Publications

Journal articles: (73) Conference papers: (154) Workshop papers: (37) Book chapters: (7) Posters: (13) Edited Volumes and Journal Special Issues: (18) Invited Papers: (6) Keynote Presentations: (6) Magazine Articles: (2) Technical reports: (31)

Journal Articles

MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI

Published in IEEE Open Journal of Circuits and Systems, 2025

Presents MARVEL, an end-to-end framework for generating model-class aware custom RISC-V extensions specifically designed for lightweight AI applications.

Recommended citation: Kumar, A., O'Mahoney, C., Kreutz Werle, P., Shanker, S., Nikolopoulos, D. S., Ji, B., Vandierendonck, H., & John, D. (2025). "MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI." IEEE Open Journal of Circuits and Systems. Vol. 6, pp. 445-456
Download Paper

On Robust Optimal Joint Deployment and Assignment of RAN Intelligent Controllers in O-RANs

Published in IEEE Open Journal of the Communications Society, 2024

Addresses robust optimal joint deployment and assignment of RAN Intelligent Controllers (RICs) in Open Radio Access Networks using chance-constrained stochastic optimization and two-stage stochastic optimization with recourse.

Recommended citation: Abdel-Rahman, M. J., Mazied, E. A., Hassan, F., Teague, K., Mackenzie, A. B., Midkiff, S. F., Cardoso, K. V., & Nikolopoulos, D. S. (2024). "On Robust Optimal Joint Deployment and Assignment of RAN Intelligent Controllers in O-RANs." IEEE Open Journal of the Communications Society, 5, 2358-2376. https://doi.org/10.1109/OJCOMS.2024.3383607
Download Paper

Auto-scaling edge cloud for network slicing

Published in Frontiers in High Performance Computing, 2023

Presents a study on resource control for autoscaling virtual radio access networks (RAN slices) using chance-constrained programming to address stochastic bin packing problems in next-generation wireless networks.

Recommended citation: Mazied, E. A., Nikolopoulos, D. S., Hanafy, Y., & Midkiff, S. F. (2023). "Auto-scaling edge cloud for network slicing." Frontiers in High Performance Computing, Volume 1 - 2023. https://doi.org/10.3389/fhpcp.2023.1167162
Download Paper

Power Log’n’Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols

Published in IEEE Transactions on Parallel and Distributed Systems, 2022

Presents Power Log’n’Roll, a power-efficient localized rollback mechanism for MPI applications that uses message logging protocols to provide fault tolerance while reducing energy consumption.

Recommended citation: Dichev, K., De Sensi, D., Nikolopoulos, D. S., Cameron, K. W., & Spence, I. (2022). "Power Log'n'Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols." IEEE Transactions on Parallel and Distributed Systems, 33(6), 1276-1288. https://doi.org/10.1109/TPDS.2021.3107745
Download Paper

gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers

Published in Future Generation Computer Systems, 2022

Presents gShare, a centralized GPU memory management framework that enables efficient GPU memory sharing for containers with near-native performance and secure isolation.

Recommended citation: Lee, M., Ahn, H., Hong, C.-H., & Nikolopoulos, D. S. (2022). "gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers." Future Generation Computer Systems, 130, 181-192. https://doi.org/10.1016/j.future.2021.12.016
Download Paper

Mixed-Precision Kernel Recursive Least Squares

Published in IEEE Transactions on Neural Networks and Learning Systems, 2022

Presents a mixed-precision approach to kernel recursive least squares for budget machine learning, enabling efficient online learning with improved throughput and memory management.

Recommended citation: Lee, J., Nikolopoulos, D. S., & Vandierendonck, H. (2022). "Mixed-Precision Kernel Recursive Least Squares." IEEE Transactions on Neural Networks and Learning Systems, 33(3), 1284-1298. https://doi.org/10.1109/TNNLS.2020.3041677
Download Paper

Efficient, Dynamic Multi-Task Execution on FPGA-Based Computing Systems

Published in IEEE Transactions on Parallel and Distributed Systems, 2022

Presents a runtime and scheduling framework for dynamic task virtualization and mapping on FPGA systems with high throughput and efficiency.

Recommended citation: Minhas, U. I., Woods, R., Nikolopoulos, D. S., & Karakonstantis, G. (2022). "Efficient Multi-Task Execution on FPGA-Based Systems." IEEE TPDS, 33(3), 710–722. https://doi.org/10.1109/TPDS.2021.3101153
Download Paper

Revealing DRAM Operating GuardBands Through Workload-Aware Error Predictive Modeling

Published in IEEE Transactions on Computers, 2021

Reveals DRAM operating guardbands through workload-aware error predictive modeling to optimize memory reliability and energy consumption trade-offs in low-power computing systems.

Recommended citation: Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D. S., & Karakonstantis, G. (2021). "Revealing DRAM Operating GuardBands Through Workload-Aware Error Predictive Modeling." IEEE Transactions on Computers, 70(11), 1976-1987. https://doi.org/10.1109/TC.2020.3033627
Download Paper

ENORM: A Framework For Edge NOde Resource Management

Published in IEEE Transactions on Services Computing, 2020

This paper presents ENORM, a framework for managing resources on edge nodes to support fog computing environments, focusing on provisioning, scaling, and QoS guarantees.

Recommended citation: Wang, N., Varghese, B., Matthaiou, M., & Nikolopoulos, D. S. (2020). "ENORM: A Framework For Edge NOde Resource Management." IEEE Transactions on Services Computing, 13(6), 1086–1099. https://doi.org/10.1109/TSC.2017.2753775
Download Paper

AIR: Iterative refinement acceleration using arbitrary dynamic precision

Published in Parallel Computing, 2020

Introduces AIR, an algorithm that dynamically adjusts arithmetic precision in iterative refinement to improve performance while maintaining backward stability.

Recommended citation: Lee, J., Peterson, G. D., Nikolopoulos, D. S., & Vandierendonck, H. (2020). AIR: Iterative refinement acceleration using arbitrary dynamic precision. *Parallel Computing*, 97, 102663. https://doi.org/10.1016/j.parco.2020.102663
Download Paper

DYVERSE: DYnamic VERtical Scaling in Multi-Tenant Edge Environments

Published in Future Generation Computer Systems, 2020

DYVERSE introduces a lightweight vertical scaling mechanism to manage multi-tenancy in Edge computing through static and dynamic priority policies, reducing SLO violations and latency.

Recommended citation: Wang, N., Matthaiou, M., Nikolopoulos, D. S., & Varghese, B. (2020). "DYVERSE: DYnamic VERtical Scaling in Multi-Tenant Edge Environments." *Future Generation Computer Systems*, 108, 598–612. https://doi.org/10.1016/j.future.2020.02.043
Download Paper

Fast load balance parallel graph analytics with an automatic graph data structure selection algorithm

Published in Future Generation Computer Systems, 2020

Proposes GraphGrind, a shared memory graph analytics framework with automatic data structure selection and reordering algorithms, outperforming Ligra up to 10.4× and Polymer up to 8.3×.

Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2020). "Fast load balance parallel graph analytics with an automatic graph data structure selection algorithm." Future Generation Computer Systems, 112, 612-623. https://doi.org/10.1016/j.future.2020.06.005
Download Paper

Hyperqueues: Design and Implementation of Deterministic Concurrent Queues

Published in ACM Transactions on Parallel Computing (TOPC), 2019

Presents hyperqueues, a programming abstraction that extends Cilk++ hyperobjects to provide deterministic and scale-free parallel programs with concurrent queue operations.

Recommended citation: Vandierendonck, H., & Nikolopoulos, D. S. (2019). "Hyperqueues: Design and Implementation of Deterministic Concurrent Queues." ACM Trans. Parallel Comput., 6(4), Article 23. https://doi.org/10.1145/3365660
Download Paper

Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems

Published in IEEE Transactions on Computers, 2019

Presents techniques for fast and energy-efficient OLAP data management on hybrid main memory systems combining volatile and non-volatile memory technologies.

Recommended citation: Hassan, A., Nikolopoulos, D. S., & Vandierendonck, H. (2019). "Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems." IEEE Transactions on Computers, 68(11), 1597-1611. https://doi.org/10.1109/TC.2019.2919287
Download Paper

Significance-Driven Data Truncation for Preventing Timing Failures

Published in IEEE Transactions on Device and Materials Reliability, 2019

Presents a significance-driven approach to data truncation that prevents timing failures in computing systems while maintaining output quality.

Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2019). Significance-Driven Data Truncation for Preventing Timing Failures. *IEEE Transactions on Device and Materials Reliability*, 19(1), 25-36. https://doi.org/10.1109/TDMR.2019.2898949
Download Paper

Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server

Published in IEEE Computer Architecture Letters, 2019

Presents Shimmer, a heterogeneous-reliability DRAM framework implemented on commodity servers that manages critical data with different reliability levels for improved power efficiency and energy savings.

Recommended citation: Tovletoglou, K., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2019). "Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server." IEEE Computer Architecture Letters, 18(1), 26-29. https://doi.org/10.1109/LCA.2019.2893189
Download Paper

Energy-Efficient Iterative Refinement Using Dynamic Precision

Published in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018

Proposes a dynamic precision refinement technique for iterative algorithms that adapts computational accuracy at runtime to reduce energy consumption without sacrificing convergence.

Recommended citation: Lee, J., Vandierendonck, H., Arif, M., Peterson, G. D., & Nikolopoulos, D. S. (2018). Energy-Efficient Iterative Refinement Using Dynamic Precision. *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, 8(4), 722–735. https://doi.org/10.1109/JETCAS.2018.2850665
Download Paper

Expediting assessments of database performance for streams of respiratory parameters

Published in Computers in Biology and Medicine, 2018

Proposes new methodology and metrics for comparing database performance when handling streams of patient respiratory data in intensive care settings, using non-parametric bootstrapping to optimize testing time.

Recommended citation: Gillan, C. J., Novakovic, A., Marshall, A. H., Shyamsundar, M., & Nikolopoulos, D. S. (2018). Expediting assessments of database performance for streams of respiratory parameters. *Computers in Biology and Medicine*, 100, 186-195. https://doi.org/10.1016/j.compbiomed.2018.05.028
Download Paper

NanoStreams: A Microserver Architecture for Real-Time Analytics on Fast Data Streams

Published in IEEE Transactions on Multi-Scale Computing Systems, 2018

Presents NanoStreams, a microserver architecture that leverages FPGAs and reconfigurable computing for real-time analytics on fast data streams.

Recommended citation: Minhas, U. I., Russell, M., Kaloutsakis, S., Barber, P., Woods, R., Georgakoudis, G., Gillan, C., Nikolopoulos, D. S., & Bilas, A. (2018). "NanoStreams: A Microserver Architecture for Real-Time Analytics on Fast Data Streams." IEEE Transactions on Multi-Scale Computing Systems, 4(3), 396-409. https://doi.org/10.1109/TMSCS.2017.2764087
Download Paper

GPU Virtualization and Scheduling Methods: A Comprehensive Survey

Published in ACM Computing Surveys, 2018

This survey presents a comprehensive review of GPU virtualization techniques and scheduling strategies, covering methods across libraries, drivers, and hardware, with implications for heterogeneous cloud computing.

Recommended citation: Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "GPU Virtualization and Scheduling Methods: A Comprehensive Survey." *ACM Computing Surveys*, 50(3), Article 35. https://doi.org/10.1145/3068281
Download Paper

Intra-Node Memory Safe GPU Co-Scheduling

Published in IEEE Transactions on Parallel and Distributed Systems, 2018

Proposes SchedGPU, a co-scheduling mechanism for GPU workloads that ensures memory safety and improves utilization on shared memory systems.

Recommended citation: Reaño, C., Silla, F., Nikolopoulos, D. S., & Varghese, B. (2018). "Intra-Node Memory Safe GPU Co-Scheduling." *IEEE TPDS*, 29(5), 1089–1102. https://doi.org/10.1109/TPDS.2017.2784428
Download Paper

A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing

Published in The Journal of Supercomputing, 2018

This paper introduces a taxonomy of task-based programming models and runtime systems for high-performance computing, providing a comprehensive classification of contemporary technologies in the context of many-core and heterogeneous systems.

Recommended citation: Thoman, P., Dichev, K., Heller, T., Iakymchuk, R., Aguilar, X., Hasanov, K., Gschwandtner, P., Lemarinier, P., Markidis, S., Jordan, H., Fahringer, T., Katrinis, K., Laure, E., & Nikolopoulos, D. S. (2018). "A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing." The Journal of Supercomputing, 74(4), 1422–1434. https://doi.org/10.1007/s11227-018-2238-4
Download Paper

DARE: Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers

Published in The International Journal of High Performance Computing Applications, 2018

Presents DARE, a Data-Access Aware Refresh system that leverages spatial-temporal application resilience to aggressively relax DRAM refresh rates on commodity servers, achieving complete hardware refresh disabling with only 2-18% quality loss.

Recommended citation: Chalios, C., Georgakoudis, G., Tovletoglou, K., Karakonstantis, G., Vandierendonck, H., & Nikolopoulos, D. S. (2018). "DARE: Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers." The International Journal of High Performance Computing Applications, 32(1), 74-88. https://doi.org/10.1177/1094342017718612
Download Paper

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads

Published in ACM Transactions on Architecture and Code Optimization, 2017

Presents SCALO, a runtime framework for orchestrating thread parallelism across co-executing applications on multicore machines, improving system throughput by up to 40%.

Recommended citation: Georgakoudis, G., Vandierendonck, H., Thoman, P., et al. (2017). "SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads." ACM Transactions on Architecture and Code Optimization (TACO), 14(4), Article 54. https://doi.org/10.1145/3158643
Download Paper

FairGV: Fair and Fast GPU Virtualization

Published in IEEE Transactions on Parallel and Distributed Systems, 2017

FairGV proposes a GPU virtualization mechanism that combines fair queuing and trap-less architecture to improve scheduling efficiency across virtual machines.

Recommended citation: Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "FairGV: Fair and Fast GPU Virtualization." *IEEE TPDS*, 28(12), 3472–3485. https://doi.org/10.1109/TPDS.2017.2717908
Download Paper

Error-Resilient Server Ecosystems for Edge and Cloud Datacenters

Published in IEEE Computer, 2017

Presents error-resilient server ecosystems for edge and cloud datacenters, addressing performance and power variability through hardware exposure interfaces and energy-efficient microserver architectures for IoT applications.

Recommended citation: Karakonstantis, G., Nikolopoulos, D. S., Gizopoulos, D., Trancoso, P., Sazeides, Y., Antonopoulos, C. D., Venugopal, S., & Das, S. (2017). "Error-Resilient Server Ecosystems for Edge and Cloud Datacenters." Computer, 50(12), 78-81. https://doi.org/10.1109/MC.2017.4451208
Download Paper

A Real Time Metabolomic Profiling Approach to Detecting Fish Fraud Using Rapid Evaporative Ionisation Mass Spectrometry

Published in Metabolomics, 2017

This article explores the use of REIMS for real-time fish fraud detection, avoiding the lengthy preparation steps of genomic profiling while maintaining result accuracy.

Recommended citation: Black, C., Chevallier, O. P., Haughey, S. A., Balog, J., Stead, S., Pringle, S. D., Riina, M. V., Martucci, F., Acutis, P. L., Morris, M., Nikolopoulos, D. S., Takats, Z., & Elliott, C. T. (2017). "A Real Time Metabolomic Profiling Approach to Detecting Fish Fraud Using Rapid Evaporative Ionisation Mass Spectrometry." Metabolomics, 13(12), 153. https://doi.org/10.1007/s11306-017-1291-y
Download Paper

On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework

Published in International Journal of Parallel Programming, 2017

This paper presents recent developments in the GVirtuS framework, enabling transparent GPU virtualization and remoting across ARM and x86 systems.

Recommended citation: Montella, R., Giunta, G., Laccetti, G., Lapegna, M., Palmieri, C., Ferraro, C., Pelliccia, V., Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework." *Int. J. Parallel Prog.*, 45(5), 1142–1163. https://doi.org/10.1007/s10766-016-0462-1
Download Paper

Managed acceleration for In-Memory database analytic workloads

Published in International Journal of Parallel, Emergent and Distributed Systems, 2017

Presents managed acceleration techniques for in-memory database analytic workloads to improve query performance and resource utilization in database management systems.

Recommended citation: O'Neill, E., McGlone, J., Kilpatrick, P., & Nikolopoulos, D. (2017). "Managed acceleration for In-Memory database analytic workloads." International Journal of Parallel, Emergent and Distributed Systems, 32(4), 406-427. https://doi.org/10.1080/17445760.2016.1170832
Download Paper

ALEA: A Fine-Grained Energy Profiling Tool

Published in ACM Transactions on Architecture and Code Optimization, 2017

Introduces ALEA, a fine-grained energy profiling tool based on probabilistic analysis, enabling detailed association of energy consumption with source code structures.

Recommended citation: Mukhanov, L., Petoumenos, P., Wang, Z., et al. (2017). "ALEA: A Fine-Grained Energy Profiling Tool." ACM Transactions on Architecture and Code Optimization (TACO), 14(1), Article 1. https://doi.org/10.1145/3050436
Download Paper

Exploiting Significance of Computations for Energy-Constrained Approximate Computing

Published in International Journal of Parallel Programming, 2016

This work introduces a runtime and programming model for optimizing quality under energy constraints using significance-aware execution and task-level approximations.

Recommended citation: Vassiliadis, V., Chalios, C., Parasyris, K., et al. (2016). "Exploiting Significance of Computations for Energy-Constrained Approximate Computing." *IJPP*, 44(5), 1078–1098. https://doi.org/10.1007/s10766-016-0409-6
Download Paper

Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics

Published in IET Computers & Digital Techniques, 2016

Evaluates fault tolerance techniques on asymmetric multicore SoCs using ARM big.LITTLE processors, focusing on near-threshold voltage computing and algorithm-based fault tolerance for low-power HPC systems.

Recommended citation: Chalios, C., Nikolopoulos, D. S., Catalán, S., & Quintana-Ortí, E. S. (2016). "Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics." IET Computers & Digital Techniques, 10(2), 85-92. https://doi.org/10.1049/iet-cdt.2015.0056
Download Paper

Methods and metrics for fair server assessment under real-time financial workloads

Published in Concurrency and Computation: Practice and Experience, 2016

Presents a rigorous methodology and new metrics for fair comparison of server and microserver platforms under real-time financial analytics workloads, comparing ARM and x86 architectures.

Recommended citation: Georgakoudis, G., Gillan, C. J., Sayed, A., Spence, I., Faloon, R., & Nikolopoulos, D. S. (2016). Methods and metrics for fair server assessment under real-time financial workloads. *Concurrency and Computation: Practice and Experience*, 28(3), 916-928. https://doi.org/10.1002/cpe.3704
Download Paper

Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics

Published in Parallel Processing Letters, 2015

Presents a mathematically rigorous iso-Quality-of-Service metric for ranking servers based on energy efficiency while meeting QoS targets for real-time analytics services.

Recommended citation: Georgakoudis, G., Gillan, C., Sayed, A., Spence, I., Faloon, R., & Nikolopoulos, D. S. (2015). "Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics." Parallel Processing Letters, 25(03), 1541004. https://doi.org/10.1142/S0129626415410042
Download Paper

A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing

Published in SIGPLAN Notices, 2015

This paper introduces a task-based model that trades output quality for energy efficiency, achieving up to 83% energy savings via significance-aware execution policies.

Recommended citation: Vassiliadis, V., Parasyris, K., Chalios, C., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing." *SIGPLAN Notices*, 50(8), 275–276. https://doi.org/10.1145/2858788.2688546
Download Paper

On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory

Published in IEEE Computer Architecture Letters, 2015

Analyzes the energy efficiency characteristics of byte-addressable non-volatile memory systems and their implications for main memory design.

Recommended citation: Vandierendonck, H., Hassan, A., & Nikolopoulos, D. S. (2015). "On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory." IEEE Computer Architecture Letters, 14(2), 144-147. https://doi.org/10.1109/LCA.2014.2355195
Download Paper

On the Potential of Significance-Driven Execution for Energy-Aware HPC

Published in Computer Science - R&D, 2015

Explores hybrid near-threshold and above-threshold voltage execution based on algorithmic significance to achieve 35–67% energy savings without compromising performance.

Recommended citation: Gschwandtner, P., Chalios, C., Nikolopoulos, D. S., et al. (2015). "On the Potential of Significance-Driven Execution for Energy-Aware HPC." Computer Science - Research and Development, 30(2), 197–206. https://doi.org/10.1007/s00450-014-0265-9
Download Paper

TProf: An Energy Profiler for Task-Parallel Programs

Published in Sustainable Computing: Informatics and Systems, 2015

Introduces TProf, a profiler for estimating energy usage in task-parallel applications, enabling per-task DVFS optimizations.

Recommended citation: Manousakis, I., Zakkak, F. S., Pratikakis, P., & Nikolopoulos, D. S. (2015). TProf: An Energy Profiler for Task-Parallel Programs. *Sustainable Computing: Informatics and Systems*, 5, 1–13. https://doi.org/10.1016/j.suscom.2014.07.004
Download Paper

Power and energy implications of the number of threads used on the Intel Xeon Phi

Published in Annals of Multicore and GPU Programming (AMGP), 2015

Studies power and energy usage of PARSEC and SPLASH-2X benchmarks on Intel Xeon Phi across different thread configurations to find optimal performance-energy relationships.

Recommended citation: Lorenzo, O. G., Pena, T. F., Cabaleiro, J. C., Pichel, J. C., Rivera, F. F., & Nikolopoulos, D. S. (2015). "Power and energy implications of the number of threads used on the Intel Xeon Phi." Annals of Multicore and GPU Programming (AMGP), 2(1), 55-65.
Download Paper

Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures

Published in Journal of Systems and Software, 2014

Introduces hybrid address spaces as a design methodology for implementing scalable runtime systems on many-core architectures without cache coherence, demonstrated through HyMR MapReduce and HyRMA remote memory access implementations.

Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2014). "Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures." Journal of Systems and Software, 97, 47-64. https://doi.org/10.1016/j.jss.2014.06.058
Download Paper

Distributed region-based memory allocation and synchronization

Published in The International Journal of High Performance Computing Applications, 2014

Presents distributed region-based memory allocation and synchronization techniques for high-performance computing applications to improve memory management and coordination in distributed systems.

Recommended citation: Symeonidou, C., Pratikakis, P., Nikolopoulos, D. S., & Bilas, A. (2014). "Distributed region-based memory allocation and synchronization." The International Journal of High Performance Computing Applications, 28(4), 406-414. https://doi.org/10.1177/1094342014552863
Download Paper

Energy Efficiency through Significance-Based Computing

Published in IEEE Computer, 2014

Presents significance-based computing as an approach to improve energy efficiency by distinguishing between critical and non-critical computations.

Recommended citation: Nikolopoulos, D. S., Vandierendonck, H., Bellas, N., Antonopoulos, C. D., Lalis, S., Karakonstantis, G., Burg, A., & Naumann, U. (2014). Energy Efficiency through Significance-Based Computing. *Computer*, 47(7), 82-85. https://doi.org/10.1109/MC.2014.182
Download Paper

FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards

Published in Journal of Systems Architecture, 2014

Presents a scalable 512-core FPGA-based prototype using Formic boards for modeling manycore architectures, demonstrating performance 50,000 times faster than software simulation.

Recommended citation: Lyberis, S., Kalokerinos, G., Lygerakis, M., Papaefstathiou, V., Mavroidis, I., Katevenis, M., Pnevmatikatos, D., & Nikolopoulos, D. S. (2014). FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards. *Journal of Systems Architecture*, 60(6), 481-493. https://doi.org/10.1016/j.sysarc.2014.03.002
Download Paper

Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-cores

Published in International Journal of Parallel, Emergent and Distributed Systems, 2014

Presents scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-core systems to optimize performance and resource allocation across diverse workloads.

Recommended citation: Khasymski, A., & Nikolopoulos, D. S. (2014). "Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-cores." International Journal of Parallel, Emergent and Distributed Systems, 30(3), 193-210. https://doi.org/10.1080/17445760.2014.895346
Download Paper

Analysis of Dependence Tracking Algorithms for Task Dataflow Execution

Published in ACM Transactions on Architecture and Code Optimization, 2013

Evaluates efficient schemes for managing task graphs in task dataflow programming, including graphs, hypergraphs, and edgeless schemes.

Recommended citation: Vandierendonck, H., Tzenakis, G., & Nikolopoulos, D. S. (2013). "Analysis of dependence tracking algorithms for task dataflow execution." ACM Transactions on Architecture and Code Optimization, 10(4), Article 61. https://doi.org/10.1145/2541228.2555316
Download Paper

Strategies for Energy-Efficient Resource Management of Hybrid Programming Models

Published in IEEE Transactions on Parallel and Distributed Systems, 2013

This work proposes dynamic concurrency throttling and DVFS strategies for energy-efficient resource management in hybrid parallel applications on multicore platforms.

Recommended citation: Li, D., de Supinski, B. R., Schulz, M., Nikolopoulos, D. S., & Cameron, K. W. (2013). "Strategies for Energy-Efficient Resource Management of Hybrid Programming Models." *IEEE TPDS*, 24(1), 144–157. https://doi.org/10.1109/TPDS.2012.95
Download Paper

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Published in International Journal of Parallel Programming, 2012

Presents cache-integrated network interfaces that combine the flexibility of caches with the efficiency of scratchpad memories, providing configurable on-chip SRAM sharing and event response mechanisms for scalable multicore architectures with less than 20% logic overhead.

Recommended citation: Kavadias, S., Katevenis, M., Zampetakis, M., & Nikolopoulos, D. S. (2012). "Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs." International Journal of Parallel Programming, 40(6), 583-604. https://doi.org/10.1007/s10766-011-0173-6
Download Paper

Critical Path-Based Thread Placement for NUMA Systems

Published in SIGMETRICS Performance Evaluation Review, 2012

This paper presents a runtime and algorithms that improve OpenMP performance on NUMA systems by optimizing thread placement along the critical path.

Recommended citation: Su, C., Li, D., Nikolopoulos, D. S., Grove, M., Cameron, K., & de Supinski, B. R. (2012). "Critical Path-Based Thread Placement for NUMA Systems." *SIGMETRICS Perform. Eval. Rev.*, 40(2), 106–112. https://doi.org/10.1145/2381056.2381079
Download Paper

BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism

Published in SIGPLAN Notices, 2012

A journal article version of BDDT, describing a runtime for structured task parallelism using fine-grained memory access footprints for dynamic analysis.

Recommended citation: Tzenakis, G., Papatriantafyllou, A., Kesapides, J., Pratikakis, P., Vandierendonck, H., & Nikolopoulos, D. S. (2012). "BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism." *SIGPLAN Notices*, 47(8), 301–302. https://doi.org/10.1145/2370036.2145864
Download Paper

The Myrmics Memory Allocator: Hierarchical, Message-Passing Allocation for Global Address Spaces

Published in SIGPLAN Notices, 2012

Myrmics implements a scalable, hierarchical memory allocator supporting dynamic regions and message-passing for task-based programming in distributed systems.

Recommended citation: Lyberis, S., Pratikakis, P., Nikolopoulos, D. S., et al. (2012). "The Myrmics Memory Allocator." *SIGPLAN Not.*, 47(11), 15–24. https://doi.org/10.1145/2426642.2259001
Download Paper

EPC: a power instrumentation controller for embedded applications

Published in SIGBED Review, 2012

Proposes and implements a real-time power monitor controller based on an 8-bit AVR controller and analog Hall effect current sensor for automated power measurement and energy accounting in embedded applications.

Recommended citation: Manousakis, I., & Nikolopoulos, D. S. (2012). "EPC: a power instrumentation controller for embedded applications." SIGBED Rev., 9(2), 28-32. https://doi.org/10.1145/2318836.2318841
Download Paper

A Capabilities-Aware Framework for Using Computational Accelerators in Data-Intensive Computing

Published in Journal of Parallel and Distributed Computing, 2011

This work proposes a framework using heterogeneous accelerators like GPUs and Cell processors for data-intensive workloads, improving performance through capability-aware resource allocation.

Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2011). "A Capabilities-Aware Framework for Using Computational Accelerators in Data-Intensive Computing." *JPDC*, 71(2), 185–197. https://doi.org/10.1016/j.jpdc.2010.09.004
Download Paper

Parallel Programming Models for Heterogeneous Multicore Architectures

Published in IEEE Micro, Volume 30, Issue 5, 2010

Explores programming models for heterogeneous multicore architectures, focusing on concurrency, hardware/software interfaces, and multiprocessor system environments, as part of the SARC European Project.

Recommended citation: SARC European Project. (2010). Parallel Programming Models for Heterogeneous Multicore Architectures. *IEEE Micro*, 30(5), 42–53. https://doi.org/10.1109/MM.2010.94

Explicit Communication and Synchronization in SARC

Published in IEEE Micro, 2010

Discusses the SARC architecture that uses explicit communication and synchronization primitives with scratchpad memory and RDMA support.

Recommended citation: Katevenis, M., Papaefstathiou, V., Kavadias, S., et al. (2010). "Explicit Communication and Synchronization in SARC." IEEE Micro, 30(5), 30–41. https://doi.org/10.1109/MM.2010.77
Download Paper

Programming Multiprocessors with Explicitly Managed Memory Hierarchies

Published in IEEE Computer, 2009

This article discusses programming techniques for multiprocessors with explicitly managed memory, using the Cell Broadband Engine as a case study for efficient parallel memory handling.

Recommended citation: Schneider, S., Yeom, J.-S., & Nikolopoulos, D. S. (2009). "Programming Multiprocessors with Explicitly Managed Memory Hierarchies." *IEEE Computer*, 42(12), 28–34. https://doi.org/10.1109/MC.2009.407
Download Paper

Algorithm, Software, and Hardware Optimizations for Delaunay Mesh Generation on Simultaneous Multithreaded Architectures

Published in Journal of Parallel and Distributed Computing, 2009

Details multi-level optimizations that improve performance of a parallel Delaunay mesh generator by up to 6x on SMT-based SMP systems.

Recommended citation: Antonopoulos, C. D., Blagojevic, F., Chernikov, A. N., Chrisochoides, N. P., & Nikolopoulos, D. S. (2009). "Optimizations for Delaunay Mesh Generation on SMT." JPDC, 69(7), 601–612. https://doi.org/10.1016/j.jpdc.2009.03.005
Download Paper

A Multigrain Delaunay Mesh Generation Method for Multicore SMT-Based Architectures

Published in Journal of Parallel and Distributed Computing, 2009

This paper evaluates a multigrain Delaunay mesh generation approach across multiple architectural layers, optimizing for SMT and multicore platforms.

Recommended citation: Antonopoulos, C. D., Blagojevic, F., Chernikov, A. N., Chrisochoides, N. P., & Nikolopoulos, D. S. (2009). "A Multigrain Delaunay Mesh Generation Method for Multicore SMT-Based Architectures." *JPDC*, 69(7), 589–600. https://doi.org/10.1016/j.jpdc.2009.03.009
Download Paper

A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies

Published in SIGPLAN Notices, 2009

This journal article compares abstractions for programming multiprocessors with explicitly managed memory, analyzing programmability and efficiency trade-offs.

Recommended citation: Schneider, S., Yeom, J.-S., Rose, B., Linford, J. C., Sandu, A., & Nikolopoulos, D. S. (2009). "A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies." *SIGPLAN Not.*, 44(4), 131–140. https://doi.org/10.1145/1594835.1504197
Download Paper

Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters

Published in SIGOPS Operating Systems Review, 2009

This article explores the design and performance of MapReduce on hybrid clusters with asymmetric multi-core accelerators and general-purpose processors.

Recommended citation: Rafique, M. M., Rose, B., Butt, A. R., & Nikolopoulos, D. S. (2009). "Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters." SIGOPS Operating Systems Review, 43(2), 25–34. https://doi.org/10.1145/1531793.1531800
Download Paper

Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes

Published in IEEE Transactions on Parallel and Distributed Systems, 2008

This paper presents a prediction-based runtime framework for adapting multithreaded scientific codes to optimize power and performance, using multivariate regression and application-aware models for energy-efficient execution on multicore systems.

Recommended citation: Curtis-Maury, M., Blagojevic, F., Antonopoulos, C. D., & Nikolopoulos, D. S. (2008). "Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes." IEEE Transactions on Parallel and Distributed Systems, 19(10), 1396–1410. https://doi.org/10.1109/TPDS.2007.70804
Download Paper

Runtime Scheduling of Dynamic Parallelism on Accelerator-Based Multi-Core Systems

Published in Parallel Computing, 2007

This paper investigates runtime mechanisms for multi-grain parallelism scheduling on heterogeneous multi-core systems, introducing S-MGPS for dynamic optimization on the Cell Broadband Engine.

Recommended citation: Blagojevic, F., Nikolopoulos, D. S., Stamatakis, A., Antonopoulos, C. D., & Curtis-Maury, M. (2007). "Runtime Scheduling of Dynamic Parallelism on Accelerator-Based Multi-Core Systems." *Parallel Computing*, 33(10), 700–719. https://doi.org/10.1016/j.parco.2007.09.004
Download Paper

Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell

Published in The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2007

This paper enhances RAxML, a phylogenetic inference tool, with novel tree search heuristics and a high-performance implementation on the IBM Cell Broadband Engine, yielding substantial speedups and addressing multi-level parallelism and optimization challenges.

Recommended citation: Stamatakis, A., Blagojevic, F., Nikolopoulos, D. S., & Antonopoulos, C. D. (2007). "Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell." The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 48(3), 271–286. https://doi.org/10.1007/s11265-007-0067-4
Download Paper

Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory

Published in Journal of Grid Computing, 2007

Extends the MMlib framework to provide fully customizable memory malleability in scientific applications, treating DRAM as a dynamic cache with local disk and remote memory capabilities.

Recommended citation: Mills, R. T., Yue, C., Stathopoulos, A., & Nikolopoulos, D. S. (2007). "Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory." Journal of Grid Computing, 5(2), 213-234. https://doi.org/10.1007/s10723-007-9075-7
Download Paper

On the design of online predictors for autonomic power-performance adaptation of multithreaded programs

Published in Journal of Autonomic and Trusted Computing, 2006

Investigates the design space for techniques that enable runtime, autonomic program adaptation for high-performance and low-power execution via event-driven performance prediction on multithreaded and multicore architectures.

Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "On the design of online predictors for autonomic power-performance adaptation of multithreaded programs." Journal of Autonomic and Trusted Computing, 1.
Download Paper

Dynamic Tiling for Effective Use of Shared Caches on Multithreaded Processors

Published in International Journal of High Performance Computing and Networking, 2004

Proposes dynamic tiling transformations to partition shared caches in SMT processors and improve performance of multithreaded workloads.

Recommended citation: Nikolopoulos, D. S. (2004). "Dynamic Tiling for Effective Use of Shared Caches on Multithreaded Processors." IJHPCN, 2(1), 22–35. https://doi.org/10.1504/IJHPCN.2004.009265
Download Paper

Quantifying contention and balancing memory load on hardware DSM multiprocessors

Published in Journal of Parallel and Distributed Computing, 2003

Proposes a methodology for quantifying remote memory access contention on hardware DSM multiprocessors and presents an algorithm for detecting hot spots and balancing memory load using dynamic page migration.

Recommended citation: Nikolopoulos, D. S. (2003). "Quantifying contention and balancing memory load on hardware DSM multiprocessors." Journal of Parallel and Distributed Computing, 63(9), 866-886. https://doi.org/10.1016/S0743-7315(03)00105-9
Download Paper

Adaptive Scheduling Under Memory Constraints on Non-Dedicated Computational Farms

Published in Future Generation Computer Systems, 2003

Proposes a scheduler for parallel programs that adapts to memory constraints in non-dedicated environments using thrashing prevention and co-scheduling extensions.

Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2003). "Adaptive Scheduling Under Memory Constraints on Non-Dedicated Computational Farms." *FGCS*, 19(4), 505–519. https://doi.org/10.1016/S0167-739X(03)00031-1
Download Paper

Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules

Published in Scientific Programming, 2003

Explores customizing and reusing loop schedules to improve scalability of non-regular numerical codes in shared-memory architectures, establishing thread-data affinity while maintaining programming simplicity.

Recommended citation: Nikolopoulos, D. S., Artiaga, E., Ayguadé, E., & Labarta, J. (2003). "Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules." Scientific Programming, 11(2), 379739. https://doi.org/10.1155/2003/379739
Download Paper

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

Published in International Journal of Parallel Programming, 2002

Compares data distribution methodologies for scaling OpenMP performance on NUMA architectures, presenting novel runtime techniques that can effectively replace manual data distribution in regular applications.

Recommended citation: Nikolopoulos, D. S., Ayguadé, E., & Polychronopoulos, C. D. (2002). "Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models." International Journal of Parallel Programming, 30(4), 225-255. https://doi.org/10.1023/A:1019899812171
Download Paper

Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors

Published in Journal of Parallel and Distributed Computing, 2002

Presents a novel dynamic page migration algorithm that improves data locality in multiprogrammed shared-memory multiprocessors through scheduler-page migration engine communication.

Recommended citation: Nikolopoulos, D. S., Polychronopoulos, C. D., Papatheodorou, T. S., Labarta, J., & Ayguadé, E. (2002). "Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors." Journal of Parallel and Distributed Computing, 62(6), 1069-1103. https://doi.org/10.1006/jpdc.2001.1817
Download Paper

Exploiting Memory Affinity in OpenMP through Schedule Reuse

Published in SIGARCH Computer Architecture News, 2001

This work introduces the concept of reusing iteration schedules in OpenMP to improve memory affinity and scalability on NUMA shared-memory systems.

Recommended citation: Nikolopoulos, D. S., Artiaga, E., Ayguadé, E., & Labarta, J. (2001). "Exploiting Memory Affinity in OpenMP through Schedule Reuse." *SIGARCH Comput. Archit. News*, 29(5), 49–55. https://doi.org/10.1145/563647.563657
Download Paper

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

Published in International Journal of Parallel Programming, 2001

Investigates architectural and OS-level implications for efficient synchronization on ccNUMA platforms, analyzing hardware and software optimizations.

Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). "The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors." IJPP, 29(3), 249–282. https://doi.org/10.1023/A:1011168003859
Download Paper

A Transparent Runtime Data Distribution Engine for OpenMP

Published in Scientific Programming, 2000

Introduces a runtime mechanism for transparent page migration in OpenMP programs to improve performance on NUMA systems without explicit data placement directives.

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., et al. (2000). "A Transparent Runtime Data Distribution Engine for OpenMP." Scientific Programming, 8(3), Article 417570. https://doi.org/10.1155/2000/417570
Download Paper

Conference Papers

dLLM-Serve: Bridging the Memory Gap in Diffusion Language Model Serving

Published in Proceedings of the 40th ACM International Conference on Supercomputing (ICS), 2026

dLLM-Serve bridges the memory gap in diffusion language model serving, enabling more efficient deployment by addressing memory bottlenecks in practical serving pipelines.

Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D. S. (2026). dLLM-Serve: Bridging the Memory Gap in Diffusion Language Model Serving. In Proceedings of the 40th ACM International Conference on Supercomputing (ICS), Belfast, Northern Ireland, UK.
Download Paper

MARCO: Multi-Agent Code Optimization for High-Performance Computing

Published in ACM Capital Region Celebration of Women in Computing Conference (CAPWIC), 2026

MARCO presents a multi-agent approach to code optimization for high-performance computing.

Recommended citation: Nafisi, A., Cvetkovic, V., Shin, C., Reece, K., Reddy, K., Torres, B., Heidari, S., Ellis, M., & Nikolopoulos, D. S. (2026). MARCO: Multi-Agent Code Optimization for High-Performance Computing. In ACM Capital Region Celebration of Women in Computing Conference (CAPWIC).
Download Paper

APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs

Published in Proceedings of the 40th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2026

APEX is a profiling‑guided hybrid LLM scheduler that maximizes CPU–GPU overlap during decode, delivering large throughput gains on memory‑constrained GPUs without increasing latency or requiring batch splitting.

Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D.S. (2026). *APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs*. In *Proceedings of the 40th IEEE International Parallel and Distributed Processing Symposum (IPDPS).*
Download Paper

Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices

Published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026

Polymorph introduces an energy-efficient approach for multi-label classification of video streams on embedded devices, combining lightweight architectures with adaptive processing to achieve high accuracy at low power.

Recommended citation: Ghafouri, S., Fayyaz, M., Li, X., John, D., Ji, B., Nikolopoulos, D., & Vandierendonck, H. (2026). *Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices*. In *Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).*
Download Paper

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Published in Tenth ACM/IEEE Symposium on Edge Computing (SEC), Washington, D.C., 2025

Conference paper presenting SLED, a speculative LLM decoding framework designed for efficient inference and serving at the edge.

Recommended citation: Li, X., Spatharakis, D., Ghafouri, S., Fan, J., Vandierendonck, H., John, D., Ji, B., & Nikolopoulos, D. S. (2025). "SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving." In Proceedings of the Tenth ACM/IEEE Symposium on Edge Computing (SEC), Washington, D.C.
Download Paper

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

Published in AAAI Conference on Artificial Intelligence, 2025

This paper proposes High-Resolution Early Dropping (HiRED), a plug-and-play token-dropping method that enhances efficiency in high-resolution vision-language models while maintaining performance.

Recommended citation: Arif, K. H. I., Yoon, J., Nikolopoulos, D. S., Vandierendonck, H., John, D., & Ji, B. (2025). "HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models." Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1773-1781. https://doi.org/10.1609/aaai.v39i2.32171
Download Paper

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

Presents ParvaGPU, an efficient spatial GPU sharing system designed for large-scale deep neural network inference in cloud computing environments.

Recommended citation: Lee, M., Seong, S., Kang, M., Lee, J., Na, G.-J., Chun, I.-G., Nikolopoulos, D., & Hong, C.-H. (2024). "ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments." In SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, 1-14. https://doi.org/10.1109/SC41406.2024.00048
Download Paper

Application-Attuned Memory Management for Containerized HPC Workflows

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024

Presents application-attuned memory management techniques for containerized HPC workflows, optimizing memory allocation and bandwidth utilization in tiered memory systems.

Recommended citation: Arif, M., Maurya, A., Rafique, M. M., Nikolopoulos, D. S., & Butt, A. R. (2024). "Application-Attuned Memory Management for Containerized HPC Workflows." In 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 114-127. https://doi.org/10.1109/IPDPS57955.2024.00019
Download Paper

Parallel Islands: A Parallel Computing Educational Video Game

Published in ACM Technical Symposium on Computer Science Education (SIGCSE), 2024

Presents Parallel Islands, an educational video game designed to teach parallel computing concepts through interactive gameplay and visualization.

Recommended citation: Cameron, M., Ellis, M., & Nikolopoulos, D. (2024). "Parallel Islands: A Parallel Computing Educational Video Game." In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2, 1586-1587.
Download Paper

Towards Efficient Python Interpreter for Tiered Memory Systems

Published in USENIX Conference on File and Storage Technologies (FAST) - Poster Session, 2024

Presents optimizations for Python interpreter performance on tiered memory systems to improve execution efficiency and memory utilization.

Recommended citation: Li, Y., Yao, S., Mobin, J., Rafique, M. M., Nikolopoulos, D., Sundararajah, K., Li, H., & Butt, A. R. (2024). "Towards Efficient Python Interpreter for Tiered Memory Systems." In Poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST). USENIX.
Download Paper

Decentralised Biomedical Signal Classification using Early Exits

Published in IEEE Interregional NEWCAS Conference, 2023

Presents decentralised biomedical signal classification techniques using early exits for ECG arrhythmia detection in distributed wireless sensor networks.

Recommended citation: Li, X., Vandierendonck, H., Nikolopoulos, D. S., Ji, B., Cardiff, B., & John, D. (2023). "Decentralised Biomedical Signal Classification using Early Exits." In 2023 21st IEEE Interregional NEWCAS Conference (NEWCAS), 1-2. https://doi.org/10.1109/NEWCAS57931.2023.10198098
Download Paper

On Realizing Efficient Deep Learning Using Serverless Computing

Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022

Explores approaches for implementing efficient deep learning training and inference using serverless computing platforms, addressing challenges in resource management and data parallelism.

Recommended citation: Assogba, K., Arif, M., Rafique, M. M., & Nikolopoulos, D. S. (2022). On Realizing Efficient Deep Learning Using Serverless Computing. In *2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)*, 220-229. https://doi.org/10.1109/CCGrid54584.2022.00031
Download Paper

Linear Regression Based DDoS Attack Detection

Published in International Conference on Machine Learning and Computing (ICMLC), 2021

Proposes a linear regression based DDoS attack detection technique that reduces false positives by analyzing the correlation between average and standard deviation of network throughput in time series data.

Recommended citation: Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2021). "Linear Regression Based DDoS Attack Detection." In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, 568-574. https://doi.org/10.1145/3457682.3457769
Download Paper

DStress: Automatic Synthesis of DRAM Reliability Stress Viruses using Genetic Algorithms

Published in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020

Presents DStress, a technique to synthesize DRAM stress viruses using genetic algorithms to expose reliability faults. Best Paper Award Nominee

Recommended citation: Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2020). "DStress: Automatic Synthesis of DRAM Reliability Stress Viruses Using Genetic Algorithms." MICRO 2020, 298–312. https://doi.org/10.1109/MICRO50266.2020.00035
Download Paper

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices

Published in USENIX Conference on Operational Machine Learning (OpML), 2020

Presents RIANN, a real-time incremental learning framework using approximate nearest neighbor algorithms optimized for mobile device constraints.

Recommended citation: Liu, J., Xie, Z., Nikolopoulos, D., & Li, D. (2020). "RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices." In 2020 USENIX Conference on Operational Machine Learning (OpML 20). USENIX Association.
Download Paper

Fast Analysis and Prediction in Large Scale Virtual Machines Resource Utilisation

Published in 10th International Conference on Cloud Computing and Services Science (CLOSER), 2020

Presents fast analysis and prediction techniques for resource utilization in large-scale virtual machine environments to optimize cloud resource management and capacity planning.

Recommended citation: Abubakar, A., Barbhuiya, S., Kilpatrick, P., Vien, N., & Nikolopoulos, D. (2020). "Fast Analysis and Prediction in Large Scale Virtual Machines Resource Utilisation." In Proceedings of the 10th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, 115-126. SciTePress. https://doi.org/10.5220/0009408701150126
Download Paper

Cross Architectural Power Modelling

Published in IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2020

Presents cross-architectural power modeling techniques using hardware counters and noise filtering to enable accurate power prediction across different processor architectures.

Recommended citation: Chen, K., Kilpatrick, P., Nikolopoulos, D. S., & Varghese, B. (2020). "Cross Architectural Power Modelling." In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 390-399. https://doi.org/10.1109/CCGrid49817.2020.00-54
Download Paper

DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search

Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020

Presents DEFCON, a method for generating and detecting failure-prone instruction sequences using stochastic search techniques. Best Paper Award

Recommended citation: Tsiokanos, I., Mukhanov, L., Georgakoudis, G., Nikolopoulos, D. S., & Karakonstantis, G. (2020). DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search. In *2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)*, 1121-1126. https://doi.org/10.23919/DATE48585.2020.9116363
Download Paper

HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers

Published in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020

HaRMony is a system that combines heterogeneous-reliability memory with QoS-aware energy management policies for virtualized servers, reducing DRAM energy and performance overhead.

Recommended citation: Tovletoglou, K., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2020). "HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers." ASPLOS '20, 575–590. https://doi.org/10.1145/3373376.3378489
Download Paper

DroidLight: Lightweight Anomaly-Based Intrusion Detection System for Smartphone Devices

Published in International Conference on Distributed Computing and Networking, 2020

DroidLight is a lightweight one-class classifier-based IDS designed to detect zero-day malware on smartphones with low overhead and high accuracy.

Recommended citation: Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2020). "DroidLight: Lightweight Anomaly-Based Intrusion Detection System for Smartphone Devices." ICDCN '20, Article 31. https://doi.org/10.1145/3369740.3369796
Download Paper

Workload-Aware DRAM Error Prediction using Machine Learning

Published in IEEE International Symposium on Workload Characterization (IISWC), 2019

This study presents a machine learning framework for predicting DRAM errors in HPC systems by analyzing workload characteristics.

Recommended citation: Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D. S., & Karakonstantis, G. (2019). "Workload-Aware DRAM Error Prediction using Machine Learning." *IISWC 2019*, 106–118. https://doi.org/10.1109/IISWC47752.2019.9041963
Download Paper

Implementing efficient message logging protocols as MPI application extensions

Published in European MPI Users' Group Meeting (EuroMPI), 2019

Implements efficient message logging protocols as MPI application extensions to enable local rollback capabilities without requiring a complete MPI library redesign, demonstrated on CG and LULESH kernels.

Recommended citation: Dichev, K., & Nikolopoulos, D. S. (2019). "Implementing efficient message logging protocols as MPI application extensions." In Proceedings of the 26th European MPI Users' Group Meeting, Article 8. https://doi.org/10.1145/3343211.3343219
Download Paper

TAPAS: Train-Less Accuracy Predictor for Architecture Search

Published in AAAI Conference on Artificial Intelligence, 2019

TAPAS introduces a novel train-less predictor for neural architecture accuracy estimation across datasets, enabling rapid architecture search with minimal computational cost.

Recommended citation: Istrate, R., Scheidegger, F., Mariani, G., Nikolopoulos, D., Bekas, C., & Malossi, A. C. I. (2019). "TAPAS: Train-Less Accuracy Predictor for Architecture Search." *AAAI 2019*, 33(01), 3927–3934. https://doi.org/10.1609/aaai.v33i01.33013927
Download Paper

SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019

Presents SAFIRE, a fault injection framework that targets soft errors in multithreaded applications with high scalability and accuracy.

Recommended citation: Georgakoudis, G., Laguna, I., Vandierendonck, H., et al. (2019). "SAFIRE: Scalable Fault Injection for Multithreaded Apps." In IPDPS 2019, 890–899. https://doi.org/10.1109/IPDPS.2019.00097
Download Paper

VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing

Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2019

Introduces VEBO, a vertex- and edge-balanced ordering heuristic that improves load balancing for parallel graph processing by balancing both edges and unique destination vertices.

Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2019). VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing. In *Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP '19)*, 391-392. https://doi.org/10.1145/3293883.3295703
Download Paper

SmartMaaS: A Framework for Smart Manufacturing-as-a-Service

Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019

Introduces SmartMaaS, a framework for Smart Manufacturing-as-a-Service that enables manufacturers to offer production capabilities as on-demand services with intelligent negotiation and optimization capabilities.

Recommended citation: Barbhuiya, S., Nikolopoulos, D. S., Price, M., Robinson, T., Nolan, D., Zhang, W., & Kyle, S. (2019). "SmartMaaS: A Framework for Smart Manufacturing-as-a-Service." In Advances in Manufacturing Technology XXXIII (pp. 16-21). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190005
Download Paper

Design Gene Representations for Emergent Innovative Design

Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019

Presents an alternative bottom-up engineering design system using “design genes” that trigger and control design growth within CAD systems, allowing unpredicted-but-valuable designs to emerge with minimal constraints.

Recommended citation: Zhang, W., Price, M., Robinson, T., Nolan, D., Nikolopoulos, D., Barbhuiya, S., & Kyle, S. (2019). "Design Gene Representations for Emergent Innovative Design." In Advances in Manufacturing Technology XXXIII (pp. 386-392). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190068
Download Paper

Bio-Inspired Growth: Introducing Emergence into Computational Design

Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019

Introduces emergence into computational design through bio-inspired growth principles, presenting a four-tiered structure that enables computers to create unexpected and innovative solutions using predictive non-determinism and stochastic rules.

Recommended citation: Kyle, S., Nolan, D., Price, M., Zhang, W., Robinson, T., Nikolopoulos, D. S., & Barbhuiya, S. (2019). "Bio-Inspired Growth: Introducing Emergence into Computational Design." In Advances in Manufacturing Technology XXXIII (pp. 379-385). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190067
Download Paper

Userspace Hypervisor Data Characterization in Virtualized Environment

Published in IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2018

Characterizes userspace hypervisor data patterns in virtualized environments using error injection and reliability analysis to improve data structure management and virtual machine monitor performance.

Recommended citation: Wang, B., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2018). "Userspace Hypervisor Data Characterization in Virtualized Environment." In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), 638-645. https://doi.org/10.1109/PADSW.2018.8644612
Download Paper

Code and Data Transformations to Address Garbage Collector Performance in Big Data Processing

Published in IEEE International Conference on High Performance Computing (HiPC), 2018

Presents code and data transformation techniques to address garbage collector performance bottlenecks in big data processing applications, focusing on memory management optimizations for Spark and Java-based systems.

Recommended citation: Fenacci, D., Vandierendonck, H., & Nikolopoulos, D. (2018). "Code and Data Transformations to Address Garbage Collector Performance in Big Data Processing." In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), 284-293. https://doi.org/10.1109/HiPC.2018.00040
Download Paper

Energy-efficient localised rollback via data flow analysis and frequency scaling

Published in European MPI Users' Group Meeting (EuroMPI), 2018

Introduces Data Flow Rollback (DFR), an approach that localizes recovery after failures in HPC systems by analyzing data flow patterns, reducing energy consumption via frequency scaling of idle nodes.

Recommended citation: Dichev, K., Cameron, K., & Nikolopoulos, D. S. (2018). Energy-efficient localised rollback via data flow analysis and frequency scaling. In *Proceedings of the 25th European MPI Users' Group Meeting (EuroMPI '18)*, Article 11. https://doi.org/10.1145/3236367.3236379
Download Paper

Supporting Cloud IaaS Users in Detecting Performance-Based Violation for Streaming Applications

Published in 2018 IEEE International Conference on Autonomic Computing (ICAC), 2018

Supports cloud IaaS users in detecting performance-based violations for streaming applications through cloud monitoring and QoS violation detection mechanisms to ensure service quality and throughput requirements.

Recommended citation: Barlaskar, E., Dichev, K., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2018). "Supporting Cloud IaaS Users in Detecting Performance-Based Violation for Streaming Applications." In 2018 IEEE International Conference on Autonomic Computing (ICAC), 163-168. https://doi.org/10.1109/ICAC.2018.00027
Download Paper

Variation-Aware Pipelined Cores through Path Shaping and Dynamic Cycle Adjustment: Case Study on a Floating-Point Unit

Published in International Symposium on Low Power Electronics and Design (ISLPED), 2018

Proposes a framework for minimizing variation-induced timing failures in pipelined designs through path shaping and dynamic cycle adjustment, demonstrated on an IEEE-754 double precision floating-point unit.

Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2018). Variation-Aware Pipelined Cores through Path Shaping and Dynamic Cycle Adjustment: Case Study on a Floating-Point Unit. In *Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '18)*, Article 52. https://doi.org/10.1145/3218603.3218617
Download Paper

The VINEYARD integrated framework for hardware accelerators in the cloud

Published in 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2018

Presents the VINEYARD framework for seamless deployment and utilization of hardware accelerators in the cloud, achieving up to 25× speedup without increasing programming complexity for machine learning and neurocomputing applications.

Recommended citation: Kachris, C., Soudris, D., Mavridis, S., Pavlidakis, M., Symeonidou, C., Kozanitis, C., Bilas, A., Fenacci, D., Bogaraju, S. V., Vandierendonck, H., & Nikolopoulos, D. S. (2018). "The VINEYARD integrated framework for hardware accelerators in the cloud." In Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 236-243. https://doi.org/10.1145/3229631.3236093
Download Paper

Minimization of Timing Failures in Pipelined Designs via Path Shaping and Operand Truncation

Published in IEEE International Symposium on On-Line Testing And Robust System Design (IOLTS), 2018

Presents techniques for minimizing timing failures in pipelined designs through path shaping and operand truncation methods to improve reliability and performance.

Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "Minimization of Timing Failures in Pipelined Designs via Path Shaping and Operand Truncation." In 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), 171-176. https://doi.org/10.1109/IOLTS.2018.8474084
Download Paper

DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server

Published in IEEE International Symposium on On-Line Testing And Robust System Design (IOLTS), 2018

Characterizes DRAM behavior under relaxed refresh periods in commodity servers, analyzing system-level effects including temperature and reliability impacts for improved memory management.

Recommended citation: Mukhanov, L., Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server." In 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), 236-239. https://doi.org/10.1109/IOLTS.2018.8474184
Download Paper

Characterization of HPC workloads on an ARMv8 based server under relaxed DRAM refresh and thermal stress

Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2018

Develops an experimental framework on a 64-bit ARM server to characterize DRAM reliability under relaxed refresh periods and thermal stress, evaluating HPC workloads and demonstrating 35X refresh period relaxation with 11.2% power savings.

Recommended citation: Mukhanov, L., Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "Characterization of HPC workloads on an ARMv8 based server under relaxed DRAM refresh and thermal stress." In Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 230-235. https://doi.org/10.1145/3229631.3236091
Download Paper

The Transprecision Computing Paradigm: Concept, Design, and Applications

Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018

This paper introduces the transprecision computing paradigm, focusing on its architectural design, energy-efficient implementations, and applications in low-power computing environments.

Recommended citation: Malossi, A. C. I., Schaffner, M., Molnos, A., Gammaitoni, L., Tagliavini, G., Emerson, A., Tomás, A., Nikolopoulos, D. S., Flamand, E., & Wehn, N. (2018). "The Transprecision Computing Paradigm: Concept, Design, and Applications." *DATE 2018*, 1105–1110. https://doi.org/10.23919/DATE.2018.8342176
Download Paper

A Taxonomy of Task-Based Technologies for High-Performance Computing

Published in Parallel Processing and Applied Mathematics (PPAM), 2018

Presents a comprehensive taxonomy and classification of task-based technologies for high-performance computing, covering diverse programming models and runtime features across heterogeneous and many-core systems.

Recommended citation: Thoman, P., Hasanov, K., Dichev, K., Iakymchuk, R., Aguilar, X., Gschwandtner, P., Lemarinier, P., Markidis, S., Jordan, H., Laure, E., Katrinis, K., Nikolopoulos, D. S., & Fahringer, T. (2018). "A Taxonomy of Task-Based Technologies for High-Performance Computing." In Parallel Processing and Applied Mathematics (pp. 264-274). Springer. https://doi.org/10.1007/978-3-319-78054-2_25
Download Paper

An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits

Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018

Presents a resilient and energy-efficient server architecture designed to exceed conventional scalability limits by combining hardware and software innovations.

Recommended citation: Karakonstantis, G., Tovletoglou, K., Mukhanov, L., et al. (2018). "An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits." DATE 2018, 1099–1104. https://doi.org/10.23919/DATE.2018.8342175
Download Paper

Power Modelling for Heterogeneous Cloud-Edge Data Centers

Published in Parallel Computing is Everywhere (Advances in Parallel Computing, Vol. 32), 2018

Develops a method for deploying power models on emerging processors for cloud-edge data centers, proposing automated hardware counter selection and a two-stage power model that works across ARM and Intel architectures.

Recommended citation: Chen, K., Varghese, B., Kilpatrick, P., & Nikolopoulos, D. S. (2018). "Power Modelling for Heterogeneous Cloud-Edge Data Centers." In Parallel Computing is Everywhere (pp. 804-813). Advances in Parallel Computing, Vol. 32. https://doi.org/10.3233/978-1-61499-843-3-804
Download Paper

Using Docker Swarm with a User-Centric Decision-Making Framework for Cloud Application Migration

Published in Cloud Computing and Service Science, 2018

Proposes MyMinder, a Multi-objective dYnamic MIgratioN Decision makER framework that assists cloud users in inter-cloud migration decisions and provides automated migration capabilities using Docker Swarm technology to overcome vendor lock-in challenges.

Recommended citation: Barlaskar, E., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2018). "Using Docker Swarm with a User-Centric Decision-Making Framework for Cloud Application Migration." In Cloud Computing and Service Science (pp. 81-101). Springer. https://doi.org/10.1007/978-3-319-94959-8_5
Download Paper

REFINE: Realistic Fault Injection via Compiler-Based Instrumentation for Accuracy, Portability and Speed

Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2017

REFINE is a compiler-based framework for fault injection that balances the fidelity of binary-level injection with the speed and portability of source-level techniques.

Recommended citation: Georgakoudis, G., Laguna, I., Nikolopoulos, D. S., & Schulz, M. (2017). "REFINE: Realistic Fault Injection via Compiler-Based Instrumentation for Accuracy, Portability and Speed." *SC 2017*, Article 29. https://doi.org/10.1145/3126908.3126972
Download Paper

A Taxonomy of Task-Based Technologies for High Performance Computing

Published in International Conference on Parallel Processing and Applied Mathematics (PPAM), 2017

Presents a comprehensive taxonomy of task-based technologies for high performance computing, categorizing various programming models and runtime systems.

Recommended citation: Nikolopoulos, D., Dichev, K., Thoman, P., Hasanov, K., Iakymchuk, R., Aguilar, X., Gschwandtner, P., Laure, E., Jordan, H., Lemarinier, P., et al. (2017). "A Taxonomy of Task-Based Technologies for High Performance Computing." In 12th International Conference on Parallel Processing and Applied Mathematics.
Download Paper

Reliability-Aware System Software Support on ARM Microservers

Published in ARM Research Summit, 2017

Presents reliability-aware system software support mechanisms for ARM microservers to improve system dependability and fault tolerance.

Recommended citation: Karakonstantis, G., Nikolopoulos, D., Antonopoulos, C., Lallis, S., Bellas, N., Gizopoulos, D., & Lawthers, P. (2017). "Reliability-Aware System Software Support on ARM Microservers." In ARM Research Summit.
Download Paper

Energy Efficiency in ARMv8-based Microservers by Hardware Margins Identification

Published in ARM Research Summit, 2017

Investigates energy efficiency improvements in ARMv8-based microservers through identification and optimization of hardware margins.

Recommended citation: Karakonstantis, G., Nikolopoulos, D., Gizopoulos, D., Sazeides, Y., Das, S., & Lawthers, P. (2017). "Energy Efficiency in ARMv8-based Microservers by Hardware Margins Identification." In 2017 ARM Research Summit.
Download Paper

Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning

Published in International Conference on Parallel Processing (ICPP), 2017

This paper explores how graph partitioning techniques can enhance memory locality and performance in graph analytics workloads.

Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2017). "Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning." ICPP '17, 181–190. https://doi.org/10.1109/ICPP.2017.27
Download Paper

Relaxing DRAM Refresh Rate through Access Pattern Scheduling: A Case Study on Stencil-Based Algorithms

Published in IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS), 2017

Explores relaxing DRAM refresh rates by leveraging access patterns in stencil codes, reducing energy with negligible performance loss.

Recommended citation: Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2017). "Relaxing DRAM Refresh Rate through Access Pattern Scheduling." IOLTS 2017, 45–50. https://doi.org/10.1109/IOLTS.2017.8046197
Download Paper

Access-aware DRAM failure-rate estimation under relaxed refresh operations

Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2017

Presents access-aware DRAM failure-rate estimation techniques under relaxed refresh operations using memory tracing, fault injection, and binary instrumentation to optimize memory reliability and energy consumption.

Recommended citation: Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2017). "Access-aware DRAM failure-rate estimation under relaxed refresh operations." In 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 292-299. https://doi.org/10.1109/SAMOS.2017.8344643
Download Paper

GraphGrind: Addressing Load Imbalance of Graph Partitioning

Published in ACM International Conference on Supercomputing (ICS), 2017

GraphGrind proposes NUMA-aware programming and runtime strategies to address partitioning-induced load imbalance in graph analytics workloads, improving performance over state-of-the-art systems.

Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2017). "GraphGrind: Addressing Load Imbalance of Graph Partitioning." ICS '17, Article 16. https://doi.org/10.1145/3079079.3079097
Download Paper

MyMinder: A User-centric Decision Making Framework for Intercloud Migration

Published in International Conference on Cloud Computing and Services Science (CLOSER), 2017

Presents MyMinder, a user-centric decision making framework that assists users in making informed decisions about intercloud migration strategies.

Recommended citation: Barlaskar, E., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2017). "MyMinder: A User-centric Decision Making Framework for Intercloud Migration." In Proceedings of the 7th International Conference on Cloud Computing and Services Science - CLOSER, 588-595. https://doi.org/10.5220/0006355905880595
Download Paper

Edge-as-a-Service: Towards Distributed Cloud Architectures

Published in Advances in Parallel Computing, Volume 32: Parallel Computing is Everywhere, 2017

This chapter introduces an Edge-as-a-Service platform that integrates edge nodes into cloud environments to reduce latency and improve Quality-of-Service.

Recommended citation: Varghese, B., Wang, N., Li, J., & Nikolopoulos, D. S. (2017). "Edge-as-a-Service: Towards Distributed Cloud Architectures." *Parallel Computing is Everywhere*, 784–793. https://doi.org/10.3233/978-1-61499-843-3-784
Download Paper

Challenges and Opportunities in Edge Computing

Published in IEEE International Conference on Smart Cloud (SmartCloud), 2016

This position paper examines the challenges and opportunities in edge computing, focusing on moving computational load towards network edges to harness untapped capabilities in edge nodes while addressing quality-of-service concerns.

Recommended citation: Varghese, B., Wang, N., Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2016). "Challenges and Opportunities in Edge Computing." IEEE International Conference on Smart Cloud (SmartCloud), 20-26. https://doi.org/10.1109/SmartCloud.2016.18
Download Paper

Runtime support for adaptive power capping on heterogeneous SoCs

Published in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016

Presents a runtime system for adaptive power capping on heterogeneous SoCs, enabling dynamic power management across ARM processors and FPGAs.

Recommended citation: Wu, Y., Nikolopoulos, D. S., & Woods, R. (2016). "Runtime support for adaptive power capping on heterogeneous SoCs." In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 71-78. https://doi.org/10.1109/SAMOS.2016.7818333
Download Paper

NanoStreams: Codesigned microservers for edge analytics in real time

Published in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016

Presents NanoStreams, a codesigned microserver architecture using FPGAs for real-time edge analytics, addressing hardware-software co-optimization for edge computing applications.

Recommended citation: Georgakoudis, G., et al. (2016). "NanoStreams: Codesigned microservers for edge analytics in real time." In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 180-187. https://doi.org/10.1109/SAMOS.2016.7818346
Download Paper

Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads

Published in ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2016

Presents an analytical energy-performance model for parallel workloads that accounts for energy consumed by CPU on memory accesses and dynamic energy of idle cores, providing optimal frequencies for global DVFS energy minimization.

Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). "Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads." In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 251-252. https://doi.org/10.1145/2935764.2935811
Download Paper

The VINEYARD project: Versatile integrated accelerator-based heterogeneous data centres

Published in International Conference on Modern Circuits and Systems Technologies (MOCAST), 2016

Presents the VINEYARD project, which develops versatile integrated accelerator-based heterogeneous data centers using FPGAs and multicore processing for improved performance and power efficiency.

Recommended citation: Kachris, C., Gaydadjiev, G., Nguyen, H.-N., Nikolopoulos, D. S., Bilas, A., Morgan, N., Strydis, C., Spatadakis, V., Gardelis, D., Jimenez-Peris, R., & Almeida, A. (2016). "The VINEYARD project: Versatile integrated accelerator-based heterogeneous data centres." In 2016 5th International Conference on Modern Circuits and Systems Technologies (MOCAST), 1-4. https://doi.org/10.1109/MOCAST.2016.7495121
Download Paper

The VINEYARD Approach: Versatile, Integrated, Accelerator-Based, Heterogeneous Data Centres

Published in Applied Reconfigurable Computing Conference, 2016

Introduces VINEYARD: a data-center architecture using programmable accelerators and a high-level programming model for big data and cloud applications.

Recommended citation: Kachris, C., Soudris, D., Gaydadjiev, G., et al. (2016). "The VINEYARD Approach." In Applied Reconfigurable Computing, 3–13. https://doi.org/10.1007/978-3-319-30481-6_1
Download Paper

Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting

Published in Architecture of Computing Systems (ARCS), 2016

Designs a generic low-cost hardware infrastructure for thread-level energy accounting in multi-core systems, achieving 95% correlation with physical power measurements while adding only 10% resource overhead.

Recommended citation: Marcu, M., Boncalo, O., Ghenea, M., Amaricai, A., Weinstock, J., Leupers, R., Wang, Z., Georgakoudis, G., Nikolopoulos, D. S., Cernazanu-Glavan, C., Bara, L., & Ionascu, M. (2016). "Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting." In Architecture of Computing Systems -- ARCS 2016 (pp. 277-289). Springer. https://doi.org/10.1007/978-3-319-30695-7_21
Download Paper

LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres

Published in Cloud Computing and Services Science Conference (Springer), 2016

Presents LS-ADT, a lightweight anomaly detection tool for cloud data centers that combines extended log analysis with correlation of system metrics to automatically detect and identify performance anomalies without requiring training or complex setup.

Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2016). "LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres." In Cloud Computing and Services Science (pp. 135-152). Springer. https://doi.org/10.1007/978-3-319-29582-4_8
Download Paper

ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems

Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

ECOSCALE proposes a reconfigurable computing architecture and runtime platform to support scalability, energy efficiency, and programmability for exascale systems.

Recommended citation: Mavroidis, I., Papaefstathiou, I., Lavagno, L., Nikolopoulos, D. S., Koch, D., Goodacre, J., Sourdis, I., Papaefstathiou, V., Coppola, M., & Palomino, M. (2016). "ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems." *DATE 2016*, 696–701.
Download Paper

Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures

Published in Parallel Computing: On the Road to Exascale (Advances in Parallel Computing, Vol. 27), 2016

Investigates the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for Conjugate Gradient method with ILUPACK preconditioner on low-power ARM processors.

Recommended citation: Aliaga, J. I., Catalán, S., Chalios, C., Nikolopoulos, D. S., & Quintana-Ortí, E. S. (2016). "Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures." In Parallel Computing: On the Road to Exascale (pp. 711-720). Advances in Parallel Computing, Vol. 27. https://doi.org/10.3233/978-1-61499-621-7-711
Download Paper

Power Capping: What Works, What Does Not

Published in IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2015

This study evaluates the effectiveness of multiple power capping strategies, including compiler optimizations and DVFS, across a variety of HPC workloads.

Recommended citation: Petoumenos, P., Mukhanov, L., Wang, Z., Leather, H., & Nikolopoulos, D. S. (2015). "Power Capping: What Works, What Does Not." *ICPADS 2015*, 525–534. https://doi.org/10.1109/ICPADS.2015.72
Download Paper

HpMC: An Energy-aware Management System of Multi-level Memory Architectures

Published in International Symposium on Memory Systems (MEMSYS), 2015

HpMC is an adaptive memory controller that switches between hierarchical and flat memory modes to reduce energy while maintaining performance in heterogeneous memory systems.

Recommended citation: Su, C., Roberts, D., León, E. A., Cameron, K. W., de Supinski, B. R., Loh, G. H., & Nikolopoulos, D. S. (2015). "HpMC: An Energy-aware Management System of Multi-level Memory Architectures." *MEMSYS '15*, 167–178. https://doi.org/10.1145/2818950.2818974
Download Paper

ALEA: Fine-Grain Energy Profiling with Basic Block Sampling

Published in International Conference on Parallel Architecture and Compilation Techniques (PACT), 2015

ALEA introduces a fine-grain energy profiling tool based on basic block sampling to help developers optimize energy consumption at the instruction level.

Recommended citation: Mukhanov, L., Nikolopoulos, D. S., & De Supinski, B. R. (2015). "ALEA: Fine-Grain Energy Profiling with Basic Block Sampling." *PACT 2015*, 87–98. https://doi.org/10.1109/PACT.2015.16
Download Paper

Energy-Efficient Hybrid DRAM/NVM Main Memory

Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015

This short paper presents a hybrid DRAM/NVM memory architecture evaluated for energy savings in memory-intensive applications.

Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Energy-Efficient Hybrid DRAM/NVM Main Memory." PACT '15, 492–493. https://doi.org/10.1109/PACT.2015.58
Download Paper

Towards automated data-driven model creation for cloud computing simulation

Published in International Conference on Simulation Tools and Techniques (SIMUTools), 2015

Presents an automated method for cloud computing topology definition, data collection and model creation to support decision making in complex cloud environments through simulation.

Recommended citation: Svorobej, S., Byrne, J., Liston, P., Byrne, P. J., Stier, C., Groenda, H., Papazachos, Z., & Nikolopoulos, D. S. (2015). "Towards automated data-driven model creation for cloud computing simulation." In Proceedings of the 8th International Conference on Simulation Tools and Techniques, 248–255. https://doi.org/10.4108/eai.24-8-2015.2261129
Download Paper

A significance-driven programming framework for energy-constrained approximate computing

Published in ACM International Conference on Computing Frontiers (CF), 2015

Introduces a programming framework for energy-constrained approximate computing that uses significance-aware runtime systems to maximize output quality within given energy budgets.

Recommended citation: Vassiliadis, V., Chalios, C., Parasyris, K., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). A significance-driven programming framework for energy-constrained approximate computing. In *Proceedings of the 12th ACM International Conference on Computing Frontiers (CF '15)*, Article 9. https://doi.org/10.1145/2742854.2742857
Download Paper

A Lightweight Tool for Anomaly Detection in Cloud Data Centres

Published in International Conference on Cloud Computing and Services Science (CLOSER), 2015

This paper presents a lightweight anomaly detection tool tailored for cloud data centers, designed for rapid deployment with minimal overhead in distributed environments.

Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2015). "A Lightweight Tool for Anomaly Detection in Cloud Data Centres." *CLOSER 2015*, 343–351. https://doi.org/10.5220/0005453403430351
Download Paper

Software-Managed Energy-Efficient Hybrid DRAM/NVM Main Memory

Published in ACM International Conference on Computing Frontiers (CF), 2015

This work proposes software techniques for energy-efficient management of hybrid DRAM/NVM memory systems, reducing hardware complexity while maintaining high performance.

Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Software-Managed Energy-Efficient Hybrid DRAM/NVM Main Memory." *CF '15*, Article 23. https://doi.org/10.1145/2742854.2742886
Download Paper

A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing

Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2015

This work presents a runtime and programming model for approximate computing based on task significance, showing graceful quality degradation and significant energy savings.

Recommended citation: Vassiliadis, V., Parasyris, K., Chalios, C., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing." *PPoPP 2015*, 275–276. https://doi.org/10.1145/2688500.2688546
Download Paper

Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi. Multicore and GPU Programming

Published in Congress of GPU Programming, 2015

Studies the power and energy implications of varying thread counts on Intel Xeon Phi processors for multicore and GPU programming applications.

Recommended citation: Lorenzo, O. G., Pena, T. F., Cabaleiro, J. C., Picel, J. C., Rivera, F. F., & Nikolopoulos, D. S. (2015). "Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi. Multicore and GPU Programming." In Second Congress of GPU Programming, 1-8.
Download Paper

Power modelling and capping for heterogeneous ARM/FPGA SoCs

Published in International Conference on Field-Programmable Technology (FPT), 2014

Presents approaches for power modeling and capping in heterogeneous System-on-Chips combining ARM processors and FPGAs.

Recommended citation: Wu, Y., Nunez-Yanez, J., Woods, R., & Nikolopoulos, D. S. (2014). Power modelling and capping for heterogeneous ARM/FPGA SoCs. In *2014 International Conference on Field-Programmable Technology (FPT)*, 231-234. https://doi.org/10.1109/FPT.2014.7082782
Download Paper

The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation

Published in IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2014

The CACTOS project outlines an approach for optimizing cloud topologies using context-awareness and autonomic resource management techniques.

Recommended citation: Östberg, P.-O., Groenda, H., Wesner, S., Byrne, J., Nikolopoulos, D. S., et al. (2014). "The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation." *CloudCom 2014*, 26–31. https://doi.org/10.1109/CloudCom.2014.62
Download Paper

Power-capped DVFS and thread allocation with ANN models on modern NUMA systems

Published in IEEE International Conference on Computer Design (ICCD), 2014

Presents power-capped dynamic voltage and frequency scaling (DVFS) and thread allocation techniques using artificial neural network models for resource management on modern NUMA systems.

Recommended citation: Imamura, S., Sasaki, H., Inoue, K., & Nikolopoulos, D. S. (2014). "Power-capped DVFS and thread allocation with ANN models on modern NUMA systems." In 2014 IEEE 32nd International Conference on Computer Design (ICCD), 324-331. https://doi.org/10.1109/ICCD.2014.6974701
Download Paper

NanoStreams: Advancing the hardware and software stack for real-time analytics on fast data streams

Published in eChallenges e-2014 Conference, 2014

Presents NanoStreams, an advanced hardware and software stack designed for real-time analytics on fast data streams, targeting high-performance server architectures and system-on-chip implementations.

Recommended citation: Gillan, C. J., Nikolopoulos, D. S., Bilas, A., & Bekas, C. (2014). "NanoStreams: Advancing the hardware and software stack for real-time analytics on fast data streams." In eChallenges e-2014 Conference Proceedings, 1-8. https://ieeexplore.ieee.org/abstract/document/7058143
Download Paper

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2014

Presents a fast dynamic binary rewriting technique that enables flexible thread migration across cores in shared-ISA heterogeneous multiprocessor systems-on-chip.

Recommended citation: Georgakoudis, G., Nikolopoulos, D. S., Vandierendonck, H., & Lalis, S. (2014). "Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs." In 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 156-163. https://doi.org/10.1109/SAMOS.2014.6893207
Download Paper

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2014

This paper presents scalable techniques for large-scale epidemic simulations on Blue Waters, addressing challenges in load balancing, graph partitioning, and communication.

Recommended citation: Yeom, J.-S., Bhatele, A., Bisset, K., Bohm, E., Gupta, A., Kale, L. V., Marathe, M., Nikolopoulos, D. S., Schulz, M., & Wesolowski, L. (2014). "Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters." *IPDPS 2014*, 755–764. https://doi.org/10.1109/IPDPS.2014.83
Download Paper

Deterministic Scale-Free Pipeline Parallelism with Hyperqueues

Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2013

Proposes hyperqueues, a programming abstraction that enables deterministic and scalable pipeline parallelism for modern multicore systems.

Recommended citation: Vandierendonck, H., Chronaki, K., & Nikolopoulos, D. S. (2013). Deterministic Scale-Free Pipeline Parallelism with Hyperqueues. In *SC '13*, Article 32. https://doi.org/10.1145/2503210.2503233
Download Paper

DRASync: distributed region-based memory allocation and synchronization

Published in European MPI Users Group Meeting (EuroMPI), 2013

Presents DRASync, a region-based allocator that implements a global address space abstraction for MPI programs with pointer-based data structures and high-level synchronization primitives.

Recommended citation: Symeonidou, C., Pratikakis, P., Bilas, A., & Nikolopoulos, D. S. (2013). "DRASync: distributed region-based memory allocation and synchronization." In Proceedings of the 20th European MPI Users' Group Meeting, 49-54. https://doi.org/10.1145/2488551.2488558
Download Paper

BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism

Published in Advanced Parallel Processing Technologies, Springer Berlin Heidelberg, 2013

A comprehensive treatment of the BDDT runtime with emphasis on block-level memory tracking and support for irregular applications in task-parallel environments.

Recommended citation: Tzenakis, G., Papatriantafyllou, A., Vandierendonck, H., Pratikakis, P., & Nikolopoulos, D. S. (2013). "BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism." In *Advanced Parallel Processing Technologies* (pp. 17–31). Springer. https://doi.org/10.1007/978-3-642-45293-2_2
Download Paper

Prefetching and Cache Management Using Task Lifetimes

Published in ACM International Conference on Supercomputing (ICS), 2013

This paper presents EBP and ECM mechanisms to leverage task lifetime information for improving prefetching and cache management in task-parallel runtimes.

Recommended citation: Papaefstathiou, V., Katevenis, M. G. H., Nikolopoulos, D. S., & Pnevmatikatos, D. (2013). "Prefetching and Cache Management Using Task Lifetimes." *ICS 2013*, 325–334. https://doi.org/10.1145/2464996.2465443
Download Paper

Model-based, Memory-centric Performance and Power Optimization on NUMA Multiprocessors

Published in IEEE International Symposium on Workload Characterization (IISWC), 2012

This work proposes a memory-centric performance and power optimization model for NUMA systems, guided by hardware counters and predictive modeling.

Recommended citation: Su, C., Li, D., Nikolopoulos, D. S., et al. (2012). "Model-based, Memory-centric Performance and Power Optimization on NUMA Multiprocessors." IISWC 2012, 164–173. https://doi.org/10.1109/IISWC.2012.6402921
Download Paper

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

Published in IEEE International Symposium on Modling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2012

Evaluates GPU-based offloading techniques to improve the performance and cost-efficiency of distributed RAID storage architectures.

Recommended citation: Khasymski, A., Rafique, M. M., Butt, A. R., et al. (2012). "On the Use of GPUs in Realizing Cost-Effective Distributed RAID." MASCOTS 2012, 469–478. https://doi.org/10.1109/MASCOTS.2012.59
Download Paper

BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies

Published in International Symposium on Computer ARchitecture and High-Performance Computing (SBAC-PAD), 2012

Introduces BTL, a framework for fine-grained measurement and modeling of memory energy consumption in hierarchical memory systems.

Recommended citation: Manousakis, I., & Nikolopoulos, D. S. (2012). "BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies." SBAC-PAD 2012, 139–146. https://doi.org/10.1109/SBAC-PAD.2012.38
Download Paper

Inference and Declaration of Independence: Impact on Deterministic Task Parallelism

Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012

Proposes static optimizations for deterministic task-parallel execution to reduce runtime overhead in task creation and dependency checks.

Recommended citation: Zakkak, F. S., Chasapis, D., Pratikakis, P., et al. (2012). "Inference and Declaration of Independence." In PACT '12, 453–454. https://doi.org/10.1145/2370816.2370892
Download Paper

Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary

Published in International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2012

Summarizes dynamic binary rewriting and migration techniques for shared-ISA asymmetric multicore processors to enable efficient code optimization and thread migration.

Recommended citation: Georgakoudis, G., Lalis, S., & Nikolopoulos, D. S. (2012). "Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary." In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 127-128. https://doi.org/10.1145/2287076.2287096
Download Paper

Formic: Cost-Efficient and Scalable Prototyping of Manycore Architectures

Published in IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2012

This paper presents Formic, a platform for efficient prototyping and validation of manycore hardware architectures using FPGA-based development environments.

Recommended citation: Lyberis, S., Kalokerinos, G., Lygerakis, M., Papaefstathiou, V., Tsaliagkos, D., Katevenis, M., Pnevmatikatos, D., & Nikolopoulos, D. (2012). "Formic: Cost-Efficient and Scalable Prototyping of Manycore Architectures." *FCCM 2012*, 61–64. https://doi.org/10.1109/FCCM.2012.20
Download Paper

BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism

Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2012

This paper presents BDDT, a runtime for deterministic task parallelism using block-level dependence analysis on dynamic memory regions.

Recommended citation: Tzenakis, G., Papatriantafyllou, A., Kesapides, J., Pratikakis, P., Vandierendonck, H., & Nikolopoulos, D. S. (2012). "BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism." *PPoPP 2012*, 301–302. https://doi.org/10.1145/2145816.2145864
Download Paper

A Unified Scheduler for Recursive and Task Dataflow Parallelism

Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011

This paper introduces a unified scheduling approach for parallel programs using recursive and task-dataflow parallelism, aiming for efficient execution with minimal overhead.

Recommended citation: Vandierendonck, H., Tzenakis, G., & Nikolopoulos, D. S. (2011). "A Unified Scheduler for Recursive and Task Dataflow Parallelism." *PACT 2011*, 1–11. https://doi.org/10.1109/PACT.2011.7
Download Paper

Task-Based Parallel H.264 Video Encoding for Explicit Communication Architectures

Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2011

Proposes a task-based parallelization strategy for H.264 encoding optimized for explicit communication architectures.

Recommended citation: Alvanos, M., Tzenakis, G., Nikolopoulos, D. S., & Bilas, A. (2011). Task-Based Parallel H.264 Video Encoding for Explicit Communication Architectures. In *SAMOS 2011*, 217–224. https://doi.org/10.1109/SAMOS.2011.6045464
Download Paper

Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer

Published in Intel Many-core Applications Research Community Symposium (MARC), 2011

Presents scalable runtime support mechanisms for data-intensive applications running on Intel’s Single-Chip Cloud Computer architecture.

Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2011). Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer. In *Proceedings of the 3rd Intel Many-core Applications Research Community Symposium (MARC)*, 25-30. https://tpapagian.github.io/files/paper_marc.pdf
Download Paper

Scalable memory registration for high performance networks using helper threads

Published in ACM International Conference on Computing Frontiers (CF), 2011

Proposes a memory registration strategy using helper threads to reduce registered memory requirements on multicore architectures for HPC applications with RDMA networks.

Recommended citation: Li, D., Cameron, K. W., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2011). Scalable memory registration for high performance networks using helper threads. In *Proceedings of the 8th ACM International Conference on Computing Frontiers (CF '11)*, Article 38. https://doi.org/10.1145/2016604.2016652
Download Paper

Fine-grain OpenMP runtime support with explicit communication hardware primitives

Published in Design, Automation & Test in Europe (DATE), 2011

Presents fine-grain OpenMP runtime support using explicit communication hardware primitives to improve synchronization and performance in parallel applications.

Recommended citation: Tendulkar, P., Papaefstathiou, V., Nikiforos, G., Kavadias, S., Nikolopoulos, D. S., & Katevenis, M. (2011). "Fine-grain OpenMP runtime support with explicit communication hardware primitives." In 2011 Design, Automation & Test in Europe, 1-4. https://doi.org/10.1109/DATE.2011.5763299
Download Paper

Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Published in ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010

Presents Strider, a runtime system for optimizing strided data access patterns on multi-core architectures with explicitly managed memory hierarchies, improving array access performance through intelligent prefetching and buffering.

Recommended citation: Yeom, J.-S., & Nikolopoulos, D. S. (2010). "Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories." In SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 1-11. https://doi.org/10.1109/SC.2010.52
Download Paper

Comparing Scalability Prediction Strategies on an SMP of CMPs

Published in International European Conference on Parallel and Distributed Computing (Euro-Par), 2010

Compares linear regression and ANN approaches for predicting scalable concurrency levels in scientific applications on CMPs.

Recommended citation: Singh, K., Curtis-Maury, M., McKee, S. A., et al. (2010). "Comparing Scalability Prediction Strategies on an SMP of CMPs." In Euro-Par 2010, 143–155. https://doi.org/10.1007/978-3-642-15277-1_14
Download Paper

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories

Published in International Conference on Parallel Processing (ICPP), 2010

Revisits MapReduce design for heterogeneous multicore systems using explicitly managed memory hierarchies and runtime adaptability. Best Paper Award Nominee

Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2010). "Rearchitecting MapReduce for Heterogeneous Multicore Processors." ICPP 2010, 121–130. https://doi.org/10.1109/ICPP.2010.21
Download Paper

Evaluation of Streaming Aggregation on Parallel Hardware Architectures

Published in ACM International Conference on Distributed Event-Based Systems (DEBS), 2010

This study compares streaming aggregation performance across Intel CPUs, NVIDIA GPUs, and IBM Cell processors, highlighting memory access patterns and data movement as key performance drivers.

Recommended citation: Schneider, S., Andrade, H., Gedik, B., Wu, K.-L., & Nikolopoulos, D. S. (2010). "Evaluation of Streaming Aggregation on Parallel Hardware Architectures." *DEBS '10*, 248–257. https://doi.org/10.1145/1827418.1827467
Download Paper

On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces

Published in ACM International Conference on Computing Frontiers (CF), 2010

This work introduces cache-integrated network interfaces for on-chip communication and synchronization in multicore processors, combining the strengths of scratchpad memory and cache-based systems.

Recommended citation: Kavadias, S. G., Katevenis, M. G. H., Zampetakis, M., & Nikolopoulos, D. S. (2010). "On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces." *CF '10*, 217–226. https://doi.org/10.1145/1787275.1787328
Download Paper

Designing Accelerator-Based Distributed Systems for High Performance

Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2010

Proposes frameworks for programming and managing asymmetric clusters composed of accelerator nodes for data-intensive applications.

Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2010). "Designing Accelerator-Based Distributed Systems for High Performance." CCGRID 2010, 165–174. https://doi.org/10.1109/CCGRID.2010.109
Download Paper

Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems

Published in International Symposium on Parallel & Distributed Processing (IPDPS), 2010

This paper presents a predictive model for MPI task aggregation in power-aware HPC systems, improving energy efficiency without sacrificing performance.

Recommended citation: Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). "Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems." *IPDPS 2010*, 1–12. https://doi.org/10.1109/IPDPS.2010.5470464
Download Paper

Hybrid MPI/OpenMP Power-Aware Computing

Published in IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

This work explores hybrid MPI/OpenMP programming for power-aware high-performance computing, introducing predictive models and heuristics to balance performance with energy efficiency through DVFS strategies.

Recommended citation: Li, D., de Supinski, B. R., Schulz, M., Cameron, K., & Nikolopoulos, D. S. (2010). "Hybrid MPI/OpenMP Power-Aware Computing." 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 1–12. https://doi.org/10.1109/IPDPS.2010.5470463
Download Paper

Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor

Published in International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), 2010

Introduces TPC, a runtime system that minimizes off-chip communication for efficient task initiation on the Cell architecture.

Recommended citation: Tzenakis, G., Kapelonis, K., Alvanos, M., et al. (2010). "TPC: Efficient Runtime for Task-Based Parallelism on the Cell." In HiPEAC, 307–321. https://doi.org/10.1007/978-3-642-11515-8_23
Download Paper

Scheduling dynamic parallelism on accelerators

Published in ACM Conference on Computing Frontiers (CF), 2009

Presents scheduling approaches for dynamic parallelism on accelerator-based systems, demonstrating cooperative scheduling and work-stealing techniques on the Cell BE architecture.

Recommended citation: Blagojevic, F., Iancu, C., Yelick, K., Curtis-Maury, M., Nikolopoulos, D. S., & Rose, B. (2009). Scheduling dynamic parallelism on accelerators. In *Proceedings of the 6th ACM Conference on Computing Frontiers (CF '09)*, 161-170. https://doi.org/10.1145/1531743.1531769
Download Paper

Scheduling Dynamic Parallelism on the Cell BE

Published in IBM HPC Systems Scientific Computing User Group (SCICOMP), 2009

Presents strategies for scheduling dynamic parallelism on the Cell Broadband Engine architecture, addressing challenges in runtime system support and performance optimization.

Recommended citation: Blagojevic, F., Iancu, C., Yelick, K. A., Nikolopoulos, D., Rose, B., & Curtis-Maury, M. (2009). Scheduling Dynamic Parallelism on the Cell BE. In *Proceedings of the 15th Meeting of the IBM HPC Systems Scientific Computing User Group (SCICOMP)*, May.

CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters

Published in International Symposium on Parallel & Distributed Processing (IPDPS), 2009

This paper introduces CellMR, a runtime framework enabling MapReduce workloads on Cell-based heterogeneous clusters with a focus on resource efficiency and acceleration.

Recommended citation: Rafique, M. M., Rose, B., Butt, A. R., & Nikolopoulos, D. S. (2009). "CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters." *IPDPS 2009*, 1–12. https://doi.org/10.1109/IPDPS.2009.5161062
Download Paper

A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies

Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2009

This paper compares programming models for explicitly managed memory hierarchies (EMM), focusing on programmability and performance across application workloads.

Recommended citation: Schneider, S., Yeom, J.-S., Rose, B., Linford, J. C., Sandu, A., & Nikolopoulos, D. S. (2009). "A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies." *PPoPP '09*, 131–140. https://doi.org/10.1145/1504176.1504197
Download Paper

Prediction Models for Multi-Dimensional Power-Performance Optimization on Many Cores

Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008

This paper introduces an online, application-aware prediction framework for optimizing dynamic voltage/frequency scaling (DVFS) and dynamic concurrency throttling (DCT) in multi-core systems, achieving significant gains in energy efficiency and performance.

Recommended citation: Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2008). "Prediction Models for Multi-Dimensional Power-Performance Optimization on Many Cores." Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 250–259. https://doi.org/10.1145/1454115.1454151
Download Paper

Scheduling Asymmetric Parallelism on a PlayStation3 Cluster

Published in IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008

Presents scheduling techniques for asymmetric parallelism on PlayStation3 clusters, addressing performance modeling and process scheduling challenges on Cell BE architecture.

Recommended citation: Blagojevic, F., Curtis-Maury, M., Yeom, J.-S., Schneider, S., & Nikolopoulos, D. S. (2008). "Scheduling Asymmetric Parallelism on a PlayStation3 Cluster." In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 146-153. https://doi.org/10.1109/CCGRID.2008.64
Download Paper

Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine

Published in ACM International Conference on Computing Frontiers (CF), 2008

This paper presents a scalable model and scheduling technique for implementing wavefront computations on the Cell Broadband Engine, evaluated through Smith-Waterman alignment.

Recommended citation: Aji, A. M., Feng, W., Blagojevic, F., & Nikolopoulos, D. S. (2008). "Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine." *Computing Frontiers 2008*, 13–22. https://doi.org/10.1145/1366230.1366235
Download Paper

Supporting I/O-Intensive Workloads on the Cell Architecture

Published in USENIX Conference on File and Storage Technologies (FAST), 2008

Explores performance enhancing techniques for I/O intensive workloads on Cell Broadband Engine, achieving 30.2% performance improvement through asynchronous prefetching and decentralized DMAs.

Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2008). "Supporting I/O-Intensive Workloads on the Cell Architecture." In Proc. USENIX FAST.
Download Paper

Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE

Published in International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), 2008

This chapter introduces a model for predicting scalability and performance of applications exploiting task- and data-level parallelism on heterogeneous multicore systems like the Cell BE.

Recommended citation: Blagojevic, F., Feng, X., Cameron, K. W., & Nikolopoulos, D. S. (2008). "Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE." In *High Performance Embedded Architectures and Compilers*, Springer, pp. 38–52. https://doi.org/10.1007/978-3-540-77560-7_4
Download Paper

DMA-Based Prefetching for I/O-Intensive Workloads on the Cell Architecture

Published in ACM International Conference on Computing Frontiers (CF), 2008

This paper evaluates DMA-based asynchronous prefetching techniques to improve the performance of I/O-intensive applications on the Cell Broadband Engine.

Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2008). "DMA-Based Prefetching for I/O-Intensive Workloads on the Cell Architecture." *CF '08*, 23–32. https://doi.org/10.1145/1366230.1366236
Download Paper

Experience with memory allocators for parallel mesh generation on multicore architectures

Published in International Conference on Numerical Grid Generation in Computational Field Simulations, 2007

Evaluates scalable and locality-aware multiprocessor memory allocators against custom allocators for parallel mesh generation algorithms on multithreaded and multicore architectures.

Recommended citation: Chernikov, A. N., Antonopoulos, C. D., Chrisochoides, N. P., Schneider, S., & Nikolopoulos, D. S. (2007). "Experience with memory allocators for parallel mesh generation on multicore architectures." In International Conference on Numerical Grid Generation in Computational Field Simulations.
Download Paper

RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2007

RAxML-Cell introduces a parallel phylogenetic inference engine targeting the Cell Broadband Engine, demonstrating performance gains through low-level hardware tuning.

Recommended citation: Blagojevic, F., Stamatakis, A., Antonopoulos, C. D., & Nikolopoulos, D. S. (2007). "RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine." *IPDPS 2007*, 1–10. https://doi.org/10.1109/IPDPS.2007.370267
Download Paper

Dynamic Multigrain Parallelization on the Cell Broadband Engine

Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007

This paper presents a scheduler for adaptive, multigrain parallelism on the Cell Broadband Engine, demonstrating performance improvements using layered parallelism in RAxML workloads. Best Paper Award

Recommended citation: Blagojevic, F., Nikolopoulos, D. S., Stamatakis, A., & Antonopoulos, C. D. (2007). "Dynamic Multigrain Parallelization on the Cell Broadband Engine." PPoPP '07, 90–100. https://doi.org/10.1145/1229428.1229445
Download Paper

A comparison of online and offline strategies for program adaptation

Published in Annual ACM Southeast Conference (ACMSE), 2007

Compares online and offline strategies for program adaptation in high-performance computing, analyzing the pros and cons of different information collection and analysis approaches for dynamic adaptation based on execution length and use characteristics.

Recommended citation: Curtis-Maury, M., Antonopoulos, C. D., & Nikolopoulos, D. S. (2007). "A comparison of online and offline strategies for program adaptation." In Proceedings of the 45th Annual ACM Southeast Conference, 162-167. https://doi.org/10.1145/1233341.1233371
Download Paper

Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel

Published in Parallel Computing, 2007

Presents the design and implementation of a nanothreading interface in the Linux kernel for Intel SMP platforms to achieve robust performance and increased throughput in multiprogrammed environments.

Recommended citation: Nikolopoulos, D. S., Antonopoulos, C. D., Venetis, I. E., Hadjidoukas, P. E., Polychronopoulos, E. D., & Papatheodorou, T. S. (2007). Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel. In *Parallel Computing* (pp. 623-630). World Scientific. https://doi.org/10.1142/9781848160170_0074
Download Paper

PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors

Published in International Conference on the Quantitative Evaluation of Systems (QEST), 2006

Presents PACMAN, a performance counters manager designed specifically for Intel Hyperthreaded processors to enable efficient performance monitoring and analysis.

Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Curtis-Maury, M. (2006). "PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors." In Third International Conference on the Quantitative Evaluation of Systems (QEST), 141-144. https://doi.org/10.1109/QEST.2006.41
Download Paper

Scalable Locality-Conscious Multithreaded Memory Allocation

Published in ACM SIGPLAN International Symposium on Memory Management (ISMM), 2006

This paper presents Streamflow, a multithreaded memory manager that improves locality and reduces synchronization overhead, outperforming state-of-the-art allocators through a segregated heap design and non-blocking operations.

Recommended citation: Schneider, S., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Scalable Locality-Conscious Multithreaded Memory Allocation." Proceedings of the 5th International Symposium on Memory Management (ISMM), 84–94. https://doi.org/10.1145/1133956.1133968
Download Paper

Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory

Published in IEEE International Conference on High Performance Distributed Computing (HPDC), 2006

Presents runtime support for memory adaptation in scientific applications using local disk and remote memory to enable dynamic memory management and improved resource utilization. Best Paper Award Nominee

Recommended citation: Yue, C., Mills, R. T., Stathopoulos, A., & Nikolopoulos, D. (2006). "Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory." In 2006 15th IEEE International Conference on High Performance Distributed Computing, 183-194. https://doi.org/10.1109/HPDC.2006.1652149
Download Paper

Online Power-Performance Adaptation of Multithreaded Programs Using Hardware Event-Based Prediction

Published in ACM International Conference on Supercomputing (ICS), 2006

This paper introduces a user-level runtime framework for online adaptation of multithreaded programs, leveraging hardware event-based prediction to optimize power and performance trade-offs in real systems with Intel Hyperthreaded processors.

Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Online Power-Performance Adaptation of Multithreaded Programs Using Hardware Event-Based Prediction." Proceedings of the 20th Annual International Conference on Supercomputing (ICS), 157–166. https://doi.org/10.1145/1183401.1183426
Download Paper

MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods

Published in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2006

Combines static code analysis with runtime instrumentation to reduce cache conflicts in multithreaded programs.

Recommended citation: Ding, X., Nikolopoulos, D. S., Jiang, S., & Zhang, X. (2006). MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods. In *ISPASS 2006*, 189–198. https://doi.org/10.1109/ISPASS.2006.1620803
Download Paper

Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs

Published in IEEE International Conference on High Performance Computing (HiPC), 2005

This chapter proposes scheduling policies that treat memory bandwidth as a first-class resource in multiprogrammed SMP systems.

Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2005). "Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs." In *HiPC 2004*, Springer, pp. 286–296. https://doi.org/10.1007/978-3-540-30474-6_33
Download Paper

Integrating Multiple Forms of Multithreaded Execution on Multi-SMT Systems: A Study with Scientific Applications

Published in International Conference on the Quantitative Evaluation of Systems (QEST), 2005

Investigates integration of simultaneous and fine-grained multithreading for improved execution of scientific codes on SMT-based systems.

Recommended citation: Curtis-Maury, M., Wang, T., Antonopoulos, C., & Nikolopoulos, D. (2005). "Integrating Multiple Forms of Multithreaded Execution on Multi-SMT Systems." QEST 2005, 199–208. https://doi.org/10.1109/QEST.2005.16
Download Paper

Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors

Published in High Performance Computing and Communications (HPCC), 2005

Introduces Factory, an object-oriented parallel programming substrate written in C++ that allows programmers to express multigrain parallelism without requiring language extensions or extra compiler support.

Recommended citation: Schneider, S., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors." In High Performance Computing and Communications (pp. 223-232). Springer. https://doi.org/10.1007/11557654_28
Download Paper

smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs

Published in International European Conference on Parallel Processing (Euro-Par), 2005

Presents SPRINTS, a source-level speculative precomputation framework for scientific applications on SMTs that reduces memory latency by prefetching long streams of delinquent data accesses without requiring hardware or compiler support.

Recommended citation: Wang, T., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs." In Euro-Par 2005 Parallel Processing (pp. 710-719). Springer. https://doi.org/10.1007/11549468_78
Download Paper

Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures

Published in ACM International Conference on Supercomputing (ICS), 2005

This paper explores multigrain parallelism in Delaunay mesh generation and evaluates execution on multithreaded SMT-based systems, revealing opportunities for performance gains.

Recommended citation: Antonopoulos, C. D., Ding, X., Chernikov, A., Blagojevic, F., Nikolopoulos, D. S., & Chrisochoides, N. (2005). "Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures." *ICS '05*, 367–376. https://doi.org/10.1145/1088149.1088198
Download Paper

Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

Published in International Parallel and Distributed Processing Symposium (IPDPS), 2005

This paper introduces scheduling algorithms that improve thread pairing for hybrid multiprocessors, targeting execution efficiency on SMT and CMP hardware architectures.

Recommended citation: McGregor, R. L., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors." IPDPS 2005. https://doi.org/10.1109/IPDPS.2005.390
Download Paper

Adapting to Memory Pressure from Within Scientific Applications on Multiprogrammed COWs

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2004

Describes adaptive memory management for scientific codes on clusters of workstations (COWs), reacting to memory pressure from the runtime.

Recommended citation: Mills, R. T., Stathopoulos, A., & Nikolopoulos, D. S. (2004). "Adapting to Memory Pressure from Within Scientific Applications on Multiprogrammed COWs." IPDPS 2004. https://doi.org/10.1109/IPDPS.2004.1303002
Download Paper

Exploiting Simultaneous Multithreading for Parallel Mesh Generation: A Multigrain Approach on Deep Multiprocessors

Published in International Meshing Roundtable (IMR), 2004

Presents a multigrain parallelization strategy leveraging simultaneous multithreading (SMT) to accelerate mesh generation on deep multiprocessor systems.

Recommended citation: Antonopoulos, C. D., Chrisochoides, N., & Nikolopoulos, D. (2004). Exploiting Simultaneous Multithreading for Parallel Mesh Generation: A Multigrain Approach on Deep Multiprocessors. In *13th International Meshing Roundtable (IMR)*.

Scheduling Algorithms with Bus Bandwidth Considerations for SMPs

Published in International Conference on Parallel Processing (ICPP), 2003

This paper introduces scheduling algorithms that incorporate system bus bandwidth as a first-class constraint for efficient SMP scheduling.

Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2003). "Scheduling Algorithms with Bus Bandwidth Considerations for SMPs." ICPP 2003, 547–554. https://doi.org/10.1109/ICPP.2003.1240622
Download Paper

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

Published in International Symposium on High Performance Computing (ISHPC), 2003

This chapter presents software techniques like dynamic tiling, copying, and block data layouts to improve cache performance on SMT processors through all-software partitioning approaches. Best Paper Award

Recommended citation: Nikolopoulos, D. S. (2003). "Code and Data Transformations for Improving Shared Cache Performance on SMT Processors." In *High Performance Computing*, 54–69. https://doi.org/10.1007/978-3-540-39707-6_5
Download Paper

Malleable memory mapping: user-level control of memory bounds for effective program adaptation

Published in International Parallel and Distributed Processing Symposium (IPDPS), 2003

Presents malleable memory mapping techniques that provide user-level control of memory bounds to enable effective program adaptation in distributed computing environments.

Recommended citation: Nikolopoulos, D. S. (2003). Malleable memory mapping: user-level control of memory bounds for effective program adaptation. In *Proceedings International Parallel and Distributed Processing Symposium (IPDPS 2003)*, 8 pp. https://doi.org/10.1109/IPDPS.2003.1213074
Download Paper

Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters

Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2002

Presents a scheduler design for multiprogrammed clusters that adapts to memory pressure using kernel-level extensions to control thread execution. Best Paper Award

Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters. In *CCGRID '02*, 22. https://doi.org/10.1109/CCGRID.2002.1017108
Download Paper

Adaptive scheduling under memory pressure on multiprogrammed SMPs

Published in International Parallel and Distributed Processing Symposium (IPDPS), 2002

Presents adaptive scheduling techniques for symmetric multiprocessors under memory pressure in multiprogrammed environments to improve system performance and throughput.

Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Adaptive scheduling under memory pressure on multiprogrammed SMPs. In *Proceedings 16th International Parallel and Distributed Processing Symposium (IPDPS 2002)*, 6 pp. https://doi.org/10.1109/IPDPS.2002.1015481
Download Paper

Quantifying and resolving remote memory access contention on hardware DSM multiprocessors

Published in International Parallel and Distributed Processing Symposium (IPDPS), 2002

Presents methods for quantifying and resolving remote memory access contention on hardware distributed shared-memory multiprocessors to improve performance and coherence. Best Paper Award

Recommended citation: Nikolopoulos, D. S. (2002). "Quantifying and resolving remote memory access contention on hardware DSM multiprocessors." In Proceedings 16th International Parallel and Distributed Processing Symposium, 10 pp. https://doi.org/10.1109/IPDPS.2002.1015503
Download Paper

Scaling Irregular Parallel Codes with Minimal Programming Effort

Published in International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC), 2001

This paper presents an OpenMP-based approach to scaling irregular parallel codes with minimal programming effort, matching MPI performance on benchmark applications. Best Paper Award Nominee

Recommended citation: Nikolopoulos, D. S., Polychronopoulos, C. D., & Ayguadé, E. (2001). "Scaling Irregular Parallel Codes with Minimal Programming Effort." *SC '01*. https://doi.org/10.1145/582034.582050
Download Paper

A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models

Published in International European Conference on Parallel Processing (Euro-Par), 2001

Defines a unified set of operating system services for embedding adaptability in thread-based programming paradigms, achieving up to 41.2% throughput improvement in multiprogrammed SMP environments.

Recommended citation: Venetis, I. E., Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). "A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models." In Euro-Par 2001 Parallel Processing (pp. 514-524). Springer. https://doi.org/10.1007/3-540-44681-8_75
Download Paper

Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs

Published in International Conference on Parallel Processing (ICPP), 2001

Presents scheduling heuristics that use runtime feedback to improve scheduling of synchronizing threads on multiprogrammed shared-memory multiprocessors.

Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs. In *ICPP 2001*, 123–130. https://doi.org/10.1109/ICPP.2001.952054
Download Paper

The Trade-Off Between Implicit and Explicit Data Distribution in Shared-Memory Programming Paradigms

Published in ACM International Conference on Supercomputing (ICS), 2001

Analyzes trade-offs among automatic page placement, page migration, and manual data distribution in OpenMP programs on NUMA systems.

Recommended citation: Nikolopoulos, D. S., Ayguadé, E., Papatheodorou, T. S., et al. (2001). "Implicit vs. Explicit Data Distribution." In ICS '01, 23–37. https://doi.org/10.1145/377792.377801
Download Paper

Improving Java Server Performance with Interruptlets

Published in International Conference on Computational Science (ICCS), 2001

Proposes Interruptlets as lightweight, low-overhead interrupt handlers for improving Java server performance, reducing I/O thread and memory copy overhead in JVMs on Linux.

Recommended citation: Craig, D., Carroll, S., Breg, F., Nikolopoulos, D. S., & Polychronopoulos, C. (2001). Improving Java Server Performance with Interruptlets. In *Computational Science – ICCS 2001* (V. N. Alexandrov et al., Eds.), pp. 223–232, Springer.

Is Data Distribution Necessary in OpenMP?

Published in International Conference on High-Performance Computing, Networking, Storage and Analysis (SC), 2000

This work questions the necessity of explicit data distribution in OpenMP programming, exploring scheduling and locality-aware execution models for scalable performance. Best Paper Award

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguade, E. (2000). "Is Data Distribution Necessary in OpenMP?" *SC 2000*. https://doi.org/10.1109/SC.2000.10025
Download Paper

User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors

Published in International Conference on Parallel Processing (ICPP), 2000

This paper proposes a user-level mechanism for dynamic page migration to improve locality and performance in multiprogrammed SMPs.

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguade, E. (2000). "User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors." ICPP 2000, 95–103. https://doi.org/10.1109/ICPP.2000.876083
Download Paper

Efficient Dynamic Parallelism with OpenMP on Linux SMPs

Published in Parallel and Distributed Processing Techniques and Applications, 2000

Presents an integrated environment for efficient support of dynamic parallelism with OpenMP on Linux-based SMPs, achieving up to 6.3 times higher throughput under multiprogramming.

Recommended citation: Antonopoulos, C. D., Venetis, I. E., Nikolopoulos, D. S., & Papatheodorou, T. S. (2000). "Efficient Dynamic Parallelism with OpenMP on Linux SMPs." In PDPTA.
Download Paper

A Case for User-Level Dynamic Page Migration

Published in ACM International Conference on Supercomputing (ICS), 2000

This paper proposes a runtime system for user-level dynamic page migration in OpenMP codes on DSM systems, improving locality and adaptivity over OS-level solutions.

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "A Case for User-Level Dynamic Page Migration." *ICS 2000*, 119–130. https://doi.org/10.1145/335231.335243
Download Paper

Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration

Published in International Symposium on High Performance Computing (ISHPC), 2000

Describes transparent mechanisms for emulating data distribution facilities in OpenMP through user-level dynamic page migration, implementing UPMlib to improve memory locality without modifying the programming model.

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration." In High Performance Computing, 415-427. Springer. https://doi.org/10.1007/3-540-39999-2_40
Download Paper

Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives

Published in International Parallel and Distributed Processing Symposium (IPDPS), 2000

Presents a technique for fast synchronization on scalable cache-coherent multiprocessors through the use of hybrid primitives.

Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (2000). Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In *Proceedings 14th International Parallel and Distributed Processing Symposium (IPDPS 2000)*, 711-719. https://doi.org/10.1109/IPDPS.2000.846056
Download Paper

An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors

Published in International Conference on Parallel and Distributed Computing Systems (PDCS), 1999

Presents a kernel-level scheduling technique for supporting the nano-threads model on shared-memory multiprocessors with hierarchical and efficient queueing mechanisms.

Recommended citation: Polychronopoulos, E. D., Nikolopoulos, D. S., Papatheodorou, T. S., et al. (1999). "An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors." PDCS '99, 148–155.
Download Paper

A Quantitative Architectural Evaluation of Synchronization Algorithms and Disciplines on ccNUMA Systems: The Case of the SGI Origin2000

Published in ACM International Conference on Supercomputing (ICS), 1999

Evaluates synchronization performance on SGI Origin2000 using architectural and algorithmic perspectives for ccNUMA systems.

Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (1999). "A Quantitative Architectural Evaluation of Synchronization Algorithms on ccNUMA Systems." In ICS '99, 319–328. https://doi.org/10.1145/305138.305209
Download Paper

System Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors

Published in Hellenic Conference on Informatics, 1999

Presents system software support techniques for reducing memory latency on distributed shared memory multiprocessors to improve performance.

Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (1999). "System Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors." In Proceedings of the 7th Hellenic conference on informatics, Greece, 61-68.
Download Paper

Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System

Published in Parallel and Distributed Processing Techniques and Applications, 1999

Presents architectural and implementation details of a nanothreading runtime system on Solaris that addresses fine-grain parallelism exploitation and scalability in multiprogrammed environments.

Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1999). "Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System." In PDPTA, 1797-1803.
Download Paper

Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors

Published in International European Conference on Parallel Processing (Euro-Par), 1998

Presents a technique to enhance autoscheduling performance in DSM multiprocessors by partitioning application task graphs and mapping them to processor clusters to improve data locality and reduce communication costs.

Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1998). "Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors." In Euro-Par'98 Parallel Processing (pp. 491-501). Springer. https://doi.org/10.1007/BFb0057892
Download Paper

Kernel-Level Scheduling for the Nano-Threads Programming Model

Published in ACM International Conference on Supercomputing (ICS), 1998

This paper introduces kernel-level scheduling mechanisms to support the nano-threads programming model for scalable parallel execution.

Recommended citation: Polychronopoulos, E. D., Martorell, X., Nikolopoulos, D. S., Labarta, J., Papatheodorou, T. S., & Navarro, N. (1998). "Kernel-Level Scheduling for the Nano-Threads Programming Model." *ICS '98*, 337–344. https://doi.org/10.1145/277830.277911
Download Paper

Workshop Papers

Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference

Published in Proceedings of the 21st International Workshop on Automatic Performance Tuning (iWAPT), 2026

Modality Inflation characterizes the energy cost of multimodal LLM inference and identifies optimization opportunities to reduce energy consumption while preserving inference quality.

Recommended citation: Moghadampanah, M., Rezaei Shahmirzadi, A., Amin, F., & Nikolopoulos, D. S. (2026). Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference. In Proceedings of the 21st International Workshop on Automatic Performance Tuning (iWAPT), New Orleans, LA.
Download Paper

Memory Tiering in the Python Virtual Machine

Published in Proceedings of the 17th SPLASH Workshop on Virtual Machines and Language Implementations (VMIL), Singapore, 2025

Presents a novel memory tiering strategy integrated into the Python Virtual Machine to improve performance and resource efficiency for memory-intensive workloads.

Recommended citation: Li, Y., Yao, S., Mobin, J., Zhan, T., Rafique, M. M., Nikolopoulos, D. S., Sundararajah, K., & Butt, A. R. (2025). Memory Tiering in the Python Virtual Machine. In *Proceedings of the 17th SPLASH Workshop on Virtual Machines and Language Implementations (VMIL)*, Singapore, October.

FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference

Published in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024

Presents FrameFeedback, a closed-loop control system for dynamic offloading of real-time edge inference tasks, enabling adaptive offloading with feedback control for deep learning applications.

Recommended citation: Jackson, M., Ji, B., & Nikolopoulos, D. S. (2024). "FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference." In 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 584-591. https://doi.org/10.1109/IPDPSW63119.2024.00116
Download Paper

Fine-Grain Slicing of Edge Cloud Servers for Radio Workloads

Published in IEEE Workshop on Hot Topics in System Infrastructure (HotInfra), 2023

Presents fine-grain slicing techniques for edge cloud servers to optimize performance for radio workloads, enabling efficient resource allocation and management in edge computing environments.

Recommended citation: Mazied, E. A., Nikolopoulos, D. S., & Midkiff, S. (2023). "Fine-Grain Slicing of Edge Cloud Servers for Radio Workloads." In IEEE Workshop on Hot Topics in System Infrastructure (HotInfra'23), in conjunction with ACM FCRC 2023. Orlando, FL.
Download Paper

Incremental Training of Deep Convolutional Neural Networks

Published in International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms, 2017

Presents incremental training techniques for deep convolutional neural networks to improve training efficiency and adaptation capabilities.

Recommended citation: Nikolopoulos, D., Istrate, R., Malossi, A. C. I., & Bekas, C. (2017). "Incremental Training of Deep Convolutional Neural Networks." In Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms.
Download Paper

Heterogeneous Servers based on Programmable Cores and Dataflow Engines

Published in Workshop on Energy-efficient Servers for Cloud and Edge Computing (EnESCE), 2017

Presents energy-efficient server architectures based on programmable accelerators and dataflow engines, demonstrating 40% better energy-efficiency than standard Xeon servers and up to 374x speedup for various workloads in data center applications.

Recommended citation: Wu, Y., Gillan, C., Minhas, U., Barbhuiya, S., Novakovic, A., Tovletoglou, K., Tzenakis, G., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. (2017). "Heterogeneous Servers based on Programmable Cores and Dataflow Engines." In Workshop Energy efficient Servers for Cloud and Edge Computing 2017.
Download Paper

An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits

Published in Workshop on Energy-efficient Servers for Cloud and Edge Computing (EnESCE 2017), 2016

Presents the UniServer approach for developing energy-efficient micro-servers that exceed conservative scaling boundaries through novel mechanisms across all design stack layers, including hardware heterogeneity exploitation and fault tolerance enhancement.

Recommended citation: Tovletoglou, K., Chalios, C., Karakonstantis, G., Mukhanov, L., Vandierendonck, H., Nikolopoulos, D., Koutsovasilis, P., Maroudas, M., Antonopoulos, C., Kalogirou, C., Bellas, N., Lalis, S., Rafique, M. M., Venugopal, S., Prat-Perez, A., Diavastos, A., Hadjilambrou, Z., Nikolaou, P., Sazeides, Y., Trancoso, P., Papadimitriou, G., Kaliorakis, M., Chatzidimitriou, A., & Gizopoulos, D. (2016). "An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits." In Workshop on Energy-efficient Servers for Cloud and Edge Computing 2017.
Download Paper

A scalable and composable map-reduce system

Published in IEEE International Conference on Big Data (Big Data), 2016

Presents a scalable and composable map-reduce system that improves performance, composition capabilities, and programmability for big data processing applications.

Recommended citation: Arif, M., Vandierendonck, H., Nikolopoulos, D. S., & de Supinski, B. R. (2016). "A scalable and composable map-reduce system." In 2016 IEEE International Conference on Big Data (Big Data), 2233-2242. https://doi.org/10.1109/BigData.2016.7840854
Download Paper

HPTA: High-performance text analytics

Published in IEEE International Conference on Big Data (Big Data) Workshops, 2016

Presents HPTA, a high-performance text analytics framework that optimizes data structures, memory management, and sparse matrix operations for improved text processing performance.

Recommended citation: Vandierendonck, H., Murphy, K., Arif, M., & Nikolopoulos, D. S. (2016). "HPTA: High-performance text analytics." In 2016 IEEE International Conference on Big Data (Big Data), 416-423. https://doi.org/10.1109/BigData.2016.7840632
Download Paper

Big data availability: Selective partial checkpointing for in-memory database queries

Published in IEEE International Conference on Big Data (Big Data) Workshops, 2016

Presents selective partial checkpointing techniques for improving availability in in-memory database queries, addressing fault tolerance challenges in big data processing systems.

Recommended citation: Playfair, D., Trehan, A., McLarnon, B., & Nikolopoulos, D. S. (2016). "Big data availability: Selective partial checkpointing for in-memory database queries." In 2016 IEEE International Conference on Big Data (Big Data), 2785-2794. https://doi.org/10.1109/BigData.2016.7840926
Download Paper

Accelerating Data Center Applications with Reconfigurable DataFlow Engines

Published in Second International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2016

Addresses the integration of energy-efficient programmable accelerators in cloud-based data analytics frameworks to achieve seamless integration and push the limits on computation capacity and density of future data centers.

Recommended citation: Barbhuiya, S., Wu, Y., Murphy, K., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). "Accelerating Data Center Applications with Reconfigurable DataFlow Engines." In Second International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'16).
Download Paper

TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods

Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2016

Presents TwinPCG, a fault tolerance approach using dual thread redundancy and forward recovery techniques specifically designed for preconditioned conjugate gradient methods.

Recommended citation: Dichev, K., & Nikolopoulos, D. S. (2016). "TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods." In 2016 IEEE International Conference on Cluster Computing (CLUSTER), 506-514. https://doi.org/10.1109/CLUSTER.2016.99
Download Paper

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform

Published in International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), 2016

Presents the design of a new runtime for heterogeneous hardware platforms that extends OpenCL to simplify programming and automate scheduling across FPGAs and other devices for exascale computing.

Recommended citation: Harvey, P., Bakanov, K., Spence, I., & Nikolopoulos, D. S. (2016). "A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform." In Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, Article 7. https://doi.org/10.1145/2931088.2931090
Download Paper

Operator and Workflow Optimization for High-Performance Analytics

Published in International Workshop on Multi-Engine Data Analytics (MEDAL), 2016

Studies the impact of intra-node parallelism on data analytics performance, identifying four performance optimizations enabled by increasing processing cores and their interactions on analytics operators.

Recommended citation: Vandierendonck, H., Murphy, K. L., Arif, M., Sun, J., & Nikolopoulos, D. S. (2016). "Operator and Workflow Optimization for High-Performance Analytics." In 1st International Workshop on Multi-Engine Data Analytics (MEDAL).
Download Paper

Energy Optimization of Parallel Programs on Unreliable Hardware

Published in Workshop on Approximate Computing, 2016

Presents a work-in-progress report on minimizing energy consumption of parallel applications on unreliable hardware platforms, specifically unreliable memory, using analytical models to capture CPU energy consumption and select optimal frequencies.

Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. (2016). "Energy Optimization of Parallel Programs on Unreliable Hardware." In Second Workshop on Approximate Computing.
Download Paper

Energy Optimization of Parallel Workloads on Unreliable Hardware

Published in WAPCO '16 (HiPEAC Workshop), 2016

This paper explores techniques for optimizing energy efficiency of parallel workloads on unreliable hardware, presented at WAPCO in conjunction with HiPEAC 2016.

Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). *Energy Optimization of Parallel Workloads on Unreliable Hardware*. In Proceedings of the Second Workshop on Approximate Computing (WAPCO), Prague, Czech Republic.
Download Paper

Application-Level Energy Awareness for OpenMP

Published in OpenMP: Heterogenous Execution and Data Movements, Springer International Publishing, 2015

This chapter introduces OpenMPE, a programming model extension to OpenMP that supports application-level energy optimizations through annotations and runtime tuning.

Recommended citation: Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., & Nikolopoulos, D. S. (2015). "Application-Level Energy Awareness for OpenMP." In *OpenMP: Heterogenous Execution and Data Movements*, Springer, pp. 219–232. https://doi.org/10.1007/978-3-319-24595-9_16
Download Paper

Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies

Published in International Workshop on Data Management on New Hardware (DAMON), 2015

Proposes energy-efficient hybrid DRAM/NVM memory management for modern data stores using application-level policies for data placement.

Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies." DaMoN '15, Article 1. https://doi.org/10.1145/2771937.2771940
Download Paper

On the Viability of Microservers for Financial Analytics

Published in Workshop on High Performance Computational Finance (WHPCF), 2014

Evaluates the viability of microserver architectures for financial analytics applications, examining energy efficiency and performance characteristics for numerical simulation and event processing workloads.

Recommended citation: Gillan, C. J., Nikolopoulos, D. S., Georgakoudis, G., Faloon, R., Tzenakis, G., & Spence, I. (2014). "On the Viability of Microservers for Financial Analytics." In 2014 Seventh Workshop on High Performance Computational Finance, 29-36. https://doi.org/10.1109/WHPCF.2014.11
Download Paper

Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores

Published in International Workshop on Code OptimiSation for MultI and Many Cores (COSMIC), 2013

Presents a low overhead binary code rewriting method for shared-ISA multicore processors that enables thread migration among heterogeneous cores while preserving functional equivalence. Best Paper Award

Recommended citation: Georgakoudis, G., Nikolopoulos, D. S., & Lalis, S. (2013). "Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores." In Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores, Article 4. https://doi.org/10.1145/2446920.2446924
Download Paper

A Programming Model for Deterministic Task Parallelism

Published in ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC), 2011

Presents a model for deterministic parallelism using tasks with isolated footprints, removing the need for cache coherence and enabling provably deterministic execution.

Recommended citation: Pratikakis, P., Vandierendonck, H., Lyberis, S., & Nikolopoulos, D. S. (2011). "A Programming Model for Deterministic Task Parallelism." MSPC '11, 7–12. https://doi.org/10.1145/1988915.1988918
Download Paper

Parallel Programming of General-Purpose Programs Using Task-Based Programming Models

Published in USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011

This paper extends the Cilk programming model by introducing input, output, and inout dependency types on task arguments, enabling concise expression of complex parallelism patterns like pipelines and speculative execution in general-purpose programs. The proposed extensions improve code readability and maintain performance comparable to existing models.

Recommended citation: Vandierendonck, H., Pratikakis, P., & Nikolopoulos, D. S. (2011). "Parallel Programming of General-Purpose Programs Using Task-Based Programming Models." *HotPar '11*, USENIX Association. https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Vandierendonck.pdf
Download Paper

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors

Published in OpenMP Shared Memory Parallel Programming, Springer Berlin Heidelberg, 2008

This chapter evaluates OpenMP performance across SMT and CMP architectures, highlighting architectural bottlenecks and benefits of adaptive runtime mechanisms. Best Paper Award

Recommended citation: Curtis-Maury, M., Ding, X., Antonopoulos, C. D., & Nikolopoulos, D. S. (2008). "An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors." In *OpenMP Shared Memory Parallel Programming* (pp. 133–144). Springer. https://doi.org/10.1007/978-3-540-68555-5_11
Download Paper

VT-ASOS: Holistic system software customization for many cores

Published in IEEE International Symposium on Parallel and Distributed Processing (IPDPS) Workshops, 2008

Presents VT-ASOS, a holistic approach to system software customization for many-core architectures, addressing virtualization, resource management, and fault tolerance.

Recommended citation: Nikolopoulos, D. S., Back, G., Tripathi, J., & Curtis-Maury, M. (2008). "VT-ASOS: Holistic system software customization for many cores." In 2008 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 1-5. https://doi.org/10.1109/IPDPS.2008.4536390
Download Paper

Identifying Energy-Efficient Concurrency Levels Using Machine Learning

Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2007

This work uses machine learning models to automatically determine energy-optimal concurrency levels for parallel workloads, improving performance-per-watt.

Recommended citation: Curtis-Maury, M., Singh, K., McKee, S. A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2007). "Identifying Energy-Efficient Concurrency Levels Using Machine Learning." *Cluster 2007*, 488–495. https://doi.org/10.1109/CLUSTR.2007.4629274
Download Paper

Application-specific customization on many-core platforms: the VT-ASOS framework

Published in Second Workshop on Software and Tools for Multi-Core Systems, 2007

Presents the VT-ASOS framework for application-specific customization on many-core platforms, enabling tailored system software solutions for diverse computing environments.

Recommended citation: Back, G., & Nikolopoulos, D. S. (2007). "Application-specific customization on many-core platforms: the VT-ASOS framework." In Proceedings of the Second Workshop on Software and Tools for Multi-Core Systems.
Download Paper

Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems

Published in 11th Workshop on High Performance Embedded Computing, 2007

Derives a methodology for synthesizing polymorphic programming models for asymmetric multi-core processors, focusing on runtime performance modeling and scheduling of dynamic parallelism.

Recommended citation: Nikolopoulos, D. S., & Cameron, K. W. (2007). "Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems." In Proceedings of the 11th Workshop on High Performance Embedded Computing.
Download Paper

Dynamic Program Stirring on Multiple Cores: How Hardware Performance Monitors Can Help Regulate Performance, Power, and Temperature Simultaneously

Published in Workshop on Functionality of Hardware Performance Monitors, 2006

Explores how hardware performance monitors can provide insights into software-hardware interaction to regulate performance, power, and temperature simultaneously on multicore platforms through dynamic adaptation.

Recommended citation: Curtis-Maury, M., Nikolopoulos, D. S., & Antonopoulos, C. D. (2006). "Dynamic Program Stirring on Multiple Cores: How Hardware Performance Monitors Can Help Regulate Performance, Power, and Temperature Simultaneously." In Proc. of the Second Workshop on Functionality of Hardware Performance Monitors.
Download Paper

Online Strategies for High-Performance Power-Aware Thread Execution on Emerging Multiprocessors

Published in IEEE International Parallel & Distributed Processing Symposium (IPDPS Workshops), 2006

This paper proposes runtime strategies for dynamically adjusting thread execution to optimize energy consumption and performance on multiprocessors.

Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Online Strategies for High-Performance Power-Aware Thread Execution on Emerging Multiprocessors." *IPDPS 2006*. https://doi.org/10.1109/IPDPS.2006.1639598
Download Paper

2-D Parallel Constrained Delaunay Mesh Generation: A Multigrain Approach on Deep Multiprocessors

Published in PMUA '05 (Workshop at ICS), 2005

Presents a multigrain approach to 2-D constrained Delaunay mesh generation for deep multiprocessors, delivered as an invited presentation at PMUA held with ICS 2005.

Recommended citation: Antonopoulos, C. D., Chrisochoides, N., & Nikolopoulos, D. S. (2005). *2-D Parallel Constrained Delaunay Mesh Generation: A Multigrain Approach on Deep Multiprocessors*. In Abstracts of the Workshop on Programming Models for HPCS Ultra-Scale Applications (PMUA), held with the 19th ACM International Conference on Supercomputing (ICS), Cambridge, MA, USA.
Download Paper

Power-aware Resource Allocation via Online Simulation with Multiple-queue Backfilling

Published in Workshop on Preformability Modeling of Computer and Communication Systems, 2005

Presents power-aware resource allocation techniques using online simulation with multiple-queue backfilling for efficient energy management in computing systems.

Recommended citation: Lawson, B., Yue, C., Smirni, E., & Nikolopoulos, D. (2005). "Power-aware Resource Allocation via Online Simulation with Multiple-queue Backfilling." In Proceedings of the 7th workshop on preformability Modeling of Computer and Communication systems, held in conjunction with the second Quantitative Evaluation of systems.
Download Paper

Runtime Support for Integrating Precomputation and Thread-Level Parallelism on Simultaneous Multithreaded Processors

Published in Workshop on Languages, Compilers and Runtime Systems for Parallel Computing (LCR), 2004

Introduces runtime mechanisms that coordinate speculative precomputation and thread-level parallelism on SMT processors for improved performance.

Recommended citation: Wang, T., Blagojevic, F., & Nikolopoulos, D. S. (2004). Runtime Support for Integrating Precomputation and Thread-Level Parallelism. In *LCR '04*, 1–12. https://doi.org/10.1145/1066650.1066667
Download Paper

Effective cross-platform, multilevel parallelism via dynamic adaptive execution

Published in International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2002

Presents an approach for achieving effective cross-platform and multilevel parallelism through dynamic adaptive execution techniques.

Recommended citation: Ko, W., Yankelevsky, M., Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Effective cross-platform, multilevel parallelism via dynamic adaptive execution. In *Proceedings 16th International Parallel and Distributed Processing Symposium (IPDPS 2002)*, 8 pp. https://doi.org/10.1109/IPDPS.2002.1016495
Download Paper

A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks

Published in International Workshop on OpenMP (IWOMP), 2001

Evaluates the effectiveness of runtime data distribution methods in OpenMP programs using SPEC benchmarks, achieving 20-25% speedup improvements through automatic data distribution without API extensions.

Recommended citation: Nikolopoulos, D. S., & Ayguadé, E. (2001). "A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks." In OpenMP Shared Memory Parallel Programming (pp. 115-129). Springer. https://doi.org/10.1007/3-540-44587-0_11
Download Paper

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors

Published in Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR), 2000

UPMLIB provides dynamic memory tuning for OpenMP programs by performing runtime page migrations using compiler and OS feedback on shared-memory multiprocessors.

Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors." In *Languages, Compilers, and Run-Time Systems for Scalable Computers*, 85–99. https://doi.org/10.1007/3-540-40889-4_7
Download Paper

A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager

Published in Job Scheduling Strategies for Parallel Processing (JSSPP), 2000

Introduces the NANOS CPU Manager, a runtime environment for optimizing processor scheduling policies for shared-memory multiprocessors.

Recommended citation: Martorell, X., Corbalán, J., Nikolopoulos, D. S., et al. (2000). "A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager." In Job Scheduling Strategies for Parallel Processing, 87–112. https://doi.org/10.1007/3-540-39997-6_7
Download Paper

Efficient Runtime Thread Management for the Nano-Threads Programming Model

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS Workshops), 1998

This chapter introduces memory management and scheduling strategies for the nano-threads programming model, enhancing performance on NUMA multiprocessors with hierarchical queues.

Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1998). "Efficient Runtime Thread Management for the Nano-Threads Programming Model." In *Parallel and Distributed Processing*, 183–194. https://link.springer.com/chapter/10.1007/3-540-64359-1_688
Download Paper

Book Chapters

Punching Holes in the Cloud: Direct Communication Between Serverless Functions

Published in Serverless Computing: Principles and Paradigms, 2023

Presents an ephemeral communication framework for serverless environments that enables direct network connections between functions, achieving 680 Mbps throughput and 4.7× performance improvement over object storage solutions.

Recommended citation: Moyer, D., & Nikolopoulos, D. S. (2023). "Punching Holes in the Cloud: Direct Communication Between Serverless Functions." In Serverless Computing: Principles and Paradigms (pp. 15-41). Springer. https://doi.org/10.1007/978-3-031-26633-1_2
Download Paper

Feasibility of Fog Computing

Published in Handbook of Integration of Cloud Computing, Cyber Physical Systems and Internet of Things, 2020

This book chapter discusses the feasibility of fog computing as a decentralized alternative to cloud-centric models, showing improved latency and reduced cloud traffic in an online gaming use case and advocating for broader integration of edge resources.

Recommended citation: Varghese, B., Wang, N., Nikolopoulos, D. S., & Buyya, R. (2020). "Feasibility of Fog Computing." In R. Ranjan, K. Mitra, P. Prakash Jayaraman, L. Wang, & A. Y. Zomaya (Eds.), Handbook of Integration of Cloud Computing, Cyber Physical Systems and Internet of Things (pp. 127–146). Springer. https://doi.org/10.1007/978-3-030-43795-4_5
Download Paper

Programming and Managing Resources on Accelerator-Enabled Clusters

Published in Programming multi‐core and many‐core computing systems, 2017

Explores system design alternatives for clusters with computational accelerators and capability-aware task scheduling strategies using the MapReduce programming model for asymmetric clusters.

Recommended citation: Mustafa Rafique, M., Butt, A. R., & Nikolopoulos, D. S. (2017). "Programming and Managing Resources on Accelerator-Enabled Clusters." In Programming multi‐core and many‐core computing systems (pp. 405-429). Wiley. https://doi.org/10.1002/9781119332015.ch20
Download Paper

Realizing Accelerated Cost-Effective Distributed RAID

Published in Handbook on Data Centers, 2015

Addresses the challenges of storing and retrieving massive scientific data reliably and cost-effectively, proposing distributed RAID solutions for large-scale storage systems and parallel file systems.

Recommended citation: Khasymski, A., Rafique, M. M., Butt, A. R., Vazhkudai, S. S., & Nikolopoulos, D. S. (2015). Realizing Accelerated Cost-Effective Distributed RAID. In S. U. Khan & A. Y. Zomaya (Eds.), *Handbook on Data Centers* (pp. 729-752). Springer New York. https://doi.org/10.1007/978-1-4939-2092-1_25
Download Paper

Modeling and Algorithms for Scalable and Energy Efficient Execution on Multicore Systems

Published in Scalable Computing and Communications: Theory and Practice, 2013

Presents modeling techniques and algorithms for achieving scalable and energy-efficient execution on multicore systems in the context of scalable computing and communications.

Recommended citation: Li, D., Nikolopoulos, D. S., & Cameron, K. W. (2013). "Modeling and Algorithms for Scalable and Energy Efficient Execution on Multicore Systems." In Scalable Computing and Communications: Theory and Practice (pp. 157-184). Wiley-Blackwell.
Download Paper

Parallel Programming

Published in Encyclopedia of Software Engineering, 2013

An overview of common abstractions in parallel programming, including models based on shared and distributed memory, with discussions on programmability and performance trade-offs.

Recommended citation: Vandierendonck, H., Nikolopoulos, D. S., & Pratikakis, P. (2013). Parallel Programming. In *Encyclopedia of Software Engineering* (Taylor and Francis), February 27.

Scheduling Algorithms with Bus Bandwidth Considerations for SMPs

Published in High-Performance Computing, John Wiley & Sons, 2005

This book chapter presents gang-like scheduling techniques for SMP systems to optimize use of shared bus bandwidth based on runtime monitoring.

Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2005). "Scheduling Algorithms with Bus Bandwidth Considerations for SMPs." In High-Performance Computing (pp. 313–332). John Wiley & Sons. https://doi.org/10.1002/0471732710.ch16
Download Paper

Magazine Articles

NanoStreams: A Hardware and Software Stack for Real-Time Analytics on Fast Data Streams

Published in HiPEAC Info, Volume 38, 2014

Introduces the NanoStreams project, focusing on an integrated hardware-software stack designed for low-latency, real-time analytics on high-throughput data streams.

Recommended citation: Nikolopoulos, D. (2014). NanoStreams: A Hardware and Software Stack for Real-Time Analytics on Fast Data Streams. *HiPEAC Info*, 38, April.

Processors: The Challenge of Cooperation

Published in Economist Special Edition, Volume 71, 2009

Article in The Economist Special Edition discussing the future of processor design and the need for coordination in multicore systems.

Recommended citation: Katevenis, M. G. H., & Nikolopoulos, D. (2009). Processors: The Challenge of Cooperation. *Economist Special Edition*, 71, 26–28.

Posters

HydraCache: Long-Context Prefill Parallelization via Distributed Cache Blending

Published in SC 2025 (St. Louis, MO) — Poster, 2025

Refereed poster: HydraCache: Long-Context Prefill Parallelization via Distributed Cache Blending.

Recommended citation: Adib Rezaei Shahmirzadi, Shayan Shabihi, Furong Huang, and Dimitrios S. Nikolopoulos (2025). "HydraCache: Long-Context Prefill Parallelization via Distributed Cache Blending." In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC25). Poster.
Download Paper

Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls

Published in SC 2025 (St. Louis, MO) — Poster, 2025

Refereed poster: Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls.

Recommended citation: Mona Moghadampanah, Adib Rezaei Shahmirzadi, and Dimitrios S. Nikolopoulos (2025). "Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls." In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC25). Poster.
Download Paper

Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference

Published in SC 2025 (St. Louis, MO) — Poster, 2025

Refereed poster: Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference.

Recommended citation: Farhana Amin, Kanchon Gharami, and Dimitrios S. Nikolopoulos (2025). "Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference." In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC25). Poster.
Download Paper

DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference

Published in SC 2025 (St. Louis, MO) — Poster, 2025

Refereed poster: DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference.

Recommended citation: Farhana Amin, Kanchon Gharami, and Dimitrios S. Nikolopoulos (2025). "DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference." In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC25). Poster.
Download Paper

Energy-Efficient Transprecision Techniques for Iterative Refinement

Published in International Conference on High Performance Computing, Networking, Storage and Analysis (SC) Posters, 2017

Presents energy-efficient transprecision techniques for iterative refinement algorithms to reduce computational energy while maintaining accuracy.

Recommended citation: Lee, J., Vandierendonck, H., & Nikolopoulos, D. (2017). "Energy-Efficient Transprecision Techniques for Iterative Refinement." In Supercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis.
Download Paper

Student Research Poster: A Scalable General Purpose System for Large-Scale Graph Processing

Published in International Conference on Parallel Architectures and Compilation (PACT), 2016

Presents a student research poster on building a scalable graph analytics framework that hides the complexity of parallelism, data distribution and memory locality behind an abstract interface using NUMA-awareness.

Recommended citation: Sun, J. (2016). "Student Research Poster: A Scalable General Purpose System for Large-Scale Graph Processing." In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 456. https://doi.org/10.1145/2967938.2971465
Download Paper

SCoRPiO: Significance Based Computing for Reliability and Power Optimization

Published in International Symposium on Code Generation and Optimization (CGO), 2016

Presents SCoRPiO, a significance-based computing approach for reliability and power optimization that leverages computational significance to balance energy efficiency with system reliability.

Recommended citation: Vassiliadis, V., Parasyris, K., Antonopoulos, C. D., Bellas, N., Riehme, J., & Nikolopoulos, D. (2016). "SCoRPiO: Significance Based Computing for Reliability and Power Optimization." In 2016 International Symposium on Code Generation and Optimization (CGO).
Download Paper

Energy-Efficient Hybrid DRAM/NVM Main Memory: ACM Student Research Competition

Published in International Conference on Parallel Architectures and Compilation Techniques (PACT) - ACM Student Research Competition, 2015

Student research competition poster presenting energy-efficient hybrid DRAM/NVM main memory architectures for improved performance and reduced power consumption in memory systems.

Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. (2015). "Energy-Efficient Hybrid DRAM/NVM Main Memory: ACM Student Research Competition." In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 492-493.
Download Paper

MapReduce for the Single-Chip-Cloud Architecture

Published in International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES), 2011

Presents a scalable implementation of MapReduce on the Intel SCC (Single-Chip Cloud), addressing scalability bottlenecks with customized data partitioning, combining and sorting algorithms for the SCC network-on-chip architecture.

Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2011). "MapReduce for the Single-Chip-Cloud Architecture." In ACACES Journal-Seventh International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems.
Download Paper

C Source Level Transformations & Optimizations for Task-Based Parallelism

Published in International Symposium on Code Generation and Optimization (CGO), Poster Session, 2011

Student poster presented at CGO 2011, exploring source-level transformations and optimizations for enabling task-based parallelism in C programs.

Recommended citation: Zakkak, F., Chassapis, D., Pratikakis, P., Nikolopoulos, D., & Bilas, A. (2011). C Source Level Transformations & Optimizations for Task-Based Parallelism. Student Poster Session, *2011 International Symposium on Code Generation and Optimization (CGO)*, April.

Model-Based Hybrid MPI/OpenMP Power-Aware Computing

Published in ACM/IEEE International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC) Poster Session, 2009

Poster presented at SC 2009 on leveraging hybrid MPI/OpenMP programming for power-aware high-performance computing through model-based approaches.

Recommended citation: Li, D., Cameron, K., Nikolopoulos, D., Schulz, M., & de Supinski, B. (2009). Model-Based Hybrid MPI/OpenMP Power-Aware Computing. Poster presented at *ACM/IEEE Supercomputing 2009 (SC)*, November.

Supporting Data-Intensive Applications on Accelerator-Based Distributed Systems

Published in Poster Session, USENIX Conference on File and Storage Technologies (FAST), 2009

Poster presented at USENIX FAST 2009 outlining architecture and programming challenges in deploying data-intensive applications on accelerator-based distributed systems.

Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2009). Supporting Data-Intensive Applications on Accelerator-Based Distributed Systems. Poster presented at *USENIX Conference on File and Storage Technologies (FAST)*, 2009.

Using machine descriptors to select parallelization models and strategies on hierarchical systems

Published in International Conference High Performance Computing, Networking, Storage and Analysis (SC) Poster Session, 2001

Presents an approach for using machine descriptors to automatically select appropriate parallelization models and strategies for hierarchical computing systems.

Recommended citation: Yankelevsky, M., Ko, W., Nikolopoulos, D. S., & Polychronopoulos, C. D. (2001). "Using machine descriptors to select parallelization models and strategies on hierarchical systems." In Poster Session of SC2001: High Performance Networking and Computing (SC'01).
Download Paper

Edited Volumes and Special Issues

Proceedings of the International Workshop on Deployment and Use of Accelerators

Published in International Conference on Paralell Processing (ICPP) Workshops, 2023

Message from the organizing committee of the 3rd International Workshop on Deployment and Use of Accelerators, highlighting workshop objectives and contributions.

Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2023). "The 3rd International Workshop on Deployment and Use of Accelerators (DUAC 2023): message from the DUAC 2023 Organizing Committee." In 3rd International Workshop on Deployment and Use of Accelerators 2023 (co-located with 52nd International Conference on Parallel Processing), vi. Association for Computing Machinery.
Download Paper

Proceedings of the International Workshop on Deployment and Use of Accelerators

Published in International Conference on Parallel Processing (ICPP) Workshops, 2022

Message from the organizing committee of the 2nd International Workshop on Deployment and Use of Accelerators.

Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2022). "Message from the 2nd DUAC Organizing Committee." In 51st International Conference on Parallel Processing, ICPP 2022.
Download Paper

Proceedings of the Workshop on Deployment and Use of Accelerators (DUAC)

Published in International Conference on Parallel Processing (ICPP) Workshops, 2021

Organizes the DUAC 2021 workshop on deployment and use of accelerators in high performance computing environments.

Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2021). "Deployment and Use of Accelerators (DUAC 2021)." In 50th International Conference on Parallel Processing. Association for Computing Machinery.
Download Paper

Proceedings: 2018 IEEE International Conference on Cluster Computing (CLUSTER)

Published in IEEE CLUSTER 2018, 2018

Editorial contribution to the proceedings of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), held in October 2018.

Recommended citation: Nikolopoulos, D. S., & De Supinski, B. R. (Eds.). *Proceedings: 2018 IEEE International Conference on Cluster Computing (CLUSTER)*. IEEE, October 2018.
Download Paper

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Published in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2018

Welcome message from the General Chairs of the 2018 IEEE International Symposium on Performance Analysis of Systems and Software.

Recommended citation: General Chairs. (2018). "Welcome from the General Chairs." In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 9. https://doi.org/10.1109/ISPASS.2018.00005
Download Paper

Proceedings of the MiniSymposium on Edge Computing

Published in Parallel Computing is Everywhere, 2018

Organizes and introduces a minisymposium on edge computing, covering current trends and challenges in edge computing technologies.

Recommended citation: Antonopoulos, C. D., & Nikolopoulos, D. S. (2018). "MiniSymposium on Edge Computing." In Parallel Computing is Everywhere, 783. IOS Press.
Download Paper

Special issue on Disruptive Technologies for Energy Efficient Computing

Published in Sustainable Computing: Informatics and Systems, 2016

Editorial for a special issue focusing on disruptive technologies for energy efficient computing in sustainable computing systems and informatics.

Recommended citation: Butt, A. R., Gniady, C., & Nikolopoulos, D. S. (2016). "Special issue on Disruptive technologies for energy efficient computing." Sustainable Computing: Informatics and Systems, 12, 56. https://doi.org/10.1016/j.suscom.2016.11.005
Download Paper

Proceedings of the Workshop on Variability in Computer Systems (VarSys)

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2016

Presents the introductory welcome message for the VarSys workshop, including conference officers’ congratulations and acknowledgments for the workshop event and proceedings publication.

Recommended citation: Cameron, K., Gamblin, T., & Nikolopoulos, D. S. (2016). "VarSys Introduction." In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1068. https://doi.org/10.1109/IPDPSW.2016.239
Download Paper

Proceedings of the Mini-symposium on Energy and Resilience in Parallel Programming

Published in Parallel Computing (Advances in Parallel Computing), 2016

Organizes and introduces a mini-symposium on energy and resilience in parallel programming, covering current trends and challenges in energy-aware and fault-tolerant parallel computing.

Recommended citation: Nikolopoulos, D. S., & Antonopoulos, C. D. (2016). "Mini-symposium on energy and resilience in parallel programming." In Parallel Computing. Advances in Parallel Computing. Elsevier. https://doi.org/10.3233/978-1-61499-621-7-709
Download Paper

Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing

Published in International Conference on High-Performance Computing, Networking, Storage and Analysis (SC) Workshops, 2015

Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, focusing on power and energy consumption as primary concerns for Exascale systems and revolutionary methods for energy efficient computing.

Recommended citation: E2SC '15: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing. (2015). Association for Computing Machinery.
Download Paper

Special Issue: Energy efficient computing with adaptive and heterogeneous architectures

Published in IET Computers & Digital Techniques, 2015

Guest editorial for a special issue on energy efficient computing with adaptive and heterogeneous architectures, addressing energy efficiency challenges in mobile devices and server systems.

Recommended citation: Nunez-Yanez, J., Moreno, J. M., & Nikolopoulos, D. S. (2015). "Guest Editorial: Special Issue: Energy efficient computing with adaptive and heterogeneous architectures." IET Computers & Digital Techniques, 9(1), 1-2.
Download Paper

Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Published in IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing, 2014

Preface message from the Technical Program Committee Co-Chairs of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2014), highlighting the scope and goals of the conference.

Recommended citation: Cameron, K. W., & Nikolopoulos, D. S. (2014). Message from Technical Program Committee Co-Chairs. *Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2014)*, IEEE. https://doi.org/10.1109/CCGrid.2014.5

Topic 1: Support Tools and Environments

Published in International European Conference on Parallel Processing (EuroPar), 2013

Introduces the Euro-Par 2013 topic on support tools and environments for parallel and distributed computing, focusing on issues such as correctness, performance, and energy efficiency.

Recommended citation: de Supinski, B. R., Krammer, B., Fürlinger, K., Labarta, J., & Nikolopoulos, D. S. (2013). Topic 1: Support Tools and Environments. In *Euro-Par 2013 Parallel Processing* (F. Wolf, B. Mohr, D. an Mey, Eds.), pp. 3–3, Springer Berlin Heidelberg.

Topic 16: GPU and Accelerators Computing

Published in International European Conference on Parallel Processing (EuroPar), 2012

Introduces the Euro-Par 2012 topic on GPU and accelerator computing, highlighting research challenges and trends in programming and optimizing heterogeneous architectures.

Recommended citation: Nikolopoulos, D. (2012). Topic 16: GPU and Accelerators Computing. In *Euro-Par 2012 Parallel Processing – 18th International Conference, Rhodes Island, Greece*, LNCS Vol. 7484, pp. 857–858, Springer. https://doi.org/10.1007/978-3-642-32820-6_84

Recent Advances in the Message Passing Interface

Published in European MPI Users' Group Meeting (EuroMPI), 2011

Edited proceedings from the 18th European MPI Users’ Group Meeting (EuroMPI 2011), featuring research on advances in message passing systems and applications.

Recommended citation: Cotronis, Y., Danalis, A., Nikolopoulos, D. S., & Dongarra, J. (Eds.). (2011). *Recent Advances in the Message Passing Interface*. Proceedings of the 18th European MPI Users’ Group Meeting (EuroMPI 2011), Santorini, Greece. Lecture Notes in Computer Science, Springer. https://doi.org/10.1007/978-3-642-24449-0

PPAC 2011 Workshop Proceedings

Published in 2011 IEEE International Conference on Cluster Computing, 2011

Workshop organizing committee information for the PPAC 2011 workshop held in conjunction with the IEEE International Conference on Cluster Computing.

Recommended citation: "PPAC 2011 Workshop Organizing Committee." In 2011 IEEE International Conference on Cluster Computing, xix-xix. https://doi.org/10.1109/CLUSTER.2011.87
Download Paper

Proceedings of the 2010 IEEE International Conference on Cluster Computing

Published in IEEE International Conference on Cluster Computing (CLUSTER), 2010

Foreword to the proceedings of the 2010 IEEE International Conference on Cluster Computing, highlighting key themes and contributions of the conference.

Recommended citation: Nikolopoulos, D. S., Bianchini, R., & Bilas, A. (2010). Foreword CLUSTER 2010. In *Proceedings of the 2010 IEEE International Conference on Cluster Computing (CLUSTER 2010)*. IEEE. https://doi.org/10.1109/CLUSTER.2010.5

Proceedings of the Parallel Programming wtih Accelerators Workshop

Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2009

Welcome message for the 2009 IEEE International Conference on Cluster Computing (CLUSTER ‘09), introducing the PPAC workshop and setting the stage for the conference.

Recommended citation: Nikolopoulos, D., & Ribbens, C. (2009). Welcome to New Orleans and PPAC'09! In *Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2009)*, December 21. IEEE. https://doi.org/10.1109/CLUSTR.2009.5289209

Invited Papers

New Approaches to Memory Reliability Management for Big Data Workloads

Published in SIAM Conference on Parallel Processing for Scientific Computing, 2018

Presents new approaches to memory reliability management specifically designed for big data workloads to improve system resilience and data integrity.

Recommended citation: Nikolopoulos, D. (2018). "New Approaches to Memory Reliability Management for Big Data Workloads." In SIAM Conference on Parallel Processing for Scientific Computing.
Download Paper

The Challenges and Opportunities of Micro-Servers in the HPC Ecosystem

Published in Workshop on Clusters, Clouds, and Data for Scientific Computing (CCDSC), 2014

Examines the potential of micro-servers in high-performance computing, highlighting architectural trends, energy efficiency, and emerging system design opportunities.

Recommended citation: Nikolopoulos, D. (2014). The Challenges and Opportunities of Micro-Servers in the HPC Ecosystem. In *Clusters, Clouds, and Data for Scientific Computing (CCDSC '14)*, September 4.
Download Paper

Green Building Blocks - Software Stacks for Energy-Efficient Clusters and Data Centres

Published in ERCIM News, 2009

Presents the Green Building Blocks (GBB) project, a software architecture for reducing energy consumption in clusters and data centers while maintaining performance.

Recommended citation: Nikolopoulos, D. S. (2009). Green Building Blocks - Software Stacks for Energy-Efficient Clusters and Data Centres. *ERCIM News*, 79. https://ercim-news.ercim.eu/en79/special/green-building-blocks
Download Paper

Set-top supercomputing: scalable software for scientific simulations on game consoles

Published in ERCIM News, 2008

Explores scalable software approaches for scientific simulations on game consoles, demonstrating the potential of consumer hardware for high-performance computing applications.

Recommended citation: Nikolopoulos, D. S. (2008). "Set-top supercomputing: scalable software for scientific simulations on game consoles." ERCIM News, 2008(74).
Download Paper

System Software Challenges and Opportunities on Asymmetric Multi-core Processors

Published in Falls Creek Falls Conference, 2007

Explores the implications of architectural asymmetry in multi-core systems for system software design, addressing challenges in scheduling, resource management, and performance tuning.

Recommended citation: Nikolopoulos, D. (2007). System Software Challenges and Opportunities on Asymmetric Multi-core Processors. Presented at 2007 Fall Creek Falls conference, Tennessee, September 2007.

Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML

Published in Virginia Tech High-End Computing Challenge, 2006

Presents the port and optimization of RAxML phylogenetic tree computation on Cell processors, achieving 5× performance improvement through multilevel parallelization and Cell-specific optimizations.

Recommended citation: Blagojevic, F., & Nikolopoulos, D. S. (2006). "Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML." In Proc. of the 2006 Virginia Tech High-End Computing Challenge.
Download Paper

Keynote Presentations

Energy Efficient Computing using Computational Significance Abstractions: Keynote Talk at the UK-China Workshop on Shaping the Low Carbon Energy Future

Published in UK-China Workshop on Shaping the Low Carbon Energy Future, 2016

Keynote presentation on energy efficient computing using computational significance abstractions at the UK-China Workshop on Shaping the Low Carbon Energy Future.

Recommended citation: Nikolopoulos, D. (2016). "Energy Efficient Computing using Computational Significance Abstractions: Keynote Talk at the UK-China Workshop on Shaping the Low Carbon Energy Future." Keynote presentation at UK-China Workshop on Shaping the Low Carbon Energy Future.
Download Paper

Using Computational Significance and Resilience in System Software Stacks: Keynote Talk

Published in International Workshop on Energy-Aware High Performance Computing, 2016

Keynote talk exploring how runtime systems and operating systems can leverage computational significance and resilience metrics to reduce energy footprint of parallel applications while tolerating higher error rates in future processors and memory technologies.

Recommended citation: Nikolopoulos, D. (2016). "Using Computational Significance and Resilience in System Software Stacks: Keynote Talk." Keynote presentation at First International Workshop on Energy-Aware High Performance Computing.
Download Paper

Programming the Energy Efficiency of High Performance Computing Systems

Published in International Conference on Energy-Aware High Performance Computing, 2013

Discusses programming methodologies and techniques for improving energy efficiency in high performance computing systems presented at the Fourth International Conference on Energy-Aware High Performance Computing.

Recommended citation: Nikolopoulos, D. S. (2013). "Programming the Energy Efficiency of High Performance Computing Systems." In Fourth International Conference on Energy-Aware High Performance Computing.
Download Paper

Connecting the Dots between Parallel Programming and Energy

Published in Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), 2013

Keynote address discussing the interplay between parallel programming models and energy efficiency in modern computing systems, delivered at PDP 2013.

Recommended citation: Nikolopoulos, D. (2013). Connecting the Dots between Parallel Programming and Energy. Keynote at *PDP 2013 – 21st Euromicro International Conference on Parallel, Distributed and Network-Based Computing*, March 1.

To Program or Not To Program the Memory Hierarchy?

Published in Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2011

Keynote talk at MULTIPROG 2011 exploring the challenges and trade-offs in managing memory hierarchies in heterogeneous multicore systems.

Recommended citation: Nikolopoulos, D. (2011). To Program or Not To Program the Memory Hierarchy? Keynote at *4th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG)*, January 10.

Facing the challenges of multicore processor technologies using autonomic system software

Published in International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2006

Discusses major challenges of software adaptation to multicore technologies and motivates the use of autonomic, self-optimizing system software for high performance portability and energy-efficient execution.

Recommended citation: Nikolopoulos, D. (2006). "Facing the challenges of multicore processor technologies using autonomic system software." In Proceedings. 20th International Parallel and Distributed Processing Symposium, 347. https://doi.org/10.1109/IPDPS.2006.1639604
Download Paper

Technical Reports

WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching

Published in , 2026

As Large Language Models (LLMs) become increasingly accessible to end users, an ever-growing number of inference requests are initiated from edge devices and computed on centralized GPU clusters. However, the resulting exponential growth in computation workload is placing significant strain on data centers, while edge devices remain largely underutilized, leading to imbalanced workloads and resource inefficiency across the network. Integrating edge devices into the LLM inference process via speculative decoding helps balance the workload between the edge and the cloud, while maintaining lossless prediction accuracy. In this paper, we identify and formalize two critical bottlenecks that limit the efficiency and scalability of distributed speculative LLM serving: Wasted Drafting Time and Verification Interference. To address these challenges, we propose WISP, an efficient and SLO-aware distributed LLM inference system that consists of an intelligent speculation controller, a verification time estimator, and a verification batch scheduler. These components collaboratively enhance drafting efficiency and optimize verification request scheduling on the server. Extensive numerical results show that WISP improves system capacity by up to 2.1x and 4.1x, and increases system goodput by up to 1.94x and 3.7x, compared to centralized serving and SLED, respectively.

Recommended citation: Li, X., Fan, J., Wang, Q., Spatharakis, D., Ghafouri, S., Vandierendonck, H., John, D., Butt, A.R., & Nikolopoulos, D.S. (2026). *WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching*. arXiv:2601.11652 [cs.CV].
Download Paper

Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving

Published in , 2025

dLLM‑Serve is a holistic serving framework for diffusion LLMs that solves their memory‑footprint and scheduling bottlenecks through novel budgeting, multiplexing, and sparse‑attention techniques, delivering up to ~1.8× higher throughput and 4× lower tail latency across diverse GPUs.

Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D.S. (2025). *Taming the Memory Footprint Crisis: System Design for PRoduction Diffusion LLM Serving*. arXiv:2512.17077 [cs.CV].
Download Paper

DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference

Published in arXiv preprint arXiv:2511.11446, 2025

Introduces DiffPro, a unified framework that optimizes both timestep counts and layer-wise precision to accelerate diffusion model inference while preserving output quality.

Recommended citation: Amin, F., Afroz, S., Gharami, K., Moghadampanah, M., & Nikolopoulos, D. S. (2025). *DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference*. arXiv:2511.11446 [cs.LG].
Download Paper

Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices

Published in arXiv preprint arXiv:2507.14959, 2025

Proposes Polymorph, an energy-efficient approach for multi-label classification of video streams optimized for embedded devices, leveraging lightweight architectures and adaptive processing.

Recommended citation: Ghafouri, S., Fayyaz, M., Li, X., John, D., Ji, B., Nikolopoulos, D., & Vandierendonck, H. (2025). *Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices*. arXiv:2507.14959 [cs.CV].
Download Paper

MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing

Published in arXiv, 2025

This paper presents MARCO, a novel framework that enhances LLM-generated code for high-performance computing through a specialized multi-agent architecture.

Recommended citation: Rahman, A., Cvetkovic, V., Reece, K., Walters, A., Hassan, Y., Tummeti, A., Torres, B., Cooney, D., Ellis, M., Nikolopoulos, D.S. (2025). "MARCO: A Multi-Agent System for Optimizing HPC Code Generation Using Large Language Models." arXiv preprint. arXiv:2505.03906.
Download Paper

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Published in arXiv preprint, 2025

Introduces SLED, a speculative decoding framework designed to improve the efficiency of large language model serving on edge devices through advanced decoding strategies.

Recommended citation: Li, X., Spatharakis, D., Ghafouri, S., Fan, J., Vandierendonck, H., John, D., Ji, B., & Nikolopoulos, D. (2025). "SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving." arXiv preprint arXiv:2506.09397.
Download Paper

Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs

Published in arXiv preprint, 2025

Presents parallel CPU-GPU execution techniques for large language model inference on constrained GPU systems to improve performance and resource utilization in memory-limited environments.

Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D. S. (2025). "Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs." arXiv preprint arXiv:2506.03296.
Download Paper

Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory

Published in arXiv preprint, 2024

Presents modeling techniques for page migration in tiered memory systems to optimize fast memory size allocation and improve system performance.

Recommended citation: Chen, S., Huang, J., Yang, S., Liu, J., Li, H., Nikolopoulos, D., Ryu, J., Baek, J., Shin, K., & Li, D. (2024). "Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory." arXiv preprint arXiv:2410.00328.
Download Paper

Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications

Published in arXiv preprint, 2023

Explores persistent memory based stateful serverless computing approaches for big data applications to improve performance and reduce cold start overhead in serverless environments.

Recommended citation: Li, Y., Assogba, K., Tripathy, A., Arif, M., Rafique, M. M., Butt, A. R., & Nikolopoulos, D. (2023). "Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications." arXiv preprint arXiv:2309.01662.
Download Paper