Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
Efficient Runtime Thread Management for the Nano-Threads Programming Model
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS Workshops), 1998
This chapter introduces memory management and scheduling strategies for the nano-threads programming model, enhancing performance on NUMA multiprocessors with hierarchical queues.
Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1998). "Efficient Runtime Thread Management for the Nano-Threads Programming Model." In *Parallel and Distributed Processing*, 183–194. https://link.springer.com/chapter/10.1007/3-540-64359-1_688
Download Paper
Kernel-Level Scheduling for the Nano-Threads Programming Model
Published in ACM International Conference on Supercomputing (ICS), 1998
This paper introduces kernel-level scheduling mechanisms to support the nano-threads programming model for scalable parallel execution.
Recommended citation: Polychronopoulos, E. D., Martorell, X., Nikolopoulos, D. S., Labarta, J., Papatheodorou, T. S., & Navarro, N. (1998). "Kernel-Level Scheduling for the Nano-Threads Programming Model." *ICS '98*, 337–344. https://doi.org/10.1145/277830.277911
Download Paper
Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors
Published in International European Conference on Parallel Processing (Euro-Par), 1998
Presents a technique to enhance autoscheduling performance in DSM multiprocessors by partitioning application task graphs and mapping them to processor clusters to improve data locality and reduce communication costs.
Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1998). "Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors." In Euro-Par'98 Parallel Processing (pp. 491-501). Springer. https://doi.org/10.1007/BFb0057892
Download Paper
Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System
Published in Parallel and Distributed Processing Techniques and Applications, 1999
Presents architectural and implementation details of a nanothreading runtime system on Solaris that addresses fine-grain parallelism exploitation and scalability in multiprogrammed environments.
Recommended citation: Nikolopoulos, D. S., Polychronopoulos, E. D., & Papatheodorou, T. S. (1999). "Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System." In PDPTA, 1797-1803.
Download Paper
Nano-Threads: Programming Model Specification
Published in ESPRIT Project NANOS Technical Report, 1999
Specifies the programming model for Nano-Threads, defining the framework for fine-grain parallelism exploitation and multiprogramming in the ESPRIT Project NANOS.
Recommended citation: Ayguade, E., Furnari, M., Giordano, M., Hoppe, H.-C., Labarta, J., Martorell, X., Navarro, N., Nikolopoulos, D., Papatheodorou, T., & Polychronopoulos, E. (1999). "Nano-Threads: Programming Model Specification." Deliverable M1. D1, ESPRIT Project NANOS, Technical Report 21907.
Download Paper
NANOS: Effective integration of fine-grain parallelism exploitation and multiprogramming
Published in Technical Report, 1999
Presents the NANOS project environment for achieving high system throughput and application performance in multiprogrammed shared-memory multiprocessors, targeting SGI Origin2000 systems.
Recommended citation: Ayguadé, E., Calidonna, C. R., Corbalan, J., Giordano, M., Gonzalez, M., Hoppe, H. C., Labarta, J., Furnari, M. M., Martorell, X., Navarro, N., Nikolopoulos, D. S., Oliver, J., Papatheodorou, T. S., & Polychronopoulos, E. D. (1999). "NANOS: Effective integration of fine-grain parallelism exploitation and multiprogramming." Technical Report.
Download Paper
System Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
Published in Hellenic Conference on Informatics, 1999
Presents system software support techniques for reducing memory latency on distributed shared memory multiprocessors to improve performance.
Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (1999). "System Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors." In Proceedings of the 7th Hellenic conference on informatics, Greece, 61-68.
Download Paper
A Quantitative Architectural Evaluation of Synchronization Algorithms and Disciplines on ccNUMA Systems: The Case of the SGI Origin2000
Published in ACM International Conference on Supercomputing (ICS), 1999
Evaluates synchronization performance on SGI Origin2000 using architectural and algorithmic perspectives for ccNUMA systems.
Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (1999). "A Quantitative Architectural Evaluation of Synchronization Algorithms on ccNUMA Systems." In ICS '99, 319–328. https://doi.org/10.1145/305138.305209
Download Paper
An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors
Published in International Conference on Parallel and Distributed Computing Systems (PDCS), 1999
Presents a kernel-level scheduling technique for supporting the nano-threads model on shared-memory multiprocessors with hierarchical and efficient queueing mechanisms.
Recommended citation: Polychronopoulos, E. D., Nikolopoulos, D. S., Papatheodorou, T. S., et al. (1999). "An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors." PDCS '99, 148–155.
Download Paper
Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives
Published in International Parallel and Distributed Processing Symposium (IPDPS), 2000
Presents a technique for fast synchronization on scalable cache-coherent multiprocessors through the use of hybrid primitives.
Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (2000). Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In *Proceedings 14th International Parallel and Distributed Processing Symposium (IPDPS 2000)*, 711-719. https://doi.org/10.1109/IPDPS.2000.846056
Download Paper
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration
Published in International Symposium on High Performance Computing (ISHPC), 2000
Describes transparent mechanisms for emulating data distribution facilities in OpenMP through user-level dynamic page migration, implementing UPMlib to improve memory locality without modifying the programming model.
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration." In High Performance Computing, 415-427. Springer. https://doi.org/10.1007/3-540-39999-2_40
Download Paper
A Case for User-Level Dynamic Page Migration
Published in ACM International Conference on Supercomputing (ICS), 2000
This paper proposes a runtime system for user-level dynamic page migration in OpenMP codes on DSM systems, improving locality and adaptivity over OS-level solutions.
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "A Case for User-Level Dynamic Page Migration." *ICS 2000*, 119–130. https://doi.org/10.1145/335231.335243
Download Paper
Efficient Dynamic Parallelism with OpenMP on Linux SMPs
Published in Parallel and Distributed Processing Techniques and Applications, 2000
Presents an integrated environment for efficient support of dynamic parallelism with OpenMP on Linux-based SMPs, achieving up to 6.3 times higher throughput under multiprogramming.
Recommended citation: Antonopoulos, C. D., Venetis, I. E., Nikolopoulos, D. S., & Papatheodorou, T. S. (2000). "Efficient Dynamic Parallelism with OpenMP on Linux SMPs." In PDPTA.
Download Paper
A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager
Published in Job Scheduling Strategies for Parallel Processing (JSSPP), 2000
Introduces the NANOS CPU Manager, a runtime environment for optimizing processor scheduling policies for shared-memory multiprocessors.
Recommended citation: Martorell, X., Corbalán, J., Nikolopoulos, D. S., et al. (2000). "A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager." In Job Scheduling Strategies for Parallel Processing, 87–112. https://doi.org/10.1007/3-540-39997-6_7
Download Paper
A Transparent Runtime Data Distribution Engine for OpenMP
Published in Scientific Programming, 2000
Introduces a runtime mechanism for transparent page migration in OpenMP programs to improve performance on NUMA systems without explicit data placement directives.
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., et al. (2000). "A Transparent Runtime Data Distribution Engine for OpenMP." Scientific Programming, 8(3), Article 417570. https://doi.org/10.1155/2000/417570
Download Paper
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
Published in International Conference on Parallel Processing (ICPP), 2000
This paper proposes a user-level mechanism for dynamic page migration to improve locality and performance in multiprogrammed SMPs.
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguade, E. (2000). "User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors." ICPP 2000, 95–103. https://doi.org/10.1109/ICPP.2000.876083
Download Paper
UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors
Published in Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR), 2000
UPMLIB provides dynamic memory tuning for OpenMP programs by performing runtime page migrations using compiler and OS feedback on shared-memory multiprocessors.
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguadé, E. (2000). "UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors." In *Languages, Compilers, and Run-Time Systems for Scalable Computers*, 85–99. https://doi.org/10.1007/3-540-40889-4_7
Download Paper
Is Data Distribution Necessary in OpenMP?
Published in International Conference on High-Performance Computing, Networking, Storage and Analysis (SC), 2000
This work questions the necessity of explicit data distribution in OpenMP programming, exploring scheduling and locality-aware execution models for scalable performance. Best Paper Award
Recommended citation: Nikolopoulos, D. S., Papatheodorou, T. S., Polychronopoulos, C. D., Labarta, J., & Ayguade, E. (2000). "Is Data Distribution Necessary in OpenMP?" *SC 2000*. https://doi.org/10.1109/SC.2000.10025
Download Paper
Improving Java Server Performance with Interruptlets
Published in International Conference on Computational Science (ICCS), 2001
Proposes Interruptlets as lightweight, low-overhead interrupt handlers for improving Java server performance, reducing I/O thread and memory copy overhead in JVMs on Linux.
Recommended citation: Craig, D., Carroll, S., Breg, F., Nikolopoulos, D. S., & Polychronopoulos, C. (2001). Improving Java Server Performance with Interruptlets. In *Computational Science – ICCS 2001* (V. N. Alexandrov et al., Eds.), pp. 223–232, Springer.
The Trade-Off Between Implicit and Explicit Data Distribution in Shared-Memory Programming Paradigms
Published in ACM International Conference on Supercomputing (ICS), 2001
Analyzes trade-offs among automatic page placement, page migration, and manual data distribution in OpenMP programs on NUMA systems.
Recommended citation: Nikolopoulos, D. S., Ayguadé, E., Papatheodorou, T. S., et al. (2001). "Implicit vs. Explicit Data Distribution." In ICS '01, 23–37. https://doi.org/10.1145/377792.377801
Download Paper
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
Published in International Workshop on OpenMP (IWOMP), 2001
Evaluates the effectiveness of runtime data distribution methods in OpenMP programs using SPEC benchmarks, achieving 20-25% speedup improvements through automatic data distribution without API extensions.
Recommended citation: Nikolopoulos, D. S., & Ayguadé, E. (2001). "A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks." In OpenMP Shared Memory Parallel Programming (pp. 115-129). Springer. https://doi.org/10.1007/3-540-44587-0_11
Download Paper
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors
Published in International Journal of Parallel Programming, 2001
Investigates architectural and OS-level implications for efficient synchronization on ccNUMA platforms, analyzing hardware and software optimizations.
Recommended citation: Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). "The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors." IJPP, 29(3), 249–282. https://doi.org/10.1023/A:1011168003859
Download Paper
Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs
Published in International Conference on Parallel Processing (ICPP), 2001
Presents scheduling heuristics that use runtime feedback to improve scheduling of synchronizing threads on multiprogrammed shared-memory multiprocessors.
Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs. In *ICPP 2001*, 123–130. https://doi.org/10.1109/ICPP.2001.952054
Download Paper
A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models
Published in International European Conference on Parallel Processing (Euro-Par), 2001
Defines a unified set of operating system services for embedding adaptability in thread-based programming paradigms, achieving up to 41.2% throughput improvement in multiprogrammed SMP environments.
Recommended citation: Venetis, I. E., Nikolopoulos, D. S., & Papatheodorou, T. S. (2001). "A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models." In Euro-Par 2001 Parallel Processing (pp. 514-524). Springer. https://doi.org/10.1007/3-540-44681-8_75
Download Paper
Using machine descriptors to select parallelization models and strategies on hierarchical systems
Published in International Conference High Performance Computing, Networking, Storage and Analysis (SC) Poster Session, 2001
Presents an approach for using machine descriptors to automatically select appropriate parallelization models and strategies for hierarchical computing systems.
Recommended citation: Yankelevsky, M., Ko, W., Nikolopoulos, D. S., & Polychronopoulos, C. D. (2001). "Using machine descriptors to select parallelization models and strategies on hierarchical systems." In Poster Session of SC2001: High Performance Networking and Computing (SC'01).
Download Paper
Scaling Irregular Parallel Codes with Minimal Programming Effort
Published in International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC), 2001
This paper presents an OpenMP-based approach to scaling irregular parallel codes with minimal programming effort, matching MPI performance on benchmark applications. Best Paper Award Nominee
Recommended citation: Nikolopoulos, D. S., Polychronopoulos, C. D., & Ayguadé, E. (2001). "Scaling Irregular Parallel Codes with Minimal Programming Effort." *SC '01*. https://doi.org/10.1145/582034.582050
Download Paper
Exploiting Memory Affinity in OpenMP through Schedule Reuse
Published in SIGARCH Computer Architecture News, 2001
This work introduces the concept of reusing iteration schedules in OpenMP to improve memory affinity and scalability on NUMA shared-memory systems.
Recommended citation: Nikolopoulos, D. S., Artiaga, E., Ayguadé, E., & Labarta, J. (2001). "Exploiting Memory Affinity in OpenMP through Schedule Reuse." *SIGARCH Comput. Archit. News*, 29(5), 49–55. https://doi.org/10.1145/563647.563657
Download Paper
Quantifying and resolving remote memory access contention on hardware DSM multiprocessors
Published in International Parallel and Distributed Processing Symposium (IPDPS), 2002
Presents methods for quantifying and resolving remote memory access contention on hardware distributed shared-memory multiprocessors to improve performance and coherence. Best Paper Award
Recommended citation: Nikolopoulos, D. S. (2002). "Quantifying and resolving remote memory access contention on hardware DSM multiprocessors." In Proceedings 16th International Parallel and Distributed Processing Symposium, 10 pp. https://doi.org/10.1109/IPDPS.2002.1015503
Download Paper
Adaptive scheduling under memory pressure on multiprogrammed SMPs
Published in International Parallel and Distributed Processing Symposium (IPDPS), 2002
Presents adaptive scheduling techniques for symmetric multiprocessors under memory pressure in multiprogrammed environments to improve system performance and throughput.
Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Adaptive scheduling under memory pressure on multiprogrammed SMPs. In *Proceedings 16th International Parallel and Distributed Processing Symposium (IPDPS 2002)*, 6 pp. https://doi.org/10.1109/IPDPS.2002.1015481
Download Paper
Effective cross-platform, multilevel parallelism via dynamic adaptive execution
Published in International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2002
Presents an approach for achieving effective cross-platform and multilevel parallelism through dynamic adaptive execution techniques.
Recommended citation: Ko, W., Yankelevsky, M., Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Effective cross-platform, multilevel parallelism via dynamic adaptive execution. In *Proceedings 16th International Parallel and Distributed Processing Symposium (IPDPS 2002)*, 8 pp. https://doi.org/10.1109/IPDPS.2002.1016495
Download Paper
Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters
Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2002
Presents a scheduler design for multiprogrammed clusters that adapts to memory pressure using kernel-level extensions to control thread execution. Best Paper Award
Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2002). Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters. In *CCGRID '02*, 22. https://doi.org/10.1109/CCGRID.2002.1017108
Download Paper
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors
Published in Journal of Parallel and Distributed Computing, 2002
Presents a novel dynamic page migration algorithm that improves data locality in multiprogrammed shared-memory multiprocessors through scheduler-page migration engine communication.
Recommended citation: Nikolopoulos, D. S., Polychronopoulos, C. D., Papatheodorou, T. S., Labarta, J., & Ayguadé, E. (2002). "Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors." Journal of Parallel and Distributed Computing, 62(6), 1069-1103. https://doi.org/10.1006/jpdc.2001.1817
Download Paper
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
Published in International Journal of Parallel Programming, 2002
Compares data distribution methodologies for scaling OpenMP performance on NUMA architectures, presenting novel runtime techniques that can effectively replace manual data distribution in regular applications.
Recommended citation: Nikolopoulos, D. S., Ayguadé, E., & Polychronopoulos, C. D. (2002). "Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models." International Journal of Parallel Programming, 30(4), 225-255. https://doi.org/10.1023/A:1019899812171
Download Paper
Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules
Published in Scientific Programming, 2003
Explores customizing and reusing loop schedules to improve scalability of non-regular numerical codes in shared-memory architectures, establishing thread-data affinity while maintaining programming simplicity.
Recommended citation: Nikolopoulos, D. S., Artiaga, E., Ayguadé, E., & Labarta, J. (2003). "Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules." Scientific Programming, 11(2), 379739. https://doi.org/10.1155/2003/379739
Download Paper
Malleable memory mapping: user-level control of memory bounds for effective program adaptation
Published in International Parallel and Distributed Processing Symposium (IPDPS), 2003
Presents malleable memory mapping techniques that provide user-level control of memory bounds to enable effective program adaptation in distributed computing environments.
Recommended citation: Nikolopoulos, D. S. (2003). Malleable memory mapping: user-level control of memory bounds for effective program adaptation. In *Proceedings International Parallel and Distributed Processing Symposium (IPDPS 2003)*, 8 pp. https://doi.org/10.1109/IPDPS.2003.1213074
Download Paper
Adaptive Scheduling Under Memory Constraints on Non-Dedicated Computational Farms
Published in Future Generation Computer Systems, 2003
Proposes a scheduler for parallel programs that adapts to memory constraints in non-dedicated environments using thrashing prevention and co-scheduling extensions.
Recommended citation: Nikolopoulos, D. S., & Polychronopoulos, C. D. (2003). "Adaptive Scheduling Under Memory Constraints on Non-Dedicated Computational Farms." *FGCS*, 19(4), 505–519. https://doi.org/10.1016/S0167-739X(03)00031-1
Download Paper
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Published in International Symposium on High Performance Computing (ISHPC), 2003
This chapter presents software techniques like dynamic tiling, copying, and block data layouts to improve cache performance on SMT processors through all-software partitioning approaches. Best Paper Award
Recommended citation: Nikolopoulos, D. S. (2003). "Code and Data Transformations for Improving Shared Cache Performance on SMT Processors." In *High Performance Computing*, 54–69. https://doi.org/10.1007/978-3-540-39707-6_5
Download Paper
Quantifying contention and balancing memory load on hardware DSM multiprocessors
Published in Journal of Parallel and Distributed Computing, 2003
Proposes a methodology for quantifying remote memory access contention on hardware DSM multiprocessors and presents an algorithm for detecting hot spots and balancing memory load using dynamic page migration.
Recommended citation: Nikolopoulos, D. S. (2003). "Quantifying contention and balancing memory load on hardware DSM multiprocessors." Journal of Parallel and Distributed Computing, 63(9), 866-886. https://doi.org/10.1016/S0743-7315(03)00105-9
Download Paper
Scheduling Algorithms with Bus Bandwidth Considerations for SMPs
Published in International Conference on Parallel Processing (ICPP), 2003
This paper introduces scheduling algorithms that incorporate system bus bandwidth as a first-class constraint for efficient SMP scheduling.
Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2003). "Scheduling Algorithms with Bus Bandwidth Considerations for SMPs." ICPP 2003, 547–554. https://doi.org/10.1109/ICPP.2003.1240622
Download Paper
Dynamic Tiling for Effective Use of Shared Caches on Multithreaded Processors
Published in International Journal of High Performance Computing and Networking, 2004
Proposes dynamic tiling transformations to partition shared caches in SMT processors and improve performance of multithreaded workloads.
Recommended citation: Nikolopoulos, D. S. (2004). "Dynamic Tiling for Effective Use of Shared Caches on Multithreaded Processors." IJHPCN, 2(1), 22–35. https://doi.org/10.1504/IJHPCN.2004.009265
Download Paper
Exploiting Simultaneous Multithreading for Parallel Mesh Generation: A Multigrain Approach on Deep Multiprocessors
Published in International Meshing Roundtable (IMR), 2004
Presents a multigrain parallelization strategy leveraging simultaneous multithreading (SMT) to accelerate mesh generation on deep multiprocessor systems.
Recommended citation: Antonopoulos, C. D., Chrisochoides, N., & Nikolopoulos, D. (2004). Exploiting Simultaneous Multithreading for Parallel Mesh Generation: A Multigrain Approach on Deep Multiprocessors. In *13th International Meshing Roundtable (IMR)*.
Adapting to Memory Pressure from Within Scientific Applications on Multiprogrammed COWs
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2004
Describes adaptive memory management for scientific codes on clusters of workstations (COWs), reacting to memory pressure from the runtime.
Recommended citation: Mills, R. T., Stathopoulos, A., & Nikolopoulos, D. S. (2004). "Adapting to Memory Pressure from Within Scientific Applications on Multiprogrammed COWs." IPDPS 2004. https://doi.org/10.1109/IPDPS.2004.1303002
Download Paper
Runtime Support for Integrating Precomputation and Thread-Level Parallelism on Simultaneous Multithreaded Processors
Published in Workshop on Languages, Compilers and Runtime Systems for Parallel Computing (LCR), 2004
Introduces runtime mechanisms that coordinate speculative precomputation and thread-level parallelism on SMT processors for improved performance.
Recommended citation: Wang, T., Blagojevic, F., & Nikolopoulos, D. S. (2004). Runtime Support for Integrating Precomputation and Thread-Level Parallelism. In *LCR '04*, 1–12. https://doi.org/10.1145/1066650.1066667
Download Paper
Scheduling Algorithms with Bus Bandwidth Considerations for SMPs
Published in High-Performance Computing, John Wiley & Sons, 2005
This book chapter presents gang-like scheduling techniques for SMP systems to optimize use of shared bus bandwidth based on runtime monitoring.
Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2005). "Scheduling Algorithms with Bus Bandwidth Considerations for SMPs." In High-Performance Computing (pp. 313–332). John Wiley & Sons. https://doi.org/10.1002/0471732710.ch16
Download Paper
Power-aware Resource Allocation via Online Simulation with Multiple-queue Backfilling
Published in Workshop on Preformability Modeling of Computer and Communication Systems, 2005
Presents power-aware resource allocation techniques using online simulation with multiple-queue backfilling for efficient energy management in computing systems.
Recommended citation: Lawson, B., Yue, C., Smirni, E., & Nikolopoulos, D. (2005). "Power-aware Resource Allocation via Online Simulation with Multiple-queue Backfilling." In Proceedings of the 7th workshop on preformability Modeling of Computer and Communication systems, held in conjunction with the second Quantitative Evaluation of systems.
Download Paper
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors
Published in International Parallel and Distributed Processing Symposium (IPDPS), 2005
This paper introduces scheduling algorithms that improve thread pairing for hybrid multiprocessors, targeting execution efficiency on SMT and CMP hardware architectures.
Recommended citation: McGregor, R. L., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors." IPDPS 2005. https://doi.org/10.1109/IPDPS.2005.390
Download Paper
Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures
Published in ACM International Conference on Supercomputing (ICS), 2005
This paper explores multigrain parallelism in Delaunay mesh generation and evaluates execution on multithreaded SMT-based systems, revealing opportunities for performance gains.
Recommended citation: Antonopoulos, C. D., Ding, X., Chernikov, A., Blagojevic, F., Nikolopoulos, D. S., & Chrisochoides, N. (2005). "Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures." *ICS '05*, 367–376. https://doi.org/10.1145/1088149.1088198
Download Paper
2-D Parallel Constrained Delaunay Mesh Generation: A Multigrain Approach on Deep Multiprocessors
Published in PMUA '05 (Workshop at ICS), 2005
Presents a multigrain approach to 2-D constrained Delaunay mesh generation for deep multiprocessors, delivered as an invited presentation at PMUA held with ICS 2005.
Recommended citation: Antonopoulos, C. D., Chrisochoides, N., & Nikolopoulos, D. S. (2005). *2-D Parallel Constrained Delaunay Mesh Generation: A Multigrain Approach on Deep Multiprocessors*. In Abstracts of the Workshop on Programming Models for HPCS Ultra-Scale Applications (PMUA), held with the 19th ACM International Conference on Supercomputing (ICS), Cambridge, MA, USA.
Download Paper
smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs
Published in International European Conference on Parallel Processing (Euro-Par), 2005
Presents SPRINTS, a source-level speculative precomputation framework for scientific applications on SMTs that reduces memory latency by prefetching long streams of delinquent data accesses without requiring hardware or compiler support.
Recommended citation: Wang, T., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs." In Euro-Par 2005 Parallel Processing (pp. 710-719). Springer. https://doi.org/10.1007/11549468_78
Download Paper
Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors
Published in High Performance Computing and Communications (HPCC), 2005
Introduces Factory, an object-oriented parallel programming substrate written in C++ that allows programmers to express multigrain parallelism without requiring language extensions or extra compiler support.
Recommended citation: Schneider, S., Antonopoulos, C. D., & Nikolopoulos, D. S. (2005). "Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors." In High Performance Computing and Communications (pp. 223-232). Springer. https://doi.org/10.1007/11557654_28
Download Paper
Integrating Multiple Forms of Multithreaded Execution on Multi-SMT Systems: A Study with Scientific Applications
Published in International Conference on the Quantitative Evaluation of Systems (QEST), 2005
Investigates integration of simultaneous and fine-grained multithreading for improved execution of scientific codes on SMT-based systems.
Recommended citation: Curtis-Maury, M., Wang, T., Antonopoulos, C., & Nikolopoulos, D. (2005). "Integrating Multiple Forms of Multithreaded Execution on Multi-SMT Systems." QEST 2005, 199–208. https://doi.org/10.1109/QEST.2005.16
Download Paper
Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs
Published in IEEE International Conference on High Performance Computing (HiPC), 2005
This chapter proposes scheduling policies that treat memory bandwidth as a first-class resource in multiprogrammed SMP systems.
Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Papatheodorou, T. S. (2005). "Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs." In *HiPC 2004*, Springer, pp. 286–296. https://doi.org/10.1007/978-3-540-30474-6_33
Download Paper
Designing effective memory allocators for multicore and multithreaded systems: A case study with irregular and adaptive applications
Published in Department of Computer Science, College of William & Mary, 2006
Presents a case study on designing effective memory allocators for multicore and multithreaded systems, focusing on irregular and adaptive applications.
Recommended citation: Schneider, S., Antonopoulos, C. D., Chernikov, A. N., Nikolopoulos, D. S., & Chrisochoides, N. P. (2006). "Designing effective memory allocators for multicore and multithreaded systems: A case study with irregular and adaptive applications." Submitted to the Supercomputing Conference.
Download Paper
Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML
Published in Virginia Tech High-End Computing Challenge, 2006
Presents the port and optimization of RAxML phylogenetic tree computation on Cell processors, achieving 5× performance improvement through multilevel parallelization and Cell-specific optimizations.
Recommended citation: Blagojevic, F., & Nikolopoulos, D. S. (2006). "Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML." In Proc. of the 2006 Virginia Tech High-End Computing Challenge.
Download Paper
Facing the challenges of multicore processor technologies using autonomic system software
Published in International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2006
Discusses major challenges of software adaptation to multicore technologies and motivates the use of autonomic, self-optimizing system software for high performance portability and energy-efficient execution.
Recommended citation: Nikolopoulos, D. (2006). "Facing the challenges of multicore processor technologies using autonomic system software." In Proceedings. 20th International Parallel and Distributed Processing Symposium, 347. https://doi.org/10.1109/IPDPS.2006.1639604
Download Paper
MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods
Published in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2006
Combines static code analysis with runtime instrumentation to reduce cache conflicts in multithreaded programs.
Recommended citation: Ding, X., Nikolopoulos, D. S., Jiang, S., & Zhang, X. (2006). MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods. In *ISPASS 2006*, 189–198. https://doi.org/10.1109/ISPASS.2006.1620803
Download Paper
Online Strategies for High-Performance Power-Aware Thread Execution on Emerging Multiprocessors
Published in IEEE International Parallel & Distributed Processing Symposium (IPDPS Workshops), 2006
This paper proposes runtime strategies for dynamically adjusting thread execution to optimize energy consumption and performance on multiprocessors.
Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Online Strategies for High-Performance Power-Aware Thread Execution on Emerging Multiprocessors." *IPDPS 2006*. https://doi.org/10.1109/IPDPS.2006.1639598
Download Paper
On the design of online predictors for autonomic power-performance adaptation of multithreaded programs
Published in Journal of Autonomic and Trusted Computing, 2006
Investigates the design space for techniques that enable runtime, autonomic program adaptation for high-performance and low-power execution via event-driven performance prediction on multithreaded and multicore architectures.
Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "On the design of online predictors for autonomic power-performance adaptation of multithreaded programs." Journal of Autonomic and Trusted Computing, 1.
Download Paper
Online Power-Performance Adaptation of Multithreaded Programs Using Hardware Event-Based Prediction
Published in ACM International Conference on Supercomputing (ICS), 2006
This paper introduces a user-level runtime framework for online adaptation of multithreaded programs, leveraging hardware event-based prediction to optimize power and performance trade-offs in real systems with Intel Hyperthreaded processors.
Recommended citation: Curtis-Maury, M., Dzierwa, J., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Online Power-Performance Adaptation of Multithreaded Programs Using Hardware Event-Based Prediction." Proceedings of the 20th Annual International Conference on Supercomputing (ICS), 157–166. https://doi.org/10.1145/1183401.1183426
Download Paper
Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory
Published in IEEE International Conference on High Performance Distributed Computing (HPDC), 2006
Presents runtime support for memory adaptation in scientific applications using local disk and remote memory to enable dynamic memory management and improved resource utilization. Best Paper Award Nominee
Recommended citation: Yue, C., Mills, R. T., Stathopoulos, A., & Nikolopoulos, D. (2006). "Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory." In 2006 15th IEEE International Conference on High Performance Distributed Computing, 183-194. https://doi.org/10.1109/HPDC.2006.1652149
Download Paper
Scalable Locality-Conscious Multithreaded Memory Allocation
Published in ACM SIGPLAN International Symposium on Memory Management (ISMM), 2006
This paper presents Streamflow, a multithreaded memory manager that improves locality and reduces synchronization overhead, outperforming state-of-the-art allocators through a segregated heap design and non-blocking operations.
Recommended citation: Schneider, S., Antonopoulos, C. D., & Nikolopoulos, D. S. (2006). "Scalable Locality-Conscious Multithreaded Memory Allocation." Proceedings of the 5th International Symposium on Memory Management (ISMM), 84–94. https://doi.org/10.1145/1133956.1133968
Download Paper
PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors
Published in International Conference on the Quantitative Evaluation of Systems (QEST), 2006
Presents PACMAN, a performance counters manager designed specifically for Intel Hyperthreaded processors to enable efficient performance monitoring and analysis.
Recommended citation: Antonopoulos, C. D., Nikolopoulos, D. S., & Curtis-Maury, M. (2006). "PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors." In Third International Conference on the Quantitative Evaluation of Systems (QEST), 141-144. https://doi.org/10.1109/QEST.2006.41
Download Paper
Dynamic Program Stirring on Multiple Cores: How Hardware Performance Monitors Can Help Regulate Performance, Power, and Temperature Simultaneously
Published in Workshop on Functionality of Hardware Performance Monitors, 2006
Explores how hardware performance monitors can provide insights into software-hardware interaction to regulate performance, power, and temperature simultaneously on multicore platforms through dynamic adaptation.
Recommended citation: Curtis-Maury, M., Nikolopoulos, D. S., & Antonopoulos, C. D. (2006). "Dynamic Program Stirring on Multiple Cores: How Hardware Performance Monitors Can Help Regulate Performance, Power, and Temperature Simultaneously." In Proc. of the Second Workshop on Functionality of Hardware Performance Monitors.
Download Paper
Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel
Published in Parallel Computing, 2007
Presents the design and implementation of a nanothreading interface in the Linux kernel for Intel SMP platforms to achieve robust performance and increased throughput in multiprogrammed environments.
Recommended citation: Nikolopoulos, D. S., Antonopoulos, C. D., Venetis, I. E., Hadjidoukas, P. E., Polychronopoulos, E. D., & Papatheodorou, T. S. (2007). Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel. In *Parallel Computing* (pp. 623-630). World Scientific. https://doi.org/10.1142/9781848160170_0074
Download Paper
Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors
Published in Technical Report TR-07-26, Department of Computer Science, Virginia Tech, 2007
Technical report proposing a computational model for multi-grain parallelization on heterogeneous multi-core processors, applied to phylogenetics workloads on the IBM Cell BE.
Recommended citation: Blagojevic, F., Feng, X., Cameron, K., & Nikolopoulos, D. (2007). *Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors*. Technical Report TR-07-26, Department of Computer Science, Virginia Tech.
Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems
Published in 11th Workshop on High Performance Embedded Computing, 2007
Derives a methodology for synthesizing polymorphic programming models for asymmetric multi-core processors, focusing on runtime performance modeling and scheduling of dynamic parallelism.
Recommended citation: Nikolopoulos, D. S., & Cameron, K. W. (2007). "Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems." In Proceedings of the 11th Workshop on High Performance Embedded Computing.
Download Paper
A comparison of online and offline strategies for program adaptation
Published in Annual ACM Southeast Conference (ACMSE), 2007
Compares online and offline strategies for program adaptation in high-performance computing, analyzing the pros and cons of different information collection and analysis approaches for dynamic adaptation based on execution length and use characteristics.
Recommended citation: Curtis-Maury, M., Antonopoulos, C. D., & Nikolopoulos, D. S. (2007). "A comparison of online and offline strategies for program adaptation." In Proceedings of the 45th Annual ACM Southeast Conference, 162-167. https://doi.org/10.1145/1233341.1233371
Download Paper
Dynamic Multigrain Parallelization on the Cell Broadband Engine
Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007
This paper presents a scheduler for adaptive, multigrain parallelism on the Cell Broadband Engine, demonstrating performance improvements using layered parallelism in RAxML workloads. Best Paper Award
Recommended citation: Blagojevic, F., Nikolopoulos, D. S., Stamatakis, A., & Antonopoulos, C. D. (2007). "Dynamic Multigrain Parallelization on the Cell Broadband Engine." PPoPP '07, 90–100. https://doi.org/10.1145/1229428.1229445
Download Paper
RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2007
RAxML-Cell introduces a parallel phylogenetic inference engine targeting the Cell Broadband Engine, demonstrating performance gains through low-level hardware tuning.
Recommended citation: Blagojevic, F., Stamatakis, A., Antonopoulos, C. D., & Nikolopoulos, D. S. (2007). "RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine." *IPDPS 2007*, 1–10. https://doi.org/10.1109/IPDPS.2007.370267
Download Paper
Application-specific customization on many-core platforms: the VT-ASOS framework
Published in Second Workshop on Software and Tools for Multi-Core Systems, 2007
Presents the VT-ASOS framework for application-specific customization on many-core platforms, enabling tailored system software solutions for diverse computing environments.
Recommended citation: Back, G., & Nikolopoulos, D. S. (2007). "Application-specific customization on many-core platforms: the VT-ASOS framework." In Proceedings of the Second Workshop on Software and Tools for Multi-Core Systems.
Download Paper
Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory
Published in Journal of Grid Computing, 2007
Extends the MMlib framework to provide fully customizable memory malleability in scientific applications, treating DRAM as a dynamic cache with local disk and remote memory capabilities.
Recommended citation: Mills, R. T., Yue, C., Stathopoulos, A., & Nikolopoulos, D. S. (2007). "Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory." Journal of Grid Computing, 5(2), 213-234. https://doi.org/10.1007/s10723-007-9075-7
Download Paper
System Software Challenges and Opportunities on Asymmetric Multi-core Processors
Published in Falls Creek Falls Conference, 2007
Explores the implications of architectural asymmetry in multi-core systems for system software design, addressing challenges in scheduling, resource management, and performance tuning.
Recommended citation: Nikolopoulos, D. (2007). System Software Challenges and Opportunities on Asymmetric Multi-core Processors. Presented at 2007 Fall Creek Falls conference, Tennessee, September 2007.
Identifying Energy-Efficient Concurrency Levels Using Machine Learning
Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2007
This work uses machine learning models to automatically determine energy-optimal concurrency levels for parallel workloads, improving performance-per-watt.
Recommended citation: Curtis-Maury, M., Singh, K., McKee, S. A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2007). "Identifying Energy-Efficient Concurrency Levels Using Machine Learning." *Cluster 2007*, 488–495. https://doi.org/10.1109/CLUSTR.2007.4629274
Download Paper
Experience with memory allocators for parallel mesh generation on multicore architectures
Published in International Conference on Numerical Grid Generation in Computational Field Simulations, 2007
Evaluates scalable and locality-aware multiprocessor memory allocators against custom allocators for parallel mesh generation algorithms on multithreaded and multicore architectures.
Recommended citation: Chernikov, A. N., Antonopoulos, C. D., Chrisochoides, N. P., Schneider, S., & Nikolopoulos, D. S. (2007). "Experience with memory allocators for parallel mesh generation on multicore architectures." In International Conference on Numerical Grid Generation in Computational Field Simulations.
Download Paper
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Published in The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2007
This paper enhances RAxML, a phylogenetic inference tool, with novel tree search heuristics and a high-performance implementation on the IBM Cell Broadband Engine, yielding substantial speedups and addressing multi-level parallelism and optimization challenges.
Recommended citation: Stamatakis, A., Blagojevic, F., Nikolopoulos, D. S., & Antonopoulos, C. D. (2007). "Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell." The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 48(3), 271–286. https://doi.org/10.1007/s11265-007-0067-4
Download Paper
Runtime Scheduling of Dynamic Parallelism on Accelerator-Based Multi-Core Systems
Published in Parallel Computing, 2007
This paper investigates runtime mechanisms for multi-grain parallelism scheduling on heterogeneous multi-core systems, introducing S-MGPS for dynamic optimization on the Cell Broadband Engine.
Recommended citation: Blagojevic, F., Nikolopoulos, D. S., Stamatakis, A., Antonopoulos, C. D., & Curtis-Maury, M. (2007). "Runtime Scheduling of Dynamic Parallelism on Accelerator-Based Multi-Core Systems." *Parallel Computing*, 33(10), 700–719. https://doi.org/10.1016/j.parco.2007.09.004
Download Paper
DMA-Based Prefetching for I/O-Intensive Workloads on the Cell Architecture
Published in ACM International Conference on Computing Frontiers (CF), 2008
This paper evaluates DMA-based asynchronous prefetching techniques to improve the performance of I/O-intensive applications on the Cell Broadband Engine.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2008). "DMA-Based Prefetching for I/O-Intensive Workloads on the Cell Architecture." *CF '08*, 23–32. https://doi.org/10.1145/1366230.1366236
Download Paper
COMPUTER SCIENCE RESEARCH MELISSES: Liquid Services for Scalable Multithreaded and Multicore Execution on Emerging Supercomputers
Published in Technical Report, Virginia Tech, 2008
Technical report introducing MELISSES, a framework for liquid services enabling scalable multithreaded and multicore execution on future-generation supercomputers.
Recommended citation: Nikolopoulos, D. S. (2008). *COMPUTER SCIENCE RESEARCH MELISSES: Liquid Services for Scalable Multithreaded and Multicore Execution on Emerging Supercomputers*. Technical Report, Virginia Tech.
Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE
Published in International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), 2008
This chapter introduces a model for predicting scalability and performance of applications exploiting task- and data-level parallelism on heterogeneous multicore systems like the Cell BE.
Recommended citation: Blagojevic, F., Feng, X., Cameron, K. W., & Nikolopoulos, D. S. (2008). "Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE." In *High Performance Embedded Architectures and Compilers*, Springer, pp. 38–52. https://doi.org/10.1007/978-3-540-77560-7_4
Download Paper
Set-top supercomputing: scalable software for scientific simulations on game consoles
Published in ERCIM News, 2008
Explores scalable software approaches for scientific simulations on game consoles, demonstrating the potential of consumer hardware for high-performance computing applications.
Recommended citation: Nikolopoulos, D. S. (2008). "Set-top supercomputing: scalable software for scientific simulations on game consoles." ERCIM News, 2008(74).
Download Paper
Supporting I/O-Intensive Workloads on the Cell Architecture
Published in USENIX Conference on File and Storage Technologies (FAST), 2008
Explores performance enhancing techniques for I/O intensive workloads on Cell Broadband Engine, achieving 30.2% performance improvement through asynchronous prefetching and decentralized DMAs.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2008). "Supporting I/O-Intensive Workloads on the Cell Architecture." In Proc. USENIX FAST.
Download Paper
VT-ASOS: Holistic system software customization for many cores
Published in IEEE International Symposium on Parallel and Distributed Processing (IPDPS) Workshops, 2008
Presents VT-ASOS, a holistic approach to system software customization for many-core architectures, addressing virtualization, resource management, and fault tolerance.
Recommended citation: Nikolopoulos, D. S., Back, G., Tripathi, J., & Curtis-Maury, M. (2008). "VT-ASOS: Holistic system software customization for many cores." In 2008 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 1-5. https://doi.org/10.1109/IPDPS.2008.4536390
Download Paper
Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine
Published in ACM International Conference on Computing Frontiers (CF), 2008
This paper presents a scalable model and scheduling technique for implementing wavefront computations on the Cell Broadband Engine, evaluated through Smith-Waterman alignment.
Recommended citation: Aji, A. M., Feng, W., Blagojevic, F., & Nikolopoulos, D. S. (2008). "Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine." *Computing Frontiers 2008*, 13–22. https://doi.org/10.1145/1366230.1366235
Download Paper
Scheduling Asymmetric Parallelism on a PlayStation3 Cluster
Published in IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008
Presents scheduling techniques for asymmetric parallelism on PlayStation3 clusters, addressing performance modeling and process scheduling challenges on Cell BE architecture.
Recommended citation: Blagojevic, F., Curtis-Maury, M., Yeom, J.-S., Schneider, S., & Nikolopoulos, D. S. (2008). "Scheduling Asymmetric Parallelism on a PlayStation3 Cluster." In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 146-153. https://doi.org/10.1109/CCGRID.2008.64
Download Paper
An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors
Published in OpenMP Shared Memory Parallel Programming, Springer Berlin Heidelberg, 2008
This chapter evaluates OpenMP performance across SMT and CMP architectures, highlighting architectural bottlenecks and benefits of adaptive runtime mechanisms. Best Paper Award
Recommended citation: Curtis-Maury, M., Ding, X., Antonopoulos, C. D., & Nikolopoulos, D. S. (2008). "An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors." In *OpenMP Shared Memory Parallel Programming* (pp. 133–144). Springer. https://doi.org/10.1007/978-3-540-68555-5_11
Download Paper
Hardware Support for Explicit Communication in Scalable CMPs
Published in Technical Report, Computer Architecture Dept., Polytechnic University of Catalonia (UPC), 2008
Presents hardware support mechanisms for explicit communication in scalable chip multiprocessors to improve inter-core communication efficiency.
Recommended citation: Villavieja, C., Katevenis, M., Navarro, N., Pnevmatikatos, D., Ramirez, A., Kavadias, S., Papaefstathiou, V., & Nikolopoulos, D. S. (2008). "Hardware Support for Explicit Communication in Scalable CMPs." Technical Report, Computer Architecture Dept., Polytechnic University of Catalonia (UPC), Barcelona.
Download Paper
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
Published in IEEE Transactions on Parallel and Distributed Systems, 2008
This paper presents a prediction-based runtime framework for adapting multithreaded scientific codes to optimize power and performance, using multivariate regression and application-aware models for energy-efficient execution on multicore systems.
Recommended citation: Curtis-Maury, M., Blagojevic, F., Antonopoulos, C. D., & Nikolopoulos, D. S. (2008). "Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes." IEEE Transactions on Parallel and Distributed Systems, 19(10), 1396–1410. https://doi.org/10.1109/TPDS.2007.70804
Download Paper
Prediction Models for Multi-Dimensional Power-Performance Optimization on Many Cores
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008
This paper introduces an online, application-aware prediction framework for optimizing dynamic voltage/frequency scaling (DVFS) and dynamic concurrency throttling (DCT) in multi-core systems, achieving significant gains in energy efficiency and performance.
Recommended citation: Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2008). "Prediction Models for Multi-Dimensional Power-Performance Optimization on Many Cores." Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 250–259. https://doi.org/10.1145/1454115.1454151
Download Paper
Supporting Data-Intensive Applications on Accelerator-Based Distributed Systems
Published in Poster Session, USENIX Conference on File and Storage Technologies (FAST), 2009
Poster presented at USENIX FAST 2009 outlining architecture and programming challenges in deploying data-intensive applications on accelerator-based distributed systems.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2009). Supporting Data-Intensive Applications on Accelerator-Based Distributed Systems. Poster presented at *USENIX Conference on File and Storage Technologies (FAST)*, 2009.
A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies
Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2009
This paper compares programming models for explicitly managed memory hierarchies (EMM), focusing on programmability and performance across application workloads.
Recommended citation: Schneider, S., Yeom, J.-S., Rose, B., Linford, J. C., Sandu, A., & Nikolopoulos, D. S. (2009). "A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies." *PPoPP '09*, 131–140. https://doi.org/10.1145/1504176.1504197
Download Paper
Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters
Published in SIGOPS Operating Systems Review, 2009
This article explores the design and performance of MapReduce on hybrid clusters with asymmetric multi-core accelerators and general-purpose processors.
Recommended citation: Rafique, M. M., Rose, B., Butt, A. R., & Nikolopoulos, D. S. (2009). "Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters." SIGOPS Operating Systems Review, 43(2), 25–34. https://doi.org/10.1145/1531793.1531800
Download Paper
A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies
Published in SIGPLAN Notices, 2009
This journal article compares abstractions for programming multiprocessors with explicitly managed memory, analyzing programmability and efficiency trade-offs.
Recommended citation: Schneider, S., Yeom, J.-S., Rose, B., Linford, J. C., Sandu, A., & Nikolopoulos, D. S. (2009). "A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies." *SIGPLAN Not.*, 44(4), 131–140. https://doi.org/10.1145/1594835.1504197
Download Paper
CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters
Published in International Symposium on Parallel & Distributed Processing (IPDPS), 2009
This paper introduces CellMR, a runtime framework enabling MapReduce workloads on Cell-based heterogeneous clusters with a focus on resource efficiency and acceleration.
Recommended citation: Rafique, M. M., Rose, B., Butt, A. R., & Nikolopoulos, D. S. (2009). "CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters." *IPDPS 2009*, 1–12. https://doi.org/10.1109/IPDPS.2009.5161062
Download Paper
Scheduling Dynamic Parallelism on the Cell BE
Published in IBM HPC Systems Scientific Computing User Group (SCICOMP), 2009
Presents strategies for scheduling dynamic parallelism on the Cell Broadband Engine architecture, addressing challenges in runtime system support and performance optimization.
Recommended citation: Blagojevic, F., Iancu, C., Yelick, K. A., Nikolopoulos, D., Rose, B., & Curtis-Maury, M. (2009). Scheduling Dynamic Parallelism on the Cell BE. In *Proceedings of the 15th Meeting of the IBM HPC Systems Scientific Computing User Group (SCICOMP)*, May.
Scheduling dynamic parallelism on accelerators
Published in ACM Conference on Computing Frontiers (CF), 2009
Presents scheduling approaches for dynamic parallelism on accelerator-based systems, demonstrating cooperative scheduling and work-stealing techniques on the Cell BE architecture.
Recommended citation: Blagojevic, F., Iancu, C., Yelick, K., Curtis-Maury, M., Nikolopoulos, D. S., & Rose, B. (2009). Scheduling dynamic parallelism on accelerators. In *Proceedings of the 6th ACM Conference on Computing Frontiers (CF '09)*, 161-170. https://doi.org/10.1145/1531743.1531769
Download Paper
A Multigrain Delaunay Mesh Generation Method for Multicore SMT-Based Architectures
Published in Journal of Parallel and Distributed Computing, 2009
This paper evaluates a multigrain Delaunay mesh generation approach across multiple architectural layers, optimizing for SMT and multicore platforms.
Recommended citation: Antonopoulos, C. D., Blagojevic, F., Chernikov, A. N., Chrisochoides, N. P., & Nikolopoulos, D. S. (2009). "A Multigrain Delaunay Mesh Generation Method for Multicore SMT-Based Architectures." *JPDC*, 69(7), 589–600. https://doi.org/10.1016/j.jpdc.2009.03.009
Download Paper
Algorithm, Software, and Hardware Optimizations for Delaunay Mesh Generation on Simultaneous Multithreaded Architectures
Published in Journal of Parallel and Distributed Computing, 2009
Details multi-level optimizations that improve performance of a parallel Delaunay mesh generator by up to 6x on SMT-based SMP systems.
Recommended citation: Antonopoulos, C. D., Blagojevic, F., Chernikov, A. N., Chrisochoides, N. P., & Nikolopoulos, D. S. (2009). "Optimizations for Delaunay Mesh Generation on SMT." JPDC, 69(7), 601–612. https://doi.org/10.1016/j.jpdc.2009.03.005
Download Paper
A Runtime Framework for Optimizing Multi-Dimensional Array Accesses on Multi-core Processors
Published in Technical Report, 2009
Presents Strider, a runtime library framework for programming and optimization of multi-dimensional data accesses in nested loops on multi-core processors with explicitly managed memory hierarchies.
Recommended citation: Yeom, J.-S., & Nikolopoulos, D. S. (2009). "A Runtime Framework for Optimizing Multi-Dimensional Array Accesses on Multi-core Processors." Technical Report.
Download Paper
Green Building Blocks - Software Stacks for Energy-Efficient Clusters and Data Centres
Published in ERCIM News, 2009
Presents the Green Building Blocks (GBB) project, a software architecture for reducing energy consumption in clusters and data centers while maintaining performance.
Recommended citation: Nikolopoulos, D. S. (2009). Green Building Blocks - Software Stacks for Energy-Efficient Clusters and Data Centres. *ERCIM News*, 79. https://ercim-news.ercim.eu/en79/special/green-building-blocks
Download Paper
Model-Based Hybrid MPI/OpenMP Power-Aware Computing
Published in ACM/IEEE International Conference on High-Performance Computing, Networking, Storage, and Analysis (SC) Poster Session, 2009
Poster presented at SC 2009 on leveraging hybrid MPI/OpenMP programming for power-aware high-performance computing through model-based approaches.
Recommended citation: Li, D., Cameron, K., Nikolopoulos, D., Schulz, M., & de Supinski, B. (2009). Model-Based Hybrid MPI/OpenMP Power-Aware Computing. Poster presented at *ACM/IEEE Supercomputing 2009 (SC)*, November.
Processors: The Challenge of Cooperation
Published in Economist Special Edition, Volume 71, 2009
Article in The Economist Special Edition discussing the future of processor design and the need for coordination in multicore systems.
Recommended citation: Katevenis, M. G. H., & Nikolopoulos, D. (2009). Processors: The Challenge of Cooperation. *Economist Special Edition*, 71, 26–28.
Programming Multiprocessors with Explicitly Managed Memory Hierarchies
Published in IEEE Computer, 2009
This article discusses programming techniques for multiprocessors with explicitly managed memory, using the Cell Broadband Engine as a case study for efficient parallel memory handling.
Recommended citation: Schneider, S., Yeom, J.-S., & Nikolopoulos, D. S. (2009). "Programming Multiprocessors with Explicitly Managed Memory Hierarchies." *IEEE Computer*, 42(12), 28–34. https://doi.org/10.1109/MC.2009.407
Download Paper
Proceedings of the Parallel Programming wtih Accelerators Workshop
Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2009
Welcome message for the 2009 IEEE International Conference on Cluster Computing (CLUSTER ‘09), introducing the PPAC workshop and setting the stage for the conference.
Recommended citation: Nikolopoulos, D., & Ribbens, C. (2009). Welcome to New Orleans and PPAC'09! In *Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2009)*, December 21. IEEE. https://doi.org/10.1109/CLUSTR.2009.5289209
Parallelization and Performance of an H.264 Video Encoder on the Cell BE
Published in Technical Report, Foundation for Research and Technology Hellas, Institute of Computer Science, 2010
This technical report analyzes parallelization strategies and performance of an H.264 video encoder targeting the Cell Broadband Engine architecture.
Recommended citation: Alvanos, M., Tzenakis, G., Nikolopoulos, D. S., & Bilas, A. *Parallelization and Performance of an H.264 Video Encoder on the Cell BE*. Technical Report, January 2010.
Download Paper
SCOOP: Source-level COmpiler Optimizations for Parallelism
Published in Technical Report, Foundation for Research and Technology Hellas, Institute of Computer Science, 2010
SCOOP introduces source-level compiler optimizations to automatically expose parallelism in sequential code.
Recommended citation: Zakkak, F. S., Chasapis, D., Pratikakis, P., Bilas, A., & Nikolopoulos, D. S. *SCOOP: Source-level COmpiler Optimizations for Parallelism*. Technical Report, 2010.
Download Paper
Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor
Published in International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), 2010
Introduces TPC, a runtime system that minimizes off-chip communication for efficient task initiation on the Cell architecture.
Recommended citation: Tzenakis, G., Kapelonis, K., Alvanos, M., et al. (2010). "TPC: Efficient Runtime for Task-Based Parallelism on the Cell." In HiPEAC, 307–321. https://doi.org/10.1007/978-3-642-11515-8_23
Download Paper
Scalability and Productivity of Parallel Programming Models for Heterogeneous–ISA Multi-Core Architectures with Local Memories
Published in Technical Report, 2010
This technical report evaluates parallel programming models in terms of scalability and productivity for heterogeneous–ISA multi-core systems with local memories.
Recommended citation: Ferrer, R., Bellens, P., Koukos, K., Alvanos, M., Yeom, J.-S., Schneider, S., Beltrán, V., González, M., Martorell, X., Badia, R. M., et al. *Scalability and Productivity of Parallel Programming Models for Heterogeneous–ISA Multi-Core Architectures with Local Memories*. Technical Report, March 2010.
Download Paper
Hybrid MPI/OpenMP Power-Aware Computing
Published in IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
This work explores hybrid MPI/OpenMP programming for power-aware high-performance computing, introducing predictive models and heuristics to balance performance with energy efficiency through DVFS strategies.
Recommended citation: Li, D., de Supinski, B. R., Schulz, M., Cameron, K., & Nikolopoulos, D. S. (2010). "Hybrid MPI/OpenMP Power-Aware Computing." 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 1–12. https://doi.org/10.1109/IPDPS.2010.5470463
Download Paper
Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems
Published in International Symposium on Parallel & Distributed Processing (IPDPS), 2010
This paper presents a predictive model for MPI task aggregation in power-aware HPC systems, improving energy efficiency without sacrificing performance.
Recommended citation: Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). "Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems." *IPDPS 2010*, 1–12. https://doi.org/10.1109/IPDPS.2010.5470464
Download Paper
Designing Accelerator-Based Distributed Systems for High Performance
Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2010
Proposes frameworks for programming and managing asymmetric clusters composed of accelerator nodes for data-intensive applications.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2010). "Designing Accelerator-Based Distributed Systems for High Performance." CCGRID 2010, 165–174. https://doi.org/10.1109/CCGRID.2010.109
Download Paper
On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces
Published in ACM International Conference on Computing Frontiers (CF), 2010
This work introduces cache-integrated network interfaces for on-chip communication and synchronization in multicore processors, combining the strengths of scratchpad memory and cache-based systems.
Recommended citation: Kavadias, S. G., Katevenis, M. G. H., Zampetakis, M., & Nikolopoulos, D. S. (2010). "On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces." *CF '10*, 217–226. https://doi.org/10.1145/1787275.1787328
Download Paper
Evaluation of Streaming Aggregation on Parallel Hardware Architectures
Published in ACM International Conference on Distributed Event-Based Systems (DEBS), 2010
This study compares streaming aggregation performance across Intel CPUs, NVIDIA GPUs, and IBM Cell processors, highlighting memory access patterns and data movement as key performance drivers.
Recommended citation: Schneider, S., Andrade, H., Gedik, B., Wu, K.-L., & Nikolopoulos, D. S. (2010). "Evaluation of Streaming Aggregation on Parallel Hardware Architectures." *DEBS '10*, 248–257. https://doi.org/10.1145/1827418.1827467
Download Paper
Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories
Published in International Conference on Parallel Processing (ICPP), 2010
Revisits MapReduce design for heterogeneous multicore systems using explicitly managed memory hierarchies and runtime adaptability. Best Paper Award Nominee
Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2010). "Rearchitecting MapReduce for Heterogeneous Multicore Processors." ICPP 2010, 121–130. https://doi.org/10.1109/ICPP.2010.21
Download Paper
Comparing Scalability Prediction Strategies on an SMP of CMPs
Published in International European Conference on Parallel and Distributed Computing (Euro-Par), 2010
Compares linear regression and ANN approaches for predicting scalable concurrency levels in scientific applications on CMPs.
Recommended citation: Singh, K., Curtis-Maury, M., McKee, S. A., et al. (2010). "Comparing Scalability Prediction Strategies on an SMP of CMPs." In Euro-Par 2010, 143–155. https://doi.org/10.1007/978-3-642-15277-1_14
Download Paper
Explicit Communication and Synchronization in SARC
Published in IEEE Micro, 2010
Discusses the SARC architecture that uses explicit communication and synchronization primitives with scratchpad memory and RDMA support.
Recommended citation: Katevenis, M., Papaefstathiou, V., Kavadias, S., et al. (2010). "Explicit Communication and Synchronization in SARC." IEEE Micro, 30(5), 30–41. https://doi.org/10.1109/MM.2010.77
Download Paper
Parallel Programming Models for Heterogeneous Multicore Architectures
Published in IEEE Micro, Volume 30, Issue 5, 2010
Explores programming models for heterogeneous multicore architectures, focusing on concurrency, hardware/software interfaces, and multiprocessor system environments, as part of the SARC European Project.
Recommended citation: SARC European Project. (2010). Parallel Programming Models for Heterogeneous Multicore Architectures. *IEEE Micro*, 30(5), 42–53. https://doi.org/10.1109/MM.2010.94
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories
Published in ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010
Presents Strider, a runtime system for optimizing strided data access patterns on multi-core architectures with explicitly managed memory hierarchies, improving array access performance through intelligent prefetching and buffering.
Recommended citation: Yeom, J.-S., & Nikolopoulos, D. S. (2010). "Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories." In SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 1-11. https://doi.org/10.1109/SC.2010.52
Download Paper
Proceedings of the 2010 IEEE International Conference on Cluster Computing
Published in IEEE International Conference on Cluster Computing (CLUSTER), 2010
Foreword to the proceedings of the 2010 IEEE International Conference on Cluster Computing, highlighting key themes and contributions of the conference.
Recommended citation: Nikolopoulos, D. S., Bianchini, R., & Bilas, A. (2010). Foreword CLUSTER 2010. In *Proceedings of the 2010 IEEE International Conference on Cluster Computing (CLUSTER 2010)*. IEEE. https://doi.org/10.1109/CLUSTER.2010.5
PPAC 2011 Workshop Proceedings
Published in 2011 IEEE International Conference on Cluster Computing, 2011
Workshop organizing committee information for the PPAC 2011 workshop held in conjunction with the IEEE International Conference on Cluster Computing.
Recommended citation: "PPAC 2011 Workshop Organizing Committee." In 2011 IEEE International Conference on Cluster Computing, xix-xix. https://doi.org/10.1109/CLUSTER.2011.87
Download Paper
To Program or Not To Program the Memory Hierarchy?
Published in Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2011
Keynote talk at MULTIPROG 2011 exploring the challenges and trade-offs in managing memory hierarchies in heterogeneous multicore systems.
Recommended citation: Nikolopoulos, D. (2011). To Program or Not To Program the Memory Hierarchy? Keynote at *4th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG)*, January 10.
A Capabilities-Aware Framework for Using Computational Accelerators in Data-Intensive Computing
Published in Journal of Parallel and Distributed Computing, 2011
This work proposes a framework using heterogeneous accelerators like GPUs and Cell processors for data-intensive workloads, improving performance through capability-aware resource allocation.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. S. (2011). "A Capabilities-Aware Framework for Using Computational Accelerators in Data-Intensive Computing." *JPDC*, 71(2), 185–197. https://doi.org/10.1016/j.jpdc.2010.09.004
Download Paper
Fine-grain OpenMP runtime support with explicit communication hardware primitives
Published in Design, Automation & Test in Europe (DATE), 2011
Presents fine-grain OpenMP runtime support using explicit communication hardware primitives to improve synchronization and performance in parallel applications.
Recommended citation: Tendulkar, P., Papaefstathiou, V., Nikiforos, G., Kavadias, S., Nikolopoulos, D. S., & Katevenis, M. (2011). "Fine-grain OpenMP runtime support with explicit communication hardware primitives." In 2011 Design, Automation & Test in Europe, 1-4. https://doi.org/10.1109/DATE.2011.5763299
Download Paper
C Source Level Transformations & Optimizations for Task-Based Parallelism
Published in International Symposium on Code Generation and Optimization (CGO), Poster Session, 2011
Student poster presented at CGO 2011, exploring source-level transformations and optimizations for enabling task-based parallelism in C programs.
Recommended citation: Zakkak, F., Chassapis, D., Pratikakis, P., Nikolopoulos, D., & Bilas, A. (2011). C Source Level Transformations & Optimizations for Task-Based Parallelism. Student Poster Session, *2011 International Symposium on Code Generation and Optimization (CGO)*, April.
Parallel Programming of General-Purpose Programs Using Task-Based Programming Models
Published in USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011
This paper extends the Cilk programming model by introducing input, output, and inout dependency types on task arguments, enabling concise expression of complex parallelism patterns like pipelines and speculative execution in general-purpose programs. The proposed extensions improve code readability and maintain performance comparable to existing models.
Recommended citation: Vandierendonck, H., Pratikakis, P., & Nikolopoulos, D. S. (2011). "Parallel Programming of General-Purpose Programs Using Task-Based Programming Models." *HotPar '11*, USENIX Association. https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Vandierendonck.pdf
Download Paper
Scalable memory registration for high performance networks using helper threads
Published in ACM International Conference on Computing Frontiers (CF), 2011
Proposes a memory registration strategy using helper threads to reduce registered memory requirements on multicore architectures for HPC applications with RDMA networks.
Recommended citation: Li, D., Cameron, K. W., Nikolopoulos, D. S., de Supinski, B. R., & Schulz, M. (2011). Scalable memory registration for high performance networks using helper threads. In *Proceedings of the 8th ACM International Conference on Computing Frontiers (CF '11)*, Article 38. https://doi.org/10.1145/2016604.2016652
Download Paper
A Programming Model for Deterministic Task Parallelism
Published in ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC), 2011
Presents a model for deterministic parallelism using tasks with isolated footprints, removing the need for cache coherence and enabling provably deterministic execution.
Recommended citation: Pratikakis, P., Vandierendonck, H., Lyberis, S., & Nikolopoulos, D. S. (2011). "A Programming Model for Deterministic Task Parallelism." MSPC '11, 7–12. https://doi.org/10.1145/1988915.1988918
Download Paper
MapReduce for the Single-Chip-Cloud Architecture
Published in International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES), 2011
Presents a scalable implementation of MapReduce on the Intel SCC (Single-Chip Cloud), addressing scalability bottlenecks with customized data partitioning, combining and sorting algorithms for the SCC network-on-chip architecture.
Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2011). "MapReduce for the Single-Chip-Cloud Architecture." In ACACES Journal-Seventh International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems.
Download Paper
Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer
Published in Intel Many-core Applications Research Community Symposium (MARC), 2011
Presents scalable runtime support mechanisms for data-intensive applications running on Intel’s Single-Chip Cloud Computer architecture.
Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2011). Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer. In *Proceedings of the 3rd Intel Many-core Applications Research Community Symposium (MARC)*, 25-30. https://tpapagian.github.io/files/paper_marc.pdf
Download Paper
Task-Based Parallel H.264 Video Encoding for Explicit Communication Architectures
Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2011
Proposes a task-based parallelization strategy for H.264 encoding optimized for explicit communication architectures.
Recommended citation: Alvanos, M., Tzenakis, G., Nikolopoulos, D. S., & Bilas, A. (2011). Task-Based Parallel H.264 Video Encoding for Explicit Communication Architectures. In *SAMOS 2011*, 217–224. https://doi.org/10.1109/SAMOS.2011.6045464
Download Paper
A Unified Scheduler for Recursive and Task Dataflow Parallelism
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011
This paper introduces a unified scheduling approach for parallel programs using recursive and task-dataflow parallelism, aiming for efficient execution with minimal overhead.
Recommended citation: Vandierendonck, H., Tzenakis, G., & Nikolopoulos, D. S. (2011). "A Unified Scheduler for Recursive and Task Dataflow Parallelism." *PACT 2011*, 1–11. https://doi.org/10.1109/PACT.2011.7
Download Paper
Recent Advances in the Message Passing Interface
Published in European MPI Users' Group Meeting (EuroMPI), 2011
Edited proceedings from the 18th European MPI Users’ Group Meeting (EuroMPI 2011), featuring research on advances in message passing systems and applications.
Recommended citation: Cotronis, Y., Danalis, A., Nikolopoulos, D. S., & Dongarra, J. (Eds.). (2011). *Recent Advances in the Message Passing Interface*. Proceedings of the 18th European MPI Users’ Group Meeting (EuroMPI 2011), Santorini, Greece. Lecture Notes in Computer Science, Springer. https://doi.org/10.1007/978-3-642-24449-0
BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism
Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2012
This paper presents BDDT, a runtime for deterministic task parallelism using block-level dependence analysis on dynamic memory regions.
Recommended citation: Tzenakis, G., Papatriantafyllou, A., Kesapides, J., Pratikakis, P., Vandierendonck, H., & Nikolopoulos, D. S. (2012). "BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism." *PPoPP 2012*, 301–302. https://doi.org/10.1145/2145816.2145864
Download Paper
Formic: Cost-Efficient and Scalable Prototyping of Manycore Architectures
Published in IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2012
This paper presents Formic, a platform for efficient prototyping and validation of manycore hardware architectures using FPGA-based development environments.
Recommended citation: Lyberis, S., Kalokerinos, G., Lygerakis, M., Papaefstathiou, V., Tsaliagkos, D., Katevenis, M., Pnevmatikatos, D., & Nikolopoulos, D. (2012). "Formic: Cost-Efficient and Scalable Prototyping of Manycore Architectures." *FCCM 2012*, 61–64. https://doi.org/10.1109/FCCM.2012.20
Download Paper
Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary
Published in International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2012
Summarizes dynamic binary rewriting and migration techniques for shared-ISA asymmetric multicore processors to enable efficient code optimization and thread migration.
Recommended citation: Georgakoudis, G., Lalis, S., & Nikolopoulos, D. S. (2012). "Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary." In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 127-128. https://doi.org/10.1145/2287076.2287096
Download Paper
EPC: a power instrumentation controller for embedded applications
Published in SIGBED Review, 2012
Proposes and implements a real-time power monitor controller based on an 8-bit AVR controller and analog Hall effect current sensor for automated power measurement and energy accounting in embedded applications.
Recommended citation: Manousakis, I., & Nikolopoulos, D. S. (2012). "EPC: a power instrumentation controller for embedded applications." SIGBED Rev., 9(2), 28-32. https://doi.org/10.1145/2318836.2318841
Download Paper
The Myrmics Memory Allocator: Hierarchical, Message-Passing Allocation for Global Address Spaces
Published in SIGPLAN Notices, 2012
Myrmics implements a scalable, hierarchical memory allocator supporting dynamic regions and message-passing for task-based programming in distributed systems.
Recommended citation: Lyberis, S., Pratikakis, P., Nikolopoulos, D. S., et al. (2012). "The Myrmics Memory Allocator." *SIGPLAN Not.*, 47(11), 15–24. https://doi.org/10.1145/2426642.2259001
Download Paper
BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism
Published in SIGPLAN Notices, 2012
A journal article version of BDDT, describing a runtime for structured task parallelism using fine-grained memory access footprints for dynamic analysis.
Recommended citation: Tzenakis, G., Papatriantafyllou, A., Kesapides, J., Pratikakis, P., Vandierendonck, H., & Nikolopoulos, D. S. (2012). "BDDT: Block-Level Dynamic Dependence Analysis for Deterministic Task-Based Parallelism." *SIGPLAN Notices*, 47(8), 301–302. https://doi.org/10.1145/2370036.2145864
Download Paper
Topic 16: GPU and Accelerators Computing
Published in International European Conference on Parallel Processing (EuroPar), 2012
Introduces the Euro-Par 2012 topic on GPU and accelerator computing, highlighting research challenges and trends in programming and optimizing heterogeneous architectures.
Recommended citation: Nikolopoulos, D. (2012). Topic 16: GPU and Accelerators Computing. In *Euro-Par 2012 Parallel Processing – 18th International Conference, Rhodes Island, Greece*, LNCS Vol. 7484, pp. 857–858, Springer. https://doi.org/10.1007/978-3-642-32820-6_84
Inference and Declaration of Independence: Impact on Deterministic Task Parallelism
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012
Proposes static optimizations for deterministic task-parallel execution to reduce runtime overhead in task creation and dependency checks.
Recommended citation: Zakkak, F. S., Chasapis, D., Pratikakis, P., et al. (2012). "Inference and Declaration of Independence." In PACT '12, 453–454. https://doi.org/10.1145/2370816.2370892
Download Paper
Critical Path-Based Thread Placement for NUMA Systems
Published in SIGMETRICS Performance Evaluation Review, 2012
This paper presents a runtime and algorithms that improve OpenMP performance on NUMA systems by optimizing thread placement along the critical path.
Recommended citation: Su, C., Li, D., Nikolopoulos, D. S., Grove, M., Cameron, K., & de Supinski, B. R. (2012). "Critical Path-Based Thread Placement for NUMA Systems." *SIGMETRICS Perform. Eval. Rev.*, 40(2), 106–112. https://doi.org/10.1145/2381056.2381079
Download Paper
BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies
Published in International Symposium on Computer ARchitecture and High-Performance Computing (SBAC-PAD), 2012
Introduces BTL, a framework for fine-grained measurement and modeling of memory energy consumption in hierarchical memory systems.
Recommended citation: Manousakis, I., & Nikolopoulos, D. S. (2012). "BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies." SBAC-PAD 2012, 139–146. https://doi.org/10.1109/SBAC-PAD.2012.38
Download Paper
On the Use of GPUs in Realizing Cost-Effective Distributed RAID
Published in IEEE International Symposium on Modling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2012
Evaluates GPU-based offloading techniques to improve the performance and cost-efficiency of distributed RAID storage architectures.
Recommended citation: Khasymski, A., Rafique, M. M., Butt, A. R., et al. (2012). "On the Use of GPUs in Realizing Cost-Effective Distributed RAID." MASCOTS 2012, 469–478. https://doi.org/10.1109/MASCOTS.2012.59
Download Paper
Model-based, Memory-centric Performance and Power Optimization on NUMA Multiprocessors
Published in IEEE International Symposium on Workload Characterization (IISWC), 2012
This work proposes a memory-centric performance and power optimization model for NUMA systems, guided by hardware counters and predictive modeling.
Recommended citation: Su, C., Li, D., Nikolopoulos, D. S., et al. (2012). "Model-based, Memory-centric Performance and Power Optimization on NUMA Multiprocessors." IISWC 2012, 164–173. https://doi.org/10.1109/IISWC.2012.6402921
Download Paper
Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs
Published in International Journal of Parallel Programming, 2012
Presents cache-integrated network interfaces that combine the flexibility of caches with the efficiency of scratchpad memories, providing configurable on-chip SRAM sharing and event response mechanisms for scalable multicore architectures with less than 20% logic overhead.
Recommended citation: Kavadias, S., Katevenis, M., Zampetakis, M., & Nikolopoulos, D. S. (2012). "Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs." International Journal of Parallel Programming, 40(6), 583-604. https://doi.org/10.1007/s10766-011-0173-6
Download Paper
Strategies for Energy-Efficient Resource Management of Hybrid Programming Models
Published in IEEE Transactions on Parallel and Distributed Systems, 2013
This work proposes dynamic concurrency throttling and DVFS strategies for energy-efficient resource management in hybrid parallel applications on multicore platforms.
Recommended citation: Li, D., de Supinski, B. R., Schulz, M., Nikolopoulos, D. S., & Cameron, K. W. (2013). "Strategies for Energy-Efficient Resource Management of Hybrid Programming Models." *IEEE TPDS*, 24(1), 144–157. https://doi.org/10.1109/TPDS.2012.95
Download Paper
Topic 1: Support Tools and Environments
Published in International European Conference on Parallel Processing (EuroPar), 2013
Introduces the Euro-Par 2013 topic on support tools and environments for parallel and distributed computing, focusing on issues such as correctness, performance, and energy efficiency.
Recommended citation: de Supinski, B. R., Krammer, B., Fürlinger, K., Labarta, J., & Nikolopoulos, D. S. (2013). Topic 1: Support Tools and Environments. In *Euro-Par 2013 Parallel Processing* (F. Wolf, B. Mohr, D. an Mey, Eds.), pp. 3–3, Springer Berlin Heidelberg.
Overcoming the scalability challenges of contagion simulations on Blue Waters
Published in Technical Report 13-057, NDSSL, Virginia Bioinformatics Institute, 2013
Addresses scalability challenges in contagion simulations on the Blue Waters supercomputer, presenting solutions for large-scale epidemiological modeling and simulation.
Recommended citation: Yeom, J.-S., Bhatele, A., Bisset, K., Bohm, E., Gupta, A., Kale, L. V., Marathe, M., Nikolopoulos, D. S., Schulz, M., & Wesolowski, L. (2013). "Overcoming the scalability challenges of contagion simulations on Blue Waters." Technical Report 13-057, NDSSL, Virginia Bioinformatics Institute.
Download Paper
Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores
Published in International Workshop on Code OptimiSation for MultI and Many Cores (COSMIC), 2013
Presents a low overhead binary code rewriting method for shared-ISA multicore processors that enables thread migration among heterogeneous cores while preserving functional equivalence. Best Paper Award
Recommended citation: Georgakoudis, G., Nikolopoulos, D. S., & Lalis, S. (2013). "Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores." In Proceedings of the First International Workshop on Code OptimiSation for MultI and Many Cores, Article 4. https://doi.org/10.1145/2446920.2446924
Download Paper
Parallel Programming
Published in Encyclopedia of Software Engineering, 2013
An overview of common abstractions in parallel programming, including models based on shared and distributed memory, with discussions on programmability and performance trade-offs.
Recommended citation: Vandierendonck, H., Nikolopoulos, D. S., & Pratikakis, P. (2013). Parallel Programming. In *Encyclopedia of Software Engineering* (Taylor and Francis), February 27.
Modeling and Algorithms for Scalable and Energy Efficient Execution on Multicore Systems
Published in Scalable Computing and Communications: Theory and Practice, 2013
Presents modeling techniques and algorithms for achieving scalable and energy-efficient execution on multicore systems in the context of scalable computing and communications.
Recommended citation: Li, D., Nikolopoulos, D. S., & Cameron, K. W. (2013). "Modeling and Algorithms for Scalable and Energy Efficient Execution on Multicore Systems." In Scalable Computing and Communications: Theory and Practice (pp. 157-184). Wiley-Blackwell.
Download Paper
Connecting the Dots between Parallel Programming and Energy
Published in Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), 2013
Keynote address discussing the interplay between parallel programming models and energy efficiency in modern computing systems, delivered at PDP 2013.
Recommended citation: Nikolopoulos, D. (2013). Connecting the Dots between Parallel Programming and Energy. Keynote at *PDP 2013 – 21st Euromicro International Conference on Parallel, Distributed and Network-Based Computing*, March 1.
Prefetching and Cache Management Using Task Lifetimes
Published in ACM International Conference on Supercomputing (ICS), 2013
This paper presents EBP and ECM mechanisms to leverage task lifetime information for improving prefetching and cache management in task-parallel runtimes.
Recommended citation: Papaefstathiou, V., Katevenis, M. G. H., Nikolopoulos, D. S., & Pnevmatikatos, D. (2013). "Prefetching and Cache Management Using Task Lifetimes." *ICS 2013*, 325–334. https://doi.org/10.1145/2464996.2465443
Download Paper
BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism
Published in Advanced Parallel Processing Technologies, Springer Berlin Heidelberg, 2013
A comprehensive treatment of the BDDT runtime with emphasis on block-level memory tracking and support for irregular applications in task-parallel environments.
Recommended citation: Tzenakis, G., Papatriantafyllou, A., Vandierendonck, H., Pratikakis, P., & Nikolopoulos, D. S. (2013). "BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism." In *Advanced Parallel Processing Technologies* (pp. 17–31). Springer. https://doi.org/10.1007/978-3-642-45293-2_2
Download Paper
DRASync: distributed region-based memory allocation and synchronization
Published in European MPI Users Group Meeting (EuroMPI), 2013
Presents DRASync, a region-based allocator that implements a global address space abstraction for MPI programs with pointer-based data structures and high-level synchronization primitives.
Recommended citation: Symeonidou, C., Pratikakis, P., Bilas, A., & Nikolopoulos, D. S. (2013). "DRASync: distributed region-based memory allocation and synchronization." In Proceedings of the 20th European MPI Users' Group Meeting, 49-54. https://doi.org/10.1145/2488551.2488558
Download Paper
Programming the Energy Efficiency of High Performance Computing Systems
Published in International Conference on Energy-Aware High Performance Computing, 2013
Discusses programming methodologies and techniques for improving energy efficiency in high performance computing systems presented at the Fourth International Conference on Energy-Aware High Performance Computing.
Recommended citation: Nikolopoulos, D. S. (2013). "Programming the Energy Efficiency of High Performance Computing Systems." In Fourth International Conference on Energy-Aware High Performance Computing.
Download Paper
Deterministic Scale-Free Pipeline Parallelism with Hyperqueues
Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2013
Proposes hyperqueues, a programming abstraction that enables deterministic and scalable pipeline parallelism for modern multicore systems.
Recommended citation: Vandierendonck, H., Chronaki, K., & Nikolopoulos, D. S. (2013). Deterministic Scale-Free Pipeline Parallelism with Hyperqueues. In *SC '13*, Article 32. https://doi.org/10.1145/2503210.2503233
Download Paper
Analysis of Dependence Tracking Algorithms for Task Dataflow Execution
Published in ACM Transactions on Architecture and Code Optimization, 2013
Evaluates efficient schemes for managing task graphs in task dataflow programming, including graphs, hypergraphs, and edgeless schemes.
Recommended citation: Vandierendonck, H., Tzenakis, G., & Nikolopoulos, D. S. (2013). "Analysis of dependence tracking algorithms for task dataflow execution." ACM Transactions on Architecture and Code Optimization, 10(4), Article 61. https://doi.org/10.1145/2541228.2555316
Download Paper
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
Published in IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing, 2014
Preface message from the Technical Program Committee Co-Chairs of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2014), highlighting the scope and goals of the conference.
Recommended citation: Cameron, K. W., & Nikolopoulos, D. S. (2014). Message from Technical Program Committee Co-Chairs. *Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2014)*, IEEE. https://doi.org/10.1109/CCGrid.2014.5
NanoStreams: A Hardware and Software Stack for Real-Time Analytics on Fast Data Streams
Published in HiPEAC Info, Volume 38, 2014
Introduces the NanoStreams project, focusing on an integrated hardware-software stack designed for low-latency, real-time analytics on high-throughput data streams.
Recommended citation: Nikolopoulos, D. (2014). NanoStreams: A Hardware and Software Stack for Real-Time Analytics on Fast Data Streams. *HiPEAC Info*, 38, April.
Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2014
This paper presents scalable techniques for large-scale epidemic simulations on Blue Waters, addressing challenges in load balancing, graph partitioning, and communication.
Recommended citation: Yeom, J.-S., Bhatele, A., Bisset, K., Bohm, E., Gupta, A., Kale, L. V., Marathe, M., Nikolopoulos, D. S., Schulz, M., & Wesolowski, L. (2014). "Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters." *IPDPS 2014*, 755–764. https://doi.org/10.1109/IPDPS.2014.83
Download Paper
Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-cores
Published in International Journal of Parallel, Emergent and Distributed Systems, 2014
Presents scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-core systems to optimize performance and resource allocation across diverse workloads.
Recommended citation: Khasymski, A., & Nikolopoulos, D. S. (2014). "Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-cores." International Journal of Parallel, Emergent and Distributed Systems, 30(3), 193-210. https://doi.org/10.1080/17445760.2014.895346
Download Paper
FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards
Published in Journal of Systems Architecture, 2014
Presents a scalable 512-core FPGA-based prototype using Formic boards for modeling manycore architectures, demonstrating performance 50,000 times faster than software simulation.
Recommended citation: Lyberis, S., Kalokerinos, G., Lygerakis, M., Papaefstathiou, V., Mavroidis, I., Katevenis, M., Pnevmatikatos, D., & Nikolopoulos, D. S. (2014). FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards. *Journal of Systems Architecture*, 60(6), 481-493. https://doi.org/10.1016/j.sysarc.2014.03.002
Download Paper
Energy Efficiency through Significance-Based Computing
Published in IEEE Computer, 2014
Presents significance-based computing as an approach to improve energy efficiency by distinguishing between critical and non-critical computations.
Recommended citation: Nikolopoulos, D. S., Vandierendonck, H., Bellas, N., Antonopoulos, C. D., Lalis, S., Karakonstantis, G., Burg, A., & Naumann, U. (2014). Energy Efficiency through Significance-Based Computing. *Computer*, 47(7), 82-85. https://doi.org/10.1109/MC.2014.182
Download Paper
Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs
Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2014
Presents a fast dynamic binary rewriting technique that enables flexible thread migration across cores in shared-ISA heterogeneous multiprocessor systems-on-chip.
Recommended citation: Georgakoudis, G., Nikolopoulos, D. S., Vandierendonck, H., & Lalis, S. (2014). "Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs." In 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 156-163. https://doi.org/10.1109/SAMOS.2014.6893207
Download Paper
The Challenges and Opportunities of Micro-Servers in the HPC Ecosystem
Published in Workshop on Clusters, Clouds, and Data for Scientific Computing (CCDSC), 2014
Examines the potential of micro-servers in high-performance computing, highlighting architectural trends, energy efficiency, and emerging system design opportunities.
Recommended citation: Nikolopoulos, D. (2014). The Challenges and Opportunities of Micro-Servers in the HPC Ecosystem. In *Clusters, Clouds, and Data for Scientific Computing (CCDSC '14)*, September 4.
Download Paper
NanoStreams: Advancing the hardware and software stack for real-time analytics on fast data streams
Published in eChallenges e-2014 Conference, 2014
Presents NanoStreams, an advanced hardware and software stack designed for real-time analytics on fast data streams, targeting high-performance server architectures and system-on-chip implementations.
Recommended citation: Gillan, C. J., Nikolopoulos, D. S., Bilas, A., & Bekas, C. (2014). "NanoStreams: Advancing the hardware and software stack for real-time analytics on fast data streams." In eChallenges e-2014 Conference Proceedings, 1-8. https://ieeexplore.ieee.org/abstract/document/7058143
Download Paper
Power-capped DVFS and thread allocation with ANN models on modern NUMA systems
Published in IEEE International Conference on Computer Design (ICCD), 2014
Presents power-capped dynamic voltage and frequency scaling (DVFS) and thread allocation techniques using artificial neural network models for resource management on modern NUMA systems.
Recommended citation: Imamura, S., Sasaki, H., Inoue, K., & Nikolopoulos, D. S. (2014). "Power-capped DVFS and thread allocation with ANN models on modern NUMA systems." In 2014 IEEE 32nd International Conference on Computer Design (ICCD), 324-331. https://doi.org/10.1109/ICCD.2014.6974701
Download Paper
Distributed region-based memory allocation and synchronization
Published in The International Journal of High Performance Computing Applications, 2014
Presents distributed region-based memory allocation and synchronization techniques for high-performance computing applications to improve memory management and coordination in distributed systems.
Recommended citation: Symeonidou, C., Pratikakis, P., Nikolopoulos, D. S., & Bilas, A. (2014). "Distributed region-based memory allocation and synchronization." The International Journal of High Performance Computing Applications, 28(4), 406-414. https://doi.org/10.1177/1094342014552863
Download Paper
Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures
Published in Journal of Systems and Software, 2014
Introduces hybrid address spaces as a design methodology for implementing scalable runtime systems on many-core architectures without cache coherence, demonstrated through HyMR MapReduce and HyRMA remote memory access implementations.
Recommended citation: Papagiannis, A., & Nikolopoulos, D. S. (2014). "Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures." Journal of Systems and Software, 97, 47-64. https://doi.org/10.1016/j.jss.2014.06.058
Download Paper
On the Viability of Microservers for Financial Analytics
Published in Workshop on High Performance Computational Finance (WHPCF), 2014
Evaluates the viability of microserver architectures for financial analytics applications, examining energy efficiency and performance characteristics for numerical simulation and event processing workloads.
Recommended citation: Gillan, C. J., Nikolopoulos, D. S., Georgakoudis, G., Faloon, R., Tzenakis, G., & Spence, I. (2014). "On the Viability of Microservers for Financial Analytics." In 2014 Seventh Workshop on High Performance Computational Finance, 29-36. https://doi.org/10.1109/WHPCF.2014.11
Download Paper
Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing
Published in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2014
The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation
Published in IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2014
The CACTOS project outlines an approach for optimizing cloud topologies using context-awareness and autonomic resource management techniques.
Recommended citation: Östberg, P.-O., Groenda, H., Wesner, S., Byrne, J., Nikolopoulos, D. S., et al. (2014). "The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation." *CloudCom 2014*, 26–31. https://doi.org/10.1109/CloudCom.2014.62
Download Paper
Power modelling and capping for heterogeneous ARM/FPGA SoCs
Published in International Conference on Field-Programmable Technology (FPT), 2014
Presents approaches for power modeling and capping in heterogeneous System-on-Chips combining ARM processors and FPGAs.
Recommended citation: Wu, Y., Nunez-Yanez, J., Woods, R., & Nikolopoulos, D. S. (2014). Power modelling and capping for heterogeneous ARM/FPGA SoCs. In *2014 International Conference on Field-Programmable Technology (FPT)*, 231-234. https://doi.org/10.1109/FPT.2014.7082782
Download Paper
Special Issue: Energy efficient computing with adaptive and heterogeneous architectures
Published in IET Computers & Digital Techniques, 2015
Guest editorial for a special issue on energy efficient computing with adaptive and heterogeneous architectures, addressing energy efficiency challenges in mobile devices and server systems.
Recommended citation: Nunez-Yanez, J., Moreno, J. M., & Nikolopoulos, D. S. (2015). "Guest Editorial: Special Issue: Energy efficient computing with adaptive and heterogeneous architectures." IET Computers & Digital Techniques, 9(1), 1-2.
Download Paper
Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi. Multicore and GPU Programming
Published in Congress of GPU Programming, 2015
Studies the power and energy implications of varying thread counts on Intel Xeon Phi processors for multicore and GPU programming applications.
Recommended citation: Lorenzo, O. G., Pena, T. F., Cabaleiro, J. C., Picel, J. C., Rivera, F. F., & Nikolopoulos, D. S. (2015). "Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi. Multicore and GPU Programming." In Second Congress of GPU Programming, 1-8.
Download Paper
Power and energy implications of the number of threads used on the Intel Xeon Phi
Published in Annals of Multicore and GPU Programming (AMGP), 2015
Studies power and energy usage of PARSEC and SPLASH-2X benchmarks on Intel Xeon Phi across different thread configurations to find optimal performance-energy relationships.
Recommended citation: Lorenzo, O. G., Pena, T. F., Cabaleiro, J. C., Pichel, J. C., Rivera, F. F., & Nikolopoulos, D. S. (2015). "Power and energy implications of the number of threads used on the Intel Xeon Phi." Annals of Multicore and GPU Programming (AMGP), 2(1), 55-65.
Download Paper
Programming and Managing Resources on Accelerator-Enabled Clusters
Published in Department of Computer Science, Virginia Tech, 2015
Discusses programming models and resource management techniques for accelerator-enabled clusters to optimize performance and resource utilization.
Recommended citation: Rafique, M. M., Butt, A. R., & Nikolopoulos, D. (2015). "Programming and Managing Resources on Accelerator-Enabled Clusters." Technical Report.
Download Paper
Realizing Accelerated Cost-Effective Distributed RAID
Published in Handbook on Data Centers, 2015
Addresses the challenges of storing and retrieving massive scientific data reliably and cost-effectively, proposing distributed RAID solutions for large-scale storage systems and parallel file systems.
Recommended citation: Khasymski, A., Rafique, M. M., Butt, A. R., Vazhkudai, S. S., & Nikolopoulos, D. S. (2015). Realizing Accelerated Cost-Effective Distributed RAID. In S. U. Khan & A. Y. Zomaya (Eds.), *Handbook on Data Centers* (pp. 729-752). Springer New York. https://doi.org/10.1007/978-1-4939-2092-1_25
Download Paper
TProf: An Energy Profiler for Task-Parallel Programs
Published in Sustainable Computing: Informatics and Systems, 2015
Introduces TProf, a profiler for estimating energy usage in task-parallel applications, enabling per-task DVFS optimizations.
Recommended citation: Manousakis, I., Zakkak, F. S., Pratikakis, P., & Nikolopoulos, D. S. (2015). TProf: An Energy Profiler for Task-Parallel Programs. *Sustainable Computing: Informatics and Systems*, 5, 1–13. https://doi.org/10.1016/j.suscom.2014.07.004
Download Paper
A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing
Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2015
This work presents a runtime and programming model for approximate computing based on task significance, showing graceful quality degradation and significant energy savings.
Recommended citation: Vassiliadis, V., Parasyris, K., Chalios, C., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing." *PPoPP 2015*, 275–276. https://doi.org/10.1145/2688500.2688546
Download Paper
Software-Managed Energy-Efficient Hybrid DRAM/NVM Main Memory
Published in ACM International Conference on Computing Frontiers (CF), 2015
This work proposes software techniques for energy-efficient management of hybrid DRAM/NVM memory systems, reducing hardware complexity while maintaining high performance.
Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Software-Managed Energy-Efficient Hybrid DRAM/NVM Main Memory." *CF '15*, Article 23. https://doi.org/10.1145/2742854.2742886
Download Paper
A Lightweight Tool for Anomaly Detection in Cloud Data Centres
Published in International Conference on Cloud Computing and Services Science (CLOSER), 2015
This paper presents a lightweight anomaly detection tool tailored for cloud data centers, designed for rapid deployment with minimal overhead in distributed environments.
Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2015). "A Lightweight Tool for Anomaly Detection in Cloud Data Centres." *CLOSER 2015*, 343–351. https://doi.org/10.5220/0005453403430351
Download Paper
On the Potential of Significance-Driven Execution for Energy-Aware HPC
Published in Computer Science - R&D, 2015
Explores hybrid near-threshold and above-threshold voltage execution based on algorithmic significance to achieve 35–67% energy savings without compromising performance.
Recommended citation: Gschwandtner, P., Chalios, C., Nikolopoulos, D. S., et al. (2015). "On the Potential of Significance-Driven Execution for Energy-Aware HPC." Computer Science - Research and Development, 30(2), 197–206. https://doi.org/10.1007/s00450-014-0265-9
Download Paper
A significance-driven programming framework for energy-constrained approximate computing
Published in ACM International Conference on Computing Frontiers (CF), 2015
Introduces a programming framework for energy-constrained approximate computing that uses significance-aware runtime systems to maximize output quality within given energy budgets.
Recommended citation: Vassiliadis, V., Chalios, C., Parasyris, K., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). A significance-driven programming framework for energy-constrained approximate computing. In *Proceedings of the 12th ACM International Conference on Computing Frontiers (CF '15)*, Article 9. https://doi.org/10.1145/2742854.2742857
Download Paper
Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies
Published in International Workshop on Data Management on New Hardware (DAMON), 2015
Proposes energy-efficient hybrid DRAM/NVM memory management for modern data stores using application-level policies for data placement.
Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies." DaMoN '15, Article 1. https://doi.org/10.1145/2771937.2771940
Download Paper
On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory
Published in IEEE Computer Architecture Letters, 2015
Analyzes the energy efficiency characteristics of byte-addressable non-volatile memory systems and their implications for main memory design.
Recommended citation: Vandierendonck, H., Hassan, A., & Nikolopoulos, D. S. (2015). "On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory." IEEE Computer Architecture Letters, 14(2), 144-147. https://doi.org/10.1109/LCA.2014.2355195
Download Paper
A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing
Published in SIGPLAN Notices, 2015
This paper introduces a task-based model that trades output quality for energy efficiency, achieving up to 83% energy savings via significance-aware execution policies.
Recommended citation: Vassiliadis, V., Parasyris, K., Chalios, C., Antonopoulos, C. D., Lalis, S., Bellas, N., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "A Programming Model and Runtime System for Significance-Aware Energy-Efficient Computing." *SIGPLAN Notices*, 50(8), 275–276. https://doi.org/10.1145/2858788.2688546
Download Paper
Towards automated data-driven model creation for cloud computing simulation
Published in International Conference on Simulation Tools and Techniques (SIMUTools), 2015
Presents an automated method for cloud computing topology definition, data collection and model creation to support decision making in complex cloud environments through simulation.
Recommended citation: Svorobej, S., Byrne, J., Liston, P., Byrne, P. J., Stier, C., Groenda, H., Papazachos, Z., & Nikolopoulos, D. S. (2015). "Towards automated data-driven model creation for cloud computing simulation." In Proceedings of the 8th International Conference on Simulation Tools and Techniques, 248–255. https://doi.org/10.4108/eai.24-8-2015.2261129
Download Paper
Energy-Efficient Hybrid DRAM/NVM Main Memory
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015
This short paper presents a hybrid DRAM/NVM memory architecture evaluated for energy savings in memory-intensive applications.
Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. S. (2015). "Energy-Efficient Hybrid DRAM/NVM Main Memory." PACT '15, 492–493. https://doi.org/10.1109/PACT.2015.58
Download Paper
Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics
Published in Parallel Processing Letters, 2015
Presents a mathematically rigorous iso-Quality-of-Service metric for ranking servers based on energy efficiency while meeting QoS targets for real-time analytics services.
Recommended citation: Georgakoudis, G., Gillan, C., Sayed, A., Spence, I., Faloon, R., & Nikolopoulos, D. S. (2015). "Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics." Parallel Processing Letters, 25(03), 1541004. https://doi.org/10.1142/S0129626415410042
Download Paper
ALEA: Fine-Grain Energy Profiling with Basic Block Sampling
Published in International Conference on Parallel Architecture and Compilation Techniques (PACT), 2015
ALEA introduces a fine-grain energy profiling tool based on basic block sampling to help developers optimize energy consumption at the instruction level.
Recommended citation: Mukhanov, L., Nikolopoulos, D. S., & De Supinski, B. R. (2015). "ALEA: Fine-Grain Energy Profiling with Basic Block Sampling." *PACT 2015*, 87–98. https://doi.org/10.1109/PACT.2015.16
Download Paper
HpMC: An Energy-aware Management System of Multi-level Memory Architectures
Published in International Symposium on Memory Systems (MEMSYS), 2015
HpMC is an adaptive memory controller that switches between hierarchical and flat memory modes to reduce energy while maintaining performance in heterogeneous memory systems.
Recommended citation: Su, C., Roberts, D., León, E. A., Cameron, K. W., de Supinski, B. R., Loh, G. H., & Nikolopoulos, D. S. (2015). "HpMC: An Energy-aware Management System of Multi-level Memory Architectures." *MEMSYS '15*, 167–178. https://doi.org/10.1145/2818950.2818974
Download Paper
Application-Level Energy Awareness for OpenMP
Published in OpenMP: Heterogenous Execution and Data Movements, Springer International Publishing, 2015
This chapter introduces OpenMPE, a programming model extension to OpenMP that supports application-level energy optimizations through annotations and runtime tuning.
Recommended citation: Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., & Nikolopoulos, D. S. (2015). "Application-Level Energy Awareness for OpenMP." In *OpenMP: Heterogenous Execution and Data Movements*, Springer, pp. 219–232. https://doi.org/10.1007/978-3-319-24595-9_16
Download Paper
Energy-Efficient Hybrid DRAM/NVM Main Memory: ACM Student Research Competition
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT) - ACM Student Research Competition, 2015
Student research competition poster presenting energy-efficient hybrid DRAM/NVM main memory architectures for improved performance and reduced power consumption in memory systems.
Recommended citation: Hassan, A., Vandierendonck, H., & Nikolopoulos, D. (2015). "Energy-Efficient Hybrid DRAM/NVM Main Memory: ACM Student Research Competition." In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 492-493.
Download Paper
Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing
Published in International Conference on High-Performance Computing, Networking, Storage and Analysis (SC) Workshops, 2015
Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, focusing on power and energy consumption as primary concerns for Exascale systems and revolutionary methods for energy efficient computing.
Recommended citation: E2SC '15: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing. (2015). Association for Computing Machinery.
Download Paper
Power Capping: What Works, What Does Not
Published in IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2015
This study evaluates the effectiveness of multiple power capping strategies, including compiler optimizations and DVFS, across a variety of HPC workloads.
Recommended citation: Petoumenos, P., Mukhanov, L., Wang, Z., Leather, H., & Nikolopoulos, D. S. (2015). "Power Capping: What Works, What Does Not." *ICPADS 2015*, 525–534. https://doi.org/10.1109/ICPADS.2015.72
Download Paper
Energy Optimization of Parallel Workloads on Unreliable Hardware
Published in WAPCO '16 (HiPEAC Workshop), 2016
This paper explores techniques for optimizing energy efficiency of parallel workloads on unreliable hardware, presented at WAPCO in conjunction with HiPEAC 2016.
Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). *Energy Optimization of Parallel Workloads on Unreliable Hardware*. In Proceedings of the Second Workshop on Approximate Computing (WAPCO), Prague, Czech Republic.
Download Paper
Proceedings of the Mini-symposium on Energy and Resilience in Parallel Programming
Published in Parallel Computing (Advances in Parallel Computing), 2016
Organizes and introduces a mini-symposium on energy and resilience in parallel programming, covering current trends and challenges in energy-aware and fault-tolerant parallel computing.
Recommended citation: Nikolopoulos, D. S., & Antonopoulos, C. D. (2016). "Mini-symposium on energy and resilience in parallel programming." In Parallel Computing. Advances in Parallel Computing. Elsevier. https://doi.org/10.3233/978-1-61499-621-7-709
Download Paper
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures
Published in Parallel Computing: On the Road to Exascale (Advances in Parallel Computing, Vol. 27), 2016
Investigates the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for Conjugate Gradient method with ILUPACK preconditioner on low-power ARM processors.
Recommended citation: Aliaga, J. I., Catalán, S., Chalios, C., Nikolopoulos, D. S., & Quintana-Ortí, E. S. (2016). "Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures." In Parallel Computing: On the Road to Exascale (pp. 711-720). Advances in Parallel Computing, Vol. 27. https://doi.org/10.3233/978-1-61499-621-7-711
Download Paper
Energy Optimization of Parallel Programs on Unreliable Hardware
Published in Workshop on Approximate Computing, 2016
Presents a work-in-progress report on minimizing energy consumption of parallel applications on unreliable hardware platforms, specifically unreliable memory, using analytical models to capture CPU energy consumption and select optimal frequencies.
Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. (2016). "Energy Optimization of Parallel Programs on Unreliable Hardware." In Second Workshop on Approximate Computing.
Download Paper
Methods and metrics for fair server assessment under real-time financial workloads
Published in Concurrency and Computation: Practice and Experience, 2016
Presents a rigorous methodology and new metrics for fair comparison of server and microserver platforms under real-time financial analytics workloads, comparing ARM and x86 architectures.
Recommended citation: Georgakoudis, G., Gillan, C. J., Sayed, A., Spence, I., Faloon, R., & Nikolopoulos, D. S. (2016). Methods and metrics for fair server assessment under real-time financial workloads. *Concurrency and Computation: Practice and Experience*, 28(3), 916-928. https://doi.org/10.1002/cpe.3704
Download Paper
Using Computational Significance and Resilience in System Software Stacks: Keynote Talk
Published in International Workshop on Energy-Aware High Performance Computing, 2016
Keynote talk exploring how runtime systems and operating systems can leverage computational significance and resilience metrics to reduce energy footprint of parallel applications while tolerating higher error rates in future processors and memory technologies.
Recommended citation: Nikolopoulos, D. (2016). "Using Computational Significance and Resilience in System Software Stacks: Keynote Talk." Keynote presentation at First International Workshop on Energy-Aware High Performance Computing.
Download Paper
ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems
Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016
ECOSCALE proposes a reconfigurable computing architecture and runtime platform to support scalability, energy efficiency, and programmability for exascale systems.
Recommended citation: Mavroidis, I., Papaefstathiou, I., Lavagno, L., Nikolopoulos, D. S., Koch, D., Goodacre, J., Sourdis, I., Papaefstathiou, V., Coppola, M., & Palomino, M. (2016). "ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems." *DATE 2016*, 696–701.
Download Paper
Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics
Published in IET Computers & Digital Techniques, 2016
Evaluates fault tolerance techniques on asymmetric multicore SoCs using ARM big.LITTLE processors, focusing on near-threshold voltage computing and algorithm-based fault tolerance for low-power HPC systems.
Recommended citation: Chalios, C., Nikolopoulos, D. S., Catalán, S., & Quintana-Ortí, E. S. (2016). "Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics." IET Computers & Digital Techniques, 10(2), 85-92. https://doi.org/10.1049/iet-cdt.2015.0056
Download Paper
LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres
Published in Cloud Computing and Services Science Conference (Springer), 2016
Presents LS-ADT, a lightweight anomaly detection tool for cloud data centers that combines extended log analysis with correlation of system metrics to automatically detect and identify performance anomalies without requiring training or complex setup.
Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2016). "LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres." In Cloud Computing and Services Science (pp. 135-152). Springer. https://doi.org/10.1007/978-3-319-29582-4_8
Download Paper
Operator and Workflow Optimization for High-Performance Analytics
Published in International Workshop on Multi-Engine Data Analytics (MEDAL), 2016
Studies the impact of intra-node parallelism on data analytics performance, identifying four performance optimizations enabled by increasing processing cores and their interactions on analytics operators.
Recommended citation: Vandierendonck, H., Murphy, K. L., Arif, M., Sun, J., & Nikolopoulos, D. S. (2016). "Operator and Workflow Optimization for High-Performance Analytics." In 1st International Workshop on Multi-Engine Data Analytics (MEDAL).
Download Paper
SCoRPiO: Significance Based Computing for Reliability and Power Optimization
Published in International Symposium on Code Generation and Optimization (CGO), 2016
Presents SCoRPiO, a significance-based computing approach for reliability and power optimization that leverages computational significance to balance energy efficiency with system reliability.
Recommended citation: Vassiliadis, V., Parasyris, K., Antonopoulos, C. D., Bellas, N., Riehme, J., & Nikolopoulos, D. (2016). "SCoRPiO: Significance Based Computing for Reliability and Power Optimization." In 2016 International Symposium on Code Generation and Optimization (CGO).
Download Paper
Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting
Published in Architecture of Computing Systems (ARCS), 2016
Designs a generic low-cost hardware infrastructure for thread-level energy accounting in multi-core systems, achieving 95% correlation with physical power measurements while adding only 10% resource overhead.
Recommended citation: Marcu, M., Boncalo, O., Ghenea, M., Amaricai, A., Weinstock, J., Leupers, R., Wang, Z., Georgakoudis, G., Nikolopoulos, D. S., Cernazanu-Glavan, C., Bara, L., & Ionascu, M. (2016). "Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting." In Architecture of Computing Systems -- ARCS 2016 (pp. 277-289). Springer. https://doi.org/10.1007/978-3-319-30695-7_21
Download Paper
The VINEYARD Approach: Versatile, Integrated, Accelerator-Based, Heterogeneous Data Centres
Published in Applied Reconfigurable Computing Conference, 2016
Introduces VINEYARD: a data-center architecture using programmable accelerators and a high-level programming model for big data and cloud applications.
Recommended citation: Kachris, C., Soudris, D., Gaydadjiev, G., et al. (2016). "The VINEYARD Approach." In Applied Reconfigurable Computing, 3–13. https://doi.org/10.1007/978-3-319-30481-6_1
Download Paper
TwinCG: Dual Thread Redundancy with Forward Recovery for Conjugate Gradient Methods
Published in arXiv preprint, 2016
Presents TwinCG, a dual thread redundancy approach with forward recovery specifically designed for Conjugate Gradient methods to improve fault tolerance in iterative solvers.
Recommended citation: Dichev, K., & Nikolopoulos, D. S. (2016). "TwinCG: Dual Thread Redundancy with Forward Recovery for Conjugate Gradient Methods." arXiv preprint arXiv:1605.04580.
Download Paper
Proceedings of the Workshop on Variability in Computer Systems (VarSys)
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS) Workshops, 2016
Presents the introductory welcome message for the VarSys workshop, including conference officers’ congratulations and acknowledgments for the workshop event and proceedings publication.
Recommended citation: Cameron, K., Gamblin, T., & Nikolopoulos, D. S. (2016). "VarSys Introduction." In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1068. https://doi.org/10.1109/IPDPSW.2016.239
Download Paper
The VINEYARD project: Versatile integrated accelerator-based heterogeneous data centres
Published in International Conference on Modern Circuits and Systems Technologies (MOCAST), 2016
Presents the VINEYARD project, which develops versatile integrated accelerator-based heterogeneous data centers using FPGAs and multicore processing for improved performance and power efficiency.
Recommended citation: Kachris, C., Gaydadjiev, G., Nguyen, H.-N., Nikolopoulos, D. S., Bilas, A., Morgan, N., Strydis, C., Spatadakis, V., Gardelis, D., Jimenez-Peris, R., & Almeida, A. (2016). "The VINEYARD project: Versatile integrated accelerator-based heterogeneous data centres." In 2016 5th International Conference on Modern Circuits and Systems Technologies (MOCAST), 1-4. https://doi.org/10.1109/MOCAST.2016.7495121
Download Paper
HPTA: A Library for High-Performance Text Analytics
Published in Technical Report, School of Electronics, Electrical Engineering and Computer Science, Queens University Belfast, 2016
Presents the HPTA library for high-performance text analytics that maps textual data to dense numeric representations with three key optimizations: efficient memory management, parallel computation on associative data structures, and context-dependent optimization of data structures.
Recommended citation: Vandierendonck, H., Murphy, K., Arif, M., & Nikolopoulos, D. S. (2016). "HPTA: A Library for High-Performance Text Analytics." Technical Report.
Download Paper
BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores
Published in arXiv preprint, 2016
Presents BDDT-SCC, a task-parallel runtime system designed specifically for non cache-coherent multicore architectures to enable efficient parallel execution without hardware cache coherence support.
Recommended citation: Labrineas, A., Pratikakis, P., Nikolopoulos, D. S., & Bilas, A. (2016). "BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores." arXiv preprint arXiv:1606.04288.
Download Paper
Myrmics: Scalable, Dependency-aware Task Scheduling on Heterogeneous Manycores
Published in arXiv preprint, 2016
Presents Myrmics, a scalable task scheduling system that handles dependency-aware scheduling on heterogeneous manycore architectures for improved parallel execution efficiency.
Recommended citation: Lyberis, S., Pratikakis, P., Mavroidis, I., & Nikolopoulos, D. S. (2016). "Myrmics: Scalable, Dependency-aware Task Scheduling on Heterogeneous Manycores." arXiv preprint arXiv:1606.04282.
Download Paper
A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform
Published in International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), 2016
Presents the design of a new runtime for heterogeneous hardware platforms that extends OpenCL to simplify programming and automate scheduling across FPGAs and other devices for exascale computing.
Recommended citation: Harvey, P., Bakanov, K., Spence, I., & Nikolopoulos, D. S. (2016). "A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform." In Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, Article 7. https://doi.org/10.1145/2931088.2931090
Download Paper
Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads
Published in ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2016
Presents an analytical energy-performance model for parallel workloads that accounts for energy consumed by CPU on memory accesses and dynamic energy of idle cores, providing optimal frequencies for global DVFS energy minimization.
Recommended citation: Trehan, C., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). "Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads." In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 251-252. https://doi.org/10.1145/2935764.2935811
Download Paper
NanoStreams: Codesigned microservers for edge analytics in real time
Published in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016
Presents NanoStreams, a codesigned microserver architecture using FPGAs for real-time edge analytics, addressing hardware-software co-optimization for edge computing applications.
Recommended citation: Georgakoudis, G., et al. (2016). "NanoStreams: Codesigned microservers for edge analytics in real time." In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 180-187. https://doi.org/10.1109/SAMOS.2016.7818346
Download Paper
Runtime support for adaptive power capping on heterogeneous SoCs
Published in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016
Presents a runtime system for adaptive power capping on heterogeneous SoCs, enabling dynamic power management across ARM processors and FPGAs.
Recommended citation: Wu, Y., Nikolopoulos, D. S., & Woods, R. (2016). "Runtime support for adaptive power capping on heterogeneous SoCs." In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 71-78. https://doi.org/10.1109/SAMOS.2016.7818333
Download Paper
Energy Efficient Computing using Computational Significance Abstractions: Keynote Talk at the UK-China Workshop on Shaping the Low Carbon Energy Future
Published in UK-China Workshop on Shaping the Low Carbon Energy Future, 2016
Keynote presentation on energy efficient computing using computational significance abstractions at the UK-China Workshop on Shaping the Low Carbon Energy Future.
Recommended citation: Nikolopoulos, D. (2016). "Energy Efficient Computing using Computational Significance Abstractions: Keynote Talk at the UK-China Workshop on Shaping the Low Carbon Energy Future." Keynote presentation at UK-China Workshop on Shaping the Low Carbon Energy Future.
Download Paper
Student Research Poster: A Scalable General Purpose System for Large-Scale Graph Processing
Published in International Conference on Parallel Architectures and Compilation (PACT), 2016
Presents a student research poster on building a scalable graph analytics framework that hides the complexity of parallelism, data distribution and memory locality behind an abstract interface using NUMA-awareness.
Recommended citation: Sun, J. (2016). "Student Research Poster: A Scalable General Purpose System for Large-Scale Graph Processing." In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 456. https://doi.org/10.1145/2967938.2971465
Download Paper
TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods
Published in IEEE International Conference on Cluster Computing (CLUSTER) Workshops, 2016
Presents TwinPCG, a fault tolerance approach using dual thread redundancy and forward recovery techniques specifically designed for preconditioned conjugate gradient methods.
Recommended citation: Dichev, K., & Nikolopoulos, D. S. (2016). "TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods." In 2016 IEEE International Conference on Cluster Computing (CLUSTER), 506-514. https://doi.org/10.1109/CLUSTER.2016.99
Download Paper
Challenges and Opportunities in Edge Computing
Published in IEEE International Conference on Smart Cloud (SmartCloud), 2016
This position paper examines the challenges and opportunities in edge computing, focusing on moving computational load towards network edges to harness untapped capabilities in edge nodes while addressing quality-of-service concerns.
Recommended citation: Varghese, B., Wang, N., Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2016). "Challenges and Opportunities in Edge Computing." IEEE International Conference on Smart Cloud (SmartCloud), 20-26. https://doi.org/10.1109/SmartCloud.2016.18
Download Paper
Exploiting Significance of Computations for Energy-Constrained Approximate Computing
Published in International Journal of Parallel Programming, 2016
This work introduces a runtime and programming model for optimizing quality under energy constraints using significance-aware execution and task-level approximations.
Recommended citation: Vassiliadis, V., Chalios, C., Parasyris, K., et al. (2016). "Exploiting Significance of Computations for Energy-Constrained Approximate Computing." *IJPP*, 44(5), 1078–1098. https://doi.org/10.1007/s10766-016-0409-6
Download Paper
Accelerating Data Center Applications with Reconfigurable DataFlow Engines
Published in Second International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), 2016
Addresses the integration of energy-efficient programmable accelerators in cloud-based data analytics frameworks to achieve seamless integration and push the limits on computation capacity and density of future data centers.
Recommended citation: Barbhuiya, S., Wu, Y., Murphy, K., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2016). "Accelerating Data Center Applications with Reconfigurable DataFlow Engines." In Second International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'16).
Download Paper
Big data availability: Selective partial checkpointing for in-memory database queries
Published in IEEE International Conference on Big Data (Big Data) Workshops, 2016
Presents selective partial checkpointing techniques for improving availability in in-memory database queries, addressing fault tolerance challenges in big data processing systems.
Recommended citation: Playfair, D., Trehan, A., McLarnon, B., & Nikolopoulos, D. S. (2016). "Big data availability: Selective partial checkpointing for in-memory database queries." In 2016 IEEE International Conference on Big Data (Big Data), 2785-2794. https://doi.org/10.1109/BigData.2016.7840926
Download Paper
HPTA: High-performance text analytics
Published in IEEE International Conference on Big Data (Big Data) Workshops, 2016
Presents HPTA, a high-performance text analytics framework that optimizes data structures, memory management, and sparse matrix operations for improved text processing performance.
Recommended citation: Vandierendonck, H., Murphy, K., Arif, M., & Nikolopoulos, D. S. (2016). "HPTA: High-performance text analytics." In 2016 IEEE International Conference on Big Data (Big Data), 416-423. https://doi.org/10.1109/BigData.2016.7840632
Download Paper
A scalable and composable map-reduce system
Published in IEEE International Conference on Big Data (Big Data), 2016
Presents a scalable and composable map-reduce system that improves performance, composition capabilities, and programmability for big data processing applications.
Recommended citation: Arif, M., Vandierendonck, H., Nikolopoulos, D. S., & de Supinski, B. R. (2016). "A scalable and composable map-reduce system." In 2016 IEEE International Conference on Big Data (Big Data), 2233-2242. https://doi.org/10.1109/BigData.2016.7840854
Download Paper
Special issue on Disruptive Technologies for Energy Efficient Computing
Published in Sustainable Computing: Informatics and Systems, 2016
Editorial for a special issue focusing on disruptive technologies for energy efficient computing in sustainable computing systems and informatics.
Recommended citation: Butt, A. R., Gniady, C., & Nikolopoulos, D. S. (2016). "Special issue on Disruptive technologies for energy efficient computing." Sustainable Computing: Informatics and Systems, 12, 56. https://doi.org/10.1016/j.suscom.2016.11.005
Download Paper
An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits
Published in Workshop on Energy-efficient Servers for Cloud and Edge Computing (EnESCE 2017), 2016
Presents the UniServer approach for developing energy-efficient micro-servers that exceed conservative scaling boundaries through novel mechanisms across all design stack layers, including hardware heterogeneity exploitation and fault tolerance enhancement.
Recommended citation: Tovletoglou, K., Chalios, C., Karakonstantis, G., Mukhanov, L., Vandierendonck, H., Nikolopoulos, D., Koutsovasilis, P., Maroudas, M., Antonopoulos, C., Kalogirou, C., Bellas, N., Lalis, S., Rafique, M. M., Venugopal, S., Prat-Perez, A., Diavastos, A., Hadjilambrou, Z., Nikolaou, P., Sazeides, Y., Trancoso, P., Papadimitriou, G., Kaliorakis, M., Chatzidimitriou, A., & Gizopoulos, D. (2016). "An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits." In Workshop on Energy-efficient Servers for Cloud and Edge Computing 2017.
Download Paper
Edge-as-a-Service: Towards Distributed Cloud Architectures
Published in Advances in Parallel Computing, Volume 32: Parallel Computing is Everywhere, 2017
This chapter introduces an Edge-as-a-Service platform that integrates edge nodes into cloud environments to reduce latency and improve Quality-of-Service.
Recommended citation: Varghese, B., Wang, N., Li, J., & Nikolopoulos, D. S. (2017). "Edge-as-a-Service: Towards Distributed Cloud Architectures." *Parallel Computing is Everywhere*, 784–793. https://doi.org/10.3233/978-1-61499-843-3-784
Download Paper
Programming and Managing Resources on Accelerator-Enabled Clusters
Published in Programming multi‐core and many‐core computing systems, 2017
Explores system design alternatives for clusters with computational accelerators and capability-aware task scheduling strategies using the MapReduce programming model for asymmetric clusters.
Recommended citation: Mustafa Rafique, M., Butt, A. R., & Nikolopoulos, D. S. (2017). "Programming and Managing Resources on Accelerator-Enabled Clusters." In Programming multi‐core and many‐core computing systems (pp. 405-429). Wiley. https://doi.org/10.1002/9781119332015.ch20
Download Paper
Heterogeneous Servers based on Programmable Cores and Dataflow Engines
Published in Workshop on Energy-efficient Servers for Cloud and Edge Computing (EnESCE), 2017
Presents energy-efficient server architectures based on programmable accelerators and dataflow engines, demonstrating 40% better energy-efficiency than standard Xeon servers and up to 374x speedup for various workloads in data center applications.
Recommended citation: Wu, Y., Gillan, C., Minhas, U., Barbhuiya, S., Novakovic, A., Tovletoglou, K., Tzenakis, G., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. (2017). "Heterogeneous Servers based on Programmable Cores and Dataflow Engines." In Workshop Energy efficient Servers for Cloud and Edge Computing 2017.
Download Paper
ALEA: A Fine-Grained Energy Profiling Tool
Published in ACM Transactions on Architecture and Code Optimization, 2017
Introduces ALEA, a fine-grained energy profiling tool based on probabilistic analysis, enabling detailed association of energy consumption with source code structures.
Recommended citation: Mukhanov, L., Petoumenos, P., Wang, Z., et al. (2017). "ALEA: A Fine-Grained Energy Profiling Tool." ACM Transactions on Architecture and Code Optimization (TACO), 14(1), Article 1. https://doi.org/10.1145/3050436
Download Paper
MyMinder: A User-centric Decision Making Framework for Intercloud Migration
Published in International Conference on Cloud Computing and Services Science (CLOSER), 2017
Presents MyMinder, a user-centric decision making framework that assists users in making informed decisions about intercloud migration strategies.
Recommended citation: Barlaskar, E., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2017). "MyMinder: A User-centric Decision Making Framework for Intercloud Migration." In Proceedings of the 7th International Conference on Cloud Computing and Services Science - CLOSER, 588-595. https://doi.org/10.5220/0006355905880595
Download Paper
Managed acceleration for In-Memory database analytic workloads
Published in International Journal of Parallel, Emergent and Distributed Systems, 2017
Presents managed acceleration techniques for in-memory database analytic workloads to improve query performance and resource utilization in database management systems.
Recommended citation: O'Neill, E., McGlone, J., Kilpatrick, P., & Nikolopoulos, D. (2017). "Managed acceleration for In-Memory database analytic workloads." International Journal of Parallel, Emergent and Distributed Systems, 32(4), 406-427. https://doi.org/10.1080/17445760.2016.1170832
Download Paper
Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes
Published in arXiv preprint, 2017
Presents dependency-aware rollback and checkpoint-restart mechanisms for distributed task-based runtime systems to improve fault tolerance and recovery capabilities.
Recommended citation: Dichev, K., Jordan, H., Tovletoglou, K., Heller, T., Nikolopoulos, D. S., Karakonstantis, G., & Gillan, C. (2017). "Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes." arXiv preprint arXiv:1705.10208.
Download Paper
GraphGrind: Addressing Load Imbalance of Graph Partitioning
Published in ACM International Conference on Supercomputing (ICS), 2017
GraphGrind proposes NUMA-aware programming and runtime strategies to address partitioning-induced load imbalance in graph analytics workloads, improving performance over state-of-the-art systems.
Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2017). "GraphGrind: Addressing Load Imbalance of Graph Partitioning." ICS '17, Article 16. https://doi.org/10.1145/3079079.3079097
Download Paper
Access-aware DRAM failure-rate estimation under relaxed refresh operations
Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2017
Presents access-aware DRAM failure-rate estimation techniques under relaxed refresh operations using memory tracing, fault injection, and binary instrumentation to optimize memory reliability and energy consumption.
Recommended citation: Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2017). "Access-aware DRAM failure-rate estimation under relaxed refresh operations." In 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 292-299. https://doi.org/10.1109/SAMOS.2017.8344643
Download Paper
Relaxing DRAM Refresh Rate through Access Pattern Scheduling: A Case Study on Stencil-Based Algorithms
Published in IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS), 2017
Explores relaxing DRAM refresh rates by leveraging access patterns in stencil codes, reducing energy with negligible performance loss.
Recommended citation: Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2017). "Relaxing DRAM Refresh Rate through Access Pattern Scheduling." IOLTS 2017, 45–50. https://doi.org/10.1109/IOLTS.2017.8046197
Download Paper
Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning
Published in International Conference on Parallel Processing (ICPP), 2017
This paper explores how graph partitioning techniques can enhance memory locality and performance in graph analytics workloads.
Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2017). "Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning." ICPP '17, 181–190. https://doi.org/10.1109/ICPP.2017.27
Download Paper
Energy Efficiency in ARMv8-based Microservers by Hardware Margins Identification
Published in ARM Research Summit, 2017
Investigates energy efficiency improvements in ARMv8-based microservers through identification and optimization of hardware margins.
Recommended citation: Karakonstantis, G., Nikolopoulos, D., Gizopoulos, D., Sazeides, Y., Das, S., & Lawthers, P. (2017). "Energy Efficiency in ARMv8-based Microservers by Hardware Margins Identification." In 2017 ARM Research Summit.
Download Paper
Incremental Training of Deep Convolutional Neural Networks
Published in International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms, 2017
Presents incremental training techniques for deep convolutional neural networks to improve training efficiency and adaptation capabilities.
Recommended citation: Nikolopoulos, D., Istrate, R., Malossi, A. C. I., & Bekas, C. (2017). "Incremental Training of Deep Convolutional Neural Networks." In Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms.
Download Paper
Reliability-Aware System Software Support on ARM Microservers
Published in ARM Research Summit, 2017
Presents reliability-aware system software support mechanisms for ARM microservers to improve system dependability and fault tolerance.
Recommended citation: Karakonstantis, G., Nikolopoulos, D., Antonopoulos, C., Lallis, S., Bellas, N., Gizopoulos, D., & Lawthers, P. (2017). "Reliability-Aware System Software Support on ARM Microservers." In ARM Research Summit.
Download Paper
A Taxonomy of Task-Based Technologies for High Performance Computing
Published in International Conference on Parallel Processing and Applied Mathematics (PPAM), 2017
Presents a comprehensive taxonomy of task-based technologies for high performance computing, categorizing various programming models and runtime systems.
Recommended citation: Nikolopoulos, D., Dichev, K., Thoman, P., Hasanov, K., Iakymchuk, R., Aguilar, X., Gschwandtner, P., Laure, E., Jordan, H., Lemarinier, P., et al. (2017). "A Taxonomy of Task-Based Technologies for High Performance Computing." In 12th International Conference on Parallel Processing and Applied Mathematics.
Download Paper
On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework
Published in International Journal of Parallel Programming, 2017
This paper presents recent developments in the GVirtuS framework, enabling transparent GPU virtualization and remoting across ARM and x86 systems.
Recommended citation: Montella, R., Giunta, G., Laccetti, G., Lapegna, M., Palmieri, C., Ferraro, C., Pelliccia, V., Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework." *Int. J. Parallel Prog.*, 45(5), 1142–1163. https://doi.org/10.1007/s10766-016-0462-1
Download Paper
Energy-Efficient Transprecision Techniques for Iterative Refinement
Published in International Conference on High Performance Computing, Networking, Storage and Analysis (SC) Posters, 2017
Presents energy-efficient transprecision techniques for iterative refinement algorithms to reduce computational energy while maintaining accuracy.
Recommended citation: Lee, J., Vandierendonck, H., & Nikolopoulos, D. (2017). "Energy-Efficient Transprecision Techniques for Iterative Refinement." In Supercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis.
Download Paper
REFINE: Realistic Fault Injection via Compiler-Based Instrumentation for Accuracy, Portability and Speed
Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2017
REFINE is a compiler-based framework for fault injection that balances the fidelity of binary-level injection with the speed and portability of source-level techniques.
Recommended citation: Georgakoudis, G., Laguna, I., Nikolopoulos, D. S., & Schulz, M. (2017). "REFINE: Realistic Fault Injection via Compiler-Based Instrumentation for Accuracy, Portability and Speed." *SC 2017*, Article 29. https://doi.org/10.1145/3126908.3126972
Download Paper
A Real Time Metabolomic Profiling Approach to Detecting Fish Fraud Using Rapid Evaporative Ionisation Mass Spectrometry
Published in Metabolomics, 2017
This article explores the use of REIMS for real-time fish fraud detection, avoiding the lengthy preparation steps of genomic profiling while maintaining result accuracy.
Recommended citation: Black, C., Chevallier, O. P., Haughey, S. A., Balog, J., Stead, S., Pringle, S. D., Riina, M. V., Martucci, F., Acutis, P. L., Morris, M., Nikolopoulos, D. S., Takats, Z., & Elliott, C. T. (2017). "A Real Time Metabolomic Profiling Approach to Detecting Fish Fraud Using Rapid Evaporative Ionisation Mass Spectrometry." Metabolomics, 13(12), 153. https://doi.org/10.1007/s11306-017-1291-y
Download Paper
Error-Resilient Server Ecosystems for Edge and Cloud Datacenters
Published in IEEE Computer, 2017
Presents error-resilient server ecosystems for edge and cloud datacenters, addressing performance and power variability through hardware exposure interfaces and energy-efficient microserver architectures for IoT applications.
Recommended citation: Karakonstantis, G., Nikolopoulos, D. S., Gizopoulos, D., Trancoso, P., Sazeides, Y., Antonopoulos, C. D., Venugopal, S., & Das, S. (2017). "Error-Resilient Server Ecosystems for Edge and Cloud Datacenters." Computer, 50(12), 78-81. https://doi.org/10.1109/MC.2017.4451208
Download Paper
FairGV: Fair and Fast GPU Virtualization
Published in IEEE Transactions on Parallel and Distributed Systems, 2017
FairGV proposes a GPU virtualization mechanism that combines fair queuing and trap-less architecture to improve scheduling efficiency across virtual machines.
Recommended citation: Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "FairGV: Fair and Fast GPU Virtualization." *IEEE TPDS*, 28(12), 3472–3485. https://doi.org/10.1109/TPDS.2017.2717908
Download Paper
SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
Published in ACM Transactions on Architecture and Code Optimization, 2017
Presents SCALO, a runtime framework for orchestrating thread parallelism across co-executing applications on multicore machines, improving system throughput by up to 40%.
Recommended citation: Georgakoudis, G., Vandierendonck, H., Thoman, P., et al. (2017). "SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads." ACM Transactions on Architecture and Code Optimization (TACO), 14(4), Article 54. https://doi.org/10.1145/3158643
Download Paper
DARE: Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers
Published in The International Journal of High Performance Computing Applications, 2018
Presents DARE, a Data-Access Aware Refresh system that leverages spatial-temporal application resilience to aggressively relax DRAM refresh rates on commodity servers, achieving complete hardware refresh disabling with only 2-18% quality loss.
Recommended citation: Chalios, C., Georgakoudis, G., Tovletoglou, K., Karakonstantis, G., Vandierendonck, H., & Nikolopoulos, D. S. (2018). "DARE: Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers." The International Journal of High Performance Computing Applications, 32(1), 74-88. https://doi.org/10.1177/1094342017718612
Download Paper
Using Docker Swarm with a User-Centric Decision-Making Framework for Cloud Application Migration
Published in Cloud Computing and Service Science, 2018
Proposes MyMinder, a Multi-objective dYnamic MIgratioN Decision makER framework that assists cloud users in inter-cloud migration decisions and provides automated migration capabilities using Docker Swarm technology to overcome vendor lock-in challenges.
Recommended citation: Barlaskar, E., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2018). "Using Docker Swarm with a User-Centric Decision-Making Framework for Cloud Application Migration." In Cloud Computing and Service Science (pp. 81-101). Springer. https://doi.org/10.1007/978-3-319-94959-8_5
Download Paper
Proceedings of the MiniSymposium on Edge Computing
Published in Parallel Computing is Everywhere, 2018
Organizes and introduces a minisymposium on edge computing, covering current trends and challenges in edge computing technologies.
Recommended citation: Antonopoulos, C. D., & Nikolopoulos, D. S. (2018). "MiniSymposium on Edge Computing." In Parallel Computing is Everywhere, 783. IOS Press.
Download Paper
Power Modelling for Heterogeneous Cloud-Edge Data Centers
Published in Parallel Computing is Everywhere (Advances in Parallel Computing, Vol. 32), 2018
Develops a method for deploying power models on emerging processors for cloud-edge data centers, proposing automated hardware counter selection and a two-stage power model that works across ARM and Intel architectures.
Recommended citation: Chen, K., Varghese, B., Kilpatrick, P., & Nikolopoulos, D. S. (2018). "Power Modelling for Heterogeneous Cloud-Edge Data Centers." In Parallel Computing is Everywhere (pp. 804-813). Advances in Parallel Computing, Vol. 32. https://doi.org/10.3233/978-1-61499-843-3-804
Download Paper
New Approaches to Memory Reliability Management for Big Data Workloads
Published in SIAM Conference on Parallel Processing for Scientific Computing, 2018
Presents new approaches to memory reliability management specifically designed for big data workloads to improve system resilience and data integrity.
Recommended citation: Nikolopoulos, D. (2018). "New Approaches to Memory Reliability Management for Big Data Workloads." In SIAM Conference on Parallel Processing for Scientific Computing.
Download Paper
An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits
Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018
Presents a resilient and energy-efficient server architecture designed to exceed conventional scalability limits by combining hardware and software innovations.
Recommended citation: Karakonstantis, G., Tovletoglou, K., Mukhanov, L., et al. (2018). "An Energy-Efficient and Error-Resilient Server Ecosystem Exceeding Conservative Scaling Limits." DATE 2018, 1099–1104. https://doi.org/10.23919/DATE.2018.8342175
Download Paper
Incremental Training of Deep Convolutional Neural Networks
Published in arXiv preprint, 2018
This paper introduces techniques for incremental training of deep convolutional neural networks to improve training efficiency over traditional methods.
Recommended citation: Istrate, R., Malossi, A. C. I., Bekas, C., & Nikolopoulos, D. (2018). "Incremental Training of Deep Convolutional Neural Networks." arXiv:1803.10232. https://arxiv.org/abs/1803.10232
Download Paper
A Taxonomy of Task-Based Technologies for High-Performance Computing
Published in Parallel Processing and Applied Mathematics (PPAM), 2018
Presents a comprehensive taxonomy and classification of task-based technologies for high-performance computing, covering diverse programming models and runtime features across heterogeneous and many-core systems.
Recommended citation: Thoman, P., Hasanov, K., Dichev, K., Iakymchuk, R., Aguilar, X., Gschwandtner, P., Lemarinier, P., Markidis, S., Jordan, H., Laure, E., Katrinis, K., Nikolopoulos, D. S., & Fahringer, T. (2018). "A Taxonomy of Task-Based Technologies for High-Performance Computing." In Parallel Processing and Applied Mathematics (pp. 264-274). Springer. https://doi.org/10.1007/978-3-319-78054-2_25
Download Paper
The Transprecision Computing Paradigm: Concept, Design, and Applications
Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018
This paper introduces the transprecision computing paradigm, focusing on its architectural design, energy-efficient implementations, and applications in low-power computing environments.
Recommended citation: Malossi, A. C. I., Schaffner, M., Molnos, A., Gammaitoni, L., Tagliavini, G., Emerson, A., Tomás, A., Nikolopoulos, D. S., Flamand, E., & Wehn, N. (2018). "The Transprecision Computing Paradigm: Concept, Design, and Applications." *DATE 2018*, 1105–1110. https://doi.org/10.23919/DATE.2018.8342176
Download Paper
A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing
Published in The Journal of Supercomputing, 2018
This paper introduces a taxonomy of task-based programming models and runtime systems for high-performance computing, providing a comprehensive classification of contemporary technologies in the context of many-core and heterogeneous systems.
Recommended citation: Thoman, P., Dichev, K., Heller, T., Iakymchuk, R., Aguilar, X., Hasanov, K., Gschwandtner, P., Lemarinier, P., Markidis, S., Jordan, H., Fahringer, T., Katrinis, K., Laure, E., & Nikolopoulos, D. S. (2018). "A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing." The Journal of Supercomputing, 74(4), 1422–1434. https://doi.org/10.1007/s11227-018-2238-4
Download Paper
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Published in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2018
Welcome message from the General Chairs of the 2018 IEEE International Symposium on Performance Analysis of Systems and Software.
Recommended citation: General Chairs. (2018). "Welcome from the General Chairs." In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 9. https://doi.org/10.1109/ISPASS.2018.00005
Download Paper
Intra-Node Memory Safe GPU Co-Scheduling
Published in IEEE Transactions on Parallel and Distributed Systems, 2018
Proposes SchedGPU, a co-scheduling mechanism for GPU workloads that ensures memory safety and improves utilization on shared memory systems.
Recommended citation: Reaño, C., Silla, F., Nikolopoulos, D. S., & Varghese, B. (2018). "Intra-Node Memory Safe GPU Co-Scheduling." *IEEE TPDS*, 29(5), 1089–1102. https://doi.org/10.1109/TPDS.2017.2784428
Download Paper
Energy-efficient localised rollback after failures via data flow analysis
Published in arXiv preprint, 2018
Presents energy-efficient localised rollback mechanisms after failures using data flow analysis to minimize recovery overhead.
Recommended citation: Dichev, K., Cameron, K., & Nikolopoulos, D. (2018). "Energy-efficient localised rollback after failures via data flow analysis." arXiv preprint arXiv:1806.01611.
Download Paper
GPU Virtualization and Scheduling Methods: A Comprehensive Survey
Published in ACM Computing Surveys, 2018
This survey presents a comprehensive review of GPU virtualization techniques and scheduling strategies, covering methods across libraries, drivers, and hardware, with implications for heterogeneous cloud computing.
Recommended citation: Hong, C.-H., Spence, I., & Nikolopoulos, D. S. (2017). "GPU Virtualization and Scheduling Methods: A Comprehensive Survey." *ACM Computing Surveys*, 50(3), Article 35. https://doi.org/10.1145/3068281
Download Paper
VEBO: A Vertex-and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing
Published in arXiv preprint, 2018
Presents VEBO, a vertex-and edge-balanced ordering heuristic designed to achieve load balancing in parallel graph processing applications through improved data distribution strategies.
Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2018). "VEBO: A Vertex-and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing." arXiv preprint arXiv:1806.06576.
Download Paper
Characterization of HPC workloads on an ARMv8 based server under relaxed DRAM refresh and thermal stress
Published in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2018
Develops an experimental framework on a 64-bit ARM server to characterize DRAM reliability under relaxed refresh periods and thermal stress, evaluating HPC workloads and demonstrating 35X refresh period relaxation with 11.2% power savings.
Recommended citation: Mukhanov, L., Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "Characterization of HPC workloads on an ARMv8 based server under relaxed DRAM refresh and thermal stress." In Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 230-235. https://doi.org/10.1145/3229631.3236091
Download Paper
DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server
Published in IEEE International Symposium on On-Line Testing And Robust System Design (IOLTS), 2018
Characterizes DRAM behavior under relaxed refresh periods in commodity servers, analyzing system-level effects including temperature and reliability impacts for improved memory management.
Recommended citation: Mukhanov, L., Tovletoglou, K., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server." In 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), 236-239. https://doi.org/10.1109/IOLTS.2018.8474184
Download Paper
Minimization of Timing Failures in Pipelined Designs via Path Shaping and Operand Truncation
Published in IEEE International Symposium on On-Line Testing And Robust System Design (IOLTS), 2018
Presents techniques for minimizing timing failures in pipelined designs through path shaping and operand truncation methods to improve reliability and performance.
Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2018). "Minimization of Timing Failures in Pipelined Designs via Path Shaping and Operand Truncation." In 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), 171-176. https://doi.org/10.1109/IOLTS.2018.8474084
Download Paper
NanoStreams: A Microserver Architecture for Real-Time Analytics on Fast Data Streams
Published in IEEE Transactions on Multi-Scale Computing Systems, 2018
Presents NanoStreams, a microserver architecture that leverages FPGAs and reconfigurable computing for real-time analytics on fast data streams.
Recommended citation: Minhas, U. I., Russell, M., Kaloutsakis, S., Barber, P., Woods, R., Georgakoudis, G., Gillan, C., Nikolopoulos, D. S., & Bilas, A. (2018). "NanoStreams: A Microserver Architecture for Real-Time Analytics on Fast Data Streams." IEEE Transactions on Multi-Scale Computing Systems, 4(3), 396-409. https://doi.org/10.1109/TMSCS.2017.2764087
Download Paper
The VINEYARD integrated framework for hardware accelerators in the cloud
Published in 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2018
Presents the VINEYARD framework for seamless deployment and utilization of hardware accelerators in the cloud, achieving up to 25× speedup without increasing programming complexity for machine learning and neurocomputing applications.
Recommended citation: Kachris, C., Soudris, D., Mavridis, S., Pavlidakis, M., Symeonidou, C., Kozanitis, C., Bilas, A., Fenacci, D., Bogaraju, S. V., Vandierendonck, H., & Nikolopoulos, D. S. (2018). "The VINEYARD integrated framework for hardware accelerators in the cloud." In Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 236-243. https://doi.org/10.1145/3229631.3236093
Download Paper
Variation-Aware Pipelined Cores through Path Shaping and Dynamic Cycle Adjustment: Case Study on a Floating-Point Unit
Published in International Symposium on Low Power Electronics and Design (ISLPED), 2018
Proposes a framework for minimizing variation-induced timing failures in pipelined designs through path shaping and dynamic cycle adjustment, demonstrated on an IEEE-754 double precision floating-point unit.
Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2018). Variation-Aware Pipelined Cores through Path Shaping and Dynamic Cycle Adjustment: Case Study on a Floating-Point Unit. In *Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '18)*, Article 52. https://doi.org/10.1145/3218603.3218617
Download Paper
Expediting assessments of database performance for streams of respiratory parameters
Published in Computers in Biology and Medicine, 2018
Proposes new methodology and metrics for comparing database performance when handling streams of patient respiratory data in intensive care settings, using non-parametric bootstrapping to optimize testing time.
Recommended citation: Gillan, C. J., Novakovic, A., Marshall, A. H., Shyamsundar, M., & Nikolopoulos, D. S. (2018). Expediting assessments of database performance for streams of respiratory parameters. *Computers in Biology and Medicine*, 100, 186-195. https://doi.org/10.1016/j.compbiomed.2018.05.028
Download Paper
Supporting Cloud IaaS Users in Detecting Performance-Based Violation for Streaming Applications
Published in 2018 IEEE International Conference on Autonomic Computing (ICAC), 2018
Supports cloud IaaS users in detecting performance-based violations for streaming applications through cloud monitoring and QoS violation detection mechanisms to ensure service quality and throughput requirements.
Recommended citation: Barlaskar, E., Dichev, K., Kilpatrick, P., Spence, I., & Nikolopoulos, D. S. (2018). "Supporting Cloud IaaS Users in Detecting Performance-Based Violation for Streaming Applications." In 2018 IEEE International Conference on Autonomic Computing (ICAC), 163-168. https://doi.org/10.1109/ICAC.2018.00027
Download Paper
Energy-efficient localised rollback via data flow analysis and frequency scaling
Published in European MPI Users' Group Meeting (EuroMPI), 2018
Introduces Data Flow Rollback (DFR), an approach that localizes recovery after failures in HPC systems by analyzing data flow patterns, reducing energy consumption via frequency scaling of idle nodes.
Recommended citation: Dichev, K., Cameron, K., & Nikolopoulos, D. S. (2018). Energy-efficient localised rollback via data flow analysis and frequency scaling. In *Proceedings of the 25th European MPI Users' Group Meeting (EuroMPI '18)*, Article 11. https://doi.org/10.1145/3236367.3236379
Download Paper
Proceedings: 2018 IEEE International Conference on Cluster Computing (CLUSTER)
Published in IEEE CLUSTER 2018, 2018
Editorial contribution to the proceedings of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), held in October 2018.
Recommended citation: Nikolopoulos, D. S., & De Supinski, B. R. (Eds.). *Proceedings: 2018 IEEE International Conference on Cluster Computing (CLUSTER)*. IEEE, October 2018.
Download Paper
RADS: Real-time Anomaly Detection System for Cloud Data Centres
Published in arXiv preprint, 2018
Presents RADS, a real-time anomaly detection system designed for cloud data centers to identify and respond to system anomalies in distributed computing environments.
Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2018). "RADS: Real-time Anomaly Detection System for Cloud Data Centres." arXiv preprint arXiv:1811.04481.
Download Paper
RADS: Real-time Anomaly Detection System for Cloud Data Centres
Published in arXiv preprint, 2018
RADS presents a real-time system for detecting anomalies in cloud data centers, emphasizing responsiveness and low-latency decision making for scalable cloud environments.
Recommended citation: Barbhuiya, S., Papazachos, Z., Kilpatrick, P., & Nikolopoulos, D. S. (2018). "RADS: Real-time Anomaly Detection System for Cloud Data Centres." *arXiv preprint*, arXiv:1811.04481. https://arxiv.org/abs/1811.04481
Download Paper
Code and Data Transformations to Address Garbage Collector Performance in Big Data Processing
Published in IEEE International Conference on High Performance Computing (HiPC), 2018
Presents code and data transformation techniques to address garbage collector performance bottlenecks in big data processing applications, focusing on memory management optimizations for Spark and Java-based systems.
Recommended citation: Fenacci, D., Vandierendonck, H., & Nikolopoulos, D. (2018). "Code and Data Transformations to Address Garbage Collector Performance in Big Data Processing." In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), 284-293. https://doi.org/10.1109/HiPC.2018.00040
Download Paper
Energy-Efficient Iterative Refinement Using Dynamic Precision
Published in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018
Proposes a dynamic precision refinement technique for iterative algorithms that adapts computational accuracy at runtime to reduce energy consumption without sacrificing convergence.
Recommended citation: Lee, J., Vandierendonck, H., Arif, M., Peterson, G. D., & Nikolopoulos, D. S. (2018). Energy-Efficient Iterative Refinement Using Dynamic Precision. *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, 8(4), 722–735. https://doi.org/10.1109/JETCAS.2018.2850665
Download Paper
Userspace Hypervisor Data Characterization in Virtualized Environment
Published in IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2018
Characterizes userspace hypervisor data patterns in virtualized environments using error injection and reliability analysis to improve data structure management and virtual machine monitor performance.
Recommended citation: Wang, B., Vandierendonck, H., Karakonstantis, G., & Nikolopoulos, D. S. (2018). "Userspace Hypervisor Data Characterization in Virtualized Environment." In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), 638-645. https://doi.org/10.1109/PADSW.2018.8644612
Download Paper
Bio-Inspired Growth: Introducing Emergence into Computational Design
Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019
Introduces emergence into computational design through bio-inspired growth principles, presenting a four-tiered structure that enables computers to create unexpected and innovative solutions using predictive non-determinism and stochastic rules.
Recommended citation: Kyle, S., Nolan, D., Price, M., Zhang, W., Robinson, T., Nikolopoulos, D. S., & Barbhuiya, S. (2019). "Bio-Inspired Growth: Introducing Emergence into Computational Design." In Advances in Manufacturing Technology XXXIII (pp. 379-385). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190067
Download Paper
Design Gene Representations for Emergent Innovative Design
Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019
Presents an alternative bottom-up engineering design system using “design genes” that trigger and control design growth within CAD systems, allowing unpredicted-but-valuable designs to emerge with minimal constraints.
Recommended citation: Zhang, W., Price, M., Robinson, T., Nolan, D., Nikolopoulos, D., Barbhuiya, S., & Kyle, S. (2019). "Design Gene Representations for Emergent Innovative Design." In Advances in Manufacturing Technology XXXIII (pp. 386-392). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190068
Download Paper
Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server
Published in IEEE Computer Architecture Letters, 2019
Presents Shimmer, a heterogeneous-reliability DRAM framework implemented on commodity servers that manages critical data with different reliability levels for improved power efficiency and energy savings.
Recommended citation: Tovletoglou, K., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2019). "Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server." IEEE Computer Architecture Letters, 18(1), 26-29. https://doi.org/10.1109/LCA.2019.2893189
Download Paper
SmartMaaS: A Framework for Smart Manufacturing-as-a-Service
Published in Advances in Manufacturing Technology XXXIII (Advances in Transdisciplinary Engineering, Vol. 9), 2019
Introduces SmartMaaS, a framework for Smart Manufacturing-as-a-Service that enables manufacturers to offer production capabilities as on-demand services with intelligent negotiation and optimization capabilities.
Recommended citation: Barbhuiya, S., Nikolopoulos, D. S., Price, M., Robinson, T., Nolan, D., Zhang, W., & Kyle, S. (2019). "SmartMaaS: A Framework for Smart Manufacturing-as-a-Service." In Advances in Manufacturing Technology XXXIII (pp. 16-21). Advances in Transdisciplinary Engineering, Vol. 9. https://doi.org/10.3233/ATDE190005
Download Paper
VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing
Published in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2019
Introduces VEBO, a vertex- and edge-balanced ordering heuristic that improves load balancing for parallel graph processing by balancing both edges and unique destination vertices.
Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2019). VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing. In *Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP '19)*, 391-392. https://doi.org/10.1145/3293883.3295703
Download Paper
Significance-Driven Data Truncation for Preventing Timing Failures
Published in IEEE Transactions on Device and Materials Reliability, 2019
Presents a significance-driven approach to data truncation that prevents timing failures in computing systems while maintaining output quality.
Recommended citation: Tsiokanos, I., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2019). Significance-Driven Data Truncation for Preventing Timing Failures. *IEEE Transactions on Device and Materials Reliability*, 19(1), 25-36. https://doi.org/10.1109/TDMR.2019.2898949
Download Paper
SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019
Presents SAFIRE, a fault injection framework that targets soft errors in multithreaded applications with high scalability and accuracy.
Recommended citation: Georgakoudis, G., Laguna, I., Vandierendonck, H., et al. (2019). "SAFIRE: Scalable Fault Injection for Multithreaded Apps." In IPDPS 2019, 890–899. https://doi.org/10.1109/IPDPS.2019.00097
Download Paper
TAPAS: Train-Less Accuracy Predictor for Architecture Search
Published in AAAI Conference on Artificial Intelligence, 2019
TAPAS introduces a novel train-less predictor for neural architecture accuracy estimation across datasets, enabling rapid architecture search with minimal computational cost.
Recommended citation: Istrate, R., Scheidegger, F., Mariani, G., Nikolopoulos, D., Bekas, C., & Malossi, A. C. I. (2019). "TAPAS: Train-Less Accuracy Predictor for Architecture Search." *AAAI 2019*, 33(01), 3927–3934. https://doi.org/10.1609/aaai.v33i01.33013927
Download Paper
Implementing efficient message logging protocols as MPI application extensions
Published in European MPI Users' Group Meeting (EuroMPI), 2019
Implements efficient message logging protocols as MPI application extensions to enable local rollback capabilities without requiring a complete MPI library redesign, demonstrated on CG and LULESH kernels.
Recommended citation: Dichev, K., & Nikolopoulos, D. S. (2019). "Implementing efficient message logging protocols as MPI application extensions." In Proceedings of the 26th European MPI Users' Group Meeting, Article 8. https://doi.org/10.1145/3343211.3343219
Download Paper
Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems
Published in IEEE Transactions on Computers, 2019
Presents techniques for fast and energy-efficient OLAP data management on hybrid main memory systems combining volatile and non-volatile memory technologies.
Recommended citation: Hassan, A., Nikolopoulos, D. S., & Vandierendonck, H. (2019). "Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems." IEEE Transactions on Computers, 68(11), 1597-1611. https://doi.org/10.1109/TC.2019.2919287
Download Paper
Hyperqueues: Design and Implementation of Deterministic Concurrent Queues
Published in ACM Transactions on Parallel Computing (TOPC), 2019
Presents hyperqueues, a programming abstraction that extends Cilk++ hyperobjects to provide deterministic and scale-free parallel programs with concurrent queue operations.
Recommended citation: Vandierendonck, H., & Nikolopoulos, D. S. (2019). "Hyperqueues: Design and Implementation of Deterministic Concurrent Queues." ACM Trans. Parallel Comput., 6(4), Article 23. https://doi.org/10.1145/3365660
Download Paper
Workload-Aware DRAM Error Prediction using Machine Learning
Published in IEEE International Symposium on Workload Characterization (IISWC), 2019
This study presents a machine learning framework for predicting DRAM errors in HPC systems by analyzing workload characteristics.
Recommended citation: Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D. S., & Karakonstantis, G. (2019). "Workload-Aware DRAM Error Prediction using Machine Learning." *IISWC 2019*, 106–118. https://doi.org/10.1109/IISWC47752.2019.9041963
Download Paper
Feasibility of Fog Computing
Published in Handbook of Integration of Cloud Computing, Cyber Physical Systems and Internet of Things, 2020
This book chapter discusses the feasibility of fog computing as a decentralized alternative to cloud-centric models, showing improved latency and reduced cloud traffic in an online gaming use case and advocating for broader integration of edge resources.
Recommended citation: Varghese, B., Wang, N., Nikolopoulos, D. S., & Buyya, R. (2020). "Feasibility of Fog Computing." In R. Ranjan, K. Mitra, P. Prakash Jayaraman, L. Wang, & A. Y. Zomaya (Eds.), Handbook of Integration of Cloud Computing, Cyber Physical Systems and Internet of Things (pp. 127–146). Springer. https://doi.org/10.1007/978-3-030-43795-4_5
Download Paper
DroidLight: Lightweight Anomaly-Based Intrusion Detection System for Smartphone Devices
Published in International Conference on Distributed Computing and Networking, 2020
DroidLight is a lightweight one-class classifier-based IDS designed to detect zero-day malware on smartphones with low overhead and high accuracy.
Recommended citation: Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2020). "DroidLight: Lightweight Anomaly-Based Intrusion Detection System for Smartphone Devices." ICDCN '20, Article 31. https://doi.org/10.1145/3369740.3369796
Download Paper
Fast load balance parallel graph analytics with an automatic graph data structure selection algorithm
Published in Future Generation Computer Systems, 2020
Proposes GraphGrind, a shared memory graph analytics framework with automatic data structure selection and reordering algorithms, outperforming Ligra up to 10.4× and Polymer up to 8.3×.
Recommended citation: Sun, J., Vandierendonck, H., & Nikolopoulos, D. S. (2020). "Fast load balance parallel graph analytics with an automatic graph data structure selection algorithm." Future Generation Computer Systems, 112, 612-623. https://doi.org/10.1016/j.future.2020.06.005
Download Paper
DYVERSE: DYnamic VERtical Scaling in Multi-Tenant Edge Environments
Published in Future Generation Computer Systems, 2020
DYVERSE introduces a lightweight vertical scaling mechanism to manage multi-tenancy in Edge computing through static and dynamic priority policies, reducing SLO violations and latency.
Recommended citation: Wang, N., Matthaiou, M., Nikolopoulos, D. S., & Varghese, B. (2020). "DYVERSE: DYnamic VERtical Scaling in Multi-Tenant Edge Environments." *Future Generation Computer Systems*, 108, 598–612. https://doi.org/10.1016/j.future.2020.02.043
Download Paper
HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers
Published in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020
HaRMony is a system that combines heterogeneous-reliability memory with QoS-aware energy management policies for virtualized servers, reducing DRAM energy and performance overhead.
Recommended citation: Tovletoglou, K., Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2020). "HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers." ASPLOS '20, 575–590. https://doi.org/10.1145/3373376.3378489
Download Paper
DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search
Published in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020
Presents DEFCON, a method for generating and detecting failure-prone instruction sequences using stochastic search techniques. Best Paper Award
Recommended citation: Tsiokanos, I., Mukhanov, L., Georgakoudis, G., Nikolopoulos, D. S., & Karakonstantis, G. (2020). DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search. In *2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)*, 1121-1126. https://doi.org/10.23919/DATE48585.2020.9116363
Download Paper
Cross Architectural Power Modelling
Published in IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2020
Presents cross-architectural power modeling techniques using hardware counters and noise filtering to enable accurate power prediction across different processor architectures.
Recommended citation: Chen, K., Kilpatrick, P., Nikolopoulos, D. S., & Varghese, B. (2020). "Cross Architectural Power Modelling." In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 390-399. https://doi.org/10.1109/CCGrid49817.2020.00-54
Download Paper
Fast Analysis and Prediction in Large Scale Virtual Machines Resource Utilisation
Published in 10th International Conference on Cloud Computing and Services Science (CLOSER), 2020
Presents fast analysis and prediction techniques for resource utilization in large-scale virtual machine environments to optimize cloud resource management and capacity planning.
Recommended citation: Abubakar, A., Barbhuiya, S., Kilpatrick, P., Vien, N., & Nikolopoulos, D. (2020). "Fast Analysis and Prediction in Large Scale Virtual Machines Resource Utilisation." In Proceedings of the 10th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, 115-126. SciTePress. https://doi.org/10.5220/0009408701150126
Download Paper
RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices
Published in USENIX Conference on Operational Machine Learning (OpML), 2020
Presents RIANN, a real-time incremental learning framework using approximate nearest neighbor algorithms optimized for mobile device constraints.
Recommended citation: Liu, J., Xie, Z., Nikolopoulos, D., & Li, D. (2020). "RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices." In 2020 USENIX Conference on Operational Machine Learning (OpML 20). USENIX Association.
Download Paper
AIR: Iterative refinement acceleration using arbitrary dynamic precision
Published in Parallel Computing, 2020
Introduces AIR, an algorithm that dynamically adjusts arithmetic precision in iterative refinement to improve performance while maintaining backward stability.
Recommended citation: Lee, J., Peterson, G. D., Nikolopoulos, D. S., & Vandierendonck, H. (2020). AIR: Iterative refinement acceleration using arbitrary dynamic precision. *Parallel Computing*, 97, 102663. https://doi.org/10.1016/j.parco.2020.102663
Download Paper
DStress: Automatic Synthesis of DRAM Reliability Stress Viruses using Genetic Algorithms
Published in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020
Presents DStress, a technique to synthesize DRAM stress viruses using genetic algorithms to expose reliability faults. Best Paper Award Nominee
Recommended citation: Mukhanov, L., Nikolopoulos, D. S., & Karakonstantis, G. (2020). "DStress: Automatic Synthesis of DRAM Reliability Stress Viruses Using Genetic Algorithms." MICRO 2020, 298–312. https://doi.org/10.1109/MICRO50266.2020.00035
Download Paper
ENORM: A Framework For Edge NOde Resource Management
Published in IEEE Transactions on Services Computing, 2020
This paper presents ENORM, a framework for managing resources on edge nodes to support fog computing environments, focusing on provisioning, scaling, and QoS guarantees.
Recommended citation: Wang, N., Varghese, B., Matthaiou, M., & Nikolopoulos, D. S. (2020). "ENORM: A Framework For Edge NOde Resource Management." IEEE Transactions on Services Computing, 13(6), 1086–1099. https://doi.org/10.1109/TSC.2017.2753775
Download Paper
Linear Regression Based DDoS Attack Detection
Published in International Conference on Machine Learning and Computing (ICMLC), 2021
Proposes a linear regression based DDoS attack detection technique that reduces false positives by analyzing the correlation between average and standard deviation of network throughput in time series data.
Recommended citation: Barbhuiya, S., Kilpatrick, P., & Nikolopoulos, D. S. (2021). "Linear Regression Based DDoS Attack Detection." In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, 568-574. https://doi.org/10.1145/3457682.3457769
Download Paper
Proceedings of the Workshop on Deployment and Use of Accelerators (DUAC)
Published in International Conference on Parallel Processing (ICPP) Workshops, 2021
Organizes the DUAC 2021 workshop on deployment and use of accelerators in high performance computing environments.
Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2021). "Deployment and Use of Accelerators (DUAC 2021)." In 50th International Conference on Parallel Processing. Association for Computing Machinery.
Download Paper
Revealing DRAM Operating GuardBands Through Workload-Aware Error Predictive Modeling
Published in IEEE Transactions on Computers, 2021
Reveals DRAM operating guardbands through workload-aware error predictive modeling to optimize memory reliability and energy consumption trade-offs in low-power computing systems.
Recommended citation: Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D. S., & Karakonstantis, G. (2021). "Revealing DRAM Operating GuardBands Through Workload-Aware Error Predictive Modeling." IEEE Transactions on Computers, 70(11), 1976-1987. https://doi.org/10.1109/TC.2020.3033627
Download Paper
Efficient, Dynamic Multi-Task Execution on FPGA-Based Computing Systems
Published in IEEE Transactions on Parallel and Distributed Systems, 2022
Presents a runtime and scheduling framework for dynamic task virtualization and mapping on FPGA systems with high throughput and efficiency.
Recommended citation: Minhas, U. I., Woods, R., Nikolopoulos, D. S., & Karakonstantis, G. (2022). "Efficient Multi-Task Execution on FPGA-Based Systems." IEEE TPDS, 33(3), 710–722. https://doi.org/10.1109/TPDS.2021.3101153
Download Paper
Mixed-Precision Kernel Recursive Least Squares
Published in IEEE Transactions on Neural Networks and Learning Systems, 2022
Presents a mixed-precision approach to kernel recursive least squares for budget machine learning, enabling efficient online learning with improved throughput and memory management.
Recommended citation: Lee, J., Nikolopoulos, D. S., & Vandierendonck, H. (2022). "Mixed-Precision Kernel Recursive Least Squares." IEEE Transactions on Neural Networks and Learning Systems, 33(3), 1284-1298. https://doi.org/10.1109/TNNLS.2020.3041677
Download Paper
gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers
Published in Future Generation Computer Systems, 2022
Presents gShare, a centralized GPU memory management framework that enables efficient GPU memory sharing for containers with near-native performance and secure isolation.
Recommended citation: Lee, M., Ahn, H., Hong, C.-H., & Nikolopoulos, D. S. (2022). "gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers." Future Generation Computer Systems, 130, 181-192. https://doi.org/10.1016/j.future.2021.12.016
Download Paper
On Realizing Efficient Deep Learning Using Serverless Computing
Published in IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022
Explores approaches for implementing efficient deep learning training and inference using serverless computing platforms, addressing challenges in resource management and data parallelism.
Recommended citation: Assogba, K., Arif, M., Rafique, M. M., & Nikolopoulos, D. S. (2022). On Realizing Efficient Deep Learning Using Serverless Computing. In *2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)*, 220-229. https://doi.org/10.1109/CCGrid54584.2022.00031
Download Paper
Power Log’n’Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols
Published in IEEE Transactions on Parallel and Distributed Systems, 2022
Presents Power Log’n’Roll, a power-efficient localized rollback mechanism for MPI applications that uses message logging protocols to provide fault tolerance while reducing energy consumption.
Recommended citation: Dichev, K., De Sensi, D., Nikolopoulos, D. S., Cameron, K. W., & Spence, I. (2022). "Power Log'n'Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols." IEEE Transactions on Parallel and Distributed Systems, 33(6), 1276-1288. https://doi.org/10.1109/TPDS.2021.3107745
Download Paper
Proceedings of the International Workshop on Deployment and Use of Accelerators
Published in International Conference on Parallel Processing (ICPP) Workshops, 2022
Message from the organizing committee of the 2nd International Workshop on Deployment and Use of Accelerators.
Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2022). "Message from the 2nd DUAC Organizing Committee." In 51st International Conference on Parallel Processing, ICPP 2022.
Download Paper
Auto-scaling edge cloud for network slicing
Published in Frontiers in High Performance Computing, 2023
Presents a study on resource control for autoscaling virtual radio access networks (RAN slices) using chance-constrained programming to address stochastic bin packing problems in next-generation wireless networks.
Recommended citation: Mazied, E. A., Nikolopoulos, D. S., Hanafy, Y., & Midkiff, S. F. (2023). "Auto-scaling edge cloud for network slicing." Frontiers in High Performance Computing, Volume 1 - 2023. https://doi.org/10.3389/fhpcp.2023.1167162
Download Paper
Punching Holes in the Cloud: Direct Communication Between Serverless Functions
Published in Serverless Computing: Principles and Paradigms, 2023
Presents an ephemeral communication framework for serverless environments that enables direct network connections between functions, achieving 680 Mbps throughput and 4.7× performance improvement over object storage solutions.
Recommended citation: Moyer, D., & Nikolopoulos, D. S. (2023). "Punching Holes in the Cloud: Direct Communication Between Serverless Functions." In Serverless Computing: Principles and Paradigms (pp. 15-41). Springer. https://doi.org/10.1007/978-3-031-26633-1_2
Download Paper
Fine-Grain Slicing of Edge Cloud Servers for Radio Workloads
Published in IEEE Workshop on Hot Topics in System Infrastructure (HotInfra), 2023
Presents fine-grain slicing techniques for edge cloud servers to optimize performance for radio workloads, enabling efficient resource allocation and management in edge computing environments.
Recommended citation: Mazied, E. A., Nikolopoulos, D. S., & Midkiff, S. (2023). "Fine-Grain Slicing of Edge Cloud Servers for Radio Workloads." In IEEE Workshop on Hot Topics in System Infrastructure (HotInfra'23), in conjunction with ACM FCRC 2023. Orlando, FL.
Download Paper
Decentralised Biomedical Signal Classification using Early Exits
Published in IEEE Interregional NEWCAS Conference, 2023
Presents decentralised biomedical signal classification techniques using early exits for ECG arrhythmia detection in distributed wireless sensor networks.
Recommended citation: Li, X., Vandierendonck, H., Nikolopoulos, D. S., Ji, B., Cardiff, B., & John, D. (2023). "Decentralised Biomedical Signal Classification using Early Exits." In 2023 21st IEEE Interregional NEWCAS Conference (NEWCAS), 1-2. https://doi.org/10.1109/NEWCAS57931.2023.10198098
Download Paper
Proceedings of the International Workshop on Deployment and Use of Accelerators
Published in International Conference on Paralell Processing (ICPP) Workshops, 2023
Message from the organizing committee of the 3rd International Workshop on Deployment and Use of Accelerators, highlighting workshop objectives and contributions.
Recommended citation: Reaño, C., & Nikolopoulos, D. S. (2023). "The 3rd International Workshop on Deployment and Use of Accelerators (DUAC 2023): message from the DUAC 2023 Organizing Committee." In 3rd International Workshop on Deployment and Use of Accelerators 2023 (co-located with 52nd International Conference on Parallel Processing), vi. Association for Computing Machinery.
Download Paper
Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications
Published in arXiv preprint, 2023
Explores persistent memory based stateful serverless computing approaches for big data applications to improve performance and reduce cold start overhead in serverless environments.
Recommended citation: Li, Y., Assogba, K., Tripathy, A., Arif, M., Rafique, M. M., Butt, A. R., & Nikolopoulos, D. (2023). "Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications." arXiv preprint arXiv:2309.01662.
Download Paper
Towards Efficient Python Interpreter for Tiered Memory Systems
Published in USENIX Conference on File and Storage Technologies (FAST) - Poster Session, 2024
Presents optimizations for Python interpreter performance on tiered memory systems to improve execution efficiency and memory utilization.
Recommended citation: Li, Y., Yao, S., Mobin, J., Rafique, M. M., Nikolopoulos, D., Sundararajah, K., Li, H., & Butt, A. R. (2024). "Towards Efficient Python Interpreter for Tiered Memory Systems." In Poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST). USENIX.
Download Paper
Parallel Islands: A Parallel Computing Educational Video Game
Published in ACM Technical Symposium on Computer Science Education (SIGCSE), 2024
Presents Parallel Islands, an educational video game designed to teach parallel computing concepts through interactive gameplay and visualization.
Recommended citation: Cameron, M., Ellis, M., & Nikolopoulos, D. (2024). "Parallel Islands: A Parallel Computing Educational Video Game." In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2, 1586-1587.
Download Paper
On Robust Optimal Joint Deployment and Assignment of RAN Intelligent Controllers in O-RANs
Published in IEEE Open Journal of the Communications Society, 2024
Addresses robust optimal joint deployment and assignment of RAN Intelligent Controllers (RICs) in Open Radio Access Networks using chance-constrained stochastic optimization and two-stage stochastic optimization with recourse.
Recommended citation: Abdel-Rahman, M. J., Mazied, E. A., Hassan, F., Teague, K., Mackenzie, A. B., Midkiff, S. F., Cardoso, K. V., & Nikolopoulos, D. S. (2024). "On Robust Optimal Joint Deployment and Assignment of RAN Intelligent Controllers in O-RANs." IEEE Open Journal of the Communications Society, 5, 2358-2376. https://doi.org/10.1109/OJCOMS.2024.3383607
Download Paper
Application-Attuned Memory Management for Containerized HPC Workflows
Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024
Presents application-attuned memory management techniques for containerized HPC workflows, optimizing memory allocation and bandwidth utilization in tiered memory systems.
Recommended citation: Arif, M., Maurya, A., Rafique, M. M., Nikolopoulos, D. S., & Butt, A. R. (2024). "Application-Attuned Memory Management for Containerized HPC Workflows." In 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 114-127. https://doi.org/10.1109/IPDPS57955.2024.00019
Download Paper
FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference
Published in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024
Presents FrameFeedback, a closed-loop control system for dynamic offloading of real-time edge inference tasks, enabling adaptive offloading with feedback control for deep learning applications.
Recommended citation: Jackson, M., Ji, B., & Nikolopoulos, D. S. (2024). "FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference." In 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 584-591. https://doi.org/10.1109/IPDPSW63119.2024.00116
Download Paper
Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory
Published in arXiv preprint, 2024
Presents modeling techniques for page migration in tiered memory systems to optimize fast memory size allocation and improve system performance.
Recommended citation: Chen, S., Huang, J., Yang, S., Liu, J., Li, H., Nikolopoulos, D., Ryu, J., Baek, J., Shin, K., & Li, D. (2024). "Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory." arXiv preprint arXiv:2410.00328.
Download Paper
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
Published in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024
Presents ParvaGPU, an efficient spatial GPU sharing system designed for large-scale deep neural network inference in cloud computing environments.
Recommended citation: Lee, M., Seong, S., Kang, M., Lee, J., Na, G.-J., Chun, I.-G., Nikolopoulos, D., & Hong, C.-H. (2024). "ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments." In SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, 1-14. https://doi.org/10.1109/SC41406.2024.00048
Download Paper
Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs
Published in arXiv preprint, 2025
Presents parallel CPU-GPU execution techniques for large language model inference on constrained GPU systems to improve performance and resource utilization in memory-limited environments.
Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D. S. (2025). "Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs." arXiv preprint arXiv:2506.03296.
Download Paper
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
Published in AAAI Conference on Artificial Intelligence, 2025
This paper proposes High-Resolution Early Dropping (HiRED), a plug-and-play token-dropping method that enhances efficiency in high-resolution vision-language models while maintaining performance.
Recommended citation: Arif, K. H. I., Yoon, J., Nikolopoulos, D. S., Vandierendonck, H., John, D., & Ji, B. (2025). "HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models." Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1773-1781. https://doi.org/10.1609/aaai.v39i2.32171
Download Paper
MARCO: A Multi-Agent System for Optimizing HPC Code Generation Using Large Language Models
Published in arXiv, 2025
This paper presents MARCO, a novel framework that enhances LLM-generated code for high-performance computing through a specialized multi-agent architecture.
Recommended citation: Rahman, A., Cvetkovic, V., Reece, K., Walters, A., Hassan, Y., Tummeti, A., Torres, B., Cooney, D., Ellis, M., Nikolopoulos, D.S. (2025). "MARCO: A Multi-Agent System for Optimizing HPC Code Generation Using Large Language Models." arXiv preprint. arXiv:2505.03906.
Download Paper
talks
A Case for User-Level Page Migration
Published:
Interoperable System Software
Published:
Interoperable System Software
Published:
Program Transformations and Scheduling Algorithms for Managing Shared Caches on SMT Processors
Published:
Design and Implementation of Time- and Power-Efficient Software Stacks for Multicore Processors
Published:
Design and Implementation of Time- and Power-Efficient Software Stacks for Multicore Processors
Published:
teaching
CS5914: Emerging Topics in Computer Science
Graduate Course, Virginia Tech, Department of Computer Science, 2025
High Performance Code Generation using LLMs
This course explores the use of Large Language Models (LLMs) for high-performance code generation. Students will engage with recent research papers, participate in class discussions, and undertake a semester-long research project. The course is designed to build a deep understanding of LLMs, their applications, and their implications in software development.