APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
Published in Proceedings of the 40th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2026
Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D.S. (2026). *APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs*. In *Proceedings of the 40th IEEE International Parallel and Distributed Processing Symposum (IPDPS).*
Download Paper
