Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs
Published in arXiv preprint, 2025
Recommended citation: Fan, J., Zhang, Y., Li, X., & Nikolopoulos, D. S. (2025). "Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs." arXiv preprint arXiv:2506.03296.
Download Paper