Publications

(2026). KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference. In ACM MobiSys ‘26.
(2026). The new compiler stack: a survey on the synergy of LLMs and compilers. In CCF THPC.
(2026). From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization. In CGO 2026.
(2025). Leveraging Compilation Statistics for Compiler Phase Ordering. IPDPS'25.
(2025). Accelerating Tensor-train Decomposition on Graph Neural Networks. IPDPS'25.
(2024). Optimizing Deep Learning Inference via Global Analysis and Tensor Expression. In ASPLOS ‘24.
(2022). HOPE: a heterogeneity-oriented parallel execution engine for inference on mobiles. In HTL ‘22.
(2021). Optimizing Deep Learning Inference via Global Analysis and Tensor Expression. In ATS ‘21.