You can also find my articles on my Google Scholar profile


Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection


QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models


Retrospective Sparse Attention for Efficient Long-Context Generation


DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation