Low-Compute Test-Time Adaptation Research Track
Overview
This page is for students who are interested in improving models at test time while keeping GPU usage as small as possible.
You do not need to start with large-scale training or expensive hardware. This track is designed for students who want a modern research topic with realistic experiments and clear questions.
The goal of this track is to explore questions such as:
How can we improve a pretrained model at test time without full retraining?
How can we make test-time adaptation accurate and cheap?
How can we study modern CVPR-level topics with limited compute?
Can these ideas work for vision-only, language-only, vision-language, or vision-language-action models?
If you are interested in these questions, this track may be a good place to start.
What to avoid at the beginning
Do not begin with:
huge multimodal LLM fine-tuning
methods requiring heavy backpropagation at test time
projects with unclear evaluation settings
very large benchmarks before understanding the core problem
When to contact me
If you read some of the papers on this page and feel interested, feel free to contact me.
You do not need to understand everything before reaching out.
Interest, curiosity, and steady effort are enough.
A careful research attitude matters more than starting with a large model.
Part I. A simple starting path
A good starting path is the following:
Start with CLIP to understand zero-shot transfer.
Read TDA to see how test-time improvement can be done efficiently.
Read one recent paper on realistic test-time evaluation.
Reproduce one baseline on a small benchmark with distribution shift.
Measure not only accuracy, but also latency and memory cost.
You do not need to do everything at once.
Part II. Core background for efficient test-time learning
These are the most important shared starting points for this page.
1. CLIP (ICML 2021)
Paper: Learning Transferable Visual Models From Natural Language Supervision
Code: https://github.com/openai/CLIP
Why read it: a strong starting point for modern vision-language transfer
Focus on: image-text alignment, zero-shot classification, prompt-based inference
2. Tent (ICLR 2021)
Paper: Fully Test-Time Adaptation by Entropy Minimization
Code: https://github.com/DequanWang/tent
Why read it: one of the simplest and most influential starting points for test-time adaptation
Focus on: entropy minimization, online adaptation, normalization-based updates
3. TDA (CVPR 2024)
Paper: Efficient Test-Time Adaptation of Vision-Language Models
Code: https://github.com/kdiAAA/TDA
Why read it: a clear starting point for efficient test-time adaptation in multimodal settings
Focus on: training-free adaptation, cache-based updates, low-cost improvement at inference time
Part III. Main track — Efficient Test-Time Adaptation
Typical question:
How can we improve a pretrained model on shifted data without expensive retraining?
Why this track is good
This track is suitable for students who want:
a modern topic with manageable experiments
strong connections to robustness and deployment
a practical path toward publication
Possible directions
A. Vision-only test-time adaptation
Why study it: often the simplest place to begin
Good for students because: experiments are lighter and the core adaptation issue is easier to isolate
Recommended papers:
DELTA (ICLR 2023)
Code: https://github.com/bwbwzhao/DELTA
B. Language or multimodal test-time adaptation
Why study it: modern models increasingly rely on cross-modal or language-conditioned inference
Good for students because: this direction is timely and closely connected to current foundation model research
Recommended papers:
Realistic Test-Time Adaptation of Vision-Language Models (CVPR 2025)
Code: https://github.com/MaxZanella/StatABayesian Test-Time Adaptation for Vision-Language Models (CVPR 2025)
Code: https://github.com/buerzlh/BCA
C. Vision-Language-Action adaptation (emerging direction)
Why study it: action-conditioned systems face deployment shift and low-latency constraints, so test-time improvement is especially valuable at deployment time.
Good for students because: this can become a distinctive topic if framed carefully around test-time robustness, online correction, and compute limits
Recommended papers:
On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning (2026)
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models (2025)
Code: https://github.com/sylvestf/LIBERO-plusLIBERO-X: Robustness Litmus for Vision-Language-Action Models (2026)
Code: https://github.com/zackhxn/LIBERO-X
Why these papers matter:
the first paper is directly about on-the-fly adaptation at deployment time
the latter two provide realistic robustness benchmarks for testing whether VLA adaptation is actually useful under distribution shift
Possible direction:
test-time adaptation for policy robustness
low-compute calibration for action-conditioned models
online correction under visual distribution shift
evaluation under realistic robotic perturbations
What students should reproduce first in this track
Choose one:
TDA
Bayesian Test-Time Adaptation
Then compare it against:
one frozen pretrained baseline
one realistic evaluation setting
Good starter experiments for this track
low-resolution corruption
blur / noise corruption
small online test streams
few-shot target-domain adaptation
modality-specific settings depending on the chosen direction
Part IV. A promising publication direction
A strong project in this track could be:
Realistic and efficient test-time adaptation under strict compute constraints
This direction is attractive because it combines:
modern vision-language learning
realistic deployment constraints
low-GPU experimentation
a clear and focused research question
The page should stay focused on one question:
How can we improve a deployed model at test time with minimal extra compute?
Part V. Good starter benchmarks
CIFAR-100
Oxford Flowers
DTD
EuroSAT
ImageNet subset
corruption benchmarks with blur, noise, or low resolution
Avoid very large datasets at the beginning.
Part VI. Suggested first mini-project
A strong first project is:
reproduce one lightweight test-time adaptation baseline
compare it with one frozen pretrained baseline
evaluate under one realistic shift setting
report accuracy, latency overhead, and memory overhead
This is a good starting point because it answers a concrete and modern question:
Can test-time adaptation remain useful when compute is limited and deployment is realistic?
Final note
It is better to have one clear track than to force separate tracks with the same papers.
So this page should stay centered on:
test-time adaptation
realistic evaluation
low-compute improvement
robustness under shift
modality flexibility across vision, language, multimodal, or VLA settings