Channel: #retrieval-and-search
The Retrieval and Search Subprogram drives foundational and applied research in the design, evaluation, and deployment of next-generation retrieval systems. With growing demand for multi-modal, multi-lingual, and context-aware search, our work spans theory, algorithms, and systems across classical IR and neural architectures.
Leads:
Focus Areas
Neural Information Retrieval (Neural IR)
Dense retrieval using dual encoders, late interaction (ColBERT), and generative retrievers.
Specialized architectures for domain-adapted retrieval (scientific, biomedical, multilingual corpora).
In-context retrieval strategies for grounding large language models (LLMs).
Retrieval-Augmented Generation (RAG) optimization techniques.
2. Representation Learning for Search
Pretraining and fine-tuning of text, image, and multimodal encoders for semantic similarity.
Contrastive learning and mutual information maximization.
Evaluation of learned embeddings for clustering, nearest-neighbor search, and knowledge transfer.
Cross-modal representation learning (image-text, code-docstring, audio-text).
3. Advances in Ranking and Reranking Models
Learning-to-Rank (LTR) frameworks.
Differentiable ranking approximations and unbiased learning-to-rank from click data.
Sparse vs. dense retrieval trade-offs and hybrid reranking pipelines.
Fine-grained ranking in recommendation and personalized search.
4. Multimodal and Multilingual Search
Retrieval models that generalize across modalities (e.g., image→text, table→text).
Query reformulation and disambiguation in multilingual retrieval scenarios.
Multilingual dense retrieval benchmarks and transfer learning across languages.
Language-agnostic embedding spaces via knowledge distillation and adversarial learning.
5. Recommendation Systems and Personalization
Sequential, graph-based, and session-aware recommenders.
Large-scale implicit feedback modeling with user/item embeddings.
Causal modeling in recommendations (counterfactual reasoning, uplift modeling).
Representation learning for user behavior and item metadata integration.
Late and Early fusion methods.
6. Efficient and Scalable Retrieval
Vector search algorithms (e.g., ScaNN, HNSW) for billion-scale corpora.
Memory-efficient approximate nearest neighbor (ANN) and Product Quantization techniques.
Knowledge distillation for retrieval model compression.
Index-aware training and optimization and latency-recall trade-offs in production settings.
7. Evaluation, Robustness, and Fairness
Robustness of retrieval and ranking models under distributional shift.
Evaluation metrics for semantic retrieval beyond precision/recall (e.g., nDCG, MRR, etc.).
Fairness, diversity, and explainability in ranking and recommendation.
Behavioral testing and model auditing for retrievers and recommenders.
And much more.....