Aim: A multi-agent system that uses lightweight, open‑source LLMs to process research papers, interactively answer questions, generate hypotheses, and design research proposals.
Potential impact: This system streamlines literature review by extracting key insights and enabling targeted, interactive querying. It enhances rigor through structured multi‑agent debates and generates novel hypotheses and research plans grounded in evidence.
Methodology: It uses a quantized open-source Mistral‑7B (Mistral AI) model to translate natural language into structured queries. It runs a hybrid search first scanning local metadata, then performing live, date‑filtered queries to NCBI GEO to retrieve timely, relevant datasets while minimizing API load.
The system combines Python for orchestration and R for bioinformatics analysis, automatically choosing DESeq2 or Limma based on dataset type. It runs differential expression and KEGG pathway analysis, with GO support in development. Results include volcano plots, pathway maps, and LLM‑driven biological interpretations with suggested research directions.
Current progress: The full analysis pipeline is live and validated, including DE and pathway analysis. Hybrid search reliably returns fresh datasets. KEGG pathway enrichment and visual outputs are generated automatically, with LLM‑based biological interpretation aligned to expert expectations. Implementation of the natural language based querying is underway.
Next Steps: Expand pathway analysis with GO support, enhance reasoning depth, and improve performance across large datasets. Begin pilot testing with genomics researchers to evaluate usability and impact on research workflows.
LLM-based interpretation:
The differential expression analysis results suggest that there are 7 significantly upregulated genes and 7 significantly downregulated genes between the treated and control groups in the data. Further validation and analysis would be required.