BunnyRAG: graph-aware retrieval and reproducible evaluation workflow

Problem

Engineering queries often span multiple domains, but standard retrieval can over-focus on one discipline and miss relevant adjacent context.

For a Boskalis project at SWI 2026, I developed BunnyRAG, a two-step retrieval method designed to improve cross-community coverage without giving up relevance. In the figure below, farther right means higher relevance and farther up means broader community coverage.

Each point is one run (500-node synthetic graph; 30 queries). X = semantic relevance, Y = cross-community coverage (entropy).

Synthetic benchmarking and robustness testing

Approach

I implemented a two-stage retrieval method that combines:

semantic similarity, and
a global graph-structure signal (effective resistance / conductance) that encourages selecting nodes spanning multiple graph communities.

Graph communities serve as a proxy for distinct disciplines.

Evaluation design

To understand behavior under realistic uncertainty, I evaluated BunnyRAG across three synthetic regimes:

Seed-community regime: whether seed nodes belong to the same community (representing a single-discipline starting point) versus a more mixed seeding strategy.
Edge-weight regime (stochastic): uniformly random edge weights VS stochastic noise of vector space distance VS query-aware edge weights

Results and takeaways

The global effective-resistance signal provided stronger cross-community coverage than expected.
Contrary to the initial hypothesis that a semantic punishment is necessary to force “hops” across communities, experiments indicated that rewarding semantic similarity could still preserve strong community coverage (high entropy) across several regimes.

Reproducible Python codebase and workflow improvements

Open Interactive BunnyRAG Demo

My role

Designed the retrieval method, built the synthetic benchmarking regimes, ran the evaluation pipeline, and refactored the prototype into a reproducible Python workflow.

Goal

Turn a one-week protype into a workflow that is easier to rerun, inspect, and extend.

What I improved

Consolidated core components (retrieval logic, graph routines, evaluation scripts).
Improved reproducibility and maintainability through Git-based workflow, clearer run paths, and documentation aimed at new contributors.
Added test scaffolding and checks to support safer iteration

Tools

Python; Git

Constraints

Evaluation used synthetic graphs due to data and IP constraints; the write-up focuses on algorithmic behavior and robustness rather than proprietary datasets.

View GitHub repo

Context & attribution

This work originated as a one-week study during SWI 2026 on a problem posed by Boskalis. The initial concept and experimental direction were developed during SWI in collaboration with the team. I led the subsequent work: designing the synthetic benchmarking regimes, running the evaluation pipeline, analyzing results, and refactoring the prototype into a reproducible Python codebase.

Page updated

Google Sites

Report abuse