Beyond Textual
Repository Exploration
Repository Exploration
Dual-Modal Structural Reasoning for Agentic Issue Resolution
Dual-Modal Structural Reasoning for Agentic Issue Resolution
DualView brings visual reasoning into repository exploration. It exposes code structure as four complementary graph views — each returned as a synchronized image and text — so issue-resolution agents can perceive dependencies directly instead of reconstructing them from fragmented textual observations.
Existing agents explore repositories through a sequence of textual tool invocations, gradually piecing together structural relationships from fragmented observations. Although repository dependencies are inherently graph-structured, they are typically presented as linear text, leaving the underlying topology implicit. Representing these dependencies directly as graph-structured observations instead makes connectivity and multi-hop relationships immediately perceptible while preserving the semantic information required for subsequent source-level inspection.
Limitation ❶: Repository exploration relies on fragmented textual observations.
Limitation ❷: Graph-structured repository information is still presented as text
DualView is implemented as an agent-agnostic structural reasoning layer that sits on top of an existing issue-resolution agent. Rather than reconstructing repository structure from a sequence of textual observations, the agent combines dual-modal structural observations with its native agent tools to navigate the codebase, inspect implementation details, and progressively localize and repair the bug.
DualView represents repository structure through four complementary graph views — Module Coupling Graph (MCG), Function Call Graph (FCG), Class Hierarchy Graph (CHG), and Program Dependence Graph (PDG) — and exposes them through a queryable interface with visual and textual responses. These views capture four recurring classes of structural relationships encountered during repository exploration: module coupling, function invocation, class inheritance, and statement-level data/control dependence. Together, the four views provide a coarse-to-fine exploration hierarchy: agents first identify relevant subsystems, then analyze interprocedural interactions, object-oriented relationships, and finally fine-grained dependencies around candidate implementations.
Modern MLLMs can jointly reason over visual and textual inputs, yet the two modalities offer complementary strengths for structural reasoning. Rather than relying on a single representation, DualView exposes every queried repository graph through synchronized visual and textual observations. Both observations are derived from the same graph slice and therefore represent the same structural context. The visual modality emphasizes topology perception, whereas the textual modality provides the semantic grounding required for precise repository navigation and repair.
To make effective use of the multi-grained structural views and dual-modal observation, DualView introduces an adaptive structural reasoning layer inside the agent's original repair loop. Existing agents rely exclusively on native tool execution (e.g., source inspection, search) to focus on concrete source-level facts; DualView expands the agent's action space by exposing the four structural graph views to satisfy the need for repository structure. The agent continuously evaluates its current reasoning state by analyzing its gathered context, and switches between native tools and structural views accordingly. The reasoning process is therefore adaptive rather than fixed: an agent may begin with the MCG to identify a relevant subsystem, switch to native tools to inspect candidate code, invoke the FCG or CHG to refine structural hypotheses, and finally use the PDG for statement-level analysis.
Comparing with text-centric agent system
Comparing with textual graph based repository representation