Applied Research Division

We transform research into product value in a changing development landscape.

This page presents an overview of our teams in the Applied Research (AR) Division and our current projects.

Collaborate with us

Our research interests

Here's what we're working on right now. You can find further details, including our ongoing projects, below.

Agent-environment interaction

Agent reliability

Agent-environment interaction

Codebase interaction

agent-repo

Modern codebases are complex and full of implicit structure, making them hard for AI agents to navigate efficiently. As a result, current systems often waste compute and produce poorly grounded changes.

This research focuses on improving how agents explore code before acting. By introducing structured representations and targeted search capabilities, we aim to separate understanding from modification – leading to more accurate, efficient, and project-aware AI developer tools.

Ongoing projects:

Local embedding models: We train code-embedding models to enhance local search functionality.
Context-aware code retrieval: We boost code generation by using context-aware embeddings.
Search subagent: We reduce the cost of SWE agents by encapsulating search in a separate subagent.

Tooling Interaction

agent-tools

Real-world software development depends on tools like IDEs, terminals, and CI/CD systems. Agents that only generate code without using these tools remain limited, but naïve tool use is often brittle and inefficient.

This research focuses on making tool use more reliable and scalable for AI agents. By improving how agents select, execute, and learn from tool interactions, we aim to enable more capable and practical AI systems for software development.

Ongoing projects:

IntelliJ Model Context Protocol (MCP) tools selection: We enable agentic frameworks to effectively use IDE tools.
Execution traces for Junie: We optimize Junie with runtime traces.

Multi-agent systems

agent-agent

Multi-agent systems focus on how multiple specialized AI agents can reliably work together to solve complex, multi-step tasks. Modern single-agent setups often hit limits on context and specialization, while multi-agent systems introduce decomposition and parallelism but are currently fragile, expensive, and poorly understood.

This research investigates stable architectures, coordination protocols, and division-of-labor strategies that make multi-agent systems more predictable and efficient. By learning from real-world implementations and extracting reusable patterns, we aim to inform orchestration tooling and provide best-practice guidance for organizations adopting multi-agent architectures.

Ongoing projects:

Best practices in multi-agentic systems: We gather common agentic development practices from real-world repositories.

Agent reliability

Evaluation and benchmarks

Many teams struggle to find and apply trustworthy evaluation. This is especially the case when results are noisy, benchmarks are hard to maintain, and risks like data leakage or contamination are easy to overlook.

This research builds scalable, realistic evaluation systems through benchmark mining and generation, careful dataset curation, and techniques that reduce leakage and overfitting. Our aim is to make it much easier for both internal teams and customers to create and maintain high-quality benchmarks, speeding up evaluation workflows and improving the reliability of AI-assisted coding tools.

Ongoing projects:

BenchRoom: We have a pipeline for automatically collecting SWE-Bench-like benchmarks on scale.
Anonymization for SWE benchmarks: We apply metamorphic testing to reduce data leakage in benchmarks.

Observability and debugging

When agents fail, their reasoning is often opaque, execution is spread across many steps and systems, and errors may only surface late in the process. All this makes debugging and agent improvement both slow and unreliable.

This research builds tools to capture, visualize, and analyze agent behavior, including detailed execution traces, anomaly detection, and methods for selecting high-quality traces for training. Our goal is to give both IDE users and internal agent developers better observability, so they can iterate on AI agents more quickly and with greater confidence.

Ongoing projects:

Trace selection for LLM post-training: We build a tool to identify the agent traces that improve the quality of post-trained agentic models.
Anomaly detection for AI agents: We analyze agent execution traces to detect anomalies in agent execution.

Robustness and adaptation

Today’s agents are fragile: small prompt tweaks, tool updates, or shifts in context can unexpectedly break behavior. On top of that, manual prompt engineering does not scale.

This research investigates automated ways to improve robustness, such as self-optimization, adaptive prompting, and dynamic system configuration. Our goal is to simplify prompt engineering, automate choices like agent topology and tool descriptions, and streamline debugging of unexpected agent behavior, ultimately giving both product teams and internal agent developers more reliable, maintainable AI systems.

Ongoing project:

Auto-optimizing agents: We use evolutionary algorithms to optimize prompts and agents.

Correctness and verification

Code generation only delivers value when outputs meet real specifications, yet current agents often produce solutions that look plausible but are wrong. In addition, verification can be a major bottleneck.

This research develops ways to enforce correctness through testing, formal checks, and other validation strategies, and explores how to weave these checks directly into the generation loop. The aim is to strengthen automated test generation in IDEs and make benchmarks like SWE-Bench more robust by expanding and tightening their test suites.

Key Projects:

Test generation agent: We implement an agent and compare it with general-purpose SWE agents.
AI-based test repair: We fix the outdated tests that led to build fails on CI/CD.

Back to research interests

Collaborate with us

We are open to collaborating with other researchers from both academia and industry.

If you’d be interested in working with us on any of the above projects, please reach out!