At JetBrains, we are passionate about the way AI transforms software engineering.
The Machine Learning (ML) division at JetBrains Research explores ways to use ML techniques and agentic approaches to help developers and enhance software development processes. We aim to improve ML adoption for code by turning the latest academic advances into practical applications.
This page presents an overview of our research teams and collaboration opportunities.
The Code Modeling Research Team focuses on advancing model capabilities in understanding and producing code. We specialize in fine-tuning procedures applicable across a wide spectrum of models and tasks. Our recent projects include supervised fine-tuning for Kotlin, reinforcement learning (RL) fine-tuning using compiler feedback, and enhancing model contexts by leveraging project information. Additionally, we actively develop comprehensive benchmarks, for instance, for plot generation, Kotlin Q&A, and test-based evaluation — because every ML venture begins with robust benchmarking.
Our current flagship project is Project Adaptation — we are working on fine-tuning a model to generate more accurate and efficient code for a specific project. This project setup presents a unique challenge due to the limited amount of available data, but it also offers valuable opportunities, such as an existing CI/CD pipeline that can provide feedback on new generations. These factors shape our current focus areas: data synthesis and reinforcement learning approaches.
In Code Editing Research, we study how to make code models better at a variety of editing-related tasks, including reasoning through edits, better edits’ representation, and generation of synthetic editing data. We also explore broader ML questions, such as improving post-RL model performance, developing new optimization methods, and applying low-variance RL techniques to language modeling.
Diff-XYZ is a benchmark of 1,000 real-world code edits designed to isolate how different edit representations affect LLMs behavior. It enables controlled evaluation across three tasks — Apply, Anti-Apply, and Diff Generation — to show how well models understand and generate code edits in various formats.
In this project we address the loss of generation diversity caused by standard RL fine-tuning by deriving an objective for RL training that directly optimizes max@k metric, aligning training with Best-of-N inference. We provide an unbiased on-policy gradient estimator and an approximately unbiased off-policy version compatible with modern RL with verifiable rewards (RLVR) pipelines along with better performing baselines for them.
Our team's goal is to enable better decision-making for JetBrains AI agents. We have a wide variety of projects ranging from benchmarking to studying agents' behavior.
Environment setup is an inevitable part of any modern coding agent training or evaluation process. We developed a benchmark to measure how good different automated environment setup systems are. The benchmark is focused on hard cases that aren’t installed with a simple static script and comprise more than 300 repositories for Python and more than 600 repositories for JVM-based languages.
We believe that agents shouldn’t replace humans in the software engineering process, but help them automate the boring parts. To help measure this, we established a benchmark on how well agents work with the version control systems. It measures such abilities as conflict resolution and interactive rebasing.
We studied two popular context management strategies for agents: context compression and observation masking. Surprisingly, we found out that the simple observation masking is often-time performs on par with the more intricate strategy of summarizing the history. We also proposed the combination of these two strategies that delivers further savings.
With the emergence of code-fluent LLMs, programming practices are changing. At the same time, the environments need to change to provide an optimal human-AI experience (HAX) in the IDE.
Work in this direction includes prototyping AI functionality integration into existing programming environments and developers' workflows, ensuring intuitive and efficient experiences.
This research area involves exploring how programmers use and perceive AI assistants, identifying the challenges they face and the benefits these tools bring, to better align AI with the real-world needs of developers.
Our efforts in this field include identifying, evaluating, and adjusting critical aspects of AI assistants' output, from correctness to understandability, while also ensuring these tools truly support developers in their tasks.
ML techniques are constrained by quality and availability of domain-specific data. Our team is focused on creation and delivery of privacy-preserving solutions that lift data constraints, allowing for training high-quality models for our IDEs.
By shifting the paradigm from centralized to distributed the federated platform allows for model training on user data without that data leaving the users' devices, enabling deployment of user aligned ML-based features to our IDEs.
By employing mathematically proven methods of private data processing the models we train can learn generalities of user data but stay unable to create exact copies of it. That allows users to confidently contribute to improving IDE features knowing their individual data will remain protected by the strongest privacy standard available in machine learning today.