Machine Learning Methods in Software Engineering

Large-scale pre-training of graph neural networks for ML4SE tasks

This project studies how graph neural networks (GNNs) can be pre-trained on source code. It is an umbrella project consisting of several parts:

  • A tool for mining graph representations from source code in different languages.
  • Implementation of GNNs for various ML4SE tasks and pre-training objectives. We implemented and evaluated 8 types of GNNs based on the PyTorch-Geometric library with their scaling in mind.
  • Building a framework/configurable pipeline for convenient experimentation with ML4SE tasks. The framework is already available.
  • Suggesting new improvements to the GNN architecture and training objectives.

Participants

Egor Bogomolov
Olga Petrova
Mikhail Evtikheev