This project studies how graph neural networks (GNNs) can be pre-trained on source code. It is an umbrella project consisting of several parts:
- A tool for mining graph representations from source code in different languages.
- Implementation of GNNs for various ML4SE tasks and pre-training objectives. We implemented and evaluated 8 types of GNNs based on the PyTorch-Geometric library with their scaling in mind.
- Building a framework/configurable pipeline for convenient experimentation with ML4SE tasks. The framework is already available.
- Suggesting new improvements to the GNN architecture and training objectives.