Project Overview
Client: Dr. Ali Jannesari, Advisor: Arushi Sharma
We have extracted feature representations from code-trained large language models (LLMs) with the goal of
explaining and interpreting these representations. By applying clustering algorithms, we have obtained clusters
of latent concepts that represent various patterns learned by the models from the code.
This project focuses on auto-labeling code datasets to capture different code properties. We will use Abstract
Syntax Tree (AST) tools like Tree-sitter, regular expressions, and LLM-generated labels to automatically annotate
these datasets. Once the datasets are labeled, we will evaluate the concepts learned by the LLMs by measuring their
alignment with the auto-labeled datasets. This evaluation will help determine how well the machine-learned concepts
correspond to human-defined code properties, enhancing the interpretability of the models.
Team Members
Manjul Balayar
Engineer
Software Engineering
Rayne Wilde
Engineer
Software Engineering/Data Science
Sam Frost
Engineer
Software Engineering
Akhilesh Nevatia
Engineer
Software Engineering
Ethan Rogers
Engineer
Electrical Engineering
SE 4920
Final Deliverables
Can’t see the video? Watch the demo on YouTube .
Design DocumentIRP Presentation
Poster
Status Reports
Report 6Report 5
Report 4
Report 3
Report 2
Report 1
SE 4910
Weekly Reports
Report 10Report 9
Report 8
Report 7
Report 6
Report 5
Report 4
Report 3
Report 2
Report 1
Lightning Talks
Lightning Talk 8: EthicsLightning Talk 7: Prototyping
Lightning Talk 6: Design Check-In
Lightning Talk 5: Detailed Design
Lightning Talk 4: Project Planning
Lightning Talk 3: User Needs and Requirements
Lightning Talk 2: Problem and Users
Lightning Talk 1: Product Research