Added adopter to test llms for real-word-dataset : ManyType4Py #9

rashidabhar · 2025-04-29T16:35:24Z

This merge request includes following

Updating current llm adopter to allow masked based prompting
Real-world-llm adopter to run data preprocessing, fine_tuning and model runners

…models.yaml

… based prompt id

merge main to real-world-benchmark

…/TypeEvalPy into real-world-benchmark-llms

Copilot

Pull Request Overview

This PR adds support for a new LLM adopter for processing the ManyTypes4Py real-world dataset. Key changes include:

A new dataset preprocessing module that performs type hint stripping, file categorization, and JSON updates.
A new code annotator module that masks type annotations using LibCST.
Updates to runners, prompt templates, and result translation logic for LLM inference, alongside documentation and configuration improvements.

Reviewed Changes

Copilot reviewed 36 out of 41 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/target_tools/real-world-llms/src/dataset-preprocessing/prepare_dataset.py	Introduces dataset preprocessing and JSON update logic with type-cleaning and split categorization.
src/target_tools/real-world-llms/src/code_annotator.py	Implements a CST transformer to mask type annotations in function definitions and assignments.
src/target_tools/real-world-llms/models_info.md	Adds a comprehensive list of supported LLM models.
src/target_tools/real-world-llms/README.md	Updates instructions for dataset preprocessing and model inference.
src/target_tools/llms/src/utils.py	Adds handling for masked code-based prompt generation.
src/target_tools/llms/src/runner.py	Incorporates a new JSON creation function for code files and integrates result translation.
src/target_tools/llms/src/result_translator.py	Provides functionality to translate raw code outputs into JSON-formatted type annotations.
src/target_tools/llms/src/prompts.py	Introduces new prompt templates for masked code based tasks.
src/target_tools/hityperdl/*	Includes updates for type normalization and runner adjustments for HiTyper integration.
src/runner_class.py	Updates Docker container creation with an Nvidia runtime specification.
src/result_analyzer/*	Adds new analysis utilities for large-scale result matching and evaluation.

Files not reviewed (5)

src/target_tools/hityper/Dockerfile: Language not supported
src/target_tools/hityperdl/Dockerfile.cuda: Language not supported
src/target_tools/hityperdl/requirements.txt: Language not supported
src/target_tools/real-world-llms/Dockerfile: Language not supported
src/target_tools/real-world-llms/requirements.txt: Language not supported

Copilot · 2025-04-29T16:36:15Z

src/target_tools/real-world-llms/src/dataset-preprocessing/prepare_dataset.py

+    - Removing JSON file if all files are missing
+    """
+    global non_deleted_json_count
+    global libcst_fail_count


The global variable 'libcst_fail_count' is used without being initialized in the module; consider initializing it at the module level to ensure consistent behavior.

Copilot · 2025-04-29T16:36:16Z

src/target_tools/real-world-llms/README.md

+
+```bash
+python3.10 runner.py \
+--bechmark_path /mnt/hf_cache/rashida_manytype4py/many-types-4-py-dataset/rw-benchmark \


The flag '--bechmark_path' appears to be misspelled; consider renaming it to '--benchmark_path' to avoid confusion.

Suggested change

--bechmark_path /mnt/hf_cache/rashida_manytype4py/many-types-4-py-dataset/rw-benchmark \

--benchmark_path /mnt/hf_cache/rashida_manytype4py/many-types-4-py-dataset/rw-benchmark \

Copilot · 2025-04-29T16:36:16Z

src/target_tools/hityperdl/src/translator.py

+    additional_type_mappings = {
+        "integer": "int",
+        "string": "str",
+        "dictonary": "dict",


The key 'dictonary' appears to be a typo; consider correcting it to 'dictionary' for clarity and consistency.

Suggested change

"dictonary": "dict",

"dictionary": "dict",

ashwinprasadme and others added 30 commits October 18, 2023 09:54

Update README.md

e7baee9

Cleanup

96a5af7

Add Leaderboard

d9b37a1

Dockerfile, Leaderboard generation, Minor fixes

03a6025

Minor README

3678ed7

Leaderboard update

aedffec

Update Readme

1197cad

ollama support initial working script

44eecc7

Updated prompts and refactoring

c770528

OllamaRunner support with multiple log handling

5ba76fc

Moving to ChatOpenAI, handle multi-tool results

9e897cc

Handling ollama server status

5c0b082

Adding timeouts and errors

fc54bfc

WIP Prompt termination

8f8808f

Multiprocess terminate

1f6e0c8

Questions based prompt and response translator

6ccf3dc

LLMs | Training set and jsonl for fine tuning

043cd1a

Fixed results scripts

a4ec08e

Minor path fix

cb1055c

Translation fixes

67eafaf

Adding another questions based prompt

30817d6

Minor fix

3deda69

Translator and added prompts

a026677

Prompt finalized

9cd7a26

Minor fix questions

d2b26be

Refactor finetuning folders

967989d

Fix P&R Calculations in edge cases

26f86e8

Fix P&R Minor

dceec00

Finetuning dataset v1

45707ef

Fine tuning datasets

ec08c67

rashidabhar and others added 27 commits October 29, 2024 17:06

updated annotation script to force already annotated types

bf65d23

translate annotated file into typeeval gt format

3b8cc55

added test script to check the annotations

4fc30b8

update test annotation script

48407d8

update annotator script for keywords only param

49e0032

use masked source code prompt to get the annotated code

6ef201b

updated annotation script and result translator

90933c7

updated prompt template

a209480

update prompt for masking language modeling

e22e88d

updated prompts design , code annotator script and added new LLMs to …

71e3ef6

…models.yaml

updated get prompt method to work on question based as well as masked…

2fa67c3

… based prompt id

setup runner for real world dataset

60ed1be

updated logging for memory caching and batch processing

1bb5cb7

updated pipeline for real-world llm

d83ab32

Result analysis for large files

891f4d7

Add indexing

c832373

Merge pull request #8 from secure-software-engineering/main

ab4788f

merge main to real-world-benchmark

updated pipeline

fee9211

updated runners for real world dataset

b4ca5b2

updated latest version

bf798ca

updated runners

95c847e

added datapreprocessing steps

cfdc8aa

documented code and added readme.md

43c7c72

added readme

15e4781

added finetuning file

378a0d7

Merge branch 'main' of https://github.com/secure-software-engineering…

1b22072

…/TypeEvalPy into real-world-benchmark-llms

delete unnecessary files

903e8bf

rashidabhar requested review from ashwinprasadme and Copilot April 29, 2025 16:35

Copilot AI reviewed Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added adopter to test llms for real-word-dataset : ManyType4Py #9

Added adopter to test llms for real-word-dataset : ManyType4Py #9

rashidabhar commented Apr 29, 2025

Copilot AI left a comment

Copilot AI Apr 29, 2025

Copilot AI Apr 29, 2025

Copilot AI Apr 29, 2025

	--bechmark_path /mnt/hf_cache/rashida_manytype4py/many-types-4-py-dataset/rw-benchmark \
	--benchmark_path /mnt/hf_cache/rashida_manytype4py/many-types-4-py-dataset/rw-benchmark \

Added adopter to test llms for real-word-dataset : ManyType4Py #9

Are you sure you want to change the base?

Added adopter to test llms for real-word-dataset : ManyType4Py #9

Conversation

rashidabhar commented Apr 29, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Copilot AI Apr 29, 2025

Choose a reason for hiding this comment

Copilot AI Apr 29, 2025

Choose a reason for hiding this comment

Copilot AI Apr 29, 2025

Choose a reason for hiding this comment