Information Retrieval in Software Engineering

(IRSE)

@ FIRE 2022

9th-13th December, 2022

Task Description

Comment Classification: A binary classification task to classify source code comments as Useful or Not Useful for a given comment and associated code pair as input.

Input: A code comment with associated lines of code (written in C)

Output: A label (Useful or Not Useful) in helping developers comprehend the associated code

The reference paper can be accessed here (pdf can be downloaded in this link). The master github repository with source codes can be referred for gaining an overall idea of the type of architectures that can be used to solve the problem.

Important Dates

30th June – Training data release -- Download Link

22nd July – Test data release Download Link

1st August – Run submission deadline

20th August – Results declared

15th September – Working notes due (4 page 2 column)

15th October – Camera ready copies of working notes

Use command gpg <.gpg file> to decrypt and mail majumdar.srijoni@gmail.com or bandyopadhyay.ayan@gmail.com for the password.

Dataset

A dataset of code and comment pairs, along with tools for comment analysis, will be provided to participants. A set of 9000 comments (from Github) with comment text, surrounding code snippets, and a label that specifies whether the comment is useful or not (a sample shown below)

The development dataset will contain 8,000 rows of comment text, surrounding code snippets, and labels (Useful and Not useful). Date of Release: 1st June 2022.
The test dataset will contain 1,000 rows of comment text, surrounding code snippets, and labels (Useful and Not useful). Date of Release: 1st July 2022

Ground Truth Generation Process:

For every comment, a label (Useful or Not Useful) has been generated by a team of 14 annotators. Every comment has been annotated by 2 annotators with a kappa (κ) value of 0.734 (Cohen’s metric [5]). The annotation process has been supervised through weekly meetings and brainstorming sessions and peer review. Out of the total 16,000 comments, 2,285 comments were annotated by every individual annotator. A total of 156 man-hours were required to complete the annotation process.

Submission Format

Participants will need to submit their runs with following .csv format:

Each file should have three (3) comma-separated columns in a line.

Column-1 = comment text

Column-2 = surrounding code snippet

Column-3 = predicted class label (If you are unable to tag any class for any test data point, then put "-1")

Furthermore, every .csv file will contain a description containing the details of the architecture and hyper-parameters of the specific run.

Sample Data Format.csv

Each .csv will have 1000 rows corresponding to the 1000 comments and code pairs released as part of the Test Data.

The file name should be as follows:

<team_name>_<run_identifier>.csv. For example, TeamA_run1.csv, where TeamA is team name and run1 is the identifier for a specific run. Please do use "_" (under score) for other purposes. Also do not use any blank space or tab in the file name.

Evaluation Metrics

Evaluation will be performed based on the F1 score and Accuracy metrics. The top F1 scores will be published on the leader board.