Information Retrieval in Software Engineering
(IRSE)
@ FIRE 2022
9th-13th December, 2022
Task Description
Comment Classification: A binary classification task to classify source code comments as Useful or Not Useful for a given comment and associated code pair as input.
Input: A code comment with associated lines of code (written in C)
Output: A label (Useful or Not Useful) in helping developers comprehend the associated code
The reference paper can be accessed here (pdf can be downloaded in this link). The master github repository with source codes can be referred for gaining an overall idea of the type of architectures that can be used to solve the problem.
Important Dates
30th June – Training data release -- Download Link
22nd July – Test data release Download Link
1st August – Run submission deadline
20th August – Results declared
15th September – Working notes due (4 page 2 column)
15th October – Camera ready copies of working notes
Use command gpg <.gpg file> to decrypt and mail majumdar.srijoni@gmail.com or bandyopadhyay.ayan@gmail.com for the password.
Dataset
A dataset of code and comment pairs, along with tools for comment analysis, will be provided to participants. A set of 9000 comments (from Github) with comment text, surrounding code snippets, and a label that specifies whether the comment is useful or not (a sample shown below)
The development dataset will contain 8,000 rows of comment text, surrounding code snippets, and labels (Useful and Not useful). Date of Release: 1st June 2022.
The test dataset will contain 1,000 rows of comment text, surrounding code snippets, and labels (Useful and Not useful). Date of Release: 1st July 2022
Ground Truth Generation Process:
For every comment, a label (Useful or Not Useful) has been generated by a team of 14 annotators. Every comment has been annotated by 2 annotators with a kappa (κ) value of 0.734 (Cohen’s metric [5]). The annotation process has been supervised through weekly meetings and brainstorming sessions and peer review. Out of the total 16,000 comments, 2,285 comments were annotated by every individual annotator. A total of 156 man-hours were required to complete the annotation process.
Submission Format
Participants will need to submit their runs with following .csv format:
Each file should have three (3) comma-separated columns in a line.
Column-1 = comment text
Column-2 = surrounding code snippet
Column-3 = predicted class label (If you are unable to tag any class for any test data point, then put "-1")
Furthermore, every .csv file will contain a description containing the details of the architecture and hyper-parameters of the specific run.
Each .csv will have 1000 rows corresponding to the 1000 comments and code pairs released as part of the Test Data.
The file name should be as follows:
<team_name>_<run_identifier>.csv. For example, TeamA_run1.csv, where TeamA is team name and run1 is the identifier for a specific run. Please do use "_" (under score) for other purposes. Also do not use any blank space or tab in the file name.
Evaluation Metrics
Evaluation will be performed based on the F1 score and Accuracy metrics. The top F1 scores will be published on the leader board.
Organizers
Srijoni Majumdar
IIT Kharagpur
TCG CREST
Ayan Bandyopdhyay
TCG CREST
Samiran Chattopadhyay
Jadavpur University
TCG CREST
Partha Pratim Das
IIT Kharagpur
Paul D Clough
Peak Indicators
Sheffield University
Prasenjit Majumder
DA-IICT Gandhinagar
TCG CREST
Contact
E-mail: irse@googlegroups.com