On tab "Saturated HLP Set", we listed the 80 HLP we derived from dataset 1. Column "Category" lists the type of each pattern. Namely, LEX represents lexical pattern, STR represents structural pattern, SEM represents semantic pattern, and PRF represents profiling pattern. Column "Name" is a descriptive name we give to each pattern. Column "Definition" defines the detailed content in each pattern.
The following tabs recorded the intermediate results of pattern derivation iterations, numbered from Iter. 1 to Iter. 13. It shows how each pattern is derived from dataset 1.
This is the sentence-level manual tagging result of Dataset-1 (for HLP derivation), Dataset-2 (for homologous evaluation), and Dataset-3 (for heterologous evaluation).
The first column, "Key", shows the ID of a Jira issue report. The second column, "Number", is the order of the sentences in each issue report. The third column shows the content in the sentence. The fourth column, "PerfTag" is the manual tagging decision---"no" means not related to performance, and "yes" means related to performance. The fifth column ” Reason“ records the determining information for a sentence to be tagged as "yes".
These two spreadsheets record the issue-level manual tagging of Dataset-2 (for homologous evaluation) and Dataset-3 (for heterologous evaluation). Note that Dataset-1 are all performance issues, tagged by the developers. On each tab, the first column, "Key", records the ID of an issue in Jira tracking system. The second column records the text of the issue description report. The third column records manual tag of the issue is performance-related, or not. The fourth column records the comments that explain why the issue should be tagged as performance-related.
NB+HLP is the winner of sentence tagging in the heterologous evaluation.
LR+HLP is the winner of sentence tagging in the heterologous evaluation.
SVM+HLP is the winner of sentence tagging in the heterologous evaluation.
DT+HLP is the winner of sentence tagging in the heterologous evaluation.
RF+HLP is the winner of sentence tagging in the heterologous evaluation.
XGB+HLP is the winner of sentence tagging in the heterologous evaluation.
Deep learning models used with HLP are the winners compared to classic deep learning models used with classic NLP features.
CNN+HLP , RNN+HLP , DT+HLP , and RF+HLP are the winners compared to the two keyword matching methods.