Joint Autoregressive and Graph Models for Software and Developer Social Networks

We introduce two new problems to study the collection of packages, their developers, dependencies among them and bug reports. The first is to identify packages that are most likely to be troubled by bugs in the immediate future, thereby demanding the greatest attention. The second is to recommend developers to packages for the next development cycle. Simple autoregression can be applied to historical data for both problems, but we propose a novel method to integrate network-derived features and demonstrate that our method brings additional benefits. Apart from formalizing these problems and proposing new baseline approaches, we prepare and contribute a substantial dataset connecting multiple attributes built from the long-term history of 20 releases of Ubuntu, growing to over 25,000 packages with their dependency links, maintained by over 3,800 developers, with over 280 thousand bug reports.

Two Software Engineering Problems

Unlike traditional social network tasks of centrality/prestige computation, influence or cascade prediction, the social network of software comes with novel tasks having strong motivation and relevance in the software management community.

Bug urgency ranking:

The task is to rank packages that are likely to be most afflicted by bugs in the immediate future. Since there is no central command, the developer community has to autonomously discover the trouble spots. There are no existing mechanisms to automatically identify bugs and triage them quickly.

Developer recommendation:

For each package, the task is to propose the developers best suited to contribute in the immediate future. Compared to soft-ware corporations with top-down management, the developer community shows high levels of churn, making such prediction difficult. At present, there exists no automated system to recommend developers for a given package. Various characteristics of the software writers’ community present additional challenges.

Key results

Developer Recommendation

* Majority: For each package, we rank the developers based on the number of times they feature in the last K (K = 1, 5, all) distributions. In the autoregressive case, for each source package, a developer present the highest number of times in last K distributions receives better rank and so on. In case of the autoregressive + dependency approach, for each source package, we extend our candidate developer set with the developers of its in(out)-neighbors in previous K distributions. Further, we rank the developers of this set based on the number of times they worked on the target source package in last K distributions.

** Upper Bound: We compute upper bound using the two policies for creating candidate set as discussed in the paper i.e., (i) main list and (ii) main list + dependency network. If the developer of a source package at test distro is present in the candidate developer set then the rank of the developer is set to 1.


Contributors

The following people contributed to this project:

  • Rima Hazra

  • Hardik Aggarwal

  • Dr. Pawan Goyal

  • Dr. Animesh Mukherjee

  • Prof. Soumen Chakrabarti

Cite Us

@misc{hazra2021joint,

title={Joint Autoregressive and Graph Models for Software and Developer Social Networks},

author={Rima Hazra and Hardik Aggarwal and Pawan Goyal and Animesh Mukherjee and Soumen Chakrabarti},

year={2021},

eprint={2101.08729},

archivePrefix={arXiv},

primaryClass={cs.IR}

}