Since the MC problem has not been studied systematically in Python language in existing work, we empirically study the MC issues from GitHub and Stack Overflow and classify them into three patterns. We then used ModuleGuard to evaluate all 4.2 million PyPI packages and 3,711 high-star projects collected from GitHub for the presence
of MCs and their potential impacts. In summary, we propose the following research questions:
RQ1 (Issue Study). What are the common types of module conflict issues? What potential threats might they have?
RQ2 (PyPI Packages). How many of all PyPI packages have MC effects?
RQ3 (GitHub Projects). How many popular projects on GitHub are affected by MC, and what are their characteristics?
We select popular Python projects from GitHub. We collect the top 3,000 most-starred Python projects and 1,187 popular projects from awesome-Python. We merge and deduplicate the two datasets and obtain a total of 3,711 projects with 93,487 tags. We analyze their dependencies, resolve dependency graphs with \resolver, and detect module-in-Dep conflicts for them.
We detect 519 (13.93%) projects with 10,850 (11.61%) tags that have module overwriting threats. The results show that module-in-Dep conflicts are more prevalent in GitHub projects, as these projects tend to declare more dependencies than packages on PyPI. Although these conflicting modules may not affect the functionality of the program if the file contents are unchanged before and after overwriting, they can break the integrity of the package in the local environment and cause errors that are hard to debug. Moreover, there are 2,569 tags for 108 projects that may have functional errors, due to the difference in file contents before and after overwriting.
Of the 108 projects, 65 are the latest version, while 43 projects have fixed module-in-Dep issues in later versions. This means that the module conflict problem is latent, with an average of 23 historical versions affected. It is often only when a user encounters an error that the maintainer becomes aware of the problem and fixes it. We manually analyze the conflicting modules in these 65 projects and report 35 issues to the project developers, of which 11 projects replied and 12 fixed the MC problems. The others do not respond, but since they have the same conflicting modules as the confirmed issues, we can assume that they have a real impact.
We chose 35 reporting issues because the remaining 30 issues are due to the presence of `jupyter.py` files in both `jupyter` and `jupyter-core`. The file has been fixed in issue's PR. (This is done by making jupyter.py call the same main function.) However, it should be acknowledged that the problem of module conflicts still exists. And if the content is subsequently updated, the impact of the conflict is amplified.
We find that module conflicts occur more often in the AI field. This is because developers need to introduce one of the four opencv-python base packages when adding dependencies, along with other related AI projects. However, other related AI projects may also introduce other incompatible versions of the base package. These base packages are stated in the official documentation that they cannot coexist because they all use the module name cv2. This behavior is beyond the developer's control because they can only control direct dependencies, and the indirect dependencies are a black box. Such conflicts result in incompatibility between different AI projects when they are used together.
In addition, we find that some developers even include the same functional dependency in the direct dependency, and the two dependencies have module conflicts. Talking to the developers, they say that adding a dependency when they encountered an error could fix a strange error (which was actually caused by module overwriting). This means that developers tend to focus more on whether the program can run properly, and introduce functionally redundant dependencies, which not only increase the complexity of the project, but also increase the difficulty of building the project environment.
To make matters worse, project issues reveal that many developers are not aware of module conflicts. They often add or remove dependencies after getting an error report to keep the program working. Of the 12 latest tags that were fixed, 10 were fixed by removing redundant dependencies. Therefore, our work reveals the nature and potential impact of module conflicts, and helps them to recognize and correctly declare dependencies to mitigate conflicts during the debugging phase.