Since the MC problem has not been studied systematically in Python language in existing work, we empirically study the MC issues from GitHub and Stack Overflow and classify them into three patterns. We then used ModuleGuard to evaluate all 4.2 million PyPI packages and 3,711 high-star projects collected from GitHub for the presence
of MCs and their potential impacts. In summary, we propose the following research questions:
RQ1 (Issue Study). What are the common types of module conflict issues? What potential threats might they have?
RQ2 (PyPI Packages). How many of all PyPI packages have MC effects?
RQ3 (GitHub Projects). How many popular projects on GitHub are affected by MC, and what are their characteristics?
We collected 97 MC issues in total and we search them in two steps.
First, we combined two sets of keywords—(module OR name) AND (clash OR conflict) to search for MC issues on GitHub and added is:issue and language:python options. Since Github can only show the first 100 pages of search results for each combination search result, we obtained the first 100 pages of search results in order of best match and collected 4,000 issues.
Second, for the 4,000 issues, the three co-authors manually reviewed the descriptions and bug reports in the issues and finally filtered out 55 issues that were strongly related to MC issues. We also notice the fact that some maintainers or reporters would cite related issues in their comments. As a result, we searched for other issues mentioned in these 55 issues using the snowballing technique and checked them manually. Finally, we collected 78 MC issues from GitHub. What’s more, we applied a similar search to Stack Overflow and collected 19 issues from it.
The keyword "Python module name (clash OR conflict)" was used to search on StackOverflow. We manually review the top 200 most relevant issues that include answers. Ultimately, a total of 19 issues related to MC are collected from StackOverflow.
We have carefully analyzed and studied the problems reported in these issues and classified them into the following three categories. In the course of our analysis, we found that although the developers propose the conflicts or problems caused by packages to packages, these problems were actually caused by module conflicts.
Module-to-Lib conflict.
Module-to-TPL conflict.
Module-in-Dep conflict.
And their presentation can be misleading, e.g., the title of the issue in the Module-to-TPL example given below mentions dependency, but it's actually a module-to-TPL type of conflict. The module-in-dep conflict example is titled package conflict, which is actually a module overwriting caused by a module-in-dependency conflict during installation.
The package python-hgijson@1.5.0 has a json module, and it conflicts with the standard library json module, so the package has module-to-Lib conflict.
The python-slugify@8.0.0 and the awesome-slugify@1.6.5 packages both have a slugify conflicting module and they have a conflict with each other if they installed together, so the two packages have a module-to-TPL conflict.
The package emoca@1.0 has opencv-python-headless@4.5.5 and opencv-python@4.5.5 in its dependency graph, and they both have a cv2 module, so the root package emoca@1.0 has a module-in-Dep conflict.
Modules overwriting.
Installation: During installation, If two modules have the same relative module path, they will conflict and may cause module overwriting issues. What's more, Windows systems are case insensitive, causing modules with different names also to conflict.
Upgrade: The pip's update process is as follows: first, the package's dependencies are installed (where module overwrites may occur), then the old version of the package is uninstalled (possibly uninstalling newly installed modules that have already been overwritten), and finally the new version of the package is installed.
Importing: Due to module conflicts during installation and update, modules that are overwritten and replaced will cause import replaced modules. In Windows environment, the module name has been changed and the module cannot be imported correctly.
Importing confusion.
For module-to-Lib, since the standard library modules are stored separately, it is not possible to overwrite them when downloading the package. But modules with the same name as the standard library can confuse the interpreter and cause it to import the wrong module.
For module-to-TPL conflicts, When conflicting modules are downloaded and located in different locations, Python searches for modules in the order of the paths in sys.path and stops at the first match it finds.