To investigate the complexity regarding the libraries dependencies, we compute the number of dependency relations between two libraries (i.e., libdeps).
According to the results in Table 3, we can see that 65.09% libraries (746,906 out of 1,147,558) depend on other libraries and 21.40% libraries (245,650 out of 1,147,558) are depended on by other libraries. Moreover, nearly half (49.23%, 564,897 out of 1,147,558) libraries only have dependencies while do not have dependents, which seems they are probably designed as standalone applications. Only a small portion (5.54%, 63,611 out of 1,147,558) of libraries have no dependencies and are only used by other libraries, and they are more likely to be components that are designed for further dependency calls.
Besides, as shown in Figure 3a, most libdeps related libraries only have limited dependencies or dependents, and only 13.07% of libraries (32,117 out of 245,650) that have dependents have over 8 dependents, and Figure 3b shows only 19.02% of the libraries (142,090 out of 746,906) that have dependencies have over 8 dependencies.
However, these 13.07% and 19.02% libraries contribute to most of the libdeps relations (90.17% and 60.72%, respectively), indicating their centrality in the whole NPM system.
Here we use reachability to calculate the transitive dependencies (i.e., by reasoning libdeps with multi-steps).
The distribution of the number of dependency for each library (i.e., the number of outgoing libdeps) is depicted as left, where the X-axis represents the number of (direct and indirect) dependency for library nodes and Y-axis represents the number of library nodes that own the corresponding number of dependencies.
We find that more more than half of libraries (544,211) have over 4096 dependencies.
The distribution of the number of dependents is depicted as left, where the X-axis represents the number of dependents for library nodes and Y-axis represents the number of library nodes that own the corresponding number of dependents.
In most cases, the number of libraries decreases as the rising of dependent size. A very small portion of libraries (0.38%, 4,370) out of 1,147,558 have extremely high usage, and we find that over 524,288 libraries depend on them via libdeps.
They are exactly the main reason why half of libraries have over 4,096 library dependencies, and these 4,370 libraries are probably able to affect almost half libraries in NPM ecosystem via dependencies. (Potential fragile points)
analysis based on libdeps (library level) is a only rough estimation for the complexity of dependencies since it only considers the relations between libraries instead of exact versions.
To obtain more accurate insights of dependencies in actual installations for each version, we compute the dependency trees for all versions (10,939,334 in total) using our dependency resolution algorithm with the latest time (2019-11-21), and measure the complexity of version dependency trees from 2 aspects: the number of nodes in the dependency trees and the inner complexity of the dependency relations.
The figure on the left shows the distribution of direct dependencies for each version, it seems that most of versions (7,371,257, 88% of those that have dependencies (8,332,238) ) have simple direct dependencies than are less than 16.
According to Figure on the left, where the X-axis represents the number of direct dependencies and Y-axis represents the average size of dependency trees with corresponding number of direct dependencies, More direct dependencies usually lead to larger dependency trees, while we find that the occupation of indirect dependencies drops as the increase of direct dependencies. This is probably because more dependency nodes are moved from lower levels into higher levels in the tree, resulting in the decrease of the occupation of transitive dependencies. But the indirect dependency still takes a huge portion in dependency tree. On average, each direct dependency introduces 22 indirect dependency nodes, which is far over the dependencies that developers can intervene (i.e., direct dependencies) during installation.
Motivating Example
From the dependency trees we have obtained, only 21% of them are spanning tree, which is the simplest graph with minimum edges to link all nodes. To capture the reasons for the complexity of dependency trees, we further define 3 structures that can increase the complexity of the graph by either violating spanning tree or adding more nodes.
Compatible dependencies, mean multiple nodes depend on a single node in dependency trees, e.g., A@1.0.0 and C@1.0.0 both depend on D@1.1.0 in Figure c.
Conflict dependencies, refer to different versions of the same library are installed for different dependencies, e.g., both D@1.1.0 and D2.0.0 are installed to satisfy the dependencies from A,B,C in Figure c.
Circle dependencies, represent two or more dependencies form a dependency loop, e.g., version A depends on version B, and B also depends on A.
Among the versions with dependencies, 72.74%, 52.16%, and 13.94% versions have compatible dependencies, conflict dependencies, and circle dependencies, respectively, in their dependency tree.
As presented in the left figure, most versions only have limited direct dependencies (the blue bar), and as the increase of direct dependencies, the ratios of containing complex structures (green, yellow, and red bars) in dependency tree also rise. Besides, the average rates of such complex structures in each dependency trees also rise, as presented in the right figure.
The reason is that more dependency nodes raise up from lower levels to higher levels, resulting in a higher chance that cases of compatible and conflict dependencies may occur, which increases the complexity.
Dependencies in NPM ecosystem are much more complex than ever thought.
Libraries in NPM ecosystem are widely connected as a forest, which links over 70\% libraries together, while only a small portion (5.54\%) of libraries, are identified to play significant roles in the ecosystem as leaf nodes that are widely depended on, particularly.
A group of 4,370 libraries are identified to have a large impact on half of libraries in the entire NPM ecosystem.
Each direct dependency introduces 22 packages to the entire dependency tree on average and 80\% dependency trees contain complexity tree structures.
Complexity of dependency trees are also related to the number of direct dependencies. As the increase of direct dependencies, the number of nodes in dependency trees rise accordingly, and more nodes that are originally in deep levels will be raised up, which shrinks the depth of the tree, but also increase the occurrence of complexity dependencies.