Library Comparison

In the first part of the project, we have compared several graph libraries, in order to choose which ones should be deployed with Sagemath. We have restricted our attention to libraries written in Python, C, or C++, because they are easier to integrate in Sagemath (using Cython). Furthermore, we have excluded NetworkX, which is already deployed with Sagemath.

First of all, we have to enforce that no graph library comparison can be completely fair, and also this comparison can be criticized, due to the large amount of available routines, to the constant evolution of libraries, and to many small differences in the outputs (for instance, one library might compute the value of a maximum s-t flow, another library might actually compute the flow, and a third one might compute all maximum flows). Despite this, we have tried to be as fair as possible, through a deeper and more detailed analysis than previous comparisons (,,

The first comparison deals with the number of algorithms implemented. We have chosen a set of 108 possible algorithms, trying to cover all possible tasks that a graph library should perform (we have not put in the list easy tasks that are common to all libraries, like outputting the number of nodes, the number of edges, the neighbors of a node, etc). In some cases, we have collapsed two tasks in one, if the algorithms solving these tasks are very similar (for instance, computing a maximum flow and computing a minimum cut, computing vertex betweenness and edge betweenness, etc).

The number of routines available for each library is plotted in the following chart, and a table containing all features is available in HTML or as a Google Sheet.

We observe that, before the project, Sagemath already had more routines than all competitors (66), closely followed by igraph (62), while after the project Sagemath drastically outperforms all competitors. The third biggest library is graph-tool (45), and all other libraries are very close to each other, having about 30 routines each. Furthermore, we outline that our project improved Sagemath in the fields of neighbor similarity measures (assortativity, bibcoupling, cocitation, etc), community detection, and random graph generators. For instance, the inclusion of igraph added 29 routines to Sagemath.

The second comparison analyzes the running-time of some of the algorithms implemented in the libraries (note that the experiments were performed in June 2015, and some libraries may have improved their algorithms after that). We have chosen 8 of the most common tasks in graph analysis: computing the diameter, computing the maximum flow between two vertices, finding connected components and strongly connected components, computing betweenness centrality, computing the clustering coefficient, computing the clique number, and generating a graph with the preferential attachment model. We have run each of these algorithms on 3 inputs, and we have considered the total execution time (excluding the time needed to load the graph). More details on this experiment are available here, and the results are also available in a Google Sheet.

In order to make the results more readable, we have plotted the ratio between the time needed by a given library and the minimum time needed by any library. If an algorithm was not implemented, or it needed more than 3 hours to complete, the corresponding bar is not shown.

Overall, the results show that NetworKit is the fastest library, or one of the fastest, in all routines that are implemented (apart from the generation of preferential attachment graphs, where it is very slow). Boost graph library is very close to NetworKit, and it also contains more routines. Also Sagemath is quite efficient in all tasks, apart from the computation of strongly connected components and the generation of a preferential attachment graph, where it needed more than 3 hours. However, in the latter case, the main problem was not speed but memory consumption.

In conclusion, Sagemath can highly benefit from the possibility of using algorithms from other libraries. First of all, it might improve the number of algorithms offered, especially by including igraph, and it also might improve its performance, by including Boost, NetworKit, or other fast graph libraries.