Organization: Sage Mathematical Software System
Abstract: In the context of libraries of graph algorithms, the graph module of Sagemath stands out for its simple and well-commented code; however, our preliminary testing outlined that some of its algorithms are not as efficient as other implementations, like Boost graph library or igraph. This project will try to address this issue: the main goal is to wrap more efficient graph libraries (a ticket about Boost is already present), but we might also improve existing algorithms, if time allows.
In the last few years, several libraries
of graph algorithms, like the graph module of Sagemath, Boost, igraph,
NetworkX, webgraph, and so on. Among all these tools, Sagemath stands
out for its simple and well-commented code; however, some of its
algorithms are not as efficient as other graph libraries, like Boost or
igraph, as outlined by our preliminary testing.
Name: Michele Borassi
Contact information (email, instant messaging, ...):
Location/Timezone: Italy / UTC+01:00
University: IMT Institute for Advanced Studies Lucca
Personal web page: https://sites.google.com/a/imtlucca.it/borassi/
Curriculum Vitae: downloadable from my web page
Mentor: David Coudert
What are your technical skills, education, experience, etc. Especially make sure to explain with what level of mathematics you are comfortable with and on what level you would like to program.
I have a Master Degree in Mathematics, and I am a Ph.D. student in Computer Science. I have excellent programming skills in Java and good programming skills in C/C++ (as testified by my project C++ vs Java - A Comparison of Identical Graph Libraries Written in Two Different Languages). I also have skills in Python programming, developed in the Ph.D. courses Computer Programming and Methodologies, and Data Science for Complex Networks (I was graded "A" in both courses).
My general programming skills, my knowledge of complexity theory, of data structures, and graph algorithms are testified by several papers where I improved state-of-the-art algorithms in several graph problems [1-6].
Who are you? What makes you the best person to work on this particular project? Your personal motivation?
Since I am a Ph.D. student in Computer Science, and I have a mathematical background, in my opinion, the efficiency of the graph module of Sagemath can benefit from my programming skills, and from my understanding of real-world graphs from an algorithmic point of view. My reasearch-level experience in graph algorithms will help me understand the main performance bottlenecks and improve them.
My motivations to pursue this work are many. First of all, I think that Sagemath is an excellent tool because its code is well-written and thoroughly commented. An improvement in efficiency would make it one of the best graph analysis tools available, if not the best. Furthermore, I will learn a lot by working in an open-source environment, and by having feedbacks from other people (my mentor, in primis). Finally, the experience I will develop using Sagemath and the final result of the project might become very useful in my academic work.
What platform and operating-system are you using on your computer? (Sage development is done best on Linux and OSX)
I use Linux, Lubuntu 15.04. In the past, I also had experience with Mac OS X and Windows.
I have never been engaged in open-source projects, but I have developed my own code in several works (see next question).
In the past few years, I have coded on several projects. In almost all my academic works I had to code algorithms, and code them efficiently. I also have programmed a software playing Othello (a board game) using Edax engine, which is one of the best engines available. In particular, I have developed an opening book and I have built a user interface. After debugging and making the code more readable, this work might be included in the new version of Edax.
During the last year, I have worked with the hyperbolicity package in the graph module of Sagemath, in order to find possible improvement. The results of this work will be published in .
My preliminary experiments have shown that the performances of the graph module of Sagemath can be improved in many tasks (for instance, computing connected components and betweenness centrality, as shown in the appendix). The best implementations available can be more than 50 times faster than the Sagemath implementations.
With this project, I will try to address this problem. I will wrap the fastest graph libraries available (igraph, Boost) within Sagemath, and I will replace some of the current algorithms with their faster counterparts. The importance of this work is also testified by a ticket asking to include Boost algorithms (http://trac.sagemath.org/ticket/17966). This way, I will try to improve the efficiency of Sagemath without affecting its simplicity.
After this main goal has been reached, if time allows, we will pursue other goals, too: coding new algorithms from scratch, in order to extend the features of the graph module of Sagemath and to speed-up current algorithms (for instance, we might code an algorithm for computing clique separators , and use it in order to speed-up the computation of the hyperbolicity of a graph).
The efficiency of algorithms has always been my field of expertise, as testified by my works and publications. I think that collecting all state-of-the-art algorithms in a single, simple, and efficient library can significantly simplify the work in the field of network analysis, which is becoming more and more important in recent years.
This project will provide such a tool, and I will be very happy to use this tool in my researches. Furthermore, I like Sagemath, and I prefer it to any other available tool: for this reason, I would really like to improve this tool even more.
Details: describe all the details and explain modules or parts of your whole project. Break down the whole project into individual tasks - as good as possible - and describe deliverable and quantifiable results for each of them. It also helps if you have already discussed this with a possible mentor.
I have divided the project in several tasks, each of which contains one or more tickets (small tasks that are submitted to Sage, reviewed, improved, tested, until they get accepted for inclusion). Some tasks do not include tickets because they do not require to write code for Sage: mainly, these tasks deal with testing. I have tried to define as many tickets as possible, so that I can have feedbacks "on the fly", and after a part is reviewed, I can build the others from solid and debugged foundations. The tasks I will try to address in my project follow.
Schedule: A timetable, including special circumstances like exams or holidays, for the individual tasks.
Risk Management: Try to anticipate potential problems and explain, how to mitigate them. Propose alternative scenarios, if a particular milestone isn't reached, to still successfully complete the project.
It is highly probable that Tasks 1 and 2 will be completed as expected, since they only involve testing and discussing with my mentor. Task 3 might involve some more problems, since it is based on the results in Tasks 1 and 2, which are not known, yet.
The first risk is that the graph module of Sagemath is already as efficient as other libraries: this possibility is ruled out by the tests in the Appendix. These tests show that, even if some algorithms are already efficient (for instance, computing the average distance), other algorithms can be improved, like computing connected components and betweenness centrality.
Another risk is that the number of libraries to wrap is too big or too small with respect to time limits. In the first case, I will have to restrict my attention only to some of them, in order of priority, and leave the others for future work. In the second case, I will devote more time to Tasks 4-5.
The last risk I can foresee during Task 3 is that wrappers have a deep impact on the running time of algorithms (in particular, linear-time algorithms): this possibility is ruled out by the tests in the Appendix, that prove that the wrapping of igraph library in Python is efficient, and I can use the same code.
Finally, Task 6 is fundamental for the availability of the code, but since it is the last task to perform, I might "run out of time". I will try to avoid this issue by perform as much on-the-fly testing as possible, and by commenting the code as soon as I write it, in order to make this task easier, and avoid delays.
 MICHELE BORASSI et al., Faster Algorithms for the Computation of the Hyperbolicity of Real-World Graphs. In preparation.
 MICHELE BORASSI et al., Fast Top-k Closeness Centrality Enumeration. In preparation.
 MICHELE BORASSI et al., Fast Diameter and Radius BFS-based Computation in (Weakly Connected) Real-World Graphs - With an Application to the Six Degrees of Kevin Bacon Game. Theoretical Computer Science, Special Issue - Fun With Algorithms, 2014
 MICHELE BORASSI et al., On the Solvability of the Six Degree of Kevin Bacon Game - A Faster Graph Diameter and Radius Computation Method. In Proceedings of the Seventh International Conference on FUN WITH ALGORITHMS, 2014.
 PAULO VIEIRA MILREU et al., Metabolic Stories: Exploring all Possible Scenarios for Metabolomics Data Analysis. Bioinformatics, 2014.
 MICHELE BORASSI et al., Telling Stories Fast Via Linear-Time Delay Pitch Enumeration. In Proceedings of the 12th International Simposium on Experimental Algorithms, 2013.
 ROBERT E. TARJAN, Decomposition by Clique Separators, Discrete Mathematics 55 (2), 1985.
In order to prove that the efficiency of the graph module of Sagemath can be improved, we have compared its performances with other libraries. The competitors are the most efficient existing graph libraries: igraph and Boost. We have also tested the igraph library wrapped in Python, in order to see if the wrapping has an impact on running times.
We have chosen three algorithms: computing the number of connected components, computing the betweenness centrality of all vertices, and computing the all-pairs-shortest path. These tests were performed on a small dataset of real-world networks, taken from the well-known SNAP dataset (http://snap.stanford.edu/). All tests considered deal with undirected, unweighted graphs: the directed inputs have been symmetrized.
The test has been performed on a Dell Latitude E6430, Intel Core I7 Quad-Core, 2.7GHz, running Lubuntu 14.10. The C libraries were compiled with gcc 4.9.1, optimization level 3, while Python version was 2.7, and Sagemath was run using the notebook interface (our preliminary tests showed that it does not slow down the execution). The times shown were averaged on 10 runs.
The results for connected components are plotted in the following table.
The following table shows the same results for the average distance.
Finally, the following table shows the results for the betweenness centrality.
It is clear that in some cases Sagemath is already very efficient (for instance, in computing the average distance). However, there are other tasks where Sagemath is much less efficient than igraph or Boost, like the computation of betweenness centrality and of the number of connected components: these tasks will be addressed by my project.
Unpublished Works >