Organization: Sage Mathematical Software System
## Extended abstractIn the last few years, several libraries
of graph algorithms, like the graph module of Sagemath, Boost, igraph,
NetworkX, webgraph, and so on. Among all these tools, Sagemath stands
out for its simple and well-commented code; however, some of its
algorithms are not as efficient as other graph libraries, like Boost or
igraph, as outlined by our preliminary testing. ## Personal:
## Background:## What are your technical skills, education, experience, etc. Especially make sure to explain with what level of mathematics you are comfortable with and on what level you would like to program.I have a Master Degree in Mathematics,
and I am a Ph.D. student in Computer Science. I have excellent
programming skills in Java and good programming skills in C/C++ (as
testified by my project My general programming skills, my knowledge of complexity theory, of data structures, and graph algorithms are testified by several papers where I improved state-of-the-art algorithms in several graph problems [1-6]. ## Who are you? What makes you the best person to work on this particular project? Your personal motivation?Since I am a Ph.D. student in Computer Science, and I have a mathematical background, in my opinion, the efficiency of the graph module of Sagemath can benefit from my programming skills, and from my understanding of real-world graphs from an algorithmic point of view. My reasearch-level experience in graph algorithms will help me understand the main performance bottlenecks and improve them. My motivations to pursue this work are
many. First of all, I think that Sagemath is an excellent tool because
its code is well-written and thoroughly commented. An improvement in
efficiency would make it one of the best graph analysis tools available,
if not the best. Furthermore, I will learn a lot by working in an
open-source environment, and by having feedbacks from other people (my
mentor, ## What platform and operating-system are you using on your computer? (Sage development is done best on Linux and OSX)I use Linux, Lubuntu 15.04. In the past, I also had experience with Mac OS X and Windows. ## Are you or have you been engaged in other open-source projects?I have never been engaged in open-source projects, but I have developed my own code in several works (see next question). ## Do you code on your own pet projects?In the past few years, I have coded on several projects. In almost all my academic works I had to code algorithms, and code them efficiently. I also have programmed a software playing Othello (a board game) using Edax engine, which is one of the best engines available. In particular, I have developed an opening book and I have built a user interface. After debugging and making the code more readable, this work might be included in the new version of Edax. ## Are you a Sage user, how long do you know Sage?During the last year, I have worked with the hyperbolicity package in the graph module of Sagemath, in order to find possible improvement. The results of this work will be published in [1]. ## Project:## Title, Project Synopsis: a short description and summary of its aim and scope.My preliminary experiments have shown that the performances of the graph module of Sagemath can be improved in many tasks (for instance, computing connected components and betweenness centrality, as shown in the appendix). The best implementations available can be more than 50 times faster than the Sagemath implementations. With this project, I will try to address this problem. I will wrap the fastest graph libraries available (igraph, Boost) within Sagemath, and I will replace some of the current algorithms with their faster counterparts. The importance of this work is also testified by a ticket asking to include Boost algorithms (http://trac.sagemath.org/ticket/17966). This way, I will try to improve the efficiency of Sagemath without affecting its simplicity. After this main goal has been reached, if time allows, we will pursue other goals, too: coding new algorithms from scratch, in order to extend the features of the graph module of Sagemath and to speed-up current algorithms (for instance, we might code an algorithm for computing clique separators [7], and use it in order to speed-up the computation of the hyperbolicity of a graph). ## What is your personal involvement or relationship with your proposed project?The efficiency of algorithms has always been my field of expertise, as testified by my works and publications. I think that collecting all state-of-the-art algorithms in a single, simple, and efficient library can significantly simplify the work in the field of network analysis, which is becoming more and more important in recent years. This project will provide such a tool, and I will be very happy to use this tool in my researches. Furthermore, I like Sagemath, and I prefer it to any other available tool: for this reason, I would really like to improve this tool even more. ## Details: describe all the details and explain modules or parts of your whole project. Break down the whole project into individual tasks - as good as possible - and describe deliverable and quantifiable results for each of them. It also helps if you have already discussed this with a possible mentor.I have divided the project in several tasks, each of which contains one or more tickets (small tasks that are submitted to Sage, reviewed, improved, tested, until they get accepted for inclusion). Some tasks do not include tickets because they do not require to write code for Sage: mainly, these tasks deal with testing. I have tried to define as many tickets as possible, so that I can have feedbacks "on the fly", and after a part is reviewed, I can build the others from solid and debugged foundations. The tasks I will try to address in my project follow. - Perform extensive testing of existing graph libraries on a big dataset of real-world networks, in order to understand their features (I already have a big dataset of networks, used in my previous papers). The goal is to provide a benchmark of comparison of their efficiency and generality, in order to decide which libraries are more suited to be wrapped in Sagemath.
- Design a new "backend" (graph implementation), or improve an existing one, in order to simplify calls to other libraries, possibly avoiding conversions (one ticket).
- Decide with my mentor which graph libraries we should include in Sagemath, and sort them in order of priority. Decide a good compromise between efficiency and generality (possibility to work with directed/undirected, weighted/unweighted, mutable/immutable graphs).
- Perform the following operation for all libraries in Task 2, in
order of priority (we will probably start with Boost, since a ticket is
already present on this topic).
- design modules to perform library-dependent adaptations (one ticket for each module);
- include the library in Sagemath as optional package (my mentor confirmed that this is the standard procedure);
- compare Sagemath algorithms with the algorithms based on other libraries. If the latter is more efficient, and the optional package is present, set the new approach as default.
- perform extensive testing to decide if the wrappers coded in the previous task have an impact on the efficiency of algorithms. In case, try to improve them, by refining the backend defined in Task 3 (optional, if time allows).
- (optional, if time allows): try to implement new algorithms in the graph module of Sagemath, like decomposition by clique separators (one ticket for each algorithm).
## Schedule: A timetable, including special circumstances like exams or holidays, for the individual tasks.
## Risk Management: Try to anticipate potential problems and explain, how to mitigate them. Propose alternative scenarios, if a particular milestone isn't reached, to still successfully complete the project.It is highly probable that Tasks 1 and 2 will be completed as expected, since they only involve testing and discussing with my mentor. Task 3 might involve some more problems, since it is based on the results in Tasks 1 and 2, which are not known, yet. The first risk is that the graph module of Sagemath is already as efficient as other libraries: this possibility is ruled out by the tests in the Appendix. These tests show that, even if some algorithms are already efficient (for instance, computing the average distance), other algorithms can be improved, like computing connected components and betweenness centrality. Another risk is that the number of libraries to wrap is too big or too small with respect to time limits. In the first case, I will have to restrict my attention only to some of them, in order of priority, and leave the others for future work. In the second case, I will devote more time to Tasks 4-5. The last risk I can foresee during Task 3 is that wrappers have a deep impact on the running time of algorithms (in particular, linear-time algorithms): this possibility is ruled out by the tests in the Appendix, that prove that the wrapping of igraph library in Python is efficient, and I can use the same code. Finally, Task 6 is fundamental for the availability of the code, but since it is the last task to perform, I might "run out of time". I will try to avoid this issue by perform as much on-the-fly testing as possible, and by commenting the code as soon as I write it, in order to make this task easier, and avoid delays. ## Additional Material## Bibliography[1] MICHELE BORASSI et al., [2] MICHELE BORASSI et al., [3] MICHELE BORASSI et al., [4] MICHELE BORASSI et al., [5] PAULO VIEIRA MILREU et al., [6] MICHELE BORASSI et al., [7] ROBERT E. TARJAN, ## Appendix: comparison of Sage with other graph librariesIn order to prove that the efficiency of the graph module of Sagemath can be improved, we have compared its performances with other libraries. The competitors are the most efficient existing graph libraries: igraph and Boost. We have also tested the igraph library wrapped in Python, in order to see if the wrapping has an impact on running times. We have chosen three algorithms: computing the number of connected components, computing the betweenness centrality of all vertices, and computing the all-pairs-shortest path. These tests were performed on a small dataset of real-world networks, taken from the well-known SNAP dataset (http://snap.stanford.edu/). All tests considered deal with undirected, unweighted graphs: the directed inputs have been symmetrized. The test has been performed on a Dell Latitude E6430, Intel Core I7 Quad-Core, 2.7GHz, running Lubuntu 14.10. The C libraries were compiled with gcc 4.9.1, optimization level 3, while Python version was 2.7, and Sagemath was run using the notebook interface (our preliminary tests showed that it does not slow down the execution). The times shown were averaged on 10 runs. The results for connected components are plotted in the following table.
The following table shows the same results for the average distance.
Finally, the following table shows the results for the betweenness centrality.
It is clear that in some cases Sagemath is already very efficient (for instance, in computing the average distance). However, there are other tasks where Sagemath is much less efficient than igraph or Boost, like the computation of betweenness centrality and of the number of connected components: these tasks will be addressed by my project. |

Unpublished Works >