Package Manager Repository Server Proposal

 Presented by Alejandro Cadavid to WinLibre

Alejandro Cadavid López
Medellín, Antioquia, Colombia.
Timezone: GMT -5
Usual Nickname: acadavid
Mail/Gtalk: acadavid@gmail.com
Skype: alejandro.cadavid
MSN: ale_cad_lop@hotmail.com

Synopsis

WinLibre Package Manager is an application to serve Open Source software to Windows users. The basic idea is to allow Windows users to get a catalog of the available applications and download and install them if they want to. This proposal shows my ideas on repository server that will hold the packages description.

Benefits to the WinLibre Community

Windows users sometimes don't even know about great open source packages around, so having an application like WinLibre will make a great difference to the Windows-Open-Source software. It's not a secret that most of the Open source software for Windows is somewhat hidden in the IRC channels, discussion lists, and geeks websites, and sometimes those programs are not easy to install. There is a good chance that Open source software would be more reachable for human beings.

Deliverables


* Package parser. Depending on the selected package file standard, a parser will be created (or a library reused [1] [2] [3]). This will allow us to get global and individual information of the available packages. In addition to the parser, a way to store those files will be built (Database, or server direct upload, or mirroring, etc).

* Repository server with GET method working. Package manager will be able to download a global description of all the files available in the repository so they can be downloaded.

* Repository with POST method working. Package Creator or developers through a web page (a simplified version of the release Website) will be able to upload a XML file (JSON, YAML, XPKG or HTML for example). In order to do this, developer should be identified by the system (OpenID account, Google Account or our own authentication system), so this deliverable will include the identification system as well.

* Repository with DELETE, UPDATE and any other "auxiliary" methods needed. This will almost complete the whole system, allowing users to fetch information about the packages, and allowing developers to post and modify each package.

* Repository mirroring. As there will be backup servers for the repository, an application to synchronize with this mirrors should be built. It could be created as a Python script and added to a cron process in the backup servers. The application will basically compare the versions of the files in the live server and the backup ones, and it then will download all the out-dated packages to the backup server. This could be done daily or twice a day and at low-traffic hours to keep the servers away from overload. Another way to do this is allowing the live server to do HTTP requests to the backup servers in order to keep files updated, but this could cause overload in the live server.

* Web Portal. After everything in the backend is implemented, a Website will be created so users can browse through package categories, download package descriptors, or download the software itself. It will also allow developers (authenticated) to create new packages or update existing ones. Users will be able to rank, comment and download packages. Other features could be added on the way.

Project Details

My development process 


To get my ideas clear, I first write them on paper, and I try to solve the problem using concepts. I research by myself and ask for recommendations on the issue if necessary. I write a possible solution on paper and then I like discussing them to get feedback. When I feel that I'm sure of what's going to be done, I write some pseudo-code and if no issue arise, I go into implementation. I'll keep all the process into documents so there is useful information after the project.  I ask for code review if necessary, and then commit. I like documenting a lot, and actually i prefer writing the documentation for each method in order to make a clearer idea of what is it going to do. Then i write it's code. Basically I can work comfortably with SVN and Git, but I don't have any problem on learning to use any other cvs. I would recommend Review Board to keep track of the code and a Wiki to keep all the knowledge gathered from the development process, If I find another tool of interest, I'd recommend it. I'm willing to learn to use any other tool that you consider it's a good idea to use along the project. I like committing often, at least once daily, and of course, keeping the weekly deliver promise to keep myself motived and keep worries away from the project.

 

Package Parser


A Data model is already defined. With this model, a package parser will be easy to build. It will depend on what kind of XML file we will be using. The parser could be used in each part of the project (Repository server, package manager and package creator).  Parser will give us methods to access each part of the data model from the package descriptor.

Data handling on repository server.


REST architecture [4] [5] will be used along the project. REST uses explicit HTTP protocol methods, so we can GET access to resources, POST them, PUT (Modify) them or DELETE them. Each package will have its URI. Identifiers for packages will be defined depending on the application standards used. A general idea of this would be something like this:

http://repositoryserver/packages/category/appidenfier

Assuming that packages will be classified under certain categories, for example. Under that URI, methods like GET can be executed, so it can retrieve the package descriptor for example, or just parts of the data model.

Django can handle HTTP request so there is no problem with this [6].

GET Method


There are two different situations here.

- Global descriptions
A compress XML could be generated by the server each time a package is added, modified or deleted. This file will have a basic description of each application available on the repository (There is already a data model for this file too). It will have the URI of the application, allow the Package manager to execute HTTP methods on the repository for a selected application. Creating a XML file could be inefficient, so a more efficient method or a better way to do global update could be investigated.

- Application descriptor
The package manager will request information for certain package using GET method. The repository will answer with the package descriptor of the application. GET method could even ask just for single information of the package, like version, so there is no need to do a full XML file download. The URI for each package will allow us to access this resources in a easy and standardized way.

POST, PUT and DELETE Method.


All this methods are standard to HTTP protocol. The way to use them should be defined depending on the kind of application descriptor that will be used. So we can modify a package for example using a request like this:

PUT /application/appidentifier HTTP
Host: repositoryserver
Content'Type: application/xml

<package>
    <version>2.5</version>
</package>


DELETE methods would delete the package descriptor from the repository. As there should be some way to authenticate users, we could use django-OpenID [7] and Google APIs [8] or create or own library to do this.

Backup server applications:


Depending on how files are going to be stored in the live server, backup server will check the versions in the live server and will download the new and updated ones. If the live server goes down, the mirror server could check the state of it and take its place if its necessary. It's something complex to do, so it will require further research. But at least, mirror server could work as a backup servers for the data, and to keep it updated in case of an emergency.

Web Portal


After every HTTP request is handled by the repository, the web portal will be created. The basic features will be:

* Application category browsing
* Application search (By name, description, etc)
* Application description (Name, version, download URL, etc)
* Application ranking
* Comments on Applications
* Application Screenshots upload.



So many other features could be added, it's just a matter of deciding which will benefit the users. We could ask for users feedback on this.

Project Schedule



This proposal is written to invest the entire GSoC time on it. I will start working on the project around April 20th, if accepted. From April 20th to May 23th I'll be working with WinLibre team (and the other students accepted) to get concrete ideas on the files specification and improving my ideas on the repository server. I'll take some time to relearn Python and I should learn to use Django.

From May 23th to July 13th I will be working from 8 to 10 hours a day. I expect to be working at least 40 hours per week on the project. From July 13th to August 24th I will be working 6 hours a day. That's because my University starts again in July 13th, but this is not a big issue because the first classes month is not that hard, and I'll have enough time to work on the project, and I expect to have a big part of it done for that date. I will be very pleased to keep working on the project after GSoC is over. It's a very exciting project, quite challenging.

Week 1: XML parser building
Week 2: Parser testing with pre-built packages and bugs fixing.
Week 3: Descriptors files storage method
Week 4: Global descriptor generator.
Week 5: Repository with GET Method.
Week 6: Authentication Methods development.
Week 7: Repository with POST Method.
Week 8: Repository with PUT and DELETE methods.
Week 9: Backup servers application
Week 10: Testing repository with Package manager and package creator.
Week 10: WebPortal development (Category browsing, application description)
Week 11: WebPortal development (Searching, ranking, comments and other features)
Week 12: Testing weeek and bugs fixing.

Who am I?


I was born in Medellín, Colombia. On March 26th 1989. So right know I'm 20 years old. I study Systems Engineering (Similar curriculum to Computer Science) at Universidad EAFIT. This is my first time at GsOC, and the first time working on an Open Source project as well. I'm interested in programming, mathematics and music. I'm part of a group at my University where we develop techniques to solve programming problems (ACM, SPOJ.pl, TopCoder, etc). I'm a very open minded person. I like sharing ideas and i like discussing them. I usually work by my own, I find my own ways to solve issues things before asking. If it's getting so hard or it's taking more than the planned time, I'll ask for help. Yet, I like reporting myself to get feedback and to feel motivated. That's why I pretend to make a weekly deliver, that keeps me focused and self-motived to keep working.

My Experience:


I mainly use Linux as my Operating System. I use Windows as well. My experience with programming languages is not so wide, I have used Java, C++ and Python in my University projects. I made a Python project with a classmate in my University called Rigo [9] (It was basically a code parser and executer.. it was something like Karel but lighter). I'm showing this code just as a prove that I've coded in Python but it's not actually the best code to show. My coding practices there, are really awful. That's not the kind of code I'd like to develop for WinLibre.

I don't have real experience with Django, but I have done some development with Ruby on Rails, so, i think that getting the idea on Django wont be that hard. Anyway i'll do my homework on learning it, and relearning Python as well. I will research on everything that is needed on the project.


[1] http://pyyaml.org/wiki/PyYAML
[2] http://pypi.python.org/pypi/python-json/3.4
[3] http://pyxml.sourceforge.net
[4] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
[5] http://en.wikipedia.org/wiki/Representational_State_Transfer
[6] http://docs.djangoproject.com/en/dev/ref/request-response/?from=olddocs
[7] http://code.google.com/p/django-openid/
[8] http://code.google.com/apis/accounts/docs/AuthForWebApps.html
[9] http://acadavid.nfshost.com/uploads/rigo.tar.gz