NetflixPrize

介绍 Introduction




我从2009年3月开始参加NetflixPrize。目前的结果是100多个模型融合的结果。其中单个模型的准确率最好的是qRMSE=0.8807。我主要使用以下方法:


RBM
SVD++
KNN
Time dependent models

I am a student from China and I majored in machine learning and data mining.

I have attended Netflix Prize since March 2009. My result is got by blending many models together by linear regression. Four main types of models used in my projects are: KNN, RBM, SVD and time-dependent models.

My Email is xlvector@gmail.com. I am happy to discuss collaborative filtering with someone else.

Now, I am a member of team Vandelay Industries !

You can find some of my methods here!

July 26  I am a member of "The Ensemble", we are the leader of NetflixPrize Now.


Submit History 提交历史


My Story in Netflix Prize

I am a student from China and I graduated from University of Science and Technology of China (USTC). Now, I study in Institute of Automation, Chinese Academy of Sciences (CASIA).

The reason why I choose recommender system as my research field is very accidental. I do research on search engine before and last year, I wrote a paper of how to find similar web pages in Internet and submit it to WWW09. However, it is rejected by WWW09 and the reviewer tell me this have been studied by many other researchers in resys. At that time, I know the existing of collaborative filtering and resys.

After my paper was rejected, I told with another member in our lab and we found a good conference about resys, that is "ACM Conference on Recommender Systems". After I read some papers about resys, I though resys is a good research area and then I downloaded all papers in ACM resys conference. This happened in Feb. 2009 and it was Spring Festival in China. I took all these papers to home and read them in my holiday. When I read papers, I found two data set is often used resys and CF, that is movielens and Netflix.

I tried movielens firstly because it is small. After doing some researches in movielens, I want to try Netflix data and then I attend Netflix Prize (2009-2-21).

I tried SVD firstly in NetflixPrize because I thought my computer can not store item-item matrix in memory. However, I found I was wrong and I used KNN in Netflix after SVD. I read all papers from BPC, GPT and other members in Leaderboard. The last algorithm I implemented was RBM because the paper of RBM is hard to understand. My friend Wang Yuantao from Tsing-Hua University help me realized RBM. He study math in Tsing-Hua, so understand formulas in RBM paper is easy for him. After I implement RBM, I break 0.87 in NetflixPrize. Meanwhile, I wrote a paper about my time-dependent model and this paper is accepted by WI09.

In June 26, when I woke up, I received a message from Wang and he tell me someone have break 10% line. I was very surprised because at the time, PT was in the firstly place (0.858x) and I thought they need at least half a year to break 10% line. I opened the computer and I found PT is merged with BellKor and Bigchaos. At the same time, I received two emails from Greg and Larry(Dace). They want to combine their results with me and I agreed. Then, I joined VI and Larry also joined VI in the same time. The following story everyone in VI knows and I will not write it.

Comments