Research‎ > ‎

Large Scale Learning

Random projection

[1]   D. Achlioptas. Database-friendly random projections. In Proc. ACM Symp. on the Principles of Database Systems, pages 274–281, 2001.

[2]   Ella Bingham and Heikki Mannila, Random projection in dimensionality reduction: Applications to image and text data, KDD 2001

[3]   K. Ganchev and M. Dredze. Small statistical models by random feature mixing. In workshop on Mobile NLP at ACL, 2008.

[4]   Fern, X.Z. and Brodley, C.E. Random projection for high dimensional data clustering: A cluster ensemble approach. Machine learning-international workshop then conference. 2003


Locality Sensitive Hashing

[5]   Gionis, A.; Indyk, P., Motwani, R. " Similarity Search in High Dimensions via Hashing". Proceedings of the 25th Very Large Database (VLDB) Conference. 1999

[6]   Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and V. Vishwanathan. ,“Hash kernels”, In International Conference on Artificial Intelligence and Statistics, 2009.

[7]   KilianWeinberger, Anirban Dasgupta, John Langford, Alex Smola, Josh Attenberg, Feature Hashing for Large Scale Multitask Learning, in Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009


Online learning

[8]   Shalev-Shwartz, S. Online learning: Theory, algorithms, and applications. 2007 (Thesis)

[9]   Artaˇc, M., Jogan, M., and Leonardis, A. (2002). Incremental PCA for on-line visual learning and recognition. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR’2002).

[10] Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. Advances in neural information processing systems, 2001

[11] Fung, G. and Mangasarian, O.L. Incremental support vector machine classification. Proceedings of the Second SIAM International Conference on Data Mining, Arlington, Virginia. 2002

[12] Chen, R. and Sivakumar, K. and Kargupta, H. An approach to online Bayesian learning from multiple data streams. Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, Freiburg, Germany, 2001

[13] Prateek Jain, Brian Kulis, Inderjit S. Dhillon, and Kristen Grauman. k. NIPS 2008

[14] Li, Y., & Long, P. M., “The relaxed online maximum margin algorithm”, Mach. Learn., 46, 361–387.2002

[15] Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y., “Online passive-aggressive algorithms”, J. Mach. Learn. Res., 7, 551–585.2006

[16] Kivinen, J., & M.K.Warmuth. “Additive versus exponentiated gradient updates for linear prediction”, Information and Computation, 132, 1–64. 1997

[17] Kivinen, J., Smola, A. J., & C.Williamson, R. (2002). Online learning with kernels. IEEE Transactions on Signal Processing, 52, 2165–2176.

[18] Vijayakumar, S. and D'souza, A. and Schaal, S. Incremental online learning in high dimensions. Neural Computation. 2005

Parallel Learning

[19] Chu, Cheng-Tao and Kim, Sang K. and Lin, Yi-An and Yu, Yuanyuan and Bradski, Gary and Ng, Andrew Y. and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. Advances in Neural Information Processing Systems 2007.

[20] Graf, H.P. and Cosatto, E. and Bottou, L. and Dourdanovic, I. and Vapnik, V. Parallel support vector machines: The cascade svm. Advances in neural information processing systems. 2005

[21] Yael Ben-Haim and Elad Yom-Tov. A streaming parallel decision tree algorithm. ICML 2008 workshop on PASCAL Large Scale Learning Challenge

[22] Tamir Hazan, Amit Man and Amnon Shashua. A Parallel Decomposition Solver for SVM: Distributed Dual Ascend using Fenchel Duality. CVPR 2008

[23] Catanzaro, Bryan and Sundaram, Narayan and Keutzer, Kurt. Fast Support Vector Machine Training and Classification on Graphics Processors. ICML 2008

[24] Rajat Raina, Anand Madhavan, Andrew Y. Ng. Large-scale Deep Unsupervised Learning using Graphics Processors. ICML 2009

[25] A. Asuncion, P. Smyth, and M. Welling, "Distributed Inference for Latent Dirichlet Allocation", Neural Information Processing Systems (NIPS) , 2007

[26] F. Lozano, and P. Rangel, "Algorithms for Parallel Boosting", ICMLA International Conference on Machine Learning and Applications , 2005

[27] N. Vasiloglou and A. G. Gray, David Anderson, "Scalable Semidefinite Manifold Learning", IEEE International Workshop on Machine Learning For Signal Processing (MLSP), 2009.

[28] L. Zanni, T. Serafini and G. Zanghirati. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems. Journal of Machine Learning Research 7:14671492, 2006.

[29] J. Zhang, Z. Li, and J. Yang. A Parallel SVM Training Algorithm on Large-Scale Classification Problems. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, 3, 2005.

[30] Jian-xiong Dong, Krzyzak, A., Suen, C.Y. Fast SVM training algorithm with decomposition on very large data sets.  IEEE Transactions on Pattern Analysis and Machine Intelligence. Volume 27,  Issue 4,  Page(s):603 – 618, April 2005

[31] Jian-Xiong Dong, Adam Krzyzak, and Ching Y. Suen. A fast parallel optimization for training support vector machine. In Proceedings of 3rd International Conference on Machine Learning and Data Mining, volume 17, pages 96–105. Springer Lecture Notes in Artificial Intelligence, Leipzig, Germany, 2003.

[32] Ferri, FJ and Pudil, P. and Hatef, M. and Kittler, J. Comparative study of techniques for large-scale feature selection. MACHINE INTELLIGENCE AND PATTERN RECOGNITION. 1994

[33] Lazarevic, A. and Obradovic, Z. Boosting algorithms for parallel and distributed learning. Distributed and parallel databases. 2002

[34] Chang, E.Y. and Bai, H. and Zhu, K. Parallel algorithms for mining large-scale rich-media data. Proceedings of the seventeen ACM international conference on Multimedia. 2009

[35] Oei, C. and Friedland, G. and Janin, A. Parallel Training of a Multi-Layer Perceptron on a GPU. 2009

[36] Bertsekas, D.P. and Tsitsiklis, J.N. Parallel and distributed computation: numerical methods. 2003

[37] Fung, J. and Mann, S. OpenVIDIA: parallel GPU computer vision. Proceedings of the 13th annual ACM international conference on Multimedia. 2005

[38] Sinha, S.N. and Frahm, J.M. and Pollefeys, M. and Genc, Y. GPU-based video feature tracking and matching. EDGE, Workshop on Edge Computing Using New Commodity Architectures. 2006

[39] Kumar, NSL and Satoor, S. and Buck, I. Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA. Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications-Volume 00. 2009


Aggregated algorithm

[40] Ruibin Xi, Nan Lin, Yixin Chen, "Compression and Aggregation for Logistic Regression Analysis in Data Cubes," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 4, pp. 479-492, Apr. 2009, doi:10.1109/TKDE.2008.186

[41] Lin, N. and Xi, R., “Aggregated estimating equation estimation”, 2009



[42] Shogun :

[43] IBM Parallel Machine Learning Toolbox  :

[44] Mahout:

[45] Vowpal Wabbit :

[46] Hadoop:



[47] Joachims, T. Making large scale SVM learning practical. 1999

[48] S Sonnenburg, G Rätsch. Large scale multiple kernel learning. The Journal of Machine Learning Research. 2006

[49] Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. Advances in neural information processing systems. 2007

[50] Collobert, R. and Bengio, S. SVMTorch: Support vector machines for large-scale regression problems. The Journal of Machine Learning Research. 2001

[51] S Sonnenburg, G Rätsch, K Rieck. Large scale learning with string kernels. Large Scale Kernel Machines. 2007

[52] Enright, AJ and Van Dongen, S. and Ouzounis, CA. An efficient algorithm for large-scale detection of protein families. Nucleic acids research. 2002

[53] Berry, M.W. Large-scale sparse singular value computations.     International Journal of Supercomputer Applications. 1992

[54] Crowder, H. and Johnson, E.L. and Padberg, M. Solving large-scale zero-one linear programming problems. Operations Research. 1983

[55] Woodland, PC and Povey, D. Large scale discriminative training for speech recognition. 2000

[56] RK Ahuja, Ö Ergun, JB Orlin, AP Punnen. A survey of very large-scale neighborhood search techniques. Discrete Applied Mathematics. 2002 (Survey)

[57] Collobert, R. and Bengio, S. and Bengio, Y. A parallel mixture of SVMs for very large scale problems. Neural computation. 2002

[58] Neil, M. and Fenton, N. and Nielson, L. Building large-scale Bayesian networks. The Knowledge Engineering Review. 2000

[59] Zhang, Y. Solving large-scale linear programs by interior-point methods under the MATLAB environment. Optimization Methods and Software. 1998

[60] Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363-392, 2005.

[61] Golub, G.H. and Von Matt, U. Generalized cross-validation for large-scale problems. Journal of Computational and Graphical Statistics. 1997

[62] Fan, W. and Stolfo, S.J. and Zhang, J. The application of AdaBoost for distributed, scalable and on-line learning. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. 1999

[63] Hall, L. and Bowyer, K. and Kegelmeyer, W. and Moore, T. and Chao, C. Distributed learning on very large data sets. Workshop on Distributed and Parallel Knowledge Discover. 2000

[64] J Beringer, E Hüllermeier. Online clustering of parallel data streams. Data & Knowledge Engineering. 2006