Ella Bingham and Heikki Mannila, Random projection in dimensionality reduction: Applications to image and text data, KDD 2001
 K. Ganchev and M. Dredze. Small statistical models by random feature mixing. In workshop on Mobile NLP at ACL, 2008.
 Fern, X.Z. and Brodley, C.E. Random projection for high dimensional data clustering: A cluster ensemble approach. Machine learning-international workshop then conference. 2003
Locality Sensitive Hashing
 Gionis, A.; Indyk, P., Motwani, R. " Similarity Search in High Dimensions via Hashing". Proceedings of the 25th Very Large Database (VLDB) Conference. 1999
 Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and V. Vishwanathan. ,“Hash kernels”, In International Conference on Artificial Intelligence and Statistics, 2009.
 KilianWeinberger, Anirban Dasgupta, John Langford, Alex Smola, Josh Attenberg, Feature Hashing for Large Scale Multitask Learning, in Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009
Shalev-Shwartz, S. Online learning: Theory, algorithms, and applications.
 Artaˇc, M., Jogan, M., and Leonardis, A. (2002). Incremental PCA for on-line visual learning and recognition. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR’2002).
 Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. Advances in neural information processing systems, 2001
 Chen, R. and Sivakumar, K. and Kargupta, H. An approach to online Bayesian learning from multiple data streams. Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, Freiburg, Germany, 2001
 Prateek Jain, Brian Kulis, Inderjit S. Dhillon, and Kristen Grauman. k. NIPS 2008
 Li, Y., & Long, P. M., “The relaxed online maximum margin algorithm”, Mach. Learn., 46, 361–387.2002
 Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y., “Online passive-aggressive algorithms”, J. Mach. Learn. Res., 7, 551–585.2006
 Kivinen, J., & M.K.Warmuth. “Additive versus exponentiated gradient updates for linear prediction”, Information and Computation, 132, 1–64. 1997
 Kivinen, J., Smola, A. J., & C.Williamson, R. (2002). Online learning with kernels. IEEE Transactions on Signal Processing, 52, 2165–2176.
 Vijayakumar, S. and D'souza, A. and Schaal, S. Incremental online learning in high dimensions. Neural Computation. 2005
 Chu, Cheng-Tao and Kim, Sang K. and Lin, Yi-An and Yu, Yuanyuan and Bradski, Gary and Ng, Andrew Y. and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. Advances in Neural Information Processing Systems 2007.
 Graf, H.P. and Cosatto, E. and Bottou, L. and Dourdanovic, I. and Vapnik, V. Parallel support vector machines: The cascade svm. Advances in neural information processing systems. 2005
 Yael Ben-Haim and Elad Yom-Tov. A streaming parallel decision tree algorithm. ICML 2008 workshop on PASCAL Large Scale Learning Challenge
 Tamir Hazan, Amit Man and Amnon Shashua. A Parallel Decomposition Solver for SVM: Distributed Dual Ascend using Fenchel Duality. CVPR 2008
 Catanzaro, Bryan and Sundaram, Narayan and Keutzer, Kurt. Fast Support Vector Machine Training and Classification on Graphics Processors. ICML 2008
 Rajat Raina, Anand Madhavan, Andrew Y. Ng. Large-scale Deep Unsupervised Learning using Graphics Processors. ICML 2009
 A. Asuncion, P. Smyth, and M. Welling, "Distributed Inference for Latent Dirichlet Allocation", Neural Information Processing Systems (NIPS) , 2007
 F. Lozano, and P. Rangel, "Algorithms for Parallel Boosting", ICMLA International Conference on Machine Learning and Applications , 2005
 N. Vasiloglou and A. G. Gray, David Anderson, "Scalable Semidefinite Manifold Learning", IEEE International Workshop on Machine Learning For Signal Processing (MLSP), 2009.
 L. Zanni, T. Serafini and G. Zanghirati. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems. Journal of Machine Learning Research 7:14671492, 2006.
 J. Zhang, Z. Li, and J. Yang. A Parallel SVM Training Algorithm on Large-Scale Classification Problems. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, 3, 2005.
 Jian-xiong Dong, Krzyzak, A., Suen, C.Y. Fast SVM training algorithm with decomposition on very large data sets. IEEE Transactions on Pattern Analysis and Machine Intelligence. Volume 27, Issue 4, Page(s):603 – 618, April 2005
 Jian-Xiong Dong, Adam Krzyzak, and Ching Y. Suen. A fast parallel optimization for training support vector machine. In Proceedings of 3rd International Conference on Machine Learning and Data Mining, volume 17, pages 96–105. Springer Lecture Notes in Artificial Intelligence, Leipzig, Germany, 2003.
 Ferri, FJ and Pudil, P. and Hatef, M. and Kittler, J. Comparative study of techniques for large-scale feature selection. MACHINE INTELLIGENCE AND PATTERN RECOGNITION. 1994
 Lazarevic, A. and Obradovic, Z. Boosting algorithms for parallel and distributed learning. Distributed and parallel databases. 2002
 Chang, E.Y. and Bai, H. and Zhu, K. Parallel algorithms for mining large-scale rich-media data. Proceedings of the seventeen ACM international conference on Multimedia. 2009
 Oei, C. and Friedland, G. and Janin, A. Parallel Training of a Multi-Layer Perceptron on a GPU. 2009
 Bertsekas, D.P. and Tsitsiklis, J.N. Parallel and distributed computation: numerical methods. 2003
 Fung, J. and Mann, S. OpenVIDIA: parallel GPU computer vision. Proceedings of the 13th annual ACM international conference on Multimedia. 2005
 Sinha, S.N. and Frahm, J.M. and Pollefeys, M. and Genc, Y. GPU-based video feature tracking and matching. EDGE, Workshop on Edge Computing Using New Commodity Architectures. 2006
 Kumar, NSL and Satoor, S. and Buck, I. Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA. Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications-Volume 00. 2009
 Ruibin Xi, Nan Lin, Yixin Chen, "Compression and Aggregation for Logistic Regression Analysis in Data Cubes," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 4, pp. 479-492, Apr. 2009, doi:10.1109/TKDE.2008.186
 Lin, N. and Xi, R., “Aggregated estimating equation estimation”, 2009
Shogun : http://www.shogun-toolbox.org/
IBM Parallel Machine Learning
Toolbox : http://www.alphaworks.ibm.com/tech/pml?open&ca=drs-aw-hom&S_TACT=106AH21W&S_CMP=AWRSSHOM
 Mahout: http://lucene.apache.org/mahout/
 Vowpal Wabbit : http://hunch.net/~vw/
 Hadoop: http://hadoop.apache.org/
 Joachims, T. Making large scale SVM learning practical. 1999
 S Sonnenburg, G Rätsch. Large scale multiple kernel learning. The Journal of Machine Learning Research. 2006
 Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. Advances in neural information processing systems. 2007
 Collobert, R. and Bengio, S. SVMTorch: Support vector machines for large-scale regression problems. The Journal of Machine Learning Research. 2001
 S Sonnenburg, G Rätsch, K Rieck. Large scale learning with string kernels. Large Scale Kernel Machines. 2007
 Enright, AJ and Van Dongen, S. and Ouzounis, CA. An efficient algorithm for large-scale detection of protein families. Nucleic acids research. 2002
 Berry, M.W. Large-scale sparse singular value computations. International Journal of Supercomputer Applications. 1992
 Crowder, H. and Johnson, E.L. and Padberg, M. Solving large-scale zero-one linear programming problems. Operations Research. 1983
 Woodland, PC and Povey, D. Large scale discriminative training for speech recognition. 2000
 RK Ahuja, Ö Ergun, JB Orlin, AP Punnen. A survey of very large-scale neighborhood search techniques. Discrete Applied Mathematics. 2002 (Survey)
 Collobert, R. and Bengio, S. and Bengio, Y. A parallel mixture of SVMs for very large scale problems. Neural computation. 2002
 Neil, M. and Fenton, N. and Nielson, L. Building large-scale Bayesian networks. The Knowledge Engineering Review. 2000
 Zhang, Y. Solving large-scale linear programs by interior-point methods under the MATLAB environment. Optimization Methods and Software. 1998
 Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363-392, 2005.
 Golub, G.H. and Von Matt, U. Generalized cross-validation for large-scale problems. Journal of Computational and Graphical Statistics. 1997
 Fan, W. and Stolfo, S.J. and Zhang, J. The application of AdaBoost for distributed, scalable and on-line learning. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. 1999
 Hall, L. and Bowyer, K. and Kegelmeyer, W. and Moore, T. and Chao, C. Distributed learning on very large data sets. Workshop on Distributed and Parallel Knowledge Discover. 2000
 J Beringer, E Hüllermeier. Online clustering of parallel data streams. Data & Knowledge Engineering. 2006