**Random
projection**

[1]
D. Achlioptas. Database-friendly random projections. In Proc. ACM Symp.
on the Principles of Database Systems, pages 274–281, 2001.

[2]
Ella Bingham and Heikki Mannila, Random projection in dimensionality reduction:
Applications to image and text data, KDD 2001

[3]
K. Ganchev and M. Dredze. Small statistical models by random feature
mixing. In workshop on Mobile NLP at ACL, 2008.

[4]
Fern, X.Z. and Brodley, C.E. Random projection for high dimensional data
clustering: A cluster ensemble approach. Machine learning-international
workshop then conference. 2003

**Locality Sensitive Hashing**

[5]
Gionis, A.; Indyk, P., Motwani, R. " Similarity Search in High
Dimensions via Hashing". Proceedings of the 25th Very Large Database (VLDB)
Conference. 1999

[6]
Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and V.
Vishwanathan. ,“Hash kernels”, In International Conference on Artificial
Intelligence and Statistics, 2009.

[7]
KilianWeinberger, Anirban Dasgupta, John Langford, Alex Smola, Josh
Attenberg, Feature Hashing for Large Scale Multitask Learning, in Proceedings
of the 26th International Conference on Machine Learning, Montreal, Canada,
2009

** **

**Online learning**

[8]
Shalev-Shwartz, S. Online learning: Theory, algorithms, and applications.
2007 (Thesis)

[9]
Artaˇc, M., Jogan, M., and Leonardis, A. (2002). Incremental PCA for
on-line visual learning and recognition. In Proceedings of the 16th
International Conference on Pattern Recognition (ICPR’2002).

[10] Cauwenberghs, G. and Poggio, T. Incremental
and decremental support vector machine learning. Advances in neural information
processing systems, 2001

[11] Fung, G. and Mangasarian, O.L. Incremental
support vector machine classification. Proceedings of the Second SIAM
International Conference on Data Mining, Arlington, Virginia. 2002

[12]
Chen, R. and Sivakumar, K. and Kargupta, H. An approach to online
Bayesian learning from multiple data streams. Workshop on Ubiquitous Data Mining
for Mobile and Distributed Environments, Freiburg, Germany, 2001

[13] Prateek Jain, Brian Kulis,
Inderjit S. Dhillon, and Kristen Grauman. k. NIPS 2008

[14] Li, Y., & Long, P. M., “The
relaxed online maximum margin algorithm”, Mach. Learn., 46, 361–387.2002

[15] Crammer, K., Dekel, O., Keshet,
J., Shalev-Shwartz, S., & Singer, Y., “Online passive-aggressive
algorithms”, J. Mach. Learn. Res., 7, 551–585.2006

[16] Kivinen, J., & M.K.Warmuth.
“Additive versus exponentiated gradient updates for linear prediction”,
Information and Computation, 132, 1–64. 1997

[17] Kivinen, J., Smola, A. J., &
C.Williamson, R. (2002). Online learning with kernels. IEEE Transactions on
Signal Processing, 52, 2165–2176.

[18]
Vijayakumar, S. and D'souza, A. and Schaal, S. Incremental online
learning in high dimensions. Neural Computation. 2005

**Parallel Learning**

[19] Chu, Cheng-Tao and Kim, Sang K.
and Lin, Yi-An and Yu, Yuanyuan and Bradski, Gary and Ng, Andrew Y. and
Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. Advances in
Neural Information Processing Systems 2007.

[20] Graf, H.P. and Cosatto, E. and
Bottou, L. and Dourdanovic, I. and Vapnik, V. Parallel support vector machines:
The cascade svm. Advances in neural information processing systems. 2005

[21] Yael Ben-Haim and Elad Yom-Tov. A
streaming parallel decision tree algorithm. ICML 2008 workshop on PASCAL Large
Scale Learning Challenge

[22] Tamir Hazan, Amit Man and Amnon
Shashua. A Parallel Decomposition Solver for SVM: Distributed Dual Ascend using
Fenchel Duality. CVPR 2008

[23] Catanzaro, Bryan and Sundaram,
Narayan and Keutzer, Kurt. Fast Support Vector Machine Training and
Classification on Graphics Processors. ICML 2008

[24] Rajat Raina, Anand Madhavan,
Andrew Y. Ng. Large-scale Deep Unsupervised Learning using Graphics Processors.
ICML 2009

[25] A. Asuncion, P. Smyth, and M.
Welling, "Distributed Inference for Latent Dirichlet Allocation",
Neural Information Processing Systems (NIPS) , 2007

[26] F. Lozano, and P. Rangel,
"Algorithms for Parallel Boosting", ICMLA International Conference on
Machine Learning and Applications , 2005

[27] N. Vasiloglou and A. G. Gray,
David Anderson, "Scalable Semidefinite Manifold Learning", IEEE
International Workshop on Machine Learning For Signal Processing (MLSP), 2009.

[28] L. Zanni, T. Serafini and G.
Zanghirati. Parallel Software for Training Large Scale Support Vector Machines
on Multiprocessor Systems. Journal of Machine Learning Research 7:14671492,
2006.

[29] J. Zhang, Z. Li, and J. Yang. A
Parallel SVM Training Algorithm on Large-Scale Classification Problems. Machine
Learning and Cybernetics, 2005. Proceedings of 2005 International Conference
on, 3, 2005.

[30] Jian-xiong Dong, Krzyzak, A.,
Suen, C.Y. Fast SVM training algorithm with decomposition on very large data
sets. IEEE Transactions on Pattern
Analysis and Machine Intelligence. Volume 27,
Issue 4, Page(s):603 – 618, April
2005

[31] Jian-Xiong Dong, Adam Krzyzak,
and Ching Y. Suen. A fast parallel optimization for training support vector
machine. In Proceedings of 3rd International Conference on Machine Learning and
Data Mining, volume 17, pages 96–105. Springer Lecture Notes in Artificial
Intelligence, Leipzig, Germany, 2003.

[32]
Ferri, FJ and Pudil, P. and Hatef, M. and Kittler, J. Comparative study
of techniques for large-scale feature selection. MACHINE INTELLIGENCE AND
PATTERN RECOGNITION. 1994

[33]
Lazarevic, A. and Obradovic, Z. Boosting algorithms for parallel and
distributed learning. Distributed and parallel databases. 2002

[34]
Chang, E.Y. and Bai, H. and Zhu, K. Parallel algorithms for mining
large-scale rich-media data. Proceedings of the seventeen ACM international
conference on Multimedia. 2009

[35]
Oei, C. and Friedland, G. and Janin, A. Parallel Training of a
Multi-Layer Perceptron on a GPU. 2009

[36]
Bertsekas, D.P. and Tsitsiklis, J.N. Parallel and distributed
computation: numerical methods. 2003

[37]
Fung, J. and Mann, S. OpenVIDIA: parallel GPU computer vision. Proceedings
of the 13th annual ACM international conference on Multimedia. 2005

[38]
Sinha, S.N. and Frahm, J.M. and Pollefeys, M. and Genc, Y. GPU-based
video feature tracking and matching. EDGE, Workshop on Edge Computing Using New
Commodity Architectures. 2006

[39]
Kumar, NSL and Satoor, S. and Buck, I. Fast Parallel Expectation
Maximization for Gaussian Mixture Models on GPUs Using CUDA. Proceedings of the
2009 11th IEEE International Conference on High Performance Computing and
Communications-Volume 00. 2009

**Aggregated algorithm**

[40] Ruibin Xi, Nan Lin, Yixin Chen,
"Compression and Aggregation for Logistic Regression Analysis in Data
Cubes," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no.
4, pp. 479-492, Apr. 2009, doi:10.1109/TKDE.2008.186

[41] Lin, N. and Xi, R., “Aggregated
estimating equation estimation”, 2009

**Tools**

[42]
Shogun : http://www.shogun-toolbox.org/

[43]
IBM Parallel Machine Learning
Toolbox : http://www.alphaworks.ibm.com/tech/pml?open&ca=drs-aw-hom&S_TACT=106AH21W&S_CMP=AWRSSHOM

[44]
Mahout: http://lucene.apache.org/mahout/

[45] Vowpal Wabbit : http://hunch.net/~vw/

[46]
Hadoop: http://hadoop.apache.org/

**Miscellaneous**

[47]
Joachims, T. Making large scale SVM learning practical. 1999

[48]
S Sonnenburg, G Rätsch. Large scale multiple kernel learning. The
Journal of Machine Learning Research. 2006

[49]
Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. Advances
in neural information processing systems. 2007

[50]
Collobert, R. and Bengio, S. SVMTorch: Support vector machines for
large-scale regression problems. The Journal of Machine Learning Research. 2001

[51]
S Sonnenburg, G Rätsch, K Rieck. Large scale learning with string
kernels. Large Scale Kernel Machines. 2007

[52]
Enright, AJ and Van Dongen, S. and Ouzounis, CA. An efficient algorithm
for large-scale detection of protein families. Nucleic acids research. 2002

[53]
Berry, M.W. Large-scale sparse singular value computations. International Journal of Supercomputer
Applications. 1992

[54]
Crowder, H. and Johnson, E.L. and Padberg, M. Solving large-scale
zero-one linear programming problems. Operations Research. 1983

[55]
Woodland, PC and Povey, D. Large scale discriminative training for
speech recognition. 2000

[56]
RK Ahuja, Ö Ergun, JB Orlin, AP Punnen. A survey of very large-scale
neighborhood search techniques. Discrete Applied Mathematics. 2002 (Survey)

[57]
Collobert, R. and Bengio, S. and Bengio, Y. A parallel mixture of SVMs
for very large scale problems. Neural computation. 2002

[58]
Neil, M. and Fenton, N. and Nielson, L. Building large-scale Bayesian
networks. The Knowledge Engineering Review. 2000

[59]
Zhang, Y. Solving large-scale linear programs by interior-point methods
under the MATLAB environment. Optimization Methods and Software. 1998

[60] Ivor W. Tsang, James T. Kwok,
Pak-Ming Cheung. Core vector machines: Fast SVM training on very large data
sets. Journal of Machine Learning Research, 6:363-392, 2005.

[61]
Golub, G.H. and Von Matt, U. Generalized cross-validation for large-scale
problems. Journal of Computational and Graphical Statistics. 1997

[62]
Fan, W. and Stolfo, S.J. and Zhang, J. The application of AdaBoost for
distributed, scalable and on-line learning. Proceedings of the fifth ACM SIGKDD
international conference on Knowledge discovery and data mining. 1999

[63]
Hall, L. and Bowyer, K. and Kegelmeyer, W. and Moore, T. and Chao, C. Distributed
learning on very large data sets. Workshop on Distributed and Parallel
Knowledge Discover. 2000

[64]
J Beringer, E Hüllermeier. Online clustering of parallel data streams. Data
& Knowledge Engineering. 2006