Reading List

Bayesian Decision and Graphical Models

[Bishop07] Bishop C, Lasserre J.Generative or discriminative? getting the best of both worlds, In: Bayesian Statistics. Oxford University Press; 2007:3-24.

[Ratnaparkhi97] Ratnaparkhi A., A Simple Introduction to Maximum Entropy Models for Natural Language Processing, Institute for Research in Cognitive Science, University of Pennsylvania, 1997.

Kernel Methods

[Kwok04] J. T. Y. Kwok and I. W. H. Tsang, The pre-image problem in kernel methods, IEEE transactions on neural networks, vol. 15, no. 6, pp. 1517–1525, Nov. 2004.

[Quadrianto10] N. Quadrianto, A. J. Smola, L. Song, and T. Tuytelaars, Kernelized sorting, IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 10, pp. 1809-21, Oct. 2010.

[Grauman05] K. Grauman and T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, in Proc. of the Tenth IEEE International Conference on Computer Vision (ICCV’05), 2005, pp. 1458-1465.

Support Vector Learning

[Chalimourda04] A. Chalimourda, B. Schölkopf, and A. J. Smola, Experimentally optimal nu in support vector regression for different noise models and parameter settings, Neural networks, vol. 17, no. 1, pp. 127-41, Jan. 2004.

[Laskov06] P. Laskov, C. Gehl, S. Krueger, and K.-R. Müller, Incremental Support Vector Learning: Analysis, Implementation and Applications, Journal of Machine Learning Research, vol. 7, pp. 1909-1936, 2006.

[Smola04] A.J. Smola and B. Schölkopf, A tutorial on support vector regression, Statistics and Computing, vol. 14, 2004, pp. 199-222.

Performance Evaluation

[Fawcett06] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, Jun. 2006.

[Demsar06] J. Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.

Unsupervised Learning

[Ding08] C. Ding, X. He, H. D. Simon, and R. Jin, On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering, Lawrence Berkeley National Laboratory, 2008.

[Dhillon04] I. S. Dhillon, Y. Guan, and B. Kulis, Kernel k-means , Spectral Clustering and Normalized Cuts, in Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD' 04, 2004, pp. 551-556.

Learning on Complex-Structured Data

[Cabestany05] J. Cabestany, A. Prieto, F. Sandoval, J. Weston, B. Scholkopf, and O. Bousquet, Joint Kernel Maps, Proceedings of the 8th International Workshop on Artificial Neural Networks, IWANN 2005, Springer-Verlag, 2005, pp. 176-191.

[Joachims09] Joachims T, Hofmann T, Yue Y, Yu C-N, Predicting structured objects with support vector machines, Communications of the ACM. 2009;52(11)

Large-Scale Machine Learning

[Vevaldi12] A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps, IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 3, pp. 480-92, Mar. 2012.

[Lin12] J. Lin and A. Kolcz, Large-scale machine learning at twitter, Proceedings of the 2012 international conference on Management of Data - SIGMOD 12, p. 793, 2012.