Two qualifications apply to the valid use of a t-test.
1) the distributions of the populations from which the samples are drawn can be assumed to be normal.
If this assumption is not valid, then a distribution-free or nonparametrictest such as the Wilcoxon (also called Mann-Whitney) should be used.
2) Less importantly, the t-test strictly defined assumes that the variances of the populations are equal.
If this assumption is not made, a variant called Welch’s t-test is used.
Type I error, or a "false positive": the error of rejecting a null hypothesis when it is actually true:
Type II error, or a "false negative": the error of failing to reject a null hypothesis when it is in not true:
The P value is a probability, with a value ranging from zero to one. If the P value is small, you'll conclude that the difference between sample means is unlikely to be a coincidence. Instead, you'll conclude that the populations have different means.
What is a null hypothesis?
When statisticians discuss P values, they use the term null hypothesis. The null hypothesis simply states that there is no difference between the groups. Using that term, you can define the P value to be the probability of observing a difference as large or larger than you observed if the null hypothesis were true.
Common misinterpretation of a P value
Many people misunderstand P values. If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed even if the two population means are identical (the null hypothesis is true). It is tempting to conclude, therefore, that there is a 97% chance that the difference you observed reflects a real difference between populations and a 3% chance that the difference is due to chance. However, this would be an incorrect conclusion. What you can say is that random sampling from identical populations would lead to a difference smaller than you observed in 97% of experiments and larger than you observed in 3% of experiments. This distinction may be more clear after you read A Bayesian perspective.
https://www.reddit.com/r/Python/comments/4j0xhf/indepth_machine_learning_course_w_python_xpost/
https://www.datacamp.com/courses/kaggle-python-tutorial-on-machine-learning
https://www.youtube.com/watch?v=G4V8owAOqrY&feature=youtu.be
https://news.ycombinator.com/item?id=11555551 time series
https://www.reddit.com/r/programming/comments/4eyyhm/google_has_started_a_new_video_series_teaching/
https://www.youtube.com/watch?v=cKxRvEZd3Mw
http://programmingzen.com/2016/04/19/big-data-university-educating-one-million-data-scientists/
https://studywolf.wordpress.com/2012/11/25/reinforcement-learning-q-learning-and-exploration/
https://habrahabr.ru/post/280766/
https://habrahabr.ru/company/spbifmo/blog/276479/
https://habrahabr.ru/company/spbifmo/blog/277511/
https://habrahabr.ru/company/spbifmo/blog/277593/
https://habrahabr.ru/company/spbifmo/blog/278069/
https://habrahabr.ru/post/276355/
https://github.com/josephmisiti/awesome-machine-learning
https://github.com/ujjwalkarn/Machine-Learning-Tutorials
https://news.ycombinator.com/item?id=10951276
https://www.udacity.com/course/deep-learning--ud730
https://habrahabr.ru/post/279665/ simpson paradox pandas
Markov Chain
http://setosa.io/ev/markov-chains/
https://news.ycombinator.com/item?id=11323122
https://habrahabr.ru/company/wunderfund/blog/279545/ MCMC pandas
http://www.inference.vc/deep-learning-is-easy/
Classes
https://www.coursera.org/specializations/big-data
https://www.dataquest.io/subscribe
https://habrahabr.ru/company/bitrix/blog/275455/
Online classes
https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
https://work.caltech.edu/telecourse.html
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
https://github.com/donnemartin/data-science-ipython-notebooks
http://blog.cambridgecoding.com/2016/01/03/getting-started-with-regression-and-decision-trees/
Java
https://github.com/h2oai/h2o-3 H2O
http://blog.datumbox.com/datumbox-machine-learning-framework-0-6-1-released/
https://www.youtube.com/watch?v=s-i6nzXQF3g Flask + Machine learning
http://www.john-foreman.com/blog/surviving-data-science-at-the-speed-of-hype
https://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/
http://grigory.us/blog/modern-intro-algorithms/
https://github.com/ianozsvald/data_science_delivered
https://iwringer.wordpress.com/2015/10/06/techniques-for-learning-from-large-amounts-of-data/
https://github.com/pbharrin/machinelearninginaction
http://blog.sigopt.com/post/134931842613/sigopt-fundamentals-likelihood-for-gaussian
https://news.ycombinator.com/item?id=10723911
http://habrahabr.ru/post/273363/ XGBoost
https://github.com/dmlc/xgboost XGBoost
https://dato.com/products/create/
Линейная регрессия прогнозирует цену дома, как линейную комбинацию ширины и длины. Но цена дома в первую очередь зависит от площади дома, которая никак не выражается через линейную комбинацию длины и ширины. Поэтому, качество алгоритма существенным образом увеличивается, если длину и ширину заметить на их произведение.
Feature Selection
http://techblog.appnexus.com/blog/2016/01/04/ad-viewability-and-feature-selection-for-big-data/
https://habrahabr.ru/company/aligntechnology/blog/303750/
http://habrahabr.ru/post/264915/ feature selection
https://www.youtube.com/watch?v=BW3paX2g3dI&feature=share feature selection
http://habrahabr.ru/post/264241/
http://habrahabr.ru/post/264139/
Feature extraction and feature selection involve a balanced combination of domain expertise,
intuition, and mathematical methods. Feature extraction and dimension reduction can be combined in one step using principal component analysis (PCA), linear discriminant analysis (LDA), or canonical correlation analysis (CCA) techniques as a pre-processing step, followed by clustering by k-NN on feature vectors in reduced-dimension space. In machine learning this process is also called low-dimensional embedding.[13]
SVD
http://www.ams.org/samplings/feature-column/fcarc-svd
http://danluu.com/linear-hammer/
https://habrahabr.ru/post/275273/ SVD dimention reduction
https://www.youtube.com/watch?v=R9UoFyqJca8
https://www.reddit.com/r/MachineLearning/comments/44qb9p/python_implementation_of_boruta_an_all_relevant/
https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
Linear models
http://www.slideshare.net/BradKlingenberg/linear-models-for-data-science
https://habrahabr.ru/post/278513/
https://habrahabr.ru/post/279117/
http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
http://www.slideshare.net/xamat/10-more-lessons-learned-from-building-machine-learning-systems
http://work.caltech.edu/telecourse.html
http://www.cs.huji.ac.il/~shais/IML2014.html
Anomaly Detection
https://iwringer.wordpress.com/2015/11/17/anomaly-detection-concepts-and-techniques/
https://www.mapr.com/blog/better-anomaly-detection-t-digest-whiteboard-walkthrough
https://www.mapr.com/blog/anomaly-detection-poisson-distribution-whiteboard-walkthrough
https://www.datascience.com/blog/intro-to-anomaly-detection-learn-data-science-tutorials
http://multithreaded.stitchfix.com/blog/2015/05/26/significant-sample/
https://github.com/haifengl/smile Java Library for ML
http://blog.brakmic.com/data-science-for-losers-part-4-machine-learning/
http://xyclade.github.io/MachineLearning/
https://news.ycombinator.com/item?id=10190740
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x
http://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms
http://habrahabr.ru/company/mlclass/blog/266727/
http://www.itshared.org/2015/10/data-science-interview-questions.htm
https://jakevdp.github.io/blog/2015/10/17/analyzing-pronto-cycleshare-data-with-python-and-pandas/
Logistic regression
http://habrahabr.ru/company/io/blog/265007/
http://gormanalysis.com/logistic-regression-fundamentals/
Data exploration tool written in Java
http://elki.dbs.ifi.lmu.de/
with an emphasis on unsupervised methods in cluster analysis and outlier detection.
https://github.com/rushter/data-science-blogs
http://www.computervisiontalks.com/?s=Smola
http://alex.smola.org/teaching/cmu2013-10-701/
https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects
http://www.analyticsvidhya.com/blog/2015/07/dimension-reduction-methods/
http://www.edupristine.com/blog/understanding-and-creating-decision-tree
BOOKS
http://www-bcf.usc.edu/~gareth/ISL/
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://greenteapress.com/thinkstats2/index.html
• Supervised Learning:
» kNN (k Nearest Neighbors)
» Naive Bayes
» Logistic Regression
» Support Vector Machines
» Random Forests
• Unsupervised Learning:
» Clustering
» Factor Analysis
» Latent Dirichlet Allocation
http://scholar-vit.livejournal.com/419610.html
"У мистера Брауна два ребенка, хотя бы один из них мальчик. Какова вероятность, что оба ребенка - мальчики".
М-М М-Д Д-М Д-Д => Ответ 1/3
"У мистера Грина два ребенка. Один из них - мальчик, родившийся в среду. Какова вероятность того, что оба ребенка - мальчики?" Ответ => 13/27
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/
https://news.ycombinator.com/item?id=9562379
https://www.otexts.org/fpp/6/3 forecasting book
http://www.cs.cornell.edu/jeh/book11April2014.pdf
t-SNE
https://beta.oreilly.com/learning/an-illustrated-introduction-to-the-t-sne-algorithm
http://habrahabr.ru/post/267041/
http://quantombone.blogspot.com/2015/03/deep-learning-vs-machine-learning-vs.html
http://quantombone.blogspot.com/2015/04/deep-learning-vs-probabilistic.html
http://www.autonlab.org/tutorials/list.html
Signal Processing
http://www-math.mit.edu/~gs/papers/newsigproc.pdf
http://habrahabr.ru/post/252743/
http://zhengrui.github.io/zerryland/ML-CV-Resource.html
http://www.slideshare.net/dtunkelang/how-to-interview-a-data-scientist
https://github.com/rasbt/pattern_classification
http://tempr.org/54f9f429588f8.html
http://www.automaticstatistician.com/
http://radimrehurek.com/data_science_python/
https://docs.google.com/document/d/1YN6BVdReNAYc8B0fjQ84yzDflqmeEPj7S0Xc-9_26R0/edit
Book
Machine Learning in Python: Essential Techniques for Predictive Analysis by Michael Bowles
ML CheatSheet
https://github.com/soulmachine/machine-learning-cheat-sheet
https://dineshramitc.wordpress.com/2014/12/05/ebook-a-course-in-machine-learning/
https://dineshramitc.wordpress.com/2014/12/05/ebook-introduction-to-machine-learning/
http://www.win-vector.com/blog/2014/12/the-geometry-of-classifiers/
http://www.startup.ml/resources
http://habrahabr.ru/post/247751/ python machine learning
http://www.nature.com/doifinder/10.1038/nature14236
https://news.ycombinator.com/item?id=9130852
Classes
http://igorsubbotin.blogspot.ru/p/data-science.html
http://evelinag.com/blog/2014/12-15-christmas-carol-and-other-eigenvectors/index.html#.VI_HIyvF_y9
Mining Massive Data Sets
http://www-labs.iro.umontreal.ca/~bengioy/dlbook/ BOOK
http://arxiv.org/abs/1412.0291 Bits from Biology for Computational Intelligence
https://news.ycombinator.com/item?id=9131991
http://blogs.technet.com/b/machinelearning/
http://neuralnetworksanddeeplearning.com/ BOOK online
http://www.cognitivealgorithm.info/
Smile (Java)
https://github.com/haifengl/smile
https://news.ycombinator.com/item?id=9131991
https://haifengl.wordpress.com/2014/11/20/smile-is-available-on-github/
http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf
http://www.machinalis.com/blog/support-vector-machines/
https://colah.github.io/posts/2014-10-Visualizing-MNIST/
http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html
http://alexey.radul.name/ideas/2015/how-to-compute-with-a-probability-distribution/
http://blog.explainmydata.com/
https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
http://bid2.berkeley.edu/bid-data-project/
https://www.youtube.com/watch?v=xeAB10QgDW8
http://numericinsight.blogspot.com/2014/07/a-gentle-introduction-to-backpropagation.html
https://news.ycombinator.com/item?id=8135890
Book: Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data By Byron Ellis
http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html
https://github.com/josephmisiti/awesome-machine-learning
http://datasciencemasters.org/
http://metacademy.org/roadmaps/cjrd/level-up-your-ml
https://news.ycombinator.com/item?id=8061628
http://arxiv.org/abs/1407.5019 NN
https://probmods.org/ cognition
http://www.zipfianacademy.com/
https://news.ycombinator.com/item?id=8120053
PCA LCA
http://sebastianraschka.com/Articles/2014_python_lda.html
https://habrahabr.ru/post/304214/ PCA explained
http://notmatthancock.github.io/2015/06/14/what-is-pca.html
http://setosa.io/ev/principal-component-analysis/
https://news.ycombinator.com/item?id=9040266
http://liorpachter.wordpress.com/2014/05/26/what-is-principal-component-analysis/
http://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/ Principal Components Analysis
http://ganeshiyer.net/blog/2013/10/22/machine-learning-with-javascript/
http://habrahabr.ru/post/241527/
ML books
http://www.cs.ubc.ca/~murphyk/MLbook/
https://dl.dropboxusercontent.com/u/31779972/DataMiningForTheMasses.pdf
http://avaxhm.com/ebooks/programming_development/1783284358Clojure.html
http://christonard.com/12-free-data-mining-books/
https://en.wikipedia.org/wiki/Rprop
Clojure for Machine Learning book
Building Machine Learning Systems with Python book
Building Probabilistic Graphical Models with Python book
Random Forest http://orbi.ulg.ac.be/handle/2268/170309
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
http://amitranga.wordpress.com/2014/06/08/feature-selection/
http://twiecki.github.io/bayesian_pymc3_europy_ab.slides.html#/
http://habrahabr.ru/post/238431/
http://habrahabr.ru/post/227293/
http://habrahabr.ru/post/225589/
http://habrahabr.ru/post/226641/
http://habrahabr.ru/company/surfingbird/blog/226677/ Sampling
http://habrahabr.ru/company/surfingbird/blog/228249/
http://habrahabr.ru/company/surfingbird/blog/230103/
http://machinelearningmastery.com/
Fourier for data analysis
http://see.stanford.edu/see/courseInfo.aspx?coll=84d174c2-d74f-493d-92ae-c3f45c0ee091
GENERIC AI
https://dl.dropboxusercontent.com/u/280585369/2013-clark.pdf Prediction Machine
http://gigaom.com/2014/05/23/meet-the-algorithm-that-can-learn-everything-about-anything/
https://news.ycombinator.com/item?id=7795621
http://lispm.de/genera-concepts
http://radar.oreilly.com/tag/intelligence-matters
http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms
https://news.ycombinator.com/item?id=7783550
Quantum computer
http://readwrite.com/2014/05/22/google-quantum-computer-project#awesm=~oF6NzB0ex2JLpE
http://christonard.com/12-free-data-mining-books/
http://www-bcf.usc.edu/~gareth/ISL/ BOOK
http://www.inf.ed.ac.uk/teaching/courses/iaml/
http://www.inf.ed.ac.uk/teaching/courses/mlpr/
http://www.inf.ed.ac.uk/teaching/courses/pmr/
http://machine-learning-course.joachims.org/
BAYES
https://news.ycombinator.com/item?id=10843680
http://homepages.inf.ed.ac.uk/vlavrenk/iaml.html Bayes
http://habrahabr.ru/post/232639/ Bayes
http://blog.claymcleod.io/2016/02/02/Bayes-Theorem-for-Computer-Scientists/
http://habrahabr.ru/company/yandex/blog/208034/ Yandex lectures
http://efytimes.com/e1/fullnews.asp?edid=121516 free ML books
https://news.ycombinator.com/item?id=7120391 free ML books
http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
http://arxiv.org/abs/1405.0126
http://rs.io/2014/02/16/simulated-annealing-intuition.html
http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html
http://en.wikipedia.org/wiki/General_game_playing Stanford annual competition
https://github.com/tadasv/chimp ChimDB
http://www.hutter1.net/ai/uaibook.htm AI BOOK
http://www.cs.toronto.edu/~mackay/itprnn/book.html MACKAY BOOK
http://www.amazon.com/Building-Machine-Learning-Systems-Python/dp/1782161406
https://news.ycombinator.com/item?id=7149913
http://blog.statwing.com/easiest-data-analysis-mistake-to-make/
http://www.data-science.tips/give-me-them-digits/decision-tree-and-random-forest
http://www.win-vector.com/blog/
http://radimrehurek.com/gensim/ Topic modelling in Python
http://precog.com/ analytics engine that natively handles JSON
http://logic.pdmi.ras.ru/~yura/internet.html
http://nlp.stanford.edu/IR-book/ BOOK
http://www.r-bloggers.com/the-guerilla-guide-to-r/ R guide
http://www.refsmmat.com/statistics/data-analysis.html
http://www.kickstarter.com/projects/jeffheaton/artificial-intelligence-for-humans-vol-1-fund-algo
http://www.dataminingblog.com/list-of-blogs/
https://www.siam.org/proceedings/
http://www.amazon.com/PRACTITIONERS-GUIDE-BUSINESS-ANALYTICS-Organization-2019s/dp/0071807594
http://parleys.com/play/51c2e0f3e4b0ed877035684f/chapter0/about ML, Scala, eBay
Online Classes
http://work.caltech.edu/previous.html
http://online.stanford.edu/course/statistical-learning-winter-2014
https://www.youtube.com/watch?v=IxflKHX7aes&list=PLZSO_6-bSqHQCIYxE3ycGLXHMjK3XV7Iz&index=6
http://habrahabr.ru/post/189178/
http://www.machinelearning.ru/
http://nborwankar.github.io/LearnDataScience/
http://videolectures.net/mlss07_teh_dp/ Dirichlet Process Video/PDF
http://videolectures.net/mlss07_rasmussen_bigp/ Bayes and Gaussian Processing
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
https://news.ycombinator.com/item?id=5817713 Bayes
http://www.knime.org/ - a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting.
BI, OLAP and Analytics
http://en.wikipedia.org/wiki/Business_intelligence
http://en.wikipedia.org/wiki/Category:Online_analytical_processing
http://en.wikipedia.org/wiki/Comparison_of_OLAP_Servers
Books:
Theophano Mitsa, "Temporal Data Mining"
Richard T. Snodgrass, "Developing Time-Oriented Database Applications in SQL" 1999
James Wu, Stephen Coggeshall , "Foundations of Predictive Analytics"
Zheng Alan Zhao, "Spectral Feature Selection for Data Mining"
Yunqian Ma, "Manifold Learning Theory and Applications"
Guozhu Dong, James Bailey, Contrast Data Mining: Concepts, Algorithms, and Applications
Fon Silvers, "Data Warehouse Designs: Achieving ROI with Market Basket Analysis and Time Variance
http://blog.cloudera.com/blog/2013/04/hadoop-stratified-randosampling-algorithm/
http://eferm.com/machine-learning-cheat-sheet/ cheat sheet
http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html
C++ machine learning
http://sourceforge.net/projects/shark-project/
http://web.mit.edu/newsoffice/2013/machine-learning-algorithm-outperforms-predecessors-0529.html
http://egtheory.wordpress.com/2013/06/05/prediction-vs-understanding/
http://pauloortins.com/resources-to-become-a-ninja-machine-learning/
BOOK
http://www.amazon.com/Understanding-Complex-Datasets-Decompositions-Knowledge/dp/1584888326
https://leanpub.com/javaai Practical AI Intelligence Book
Random forest
http://blog.yhathq.com/posts/random-forests-in-python.html
http://www.rene-pickhardt.de/please-help-me-to-realize-my-web-science-massive-open-online-course/
http://biglearn.org/index.php/Papers
http://alex.smola.org/teaching/cmu2013-10-701/index.html
http://www.bigdatarepublic.com/author.asp?section_id=2809&doc_id=257527&
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
http://www.youtube.com/watch?v=bobeo5kFz1g Bayes
http://www.youtube.com/watch?v=jKBwGlYb13w Big data algorithms
http://probabilistic-programming.org/wiki/Home
http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca
Nikolenko: multiarmed bandits; SVD, bayes graphical models
http://habrahabr.ru/company/surfingbird/blog/168611/
http://habrahabr.ru/company/yandex/blog/229555/
http://habrahabr.ru/company/surfingbird/blog/169573/
http://habrahabr.ru/company/surfingbird/blog/176461/
http://habrahabr.ru/company/surfingbird/blog/177889/
http://habrahabr.ru/post/188244/
http://habrahabr.ru/post/175819/ Bolzman machine
http://habrahabr.ru/company/yandex/blog/175917/
http://conductrics.com/data-science-resources-2
http://conductrics.com/data-science-resources/
Clustering
https://en.wikipedia.org/wiki/Cluster_analysis
http://haifengl.github.io/smile/index.html#clustering
http://andrew.gibiansky.com/blog/machine-learning/k-nearest-neighbors-simplest-machine-learning/
http://varianceexplained.org/r/kmeans-free-lunch/
https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
http://www.bigdatanews.com/profiles/blogs/fast-clustering-algorithms-for-massive-datasets
http://grigory.us/blog/mapreduce-clustering/
http://www.galvanize.com/blog/introduction-k-means-cluster-analysis/#.Vk_C0xFViko
http://www.wired.com/design/2013/01/data-viz-ayasdi-iris/
http://www.p-value.info/2012/11/free-datascience-books.html
http://zinkov.com/posts/2012-10-04-ml-book-reviews/
http://habrahabr.ru/post/164211/
http://www.lektorium.tv/course/?id=22851
http://shad.yandex.ru/lectures/machine_learning.xml
http://conductrics.com/data-science-resources/
http://gigaom.com/data/a-programmers-guide-to-big-data-12-tools-to-know/
http://probabilistic-programming.org/wiki/Home
http://habrahabr.ru/post/161301/
http://ailev.livejournal.com/1047765.html?style=mine&nc=17#comments
http://blog.videolectures.net/100-most-popular-machine-learning-talks-at-videolectures-net/
http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
http://www.gambit-project.org/doc/index.html C++/Python for Game theory
http://en.wikipedia.org/wiki/Vowpal_Wabbit
Inference Engine
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/default.aspx Infer.NET
http://cs.ru.nl/~jorism/libDAI/
http://courses.cms.caltech.edu/cs155/ Probabilistic Graphical Models
Expectation propagation / Beleif propagation
http://arxiv.org/abs/1212.2991 Dimple: open-source API for probabilistic modeling
Rule Engines
http://www.infoq.com/articles/Rule-Engines
http://en.wikipedia.org/wiki/Rete_algorithm
https://speakerdeck.com/peteris/logic-programming Logic plogrammig
http://www.aelag.com/147952673
http://k2company.com/blog/2012/09/06/toolbox-for-learning-machine-learning-and-data-science/
http://www.cs.yale.edu/homes/el327/
Web Analytics
http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/
https://github.com/blog/1112-data-at-github
https://developers.google.com/bigquery/
ROOT
http://en.wikipedia.org/wiki/ROOT
Bayes
http://code.google.com/p/ourmine/wiki/LectureNaiveBayes
https://news.ycombinator.com/item?id=10843680
http://habrahabr.ru/post/219721/
http://habrahabr.ru/post/170545/
http://habrahabr.ru/post/170633/
http://www.greenteapress.com/thinkbayes/ Bayes Book
http://www.mimno.org/articles/hdp/ non-parametric bayes
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage Bayes Book
http://www.machinelearning.ru/
http://www.mblondel.org/journal/
http://www.autonlab.org/tutorials/index.html
http://deeplearning.net/tutorials/
http://machinelearningjourney.blogspot.com/
http://news.ycombinator.com/item?id=3199718
http://habrahabr.ru/blogs/data_mining
http://metaoptimize.com/qa/questions/3163/good-machine-learning-blogs
http://en.wikipedia.org/wiki/Data_mining#Methods_and_algorithms
http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/
http://habrahabr.ru/blogs/algorithm/124978/#habracut
http://www.mblondel.org/journal/2010/10/31/kernel-perceptron-in-python/
http://work.caltech.edu/library/ Machine lerning Video Library
http://www.shogun-toolbox.org/ SHOGUN
http://www.slideshare.net/dscottbrown/vicarious-systems-at-singularity-summit-2011
Support Vector Machines
http://www.machinelearning.ru/wiki/index.php?title=SVM
https://beta.oreilly.com/learning/intro-to-svm
https://www.youtube.com/watch?v=_PwhiWxHK8o
http://www.machinalis.com/blog/support-vector-machines/
http://habrahabr.ru/post/202486/ kernel trick
http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/
http://www.win-vector.com/blog/2011/10/kernel-methods-and-support-vector-machines-de-mystified/
http://habrahabr.ru/blogs/data_mining/105220/
http://antilamer.livejournal.com/381249.html#comments
http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/
http://hunch.net/~large_scale_survey/
http://habrahabr.ru/blogs/algorithm/125614/#habracut Fussy Logic
http://valserb.wordpress.com/2011/08/02/understanding-hidden-markov-models/
http://en.wikipedia.org/wiki/Machine_Learning
http://amundblog.blogspot.com/2008/06/pragmatic-classification-of-classifiers.html
http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
http://arxiv.org/PS_cache/arxiv/pdf/1105/1105.1951v2.pdf
Web Data Scrapping
https://scraperwiki.com/docs/python/python_libraries/
http://getthedata.org/questions/122/what-tools-or-services-are-good-for-scraping-data-from-websites
http://openflights.org/data.html
http://services.faa.gov/docs/services/airport/
http://www.quora.com/What-is-the-best-way-to-learn-data-mining-in-one-month
http://www.reddit.com/r/MachineLearning/comments/hwyus/learning_ml_after_graduating/
Markov Chains and Monte Carlo
http://users.aims.ac.za/~ioana/
http://habrahabr.ru/post/241317/
http://videolectures.net/mlss09uk_murray_mcmc/
http://habrahabr.ru/blogs/algorithm/134954/ Language recognition
http://users.livejournal.com/_winnie/320534.html#comments
http://metaoptimize.com/qa/
DataMining online
http://vis.stanford.edu/wrangler/
http://www.google.com/publicdata/home
http://www.philwhln.com/how-to-get-experience-working-with-large-datasets
http://www.slate.com/id/2285354/pagenum/all/#p2
http://labs.slate.com/
http://habrahabr.ru/blogs/algorithm/134950/#habracut
Parrondo Paradox
http://io9.com/5861287/parrondos-paradox-winning-two-games-youre-guaranteed-to-lose
The supervised learning problem is to find an approximation to an unknown function given a set of labeled examples. For the generalized linear models (GLM), the usual goal is to minimize the sum of squared deviations of the observed values for the dependent variable from those predicted by the model.
- Locality-sensitive hashing (1998) - алгоритмы, позволяющие вести быстрый поиск "самых похожих" изображений, звуков и вообще "ситуаций" в сколь угодно больших словарях
- Gradient boosting (1999) - текущий де-факто стандарт для машинного обучения
- Viola-Jones framework (2001) - первое realtime распознавание "естественных" объектов (лиц, людей и т.п.)
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/
http://scikit-learn.sourceforge.net/
http://waffles.sourceforge.net/
http://metaoptimize.com/qa/questions/3053/python-machine-learning-packages
http://www.gaussianprocess.org/
http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/
http://andriybuday.blogspot.com/2010/03/my-implementation-of-self-organizing.html
http://www.cis.temple.edu/~ingargio/cis587/readings/id3-c45.html
Data Vizualization:
http://tulip.labri.fr/TulipDrupal/
R
http://jeromyanglim.blogspot.com/2009/06/learning-r-for-researchers-in.html R
http://www.hselab.org/machinery/content/create-sequence-plots-r-and-ggplot2-and-save-pdfs
Statistics
http://habrahabr.ru/post/208684/ generating normal distrib from uniform distrib
http://daithiocrualaoich.github.io/kolmogorov_smirnov/
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/
http://cscs.umich.edu/~crshalizi/weblog/cat_36-402.html
http://www.statsoft.com/textbook/
http://en.wikipedia.org/wiki/Boy_or_Girl_paradox
http://en.wikipedia.org/wiki/Simpson%27s_paradox
Bayes Classifier
http://commonsenseatheism.com/?p=13156
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://ebiquity.umbc.edu/blogger/2010/12/07/naive-bayes-classifier-in-50-lines/
http://bionicspirit.com/blog/2012/02/09/howto-build-naive-bayes-classifier.html
http://davywybiral.blogspot.com/2011/04/naive-bayes-and-author-detection.html
http://habrahabr.ru/blogs/python/120194/
http://cnx.org/content/m10985/latest/
http://sciencehouse.wordpress.com/2010/11/11/bayesian-parameter-estimation/
http://cscs.umich.edu/~crshalizi/weblog/796.html
http://blog.moertel.com/articles/2010/12/20/more-on-the-evidence-of-a-single-coin-toss
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
The t-test is the most commonly used method to evaluate the differences in means between two groups. For example, the t-test can be used to test for a difference in test scores between a group of patients who were given a drug and a control group who received a placebo.
Nonparametric tests are sometimes called distribution free statistics because they do not require that the data fit a normal distribution. More generally, nonparametric tests require less restrictive assumptions about the data. Another important reason for using these tests is that they allow for the analysis of categorical as well as rank data.
Summary of statistical tests http://www.uwsp.edu/psych/stat/indexTests.htm
Time Series Analysis
http://habrahabr.ru/post/207160/ in python statmodels
http://institutiones.com/download/lecture/804-analiz-vremennih-ryadov.html
http://www.stat.pitt.edu/stoffer/tsa3/
http://practicalquant.blogspot.com/2012/10/mining-time-series-with-trillions-of.html
http://www.larkc.eu/ platform for massive distributed incomplete reasoning
http://blip.tv/search?q=machine+learning
http://selfawaresystems.com/
http://books.google.com/books?uid=8640673873589796416
Python
http://datacommunitydc.org/blog/2013/03/getting-started-with-python-for-data-scientists
https://github.com/avelino/mining
http://slendrmeans.wordpress.com/will-it-python/
http://www-ist.massey.ac.nz/smarsland/MLbook.html
http://pyml.sourceforge.net/index.html
http://www.vcasmo.com/video/drewconway/13268
Panda
http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas
http://stackoverflow.com/questions/16249736/how-to-import-data-from-mongodb-to-pandas
http://shop.oreilly.com/product/0636920023784.do
http://statsmodels.sourceforge.net
http://scikit-learn.sourceforge.net/
http://deeplearning.net/tutorial/contents.html
http://databrewery.org/
http://www.meetup.com/NYC-Predictive-Analytics/
http://www.amazon.com/Machine-Learning-Algorithmic-Perspective-Recognition/dp/1420067184
http://mloss.org/software/ http://mlcomp.org/
http://logic.pdmi.ras.ru/~sergey/index.php?page=ml
http://lucene.apache.org/mahout/
http://www.youtube.com/results?search_query=machine+learning&aq=f
http://www.machinelearning.ru/
http://www.dataminingtools.net
http://www.reddit.com/r/programming/comments/9u23i/what_are_the_good_machine_learning_tools_available/
http://yury.name/modern/ Yury Lifshits
http://nlp.stanford.edu/IR-book/information-retrieval.html