Links

PhD Journey:

PhD thesis research: where do I start? by Don Davis (2001) [PDF]
A guide to writing the dissertation literature review [PDF]
Basics of research paper writing and publishing [PDF]
Ten simple rules for reproducible computational research [PDF]
Crafting a research proposal [PDF]
How to write a good (no, great) PhD dissertation [PDF]
How to write a good PhD thesis and survive the viva by Stefan Ruger
Again, the Role of Conference Papers in Computer Science and Informatics [PDF]
Relative status of journal and conference publications in computer science [PDF]

Math Preliminaries:

Linear Algebra by Prof. Gilbert Strang: ocw.mit.edu (spring 2010)
Calculus: MIT course Fall 2007 (univariate), MIT course Fall 2010 (multivariate)
Probability distributions (comprehensive): Distributions_Handbook

Miscellaneous:

Data visualization catalogue: datavizcatalogue.com
LaTeX for beginners: Book & Presentations
Python resources:
- Scientific programming, mathematical and statistical computing: Pythonidae
R resources:
- Using R for Introductory Statistics by John Verzani: Verzani-SimpleR.pdf
- R packages, tutorials, primers for different statistical methodologies: cmecklin's R web page
- Pointers to various books on R: Books related to R
- R reference cards: Baggott-refcard-v2
- i-graph (a network manipulation & visualization library): i-graph architecture, Large-scale network analysis, Social Network Analysis
- R & C++ integration: Rcpp, RcppArmadillo-cheatsheet
GCC & make documentation: Small intro to compilation, GCC, make
GIT, visual way: GIT cheatsheet

Few key papers in Data Mining and Machine Learning:

A few useful things to know about machine learning
An introduction to ROC analysis
The relationship between precision-recall and ROC curves
Data mining of social networks represented as graphs
Unsupervised learning by probabilistic latent semantic analysis
Probabilistic latent semantic analysis
Latent Dirichlet Allocation
Explaining the Gibbs Sampler
Probabilistic modeling and Bayesian analysis 15.097 by Ben Letham and Cynthia Rudin (MIT)
Clustering by fast search and find of density peaks
Multidimensional scaling
Clustering validation by prediction strength
Discretization: an enabling technique
Feature selection for high-dimensional data: a fast correlation-based filter solution
Review of Probability theory: Stanford university
The multivariate gaussian distribution by Chuong Do (2008)
A tutorial on principal component analysis: derivation, discussion and SVD by Jon Shlens
Supervised machine learning: a review of classification techniques
Support vector machine learning for interdependent and structured output spaces
Linear algebra review (CSC2515 - Machine learning - Fall 2003)
Statistical pattern recognition - a review
A tutorial on principal component analysis by Lindsay Smith
Bagging predictors
Understanding convergence concepts: a visual-minded and graphical simulation-based approach
An introduction to exponential random graph models for social networks
The challenges of clustering high dimensional data
On clustering validation techniques
Discrete state Markov processes - Ch 5 in Fundamentals of Applied Probability Theory by Alvin Drake

Knowledge Bases (DBpedia, YAGO, Wikidata, etc.)

Knowledge-Based trust: estimating the trustworthiness of web sources
What is knowledge representation? by Randall Davis (AAAI 1993)
The epistemology of intelligent semantic web systems
Schema.org: Evolution of structured data on the web
How far are we from collecting the knowledge in the world
Quality assessment for Linked Open Data: a survey
SemMedDB: a PubMed-scale repository of biomedical semantic predications
YAGO: a core of semantic knowledge unifying Wordnet and Wikipedia
Knowledge Vault: a web-scale approach to probabilistic knowledge fusion
Recovering semantics of tables on the web
Knowledge curation and knowledge fusion: challenges, models and applications
From data fusion to knowledge fusion
Freebase: a collaboratively created graph database for structuring human knowledge
Wordnet: a lexical database for English
Introduction to Linked Data and its Lifecycle on the Web
A comparative survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO
Introducing Wikidata to the Linked Data Web (Technical: data model)
DBpedia: a large-scale, multilingual knowledge base extracted from Wikipedia
Wikidata: A free collaborative knowledge base
How the semantic web is being used: an analysis of FOAF documents

Tensors

Tensor decompositions and applications
Reducing the rank of relational factorization models by including observable patterns
A latent factor model for highly multi-relational data
Applications of tensor (multiway array) factorizations and decompositions in data mining
Modeling relational data using Bayesian Clustered Tensor Factorization (BCTF)
Future directions in tensor-based computation and modeling
An introduction to tensor products with applications to multiway data analysis
Temporal analysis of semantic graphs using ASALSAN
Unsupervised multiway data analysis: a literature survey
A tensor-based approach for Big Data representation and dimensionality reduction

Tutorial Papers

Statistical mechanics of complex networks
TensorFlow: A system for large-scale machine learning
Statistical comparisons of classifiers over multiple datasets
Evaluating recommender systems
Appropriate similarity measures for author cocitation analysis
How to normalize co-occurrence data? ana analysis of some well-known similarity measures
An introduction to ROC analysis
The relationship between precision-recall and ROC curves
A tutorial on Spectral Clustering
Authoritative sources in a hyperlinked environment
Learning from Imbalanced Data (IEEE transactions 2009)
A unifying view on dataset shift in classification
Graph clustering survey by Satu Elisa Schaffer
Density estimation for statistics and data analysis by B.W.Silverman
A tutorial on Bayesian nonparametric models
An introduction to MCMC for machine learning
The structure and function of complex networks
A gentle introduction of the EM algorithm and its application to parameter estimation for Gaussian mixture and Hidden Markov Models
Support Vector Machines for multiple-instance learning
An empirical comparison of supervised learning algorithms
An introduction to variable and feature selection (JMLR 2003)
Loss functions for preference levels: Regression with discrete ordered labels
Extracting the multi-scale backbone of complex weighted networks
The Mythos of model interpretability

Google Sites

Report abuse