Links

PhD Journey:

  • PhD thesis research: where do I start? by Don Davis (2001) [PDF]

  • A guide to writing the dissertation literature review [PDF]

  • Basics of research paper writing and publishing [PDF]

  • Ten simple rules for reproducible computational research [PDF]

  • Crafting a research proposal [PDF]

  • How to write a good (no, great) PhD dissertation [PDF]

  • How to write a good PhD thesis and survive the viva by Stefan Ruger

  • Again, the Role of Conference Papers in Computer Science and Informatics [PDF]

  • Relative status of journal and conference publications in computer science [PDF]


Math Preliminaries:


Miscellaneous:


Few key papers in Data Mining and Machine Learning:

  • A few useful things to know about machine learning

  • An introduction to ROC analysis

  • The relationship between precision-recall and ROC curves

  • Data mining of social networks represented as graphs

  • Unsupervised learning by probabilistic latent semantic analysis

  • Probabilistic latent semantic analysis

  • Latent Dirichlet Allocation

  • Explaining the Gibbs Sampler

  • Probabilistic modeling and Bayesian analysis 15.097 by Ben Letham and Cynthia Rudin (MIT)

  • Clustering by fast search and find of density peaks

  • Multidimensional scaling

  • Clustering validation by prediction strength

  • Discretization: an enabling technique

  • Feature selection for high-dimensional data: a fast correlation-based filter solution

  • Review of Probability theory: Stanford university

  • The multivariate gaussian distribution by Chuong Do (2008)

  • A tutorial on principal component analysis: derivation, discussion and SVD by Jon Shlens

  • Supervised machine learning: a review of classification techniques

  • Support vector machine learning for interdependent and structured output spaces

  • Linear algebra review (CSC2515 - Machine learning - Fall 2003)

  • Statistical pattern recognition - a review

  • A tutorial on principal component analysis by Lindsay Smith

  • Bagging predictors

  • Understanding convergence concepts: a visual-minded and graphical simulation-based approach

  • An introduction to exponential random graph models for social networks

  • The challenges of clustering high dimensional data

  • On clustering validation techniques

  • Discrete state Markov processes - Ch 5 in Fundamentals of Applied Probability Theory by Alvin Drake


Knowledge Bases (DBpedia, YAGO, Wikidata, etc.)

  • Knowledge-Based trust: estimating the trustworthiness of web sources

  • What is knowledge representation? by Randall Davis (AAAI 1993)

  • The epistemology of intelligent semantic web systems

  • Schema.org: Evolution of structured data on the web

  • How far are we from collecting the knowledge in the world

  • Quality assessment for Linked Open Data: a survey

  • SemMedDB: a PubMed-scale repository of biomedical semantic predications

  • YAGO: a core of semantic knowledge unifying Wordnet and Wikipedia

  • Knowledge Vault: a web-scale approach to probabilistic knowledge fusion

  • Recovering semantics of tables on the web

  • Knowledge curation and knowledge fusion: challenges, models and applications

  • From data fusion to knowledge fusion

  • Freebase: a collaboratively created graph database for structuring human knowledge

  • Wordnet: a lexical database for English

  • Introduction to Linked Data and its Lifecycle on the Web

  • A comparative survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

  • Introducing Wikidata to the Linked Data Web (Technical: data model)

  • DBpedia: a large-scale, multilingual knowledge base extracted from Wikipedia

  • Wikidata: A free collaborative knowledge base

  • How the semantic web is being used: an analysis of FOAF documents


Tensors

  • Tensor decompositions and applications

  • Reducing the rank of relational factorization models by including observable patterns

  • A latent factor model for highly multi-relational data

  • Applications of tensor (multiway array) factorizations and decompositions in data mining

  • Modeling relational data using Bayesian Clustered Tensor Factorization (BCTF)

  • Future directions in tensor-based computation and modeling

  • An introduction to tensor products with applications to multiway data analysis

  • Temporal analysis of semantic graphs using ASALSAN

  • Unsupervised multiway data analysis: a literature survey

  • A tensor-based approach for Big Data representation and dimensionality reduction


Tutorial Papers

  • Statistical mechanics of complex networks

  • TensorFlow: A system for large-scale machine learning

  • Statistical comparisons of classifiers over multiple datasets

  • Evaluating recommender systems

  • Appropriate similarity measures for author cocitation analysis

  • How to normalize co-occurrence data? ana analysis of some well-known similarity measures

  • An introduction to ROC analysis

  • The relationship between precision-recall and ROC curves

  • A tutorial on Spectral Clustering

  • Authoritative sources in a hyperlinked environment

  • Learning from Imbalanced Data (IEEE transactions 2009)

  • A unifying view on dataset shift in classification

  • Graph clustering survey by Satu Elisa Schaffer

  • Density estimation for statistics and data analysis by B.W.Silverman

  • A tutorial on Bayesian nonparametric models

  • An introduction to MCMC for machine learning

  • The structure and function of complex networks

  • A gentle introduction of the EM algorithm and its application to parameter estimation for Gaussian mixture and Hidden Markov Models

  • Support Vector Machines for multiple-instance learning

  • An empirical comparison of supervised learning algorithms

  • An introduction to variable and feature selection (JMLR 2003)

  • Loss functions for preference levels: Regression with discrete ordered labels

  • Extracting the multi-scale backbone of complex weighted networks

  • The Mythos of model interpretability