Return to the main site:
SYSC 531: Data Mining with Information Theory
Instructors: Joe Fusion
Summary
DMIT is a project-based course that offers you an opportunity to use information theoretic methods to analyze data. These models are implemented in a software package named OCCAM, developed at PSU, that will be the main analytical tool used in the course. The theory underlying these methods is taught in SySc 551/651Discrete Multivariate Modeling (DMM), but this course (DMIT) is stand-alone and does not have DMM as a prerequisite. Only the theory needed to understand the inputs and outputs of OCCAM will be presented, but OCCAM will be treated as a black box, so the algorithms that it implements will not be discussed. The point is to make it possible for you to do exploratory modeling on data of interest to you without having to master the underlying theory first. If you want to understand this theory, you can take DMM later, but this is not required.
Prerequisites
It is recommended for all students to have basic probability and statistics or machine learning (e.g., Math 105, Stat 243, or equivalent) and access to data that they know something about and want to analyze. (The instructor will provided possible data sources for students who do not have their own, but bringing your own data is preferable.)
Assignments & Grading
Typically, the term research paper is worth approximately 75% of the course's grade.