- Understand the concepts of principal components analysis (PCA) and factor analysis (FA), and be able to use both to facilitate exploratory data analysis;
- Understand how to use the eigenvalues of the variance-covariance matrix and scree plots to choose a fundamental underlying dimension for the multivariate data set;
- Understand how orthogonal rotations like varimax and oblique rotations like promax can be used to make factors more interpretable;
- Be able to use PCA and factor analysis to create latent variables of interest or to partially replicate results promulgated in famous examples in the literature (e.g., the work of Fama
- and French);
- Define and explain Mahalanobis distance and its connection to both multivariate two-sample student's t tests and to linear discriminant analysis;
- Be able to conduct in R, and interpret the outputs related to, Bartlett's test for sphericity;
- Be able to solve the classical (i.e., Gaussian) univariate and multivariate linear discriminant analysis problems;
- Understand common extensions of linear discriminant analysis, e.g., quadratic discriminant analysis and nonparametric discriminant analysis;
- Understand the connections between naive Bayes and discriminant analysis;
- Be able to state types of, and uses of, cluster analysis;
- Understand the most common types of hierarchic methods of cluster analysis, as well as limitations of the usefulness of cluster analysis;
- Be able to use principal components analysis (or factor analysis) to augment cluster analysis;
- Understand the challenges associated with performing cluster analysis on categorical data, and the notion of a similarity measure for categorical data;
- Understand the definition of, and basic properties related to, graphs (i.e., vertices and edge relationships between those vertices);
- Be able to perform spectral clustering on graph-type data;
- Be able to convert a data set with categorical variables into a graph and then use spectral clustering to partition the graph into subgraphs; and
- Apply clustering techniques to interesting data sets from business and the social sciences.
analytics and I am affiliated with both the Master of Science in Business Analytics (MSAN) and Master of Science in Financial Analysis (MSFA) programs at the University of San Francisco. Please call me Jeff. My office is located in room 211 of the Masonic (MA) building at the cor- ner of Masonic and Turk. My e.mail address is jhamrick@usfca.edu. My cell phone number is 617/943-4619 and my office number is 415/422-6810. If you're unable to discuss academic issues with me at the Presidio campus before or after class, let me know and we may be able to schedule an appointment (possibly over Google Hangout) at an alternate time.
you are a candidate for the Master of Science in Analytics at the University of San Francisco. Linear Regression Analysis (MSAN 601) is a prerequisite for this course.
until Tuesday, March 14, 2013. We will meet at the Presidio Campus. We will primarily use the sixth, seventh, eighth, and ninth chapters of the third edition of Multivariate Statistical Methods: A Primer by Bryan F.J. Manly (ISBN 1-58488-414-2) and chapters 2, 5, 7, 8, and 9 of the second edition of Analysis of Multivariate Social Science Data by David J. Bartholomew, Fiona Steele, Irini Moustaki, and Jane Galbraith (ISBN 978-1-58488-960-1). You are responsible for the material in all readings assigned for this course, regardless of whether or not the ma- terial from those readings is included in my in-class lectures.
computing and graphics. The R language is used by many professional statisticians and is making deep inroads in industry as well. R is equipped with a wide variety of statistical and graphical techniques. It supports linear and nonlinear modeling, classical statistical tests, time series anal- ysis, classication analysis, clustering, and much more. It will be extensively used in the MSAN program. A set of screencast tutorials related to R will be available on my YouTube channel and Blackboard.
Consequently, you may only miss class under the most dire of circumstances. These circumstances should be both unusual and documentable. For example, having a bad cold is documentable but not unusual. On the other hand, being kidnapped by aliens is unusual but is most likely not doc- umentable. A single absence, for any reason, is acceptable and will not be penalized. Each absence in excess of one absence will cause your nal letter grade in this class to be lowered by one level (e.g., an A- will become a B+.)
that laptop before the course begins. You will be expected to use R on quizzes and on the final examination, and sometimes we will use R in class. I would ask you to be respectful of your class- mates and to refrain from surng the web, checking out Facebook, tweeting people your various tweets, etc. during the middle of my lectures.
including tasks for you to perform in R) that I will make up myself or assign from the Manly or Bartholomew textbooks. You must turn in your own write-up of each assignment, though you may work with colleagues on the homework problems up until 48 hours before the homework assignment is due. To reiterate: during the 48 hours prior to the start of the class during which the homework assignment is due, you may not confer, work with, or write up your homework assignments with anybody from the class. You also may not confer with any third party, in fact. I will grade a random subset of the problems you turn in each week, and sometimes I might grade all of them. To facilitate efficient grading, your weekly homework should have the following properties: - Each problem should be started on a separate piece of paper.
- Different parts of the same problem do not need to be started on separate pieces of paper.
- Turn in the problems in the same order in which they were assigned.
- Staple your homework assignment in the upper left-hand corner.
- In general, do not print out entire data sets.
- In general, do not print out reams and reams of R outputs. Everything should be orderly and easy for me to read.
work problems in great detail. Instead, feel free to come to my oce hours or to schedule an individual appointment with me to discuss the homework. I will not accept late homework assignments under any circumstances.
day period to schedule an appointment with Kirsten Keihl, the assistant for the M.S. in Analytics program. You will have as much time as you like to take each quiz. While you can use your laptop during the quiz, you are on your honor not to consult with any other resources on the Internet. You also may not use your textbook, class notes, cheat sheets, etc. The weekly quiz will focus on material that we have recently discussed in class -- generally, the topics from the past few lectures. The weekly quizzes will be centered on denitions, concepts, and simple computations, as well as interpretation of statistical output. At the end of the course, I will drop your lowest quiz grade.
in this course on May 14, 2013 during the regular course time, with some possibility for extra time (say, 10:00 a.m. - 2:00 p.m.). The final examination will focus on concepts, i.e., you will not be expected to engage in tons of routine calculations, but you will be expected to know certain formulas and relationships and you will be expected to interpret the outputs of various multivariate statistical analyses. In addition, you will be expected to use R to assist you with various statistical analyses.
ment of their own learning. If students put as much effort into actually learning material as they did worrying about their grades, their performance would be much better. Nevertheless, part of my job is to assign grades fairly and in a manner that reflects the high academic standards at the University of San Francisco and in the MSAN program. In this class, we will use the standard ten-point scale. "Plus" or "minus" grades will be assigned to students with grades close to the extremes of each ten-point bracket (plus or minus three points from the boundary of each bracket). Your grade in this course will be computed according to the following weights:
Homework Sets 25% Quizzes 35% Final Examination 40%
of the whole person -- the University of San Francisco has an obligation to embody and foster the values of honesty and integrity. The university upholds standards of honesty and integrity from all members of the academic community, including faculty, students, and sta. All students are ex- pected to know and to adhere to the university's honor code. You can find the full text of the code online at http://www.usfca.edu/catalog/policies/honor/. Specically, while you are required to work in groups with students on the homework assignments, you should not allow your name to be placed on a group write-up if it does not reflect your own understanding of the material and if you have not made an honest, equitable contribution to the group effort. Copying answers from other students or sources during a quiz or examination is a violation of the university's honor code and will be treated as such. You are also, of course, bound to the terms of the MSAN Code of Conduct that you signed prior to matriculating in the analytics program. All incidents of cheating or other academic misconduct will be reported to the director of the MSAN program.
you may have a disability, please contact USF Student Disability Services (SDS) at 415/422-2613 within the first week of class, or immediately upon onset of the disability, to speak with a disability specialist. If you are determined eligible for reasonable accommodations, please meet with your disability specialist so they can arrange to have your accommodation letter sent to me, and we will discuss your needs for this course. For more information, please visit http://www.usfca.edu/sds/ or call 415/422-2613. |