Aug 21 – Oct 12, TR 10:00am – 11:45am.
Instructor: Dr. Matthew Dixon (Adjunct Professor)
Location: USF Presidio Building
Office Hours: TBA
Many challenging analytics problems in industry can’t be addressed through statistical techniques alone. This course covers the application of tools and techniques in machine learning and data mining to discover and visualize patterns in complex data, predict results and improve information retrieval.
(CT2) Machine Learning in Action by Peter Harrington, Manning, First Edition, 2012.
(CT3) Mining the Social Web by Matthew Russell, O'Reilly, 2011.
Available for viewing from the USF online library archive (Ignacio). Call: QA76.9.D343
(CT4) MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf.
Available for viewing from the USF online library archive (Ignacio). Call: HF5548.38.O64
On completion of this course the student should be able to accomplish the following:
· Data mining:
o Collect and clean structured and unstructured data from a variety of sources including databases, web services, and text-based data formats
o Understand key techniques for mining association rules from frequent item sets
o Understand and apply the Apriori algorithm to real datasets
o Design and implement the workflow required to solve key analytics problems such as recommendation engines, trading algorithms, fault detection, event prediction etc
o Understand the characteristics and limitations of several different classification and clustering techniques and select the one most appropriate for a given task and the data
o Understand and apply key algorithms such as Google’s PageRank, k-Means, SVM, kNN, Naive Bayes, and CART
o Create and manage both SQL and noSQL databases
o Insert data and perform moderately complex queries into databases
This course will consist of two morning lectures per week. The lecture schedule for this course shall be made available as a separate document listed on the course webpage and is subject to change. You are strongly encouraged to review the schedule and read the suggested background reading material ahead of class.
Each lecture will cover material from the course textbooks and demonstrate computational tools and example problems necessary to complete the graded assignments, quizzes, group projects and final exam. Any lecture material that is used during the class will be made available afterwards. This material is designed to supplement your own notes made during the class and your reading and should not be used in lieu of attending classes.
In the event that you are unable to attend a class, it is important that you contact me ahead of the class to make the necessary arrangements to catch up on the class material.
It is important that you bring your personal laptop to the lectures so that you can best prepare yourself for the assignments. Please mute your cellular phones and laptops before the start of each lecture to avoid distracting others.
Weekly individual lab assignments will be issued during the first lecture of each week and must be turned in at the start of the first lecture of the following week. Late assignments will only be accepted in the event of verifiable extreme circumstances such as a medical emergency.
To avoid getting stuck, it is important that you think through each assignment carefully, seek the advice of your colleagues and contact me well in advance of the due date. You are responsible for submitting your own independent work. Problem solving, organization and communication skills are paramount to your success. Guidelines on how to submit your assignments will be provided.
The occasional short quiz will be held in class and you will be given preparation instructions one week in advance of the quiz date. Failure to show up in class to take the quiz will automatically result in no marks being awarded for the quiz, except in the event of verifiable extreme circumstances.
In addition to the weekly lab assignments, you will be assigned a group project. You will be given the opportunity to choose one problem from a list of analytics problems and form a small team of your colleagues to apply computational analytics techniques to solve the problem and produce a summary research report.
Projects will be assigned during the second or third week of the course and must be completed by the first lecture of week 7. Graded projects will be returned by week 8. All data will be provided but each team will be responsible for writing any analytics code in R or python.
You are strongly advised to closely co-ordinate with your colleagues throughout the project and distribute the work load fairly and commensurate with each team member’s interests, experience and background. Budget for twice the amount of time that you think a task will reasonably take to complete and be pre-emptive about any potential difficulties in synchronizing your work.
Because project grades are assigned to the team and not on an individual basis, it is vital that each team member is accountable for their portion of the project and does not let down their team.
The final exam will be closed book and take place during class time in week 8. Instructions and guidelines for preparation will be provided. Make-up exams will only be permitted in the event of verifiable extreme circumstances.
Individual lab assignments and quizzes 40%
Group project 40%
Overall grades will be assigned as follows:
100 - 90 A
89 - 80 B
79 - 70 C
69 - below Failing
You must attain an overall score of 70% or above to pass.
You must abide by the copyright laws of the United States and academic honesty policies of USF. If told you may for a particular project, use any code from the net that you find as long as it does not violate the software's license. You may not borrow code from other current or previous students. All suspicious activity will be investigated and, if warranted, passed to the Dean of Sciences for action.
Official text from USF
As a Jesuit institution committed to cura personalis - the care and education of the whole person - USF has an obligation to embody and foster the values of honesty and integrity. USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University's Honor Code. You can find the full text of the code online.
The golden rule: You must never represent another person's work as your own.