General Information
 Course Number: CE5020 @ NCU, Taiwan
 Location: E6A205
 Semester: Spring 2016, Friday 9:00am11:50am
 Target students: Senior undergraduates, graduates
InstructorTeaching Assistant
 林圓皓(luff543@gmail.com)、張國斌 (cheongkuokpan@gmail.com )
 Office: E6A307 ext. 35327 or R2206 ext. 57868
Evaluation and Grading
 Three Assignments: 20%
 Oral presentation: 15%
 Quiz: 10%
 Midterm Exam: 25%
 Project: 30%
Textbook Massive Data Mining, Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2012
 Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann, 2006.
 Introduction to Data Mining, PangNing Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.
 Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten & E. Frank, Morgan Kaufmann Publishers.
Course Topics
 Preliminary (1 week)
 Predictive data mining (2 weeks)
 Association rule mining (2 weeks)
 Cluster analysis (2 weeks)
 Recommendation (2 weeks)
 Finding Similar Items (2 weeks)

Recent Announcements

Assignment #4
1. Use the given gps data from flickr to finish a clustering task based. Try different clustering algorithms.2. Show your clustering result on "3D Map" which is a function in excel 2016(name "power view" for 2013).3. Compute correlation of your clustering result for evaluation comparison.data: http://lms.ncu.edu.tw/files/manage/4/22.200023.102788/flickr.data
3D Map tutorial: http://lms.ncu.edu.tw/files/manage/4/22.200023.102788/3d_map_excel_2016.mp4
power view tutorial: http://lms.ncu.edu.tw/files/manage/4/22.200023.102788/power_view_excel_2013.mp4Due date: 2016/4/15 23:59 (1st round) 2016/4/22 23:55 (2nd round)Submission ...
Posted Apr 8, 2016, 12:25 AM by 林圓皓

Reference Answer for Assignment #2
Posted Apr 7, 2016, 5:57 PM by ChiaHui Chang

Assignment #3
1. Video lectures: More Data Mining with Weka: 3.3~3.4Exercise: Use data.txt attached below to find top 10 association rules with highest support and minimum confidence greater than 0.8. 2. Use the given grocery shopping dataset D1101.zip released by ACM RecSys (http://recsyswiki.com/wiki/Grocery_shopping_datasets) to find interesting patterns. The dataset collected users` transaction data of 4 months, from November 2000 to February 2001. The total count of transactions in this dataset is 817741, which belong to 32266 users and 23812 products. The file D.txt records users` transaction history. Each line in the file corresponds to a transaction in the following format: Transaction date; customerID; Age group; Residence Area ...
Posted Mar 25, 2016, 6:39 PM by ChiaHui Chang

Assignment #2
Problem 1: for the given dataset http://archive.ics.uci.edu/ml/datasets/Adult Produce a learning curve (accuracy vs. training size) for the given data set. Plot ROC curve for your model.Produce a plot (accuracy on training and testing vs. model size) to observe overfitting and underfitting scenarios by tuning the parameters of your learning algorithms. Problem 2: Performance evaluationFor klabels (k>2) classification problem, propose some metric for performance evaluation.For 5grade outcome, propose a measure for performance evaluation.Due date: 2016/3/18 23:59 (1st round) 2016/3/25 23:59 (2nd round)Submission: upload to LMS
Posted Mar 12, 2016, 5:31 PM by ChiaHui Chang

Assignment #1
1. Construct a pivot table for the given precipitation data (in file 2015_rainfall.xlsx) to (a) generate a summary of the data (b) generate a report of each location's average rainfall for each month, i.e. location ( "BANQIAO, 板橋","TAMSUI,淡水") vs. week2. Prepare the arff file so that Weka can import for the given weather data (2015_weather.csv).References: Getting started with Weka 1.1~1.5Due date: 2016/3/3 23:59 (1st round) 2016/3/10 23:59 (2nd round)
Posted Mar 10, 2016, 5:04 PM by ChiaHui Chang
