Data Science for FinTech Application

Some of the real-world problems can be solved by data science (as known as data-driven science), and a well-educated data scientist should be able to determine what problem can be solved by data science, and further develop corresponding data model or program code to solve it. The class helps to enhance your problem solving and analytical skills from the perspective of data science which includes data working process, analysis algorithms, and data visualization. As a class in commerce college, we will focus on FinTech related data and application, and Python programming language and certain packages (e.g., SciPy, Pandas, scikit-learn) are used.

This course starts from the basics of data science and leaves armed with practical experience extracting value from data. You will learn the theories and skills required for data analytics and proficiency with a complex ecosystem of tools and platforms. Students will need to demonstrate the data or the result of the analysis in a more visualized way to identify financial insights. Data science is an interdisciplinary field about scientific processes and extracting insight from data. Lectures cover certain important topics, such data collection, data in process, data cleaning, exploratory data analysis, analysis algorithms, security, and visualization, to make sure you could solve a real-world data problem.

Class Info

Announcements (Spring 2017)

  • 2/1: Hiring TA who is proficient in Python. TA needs to support learning activities in class and online forum, preparing the classroom, helping during exams, grading homework and project, and tutoring students. Please contact me by email directly or contact the department course TA.
  • 6/14: Homework hand-in status can be found here.

Course Objectives & Learning Outcomes

  • Data science 101: what is data science, what is big data.
  • FinTech 101 and FinTech data
  • Python for data analysis: array and vectorized computation.
  • Data handling: data loading, storage and format.
  • Data wrangling: clean, transform, merge, reshape.
  • Exploratory data analysis
  • Analysis algorithms: statistics, classification, clustering, detection, time series, etc.
  • Data Aggregation and Group Operations
  • Data visualization, interactive graph visualization
  • Case study: FinTech

Schedule (Spring 2017)

  1. 2/22: Data Science [00-Syllabus] [01-DataScience] [EDA-NewYorkTimes.ipynb]
  2. 3/1: Python: an interactive execution platform. [02-Python] [PythonLanguageReference.ipynb] [PythonBuiltinTypes.ipynb]
  3. 3/8: Advanced Python [PythonBuiltinFunctions.ipynb] [PythonFileIO&EE.ipynb]
  4. 3/15: Data Handling and Wrangling [03-AdvPython] [PythonDSPackages.ipynb] [Numpy.ipynb] [pandas.ipynb]
  5. 3/22: HTTP and Web Crawler [04-DataCollection] [HTTP.ipynb] [HW01]
  6. 3/29: Lab: Data Collection [Crawler-chinatimes.ipynb] [Crawler-t51sb01.ipynb]
  7. 4/5: Missing Data [EDA-Box.ipynb] [ipyparallel.ipynb] [HW02]
  8. 4/12: FinTech and Blockchain 101 [05-FinTech101] [Bitcoin_APIs.ipynb]
  9. 4/19 (M): Midterm [HW03]
  10. 4/26: Analysis algorithm [06-DM0] [Alg-Regression.ipynb] [Alg-Regression-LendingClub.ipynb] Ref: [Quick Tour of Machine Learning by Prof. Hsuan-Tien Lin]
  11. 5/3: Analysis algorithm [UPGMA] [k-means] Ref: [UPGMA Worked Example by Dr. Richard Edwards], or a pptx backup.
  12. 5/10: Analysis algorithm [07-DM-PCA] [Alg-PCA.ipynb] [Alg-LDA.ipynb] [Alg-knn.ipynb] [HW04]
  13. 5/17: Analysis algorithm [Alg-SVM.ipynb] [08-DM-DT] [Alg-DecisionTrees.ipynb]
  14. 5/24: Analysis algorithm [09-DM-LR] [Alg-LogisticRegression.ipynb] & Time Series [WannaCry.ipynb] [usagov&movie.ipynb] [timeseries.ipynb]
  15. 5/31: Analysis algorithm [10-DM-AR] & Data Visualization [11-Visualization] [pyplot.ipynb]
  16. 6/7: Data Visualization Tool: Ref: Orange (Interactive data analysis workflows) and Ref: YouTube: Orange Data Mining, or KNIME/RapidMiner.
    • My class note for Orange.
  17. 6/14: Term Project Presentation
  18. 6/21 (F): Final

Grading Policy

  • Homework (30%): programming exercises and essays
  • Participation (10%): attendance and discussion
  • Term Project (20%): implement a data analysis application
  • Midterm and Final (40%)