This course is an introduction to mainstream programming and high performance computing techniques and tools in data science.
List of Topics
The fundamental Computer System (Computer Architecture and Operating Systems)
Program Profiling to find bottlenecks
Data Structures and Algorithms
Matrix and Vector computation
Concurrency & Multiprocessing modules
High Performance Computing using Clusters & GPUs
Introduction to big data files, text data, and web scraping
Data manipulation and management via Bash, Pandas, Relational Data Management Systems
Introduction to big data systems: Hadoop, Spark
Introduction to convex optimization
Introduction to (interactive) visualization
Textbook
· High Performance Python: Practical Performant Programming for Humans, by Micha Gorelick and Ian Ozsvald
· Convex Optimization, by Stephen Boyd and Lieven Vandenberghe
· Python for Data Analysis: Data Wrangling with Pandas, Numpy, and IPython, by Wes McKinney
· Lecture notes
Recommended books:
- Effective Python: 59 Specific Ways to Write Better Python
- Introduction to High Performance Computing for Scientists and Engineers, Georg Hager and Gerhard Wellein
- Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Deep Learning with PyTorch, by Eli Stevens and Luca Antiga