CE 764 Hydroinformatics

In God we trust, all others bring data.

–William Edwards Deming (1900-1993)

This post-graduate elective course discusses the integrated use of information and communication technology in solving science problems that involve learning from data. To put it briefly, we will learn and use statistical learning (alternatively, machine learning) techniques with applications ranging from things used in day-to-day life to dealing with complex science problems related to hydrology. Statistical learning (alternatively, machine learning) involves making best use of data to learn patterns that can give us information and hence knowledge about the system.

Over the past few terms, Masters/PhD students working in Civil Engineering, Center for Resources Engineering, Interdisciplinary Program in Climate Studies have been taking this course. This course might also benefit students working on application of statistics and machine learning in general.

The attached slides of the first Lecture presents an overview of the course.

The syllabus of the course can be broken down into a few modules as follows:

Module 1: Introduction

Topics: Introduction to hydroinformatics - data-driven modeling for water systems, Model classification, Models overview, Modeling accuracy, Introduction to machine learning and artificial intelligence, Introduction to Matlab and R Programming.

Module 2: Supervised Learning – Classification and Regression

Topics: Linear Models, Generalized linear models (GLMs) – Logistic Regression, Poisson Regression, Gamma and Exponential GLMs, k-Nearest Neighbors (kNN), Polynomial regression and Generalized additive models, Kernel-based methods, Decision trees - Classification and Regression Trees (CART) - Bagging, Boosting and Random Forests, Support Vector Machines (SVM), Artificial Neural Networks (ANN), Resampling methods - Bootstrap, Regularization and Machine Learning System Design.

Module 3: Unsupervised Learning

Clustering: i) Hard (k-means) clustering and ii) Fuzzy clustering (fuzzy c-means) with introduction to fuzzy logic, Multivariate analysis - dimension reduction, singular value decomposition (SVD) analysis, principal component analysis (PCA), canonical correlation analysis (CCA).

Module 4: Applications

Hydroinformatics for Climate Change Impact Assessment and Regional Flood Frequency Analysis; Example of a Hydrologic Information System.

References

1) James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: Springer. (Alternatively, Hastie et al., 2008, The Elements of Statistical Learning - for advanced)

2) von Storch and Zwiers, 1999, Statistical Analysis in Climate Research, Cambridge Univ. Press, U.K.

3) Myers, R. H., Montgomery, D. C., Vining, G. G., & Robinson, T. J. (2012). Generalized linear models: with applications in engineering and the sciences (Vol. 791). John Wiley & Sons.

4) Abbott, 1991, Hydroinformatics- Information Technology and the Aquatic Environment, Avebury Technical, Aldershot, U.K.

5) Nielsen, 2016, Neural Networks and Deep Learning, Web-book. http://neuralnetworksanddeeplearning.com/index.html

....and a host of other online/offline resources and research articles that will be discussed in class.

Pre-requisite - Basic knowledge of probability and statistics and optimization is essential. Basic knowledge of hydrology is helpful, but not essential. Prior experience with at least one programming language is also helpful, but not mandatory.