Curriculum Development

Introductory Multivariate Analysis

Multivariate Analysis

Time Series Analysis

Statistics and Visualisation

Contemporary Issues in Business Analytics

Statistical Modelling in R

Survey Methodology

Conclusion

Introductory Multivariate Analysis

In 2023 the department introduced Introductory Multivariate Analysis (STA739), a renaming of STA734 to emphasise the continuation of the students' multivariate analysis journey in the second semester with STA701 (Multivariate Analysis). Most statistical analyses form part of multivariate analysis and STA739, together with STA701, form our students' multivariate analysis journey. Given my extensive experience with teaching and supervising in the Data Science Masters programme at this time, I was in a position to review all topics covered by our other Honours courses and identify topics that are missing, but relevant. Relevant here meaning to the knowledge set the students should have going from Honours into the Masters programme, but also techniques currently applied by career analysts. Thus, considering my students' learning needs and making sure they feel prepared when moving from my Honours programme to the next stage of their analytics journey. This content evaluation led to the following topics being covered in STA739 in the first semester:

The rest of the topics are covered in the second semester Multivariate Analysis course (next section). These topics were selected for the first semester course since my research of relevant literature suggested that a good foundation on these concepts are necessary before moving on to the other topics.

Traditional multivariate analysis textbooks are focused on continuous data. In the real world, Statisticians and Data Scientists work with mixed data types (numerical and categorical). Also, survival analysis is a separate and extensive set of tools for the analysis of time-to-event data and as such will not be included by typical multivariate analysis textbooks. As such, after identifying these topics, I authored course readers for each topic from the following main sources:

Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis (6th ed.). Upper Saddle River, New Jersey: Pearson Prentice Hall
Rencher, A. C., & Christensen, W. F. (2012). Methods of Multivariate Analysis (3rd ed.). Hoboken, New Jersey: John Wiley & Sons
Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Hoboken, New Jersey: John Wiley & Sons, Inc.
James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R.

I also reviewed hundreds of peer-reviewed articles to identify top quality relevant foundational content. Those that I selected, along with the main sources, can be viewed in the bibliographies of the course readers:

Looking at the contents of the course readers over the years (2023 - 2025), it is noticed that I've been reviewing and updating the content annually. The reason for doing this is two-fold. Firstly, I do this to ensure that the topics remain relevant. Updates in this regard are informed by research and supervision that I do. For example, point-biserial correlation and Cramer's V were added to the 2025 categorical data analysis course reader (↪ view) following the industry-directed project I supervised in 2024, where effect size measurement was required in addition to knowing whether significant associations exist. Also, I make these updates to ensure that the content is as clear and easily digestible as possible for the students.

The structure of these course readers is an introduction to a concept, followed by mathematical formulation and/or derivation, and then practical application, first by hand to cement the understanding of the technique, and then in RStudio (↪ view). Part of the reason for compiling my own course readers, is to add software application of the techniques throughout. Traditional textbooks do not always follow this structure that I am after. Either they focus on theoretical and mathematical discussion with hand calculations, or they provide summary high-level discussions of techniques followed by software application. With the structure of my course readers my students get to know a new topic before going into the mathematics behind the technique. Then they see how the mathematics are applied, by hand. And, finally, how to apply it in a software package. The course readers also include links to the helpfiles of the functions used to apply the analyses. These are the words highlighted in blue (↪ view). I follow this approach in all course readers that I author. A student commented in 2024: "I enjoyed reading all the course materials because they were very self explanatory".

Statistics graduates are now expected to develop a relevant technology stack. As such my curriculum development needs to take this into account to ensure I deliver analytics graduates with the desired attributes. In Statistics and Data Science, this would be SAS, Python, and RStudio. Our postgraduate students receive training in Python and SAS in other modules. I train them in using RStudio. This training goes beyond coding in base R. Students are shown how to use scripts with comments to contain all analyses relevant to a particular task, in one place (↪ view). Students now also use R Markdown to compile their assignments (↪ view). As explained by by Wickham et al. (2016), R Markdown "provides an unified authoring framework, combining your code, its results, and your prose commentary". The analyst then has the option to knit the Markdown file to a wide selection of output formats, such as Word, pdf, Powerpoint, etc. R Markdown also makes it possible to switch between engines (R, Python, SAS) in the same file, which is something I also make them aware of. RStudio can thus be used as a general analytics workbench (↪ Python, ↪ SAS). My curriculum development for this module and the follow-on Multivariate Analysis module (next section) takes this into account.

Further curriculum development includes the integration of GitHub Copilot for RStudio in 2026. This technology has the potential to accelerate code development and provide students with a practical understanding of how AI tools are being utilised in programming environments, another essential skill for their growing technology stack. This would require an update to how assignments are designed to assess their practical skills development (↪ view).

Of course, in addition to the course readers I also prepare slide decks for lecturing (↪ view) and R scripts for demonstration. Since, at this stage, the students aren't necessarily familiar with Python, I only focus on R. However, in Multivariate Analysis (next section) I have gradually added Python application as well based on my own skills development using this package.

Multivariate Analysis

This second semester Multivariate Analysis module follows on from the Introductory Multivariate Analysis (previous section). The following topics are covered in this course from the topics I identified:

Overview of matrix algebra required for multivariate analysis
Principal component analysis (PCA)
Principal component regression (PCR)
Discriminant analysis (descriptive/exploratory)
Classification analysis (linear and quadratic discriminant analysis)
Hierarchical cluster analysis
K-means cluster analysis

In addition to designing the course content, it is my responsibility to demonstrate the application of these techniques in RStudio. When I am given this task, my focus is on turning my students into analytical problem solvers. I make sure that they understand the process they need to follow from raw data to results through software application. With data-driven decision making at the fore, it is also important that my students understand how to turn the results into insights relevant to the problem being solved. In order to achieve the desired outcomes, I thus also authored the following practical course readers for this purpose:

Lists of the sources consulted in the compilation of these course readers are included in the texts. The matrix algebra course reader also contains Python code in addition to the R code. This is currently being done for the other remaining course readers.

Here too I incorporate ideas from research and supervision. For example, the PCA and PCR manual includes a section on what students need to do if they have mixed type data, given that these techniques are traditionally applicable to numerical data. This is not simply given as self study, I make a point of thinking through this with them during the lecture (↪ view) since this is likely an issue they will be confronted with in practice.

With cluster analysis, traditional textbooks stop at demonstrating how the technique is applied. When my students enter industry, they need to do more than that. Cluster analysis solutions are based on a series of subjective decisions. My students understand this and know that they will typically end up with multiple possible solutions from which the "best" solution must be selected. And then they will need to turn the "best" solution into actionable insights for their client. So, in addition to demonstrating the cluster analysis techniques, I also include cluster validation and interpretation (↪ view).

During practical demonstrations the students are given opportunity to practice the R application (see Hands-on Practice for an example) in class and receive short assignments to practice on their own afterwards (example). Towards the end of the semester the students receive a practical project bringing all the topics together to test their understanding of where each technique fits in and what insights can be deduced by applying them.

When my students pass Multivariate Analysis, I know that I have given them every possible resource they need to be able to apply what I've taught them in a practical and insightful way.

Time Series Analysis

After the first cohort of MSc Statistics (with specialisation in Data Science) students graduated, the Data Science lecturing team reflected on the MSc curriculum that was taught and industry projects that were completed. It was decided to revive the Honours level Time Series Analysis (COF711) module given the wide application of this methodology to problems in the financial and retail industries. The task was given to me to redevelop the time series analysis curriculum, given that the module had been inactive for a while. After some consultation with peers that had or were presently lecturing the topic, I identified an appropriate textbook for the course. It came highly recommended as it was free, interleaved software application in RStudio with theory and covered a range of topics appropriate for a first level postgraduate course on the topic.

In a nutshell, time series forecasting using exponential smoothing methods and ARIMA models formed the basis of this module. I covered Chapters 1 - 4 and 7 with the students and, in addition to the R application included in the textbook, I also incorporated equivalent SAS application. For an example of a lecture, ↪ view.

Statistics and Visualisation

In 2017, developed for the EMS Faculty's Diploma in Software and Media with specialisation in Data Analytics and Business Intelligence, the department started offering BIA713, Statistics and Visualization. At this time I was asked to develop the curriculum for this course. The student cohort typically consisted of a mixed set of professionals and students with varying levels of Statistical knowledge and computer skills that needed to be taken into account. Also, the objective was for the taught content to be applicable for them in the environments they work in. I curated the topics for this first course in data analysis keeping these points in mind. I decided on the following topics (↪ view):

Introduction to data
Probability (self-study - perhaps too advanced)
Distributions of random variables
Foundations for inference
Inference for numerical data
Inference for categorical data
Introduction to linear regression
Logistic regression

I developed all course material for the parts that I lectured (everything up to logistic regression), which included interleaving computer application with theory content.

An crucial aspect of the course was that the students must learn how to gain insights from the data analysis we teach them. For an example of my lecture notes, ↪ view. In addition to this, it was also important that they learn to apply the data analysis in software packages that would be readily available to them where they work. As such, I decided to focus on Excel, as most businesses at a minimum have access to Excel. Then I also decided to show them how to do data analysis in R, since it is open source software that can be downloaded and installed for free. Due to the dual software application and wanting to help them upskill as seamlessly as possible, I compiled a manual with clear instructions on how to carry out the various statistical techniques in Excel and/or R (↪ view).

I thoroughly enjoyed presenting this course, even though it could be challenging navigating different generations with varying levels of analytics ability in the same class. The appreciation though for the effort made, was priceless. Below a photo of me presenting one of my lectures in Lab J of the CAMS building in 2018.

Contemporary Issues in Business Analytics

In 2023 the lecturing and coordination of the Contemporary Issues in Business Analytics (STA842) module became my full responsibility. Having been part of this course since 2018 and being actively involved in the curriculum development of our Honours programme, I was already aware of places where I should add some content, i.e., expand on the current curricula.

One topic was survival analysis. All student cohorts from UWC's Honours Statistical Science programme that went into the Masters programme from 2018 - 2023 had no exposure to even an introductory course on survival analysis. Survival modelling is widely applied in customer relationship management, particularly in the acquisition, retention and churn phases. This is because these models predict the instantaneous possibility of an event occurring, which, in this context, would refer to a customer leaving a business and not returning. In addition to these predictions being useful to marketing departments to flag customers at risk of leaving, these models also provide insight on the drivers of customer attrition. Another use case would be to estimate, before acquiring a new customer, how long that customer would remain on book. If the predicted duration does not at least reach a breakeven point so that the business can recover any expense made to acquire the prospect, then they could decide to allocate those resources to other customers.

I thus expanded the material usually covered on survival analysis to include a thorough unpacking of the technique. I also used the models to solve relevant contextual problems with the students to demonstrate to them how the model output should be turned into actionable insight. I compiled two short course readers for the students. The first gives a thorough introduction to survival analysis as well as the descriptive analysis of such data. It also introduces the Cox proportional hazards model with application in R and SAS. The second is on the accelerated failure time model, as an alternative to the Cox model, also with application in R and SAS (↪ view). These course readers were turned into seminar presentations (↪ view).

Statistical Modelling in R

I was part of the Statistical Modelling (STA737) lecturing team from 2017 to 2018 and again from 2021 to 2023. A reflection on the performance of the 2017 cohort revealed that their R skills were lacking. In 2018 I then became the practical lecturer with the task of developing this necessary skill set.

For this purpose I had to develop my own curricula. The textbook used for the course includes practical lab demonstrations at the end of each chapter. The authors also provide excellent course material. However, to address the skills shortage of our students, I had to develop course material that spoke directly to that. The students needed to be guided on how to install and start using the software, from base R in the initial years to RStudio in later years (2018, 2021, 2023). And then I would add additional content in my demo slides to fill in any gaps in the labs of the textbooks (2018, 2021, 2023).

In 2023 the practical component was redesigned to take the students through all the stages of a statistical modelling project. At this stage the students came into Honours with better developed R skills and so the focus could shift from more basic application to also include additional analyses. For example, this textbook never included any data preparation or data exploration. At this time, each week I would add something extra to bring the students closer to a complete statistical learning application in R. In Chapter 2 I introduced them to pipe operators, data cleaning and data visualisation using ggplot2. This was continued in Chapter 3 with drawing their attention to variable classes and continued use of ggplot2. Bivariate analysis using correlations and associations were introduced in the Chapter 4 lab.

Finally, I used the following roadmap along the way for the students to navigate through the stages of a modelling project:

Robot colours were used to update their progress through the modelling workflow:

Red: not yet started
Orange: in progress
Green: completed

This change to the practical part had a definite impact on the students' understanding of how to approach and solve a statistical modelling project. See student feedback for evidence of this.

Survey Methodology

My research expertise are in survey sampling theory and application. My Masters and my PhD were in this area, I have published a few times in this field and I have even delivered conference presentations, workshops and seminars on such topics.

After joining the department in 2012, I was asked to present "Sampling in Practice" over two sessions to postgraduate students in the Honours in Statistics and Honours in Population Studies programmes. For these lectures I had to curate the relevant content and then build the course material to share with the students (2012, 2013, 2014, 2016).

It was my task to ensure that the students understood the importance of sampling in practice, the difference between good and bad sampling practice, the different sampling methods, how to determine the size of a sample and how to correctly conduct statistical analyses using survey samples. Of course, it was important for the students to know how to apply sampling using software. At the time, the students were shown how to carry out the different techniques in Excel and SAS.

Conclusion

This page summarised all curriculum development and renewal that I have taken on. My ability to identify relevant topics and curate course material is evident from the evidence presented here. Substantial positive feedback from students attest to my ability to successfully lead curriculum development. I demonstrated how I consider graduate attributes in my curation process to ensure that my graduates are industry ready practical problem solvers once they've graduated from my courses. In this field, curriculum development has to consider technology application. I've demonstrated how I ensure that my Statistics graduates continuously build on the technology stack that they are now required to have when they move on to the next stage of their analytics careers. More discussion on this can be found here.

Report abuse