Teaching and Learning Statistics Online

Many of us will be teaching statistics online in the next academic year due to Covid-19. While this poses challenges for statistics teaching and assessment it is also a chance to reflect and improve our current offering.

In response to this, the Teaching Statistics SIG held a half-day virtual workshop on Teaching and Learning Statistics Online , in conjunction with TALMO (http://talmo.uk/index.html) on Wednesday 22nd July, 2020. The programme consisted of six talks of 20 minutes (15 minute presentation and 5 minutes for questions) and talks were applicable to teaching specialist and non-specialist students.

Workshop programme

13:30 - Welcome and introduction

13:40 - Dr Mine Çetinkaya-Rundel (The University of Edinburgh) - Breaking it down, and building it back up again

In this talk we describe the process of moving an introductory data science course online. Specifically, we describe the thought process and the tooling for breaking down course components like lectures, workshops, weekly assignments, and group projects into parts that can be delivered syncronously and asyncronously online.

14:00 - Dr Maria Tackett (Duke) - Making remote lectures active and inclusive in a large undergraduate course

In this talk we’ll share how we adapted face-to-face lectures for remote learning by including both synchronous (“live lecture”) and asynchronous (“lecture content videos”) components. We’ll present ideas for engaging a large class during the live lecture by using technology to facilitate discussion and active learning. We’ll discuss what worked well and what could be improved based on the experience of using this format in Spring 2020. Looking ahead to the fall semester, we conclude with practical considerations for implementing this lecture format in a way that strives to foster equitable learning opportunities for students taking the course in a variety of learning environments.

14:20 - BREAK

14:40 - Dr Matthew Brett (University of Birmingham) - Data science methods for online teaching

Data science is an approach to the practice and understanding of data analysis that is founded in code. It is a rediscovery of methods advocated by John W. Tukey (1962), among others (Donoho 2015).

Thus defined, data science is a fundamental approach to the teaching of statistics. Computation, in the form of simulation and resampling, makes it radically easier to explain the deep ideas of sampling and inference to students without substantial background in mathematics (Cobb 2007). Code is a concise and powerful language to describe data analysis; it gives tools to deal with difficult, messy, real data.

The benefits to successful teaching of data science can be great, but teaching code is famously difficult. How should this be done, and especially, how should this be done online?

Since 2015, The University of California, Berkeley has pioneered data science education to early undergraduates from any subject; many other US institutions are adopting their approach. Berkeley's programme resulted from a deep collaboration between statistics and computer science. Their introductory course now has 1500 students per semester. This is possible because they have developed powerful cloud-based teaching materials and exercises, using the Jupyter Notebook. Notebooks weave code, generated figures and text into a document that students can view, edit and execute, from their web browser. Notebooks make it relatively simple to develop and distribute exercises, via the web. Exercises usually have embedded code for checking and marking, so students can check they are on the right track, and a large proportion of grading can be automated. As an example, this system allowed Berkeley to make an EdX version of their course that is similar to the live version on campus.

Since 2018, I have been teaching courses based on these approaches at the University of Birmingham, using Python (like Berkeley) and R. Preparing for next term, I have also set up a cloud-based system, using the open-source Berkeley machinery, to distribute teaching materials, and exercises. I'll describe how this computation-based teaching worked and, if possible, I will show the live system in action with some example teaching materials.

15:00 - Dr Amira Elayouty and Dr Mitchum Bock (University of Glasgow) - Online Statistics Teaching with R using “learnr” interactive lessons and tutorials

Due to Covid-19, statistics teachers need to make unprecedented plans to deliver the highest quality teaching remotely in the next academic year. A key tool in statistics is the widely used open source statistical software R, which students need for active learning with problems based on data. However, the use of R in a remote setting requires that students have their own computer and can be quite challenging for students with no background in R. One solution is to design interactive lessons and computing labs embedded with R consoles and grading tools for formative assessment using the freely available learnr and gradethis packages from RStudio. These packages turn R Markdown files into interactive tutorials published as shinyapps that only require a web browser to access them, which mitigates the challenges of local installations of R/Rstudio and working with different versions of packages. This also has the advantage that students can do the work remotely and receive automated informative feedback and messages of encouragement or hints for improving code writing and solutions. A further advantage is engaging students with the course material by incorporating quiz style questions relating the interactive lessons to class material with instant feedback. In this talk we share our experience of delivering a full course on data analysis skills using interactive lessons and a lab component of an introductory statistics course using an interactive lab based on an environmental case study.

15:20 - BREAK

15:40 - Dr Eirini Koutoumanou (University College London) - Online teaching of computer-based short statistics courses

The UCL Centre for Applied Statistics Courses runs several short courses on a variety of statistical topics ranging from introduction to statistics to Bayesian analysis. Their duration varies from a single day to 5 consecutive days with evening classes available too and they attract between 10 to 40 people depending on the topic covered. Half of the courses are theory based with no hands-on element added to them (i.e. no real time demonstration of statistical software applications) and the other half either involve live computer demos throughout the course or offer an optional hands-on element on a separate day. The main software used for the computer-based sessions are SPSS and R (with introductory courses available on Stata and (soon) Matlab too). Traditionally, the computer-based courses have run very smoothly over the last few years on a face-to-face format with consistently positive feedback received from delegates. The move to online courses was certainly met with some (unsurprising) hesitation both from teaching staff and participants, with the first few virtual classes being noticeably quieter compared to the pre-covid19 norm. Very soon it became apparent that extending the duration of virtual computer-based classes was necessary to allow more time for questions and for ironing out technical difficulties for delegates (often staff too). Additionally, virtual break out groups were introduced in order to provide participants with an opportunity to ask questions in smaller (hence, possibly friendlier/less distant) groups and to address their difficulties/questions more effectively. Functions such as screen sharing, online chatting, session recordings (and live streaming of code for R classes) have proven particularly useful in maximising the pedagogical student outcome; as reflected on the positive feedback received from delegates too. Finally, availability of more teaching staff and for longer teaching hours have been two significant factors contributing to the success of online computer-based classes organised by CASC. Details and further feedback on CASC’s experiences outlined above will be discussed during this presentation.

16:00 - Prof. Alex Bottle and Dr. Victoria Cornelius (Imperial) - Teaching statistics to both online and on-campus Masters of Public Health students: considerations for remote assessment during covid-19 pandemic

Imperial College London’s Global Masters in Public Health (GMPH) is our first fully online degree course, which students take in an asynchronous manner over two years. In our first pilot intake, 66 students enrolled in October 2019, with plans to scale up numbers for future years. We have for several years run an on-campus MPH, also with around 60 students each intake. Both degrees have a core module in statistics. The on-campus assessment has used a traditional two-hour written exam (making 80% of their final score) and a practical assessment (20%). In contrast, the online module was created with a backwards design, beginning by defining the intended learning outcomes and creating the assessment around those outcomes, followed by the main content. An important consideration was the scalability of the assessment as we prepare to scale up to over 100 students in the next intake. We need to consider both the skills relevance of what we were asking them to do and the burden on markers. We wanted students to demonstrate their statistical learning through practical skills, requiring them to plan and undertake an appropriate analysis to answer a given research question. Students were asked to write a 350-word Abstract with flow diagram and submit one line of R code to show the final regression model they had used (85%), instead of the traditional written exam. The other 15% of the module’s marks were from auto-marked MCQs. Students were given two weeks in January 2020 to submit. In practice, assessment of the Abstract proved difficult in some cases for us to ascertain whether what the student had done was good or poor statistical practice, for example, assumptions around missing data, treatment of variables, and final model covariates, despite their R code submitted. For students resitting in April 2020, we instead provided them a long set of R output, some of which was irrelevant, and asked them to construct an Abstract from that. This allowed us to reduce the permutations of possible analyses and easily identify good and poor choices. In addition, we asked them several free text questions to test their understanding of model coefficients and suggest other analysis. The online MPH has subsequently influenced the ongoing revision of the on-campus degree, which, due to covid restrictions, will make considerable use of blended learning and online assessment.

Recording of the workshop

A recording of the workshop can be found on YouTube and also on the Teaching and Learning Mathematics Online (TALMO) page.

Page updated

Google Sites

Report abuse

Teaching and Learning Statistics Online

Workshop programme

Recording of the workshop

Slides from the workshop