Practical Classroom Sensing at Scale
Providing university teachers with high-quality opportunities for professional development cannot happen without data about the classroom environment. Currently, the most effective mechanism is for an expert to observe one or more lectures and provide personalized formative feedback to the instructor. Of course, this is expensive and unscalable, and perhaps most critically, precludes a continuous learning feedback loop for the instructor. EduSense is a comprehensive, open source, sensing system that produces a plethora of theoretically-motivated visual and audio features correlated with effective instruction, which can feed professional development tools in much the same way as a Fitbit sensor reports step count to an end user app. Although previous systems have demonstrated some features in isolation, EduSense is the first to unify them into a cohesive, real-time, in-the-wild evaluated, and practically-deployable system. Our published studies quantify where contemporary machine learning techniques are robust, and where they fall short, illuminating where future work remains to bring the vision of automated classroom analytics to reality. Nonetheless, we believe that EduSense has immediate utility for human observers, augmenting their notes with data they cannot easily capture themselves, or provide data at greater frequency. We also envision EduSense as a stepping stone towards the furthering of a university culture that values professional development for teaching. It may reduce current barriers such as time and effort needed to collect, process, and view fine-grained data that leads to quality feedback on teaching.
Video and audio from classroom cameras first flows into a scene parsing layer, which provides the raw material for a series of subscriber featurization modules, responsible for a particular classroom facet of interest. These are launched as containers and receive data over an IPC socket using a standard API which affords modularity. This architecture also allowed us to swap in new implementations as they become available, as well as toggle individual featurization modules on and off. Please see our publications for an extended discussion on the pedagogical motivation for each of these features.
We note that our feature set, while diverse, is not exhaustive. There are many other valuable dimensions of data that could be gleaned through video and audio processing; our present implementation is one set of features that we believed were a natural starting point and proof-of-concept. We hope that others contribute new capabilities through the open source project.
EduSense is a full-stack system comprised of four key layers, illustrated above. The physical sensors that power our system are the lowest classrooms layer. This is followed by a processing layer, in which the audio-visual scene is parsed to generate initial data, after which a series of specialized featurization modules convert and classify into educationally-relevant features. This digested data is then saved for long term storage and information retrieval in the datastore layer. The final Apps layer is comprised of end-user applications, which is our focus for the next year of research and development.
Our future goal with EduSense is to power a suite of data-driven instructional aids, for example, tracking the elapsed time of continuous speech to help instructors inject lectures with pauses, as well as opportunities for student questions and discussion. Similarly, pauses in speech by the instructor, for instance after posing a question to a class, should follow the recommended wait time of three seconds (shown to significantly raise student participation). These timers could pop-up on a carefully designed, low-visual-complexity instructor tablet. Other simple cues that could be automatically generated include suggestions to increase movement at the front of the class (increasing student attention), and modify the ratio of facing the board vs. students. More complex visualizations are possible too, for example, a cumulative heatmap off all student hand raised thus far in a lecture, which could facilitate selecting students who have yet to contribute. Similarly, a histogram of the instructor’s gaze could highlight areas of the classroom receiving less visual attention, which has been shown to decrease learning. In additional to in-class, real-time feedback, we also see great value in after-class and end-of-semester reports, which could be sent via email. This could provide instructors opportunities to reflect on their teaching practices, and perhaps even the efficacy of interventions throughout the semester, such as increasing wait time after posing a question. Prior work (using similar, but manually coded data) has already shown such reports change instructor behavior and improve teaching efficacy. EduSense also serves as a springboard for more advanced features, for example, models of student engagement and attention. Prior work has shown that nonverbal communication on the teacher side (such as gaze direction, gesticulation, smiling, and moving around the classroom – all features EduSense tracks) can boost student attention and engagement, and has been shown to improve learning outcomes.