iCODE: Adaptive Training of Students’ Code Comprehension Processes

Developing an innovative educational technology to improve code comprehension for CS, non-CS majors and different underrepresented subgroups in CS (e.g., women, racial minorities, first-generation)

The iCode project is funded by the Institute for Education Sciences (Department of Education) Award#R305A220385

iCODE's mission is to integrate reading strategies training, animated pedagogical agents, inclusive and culturally responsive instructional design, and the open pro-social learner model to improve code comprehension, learning, student engagement, self-efficacy, computer science (CS) identity, and retention in CS programs.


The three-year project combines design-based research with randomized controlled trials supporting CS majors, non-CS majors, and students from underrepresented groups (females, students of color, first-generation status) to engage in code comprehension activities.

The project combines design-based research with randomized controlled trials to pilot the iCODE system and assess its promise to improve student outcomes. Through an iterative process informed by students and faculty from universities and community colleges, the researchers will build on top of the DeepTutor platform developed under a previously funded IES grant, and refine and develop it for use in courses teaching Java or Python. During the pilot study, the researchers will compare outcomes for students in courses using iCODE in their lab sessions to those using business-as-usual or another automated tutor that does not focus on reading comprehension.


Code comprehension is a critical skill for both learners and professionals. iCODE integrates reading strategies training, animated pedagogical agents, inclusive and culturally responsive instructional design, and the open pro-social learner model to improve code comprehension learning, student engagement, self-efficacy, computer science (CS) identity, and retention in CS programs. By adapting to individual learner characteristics (prior knowledge, self-efficacy, engagement, socio-cultural factors) and code characteristics (language, cohesion, and readability), iCODE aims to benefit both CS and non-CS majors, including underrepresented groups in CS and higher education such as women, students of color, and first-generation college students.

The proposed iCODE system can be used as a supplementary instructional tool in formal or informal, traditional or online education settings, or as a standalone resource used by all learners who want to improve their programming skills. Now, more than ever, the need for “high-dosage tutoring” tools such as iCODE is needed, given the effects of the lockdowns on students during the pandemic.


The research team aims to develop and pilot test the education technology called iCODE (improve source CODE comprehension), which takes place in universities and community colleges in urban Tennessee and Minnesota.

For the development and usability, and feasibility work, the research team will collect data from approximately 400 students and at least eight instructors across the different institutions, primarily at the Tennessee and Minnesota universities. During the pilot study, they will recruit 300 college-level students from the two universities and Tennessee and Minnesota community colleges. Students at the Tennessee and Minnesota universities will come from intro-to-programming and introductory psychology courses, and students from the Tennessee and Minnesota community colleges will come from a wide range of backgrounds, including CS and non-CS majors.

iCODE will be a web-based intervention that adaptively monitors, tracks, models, and scaffolds students' source code comprehension processes as they engage in various code-comprehension tasks. iCODE draws on and integrates theories of reading comprehension and source code comprehension, motivation theory, and frameworks of self-regulated learning with open prosocial learner models (OPLMs) and animated pedagogical agents (APAs) and adopts culturally responsive teaching models to address the fundamental difficulty of students in intro-to-programming classes to construct accurate mental models during source code comprehension. iCODE will focus on two coding languages, Java and Python, and will be a web-based supplemental tool, such as those commonly used in the lab component of computer science courses. The web-based environment will also include motivational elements such as mastery framing and self-assessment through an aspirational peer in the OPLM to increase students' self-regulated, mastery-oriented engagement with the iCODE system and the assigned instructional tasks. Furthermore, to help students navigate the social factors of learning computing and internal factors that impact students' computing identity and self-efficacy, iCODE will implement practices to support diversity, equity, inclusion, and fairness. The researchers will design the assignments and the APAs to follow the culturally responsive instruction model. They will incorporate professional development training workshops for all instructors involved in the project and their support staff on inclusive learning practices and assess the fidelity of training.

The research will occur in three major phases: prototyping, formative evaluation and refinement, and summative evaluation. The iterative development work happens over the first two phases. During these phases, the researchers will engage with stakeholders, such as university instructors and students, through interviews. They will use cases and walkthroughs to gain insights from their knowledge and experience to develop and refine iCODE. This work will also address usability, feasibility, and cost issues to help guide revisions. Once the full iCODE is complete, the researchers will conduct a pilot study (the summative evaluation phase). They will collect data from students across multiple sites in the introduction to programming courses. They will compare CS and non-CS majors' outcomes and consider different underrepresented subgroups in CS (e.g., women and racial minorities).

During the pilot study, the researchers will compare iCODE to two control conditions: standard instruction (business-as-usual control) and another intelligent tutoring system, DeepTutor, which does not include the core reading strategies of iCODE but can still provide a one-to-one tutoring context for implementation.

During the development phase, the research team will use data from interviews, think-aloud, eye-tracking, and log files. During the pilot study, they will assess learning using the Foundational CS1 test or a similar one, course completion and course grades, and source code comprehension. They will evaluate motivation and beliefs using the Computer Science Cultural Attitude and Identity Survey, the Computer Programming Self-Efficacy Scale, and the Achievement-Goal Orientation Questionnaire-Revised. They will measure engagement using logfiles and measures such as time spent on tasks, self-explanation verbiage, and session time. To estimate retention, they will administer the Intentions for CS survey at the end of the semester, asking students if they are likely to continue with their CS degree (for majors) or study further and use programming in their work (non-CS majors) and course registration for up to two semesters.

The researchers will compare the output of the automated method with expert observations and judgments to analyze the accuracy of various iCODE components. For the summative final pilot study, the primary focus of the inferential analyses will be on testing the effectiveness of iCODE relative to the two control conditions. Statistical analyses will comprise descriptive and visual studies and fitting inferential models to each dependent variable separately. The researchers will fit a multiple regression model to each dependent variable individually and include pretest variables (covariates) in the model and two interaction terms testing if pretest status moderates the effect of iCODE.The models will consist of variables for CS major, gender, race/ethnicity, and first-generation status.

The researchers will conduct a cost analysis following the ingredients method, deriving the average unit cost to deliver the intervention at the student level based on the estimated total cost of each condition divided by the number of students served.


DeepTutor is an advanced intelligent tutoring system that fosters students' deep understanding of complex science topics through quality interaction and instruction. 300+ students currently use it as a supplement to their traditional classroom instruction.


The CSEdPad system improves students' mental model construction, learning, engagement, and retention in CS education. In particular, the system targets source code comprehension, a critical skill for both learners and professionals. It monitors and scaffolds source code comprehension processes while students engage in a variety of code comprehension tasks. Key approaches being explored include Animated Pedagogical Agents, self-explanation, and the Open Social Learner Model.