Xiaoyi Yang
Khoury College of Computer Sciences
Northeastern University
Current teaching (Fall 2023)
DS4400 Machine Learning and Data Mining 1
DS5220 Supervised Machine Learning and Learning Theory
Past teaching
MTH 361 Probability and Statistics in the Health Sciences (Fall 2021, Spring 2022, Fall 2022, Spring 2023)
MTH 365 Introduction to Data Science (Fall 2021, Fall 2022, Spring 2023)
MTH 362 Statistical modeling (Spring 2022, Spring 2023)
MTH 366 Machine Learning (Spring 2022)
MTH 205B Mathematics for the Modern World (Fall 2022)
Pedagogical project: The Effect of Repetition Learning in Data Analysis Projects
In the higher-level statistics and data science courses, data science project is an essential part in the course to test whether students have understood the models and know when and how to apply the to real-life data. As instructors, not only we want to make sure the students have learned their mistake through the projects but also expect them can view the problems in different perspectives. As a result of that, we are wondering what if a student gets a second chance to do the exact same project assignment, can they self-improved the work quality? Between the two submissions, we also add a group reflection activity that students will be asked to read the peers’ work, to reflect on what their peers have done great, but they need to improve. In this work, we’d like to quantitatively measure how this repetition project with group reflection activity may affect students’ learning outcomes in different project settings, through both their project scores and word use in the final analysis report.
Pedagogical project: Assessing the resources and requirements of statistics education in forensic science
With the increasing ability to easily collect and analyze data, statistics plays a more critical role in scientific research activities, such as designing experiments, controlling processes, and understanding or validating lab results. As a result, incorporating statistics training into the curriculum is becoming a trend in STEM education across a range of fields. However, assessing the level and focus of statistical skills that each discipline requires is complicated and subjective. Situations vary based on the subject, program level and expectation, and university resources. As part of the Center for Statistics and Applications in Forensic Evidence (CSAFE), we assess the statistics requirements in accredited university programs in forensic science, through reviewing accreditation requirements and analyzing program admission requirements and curricula. We present results for this pilot project characterizing the expectation of the American Academy of Forensic Sciences for statistics skills and their alignment with tasks performed by forensic scientists, statistics teaching resources available to forensics programs, and possible solutions for reducing any identified gaps.
Undergraduate summer projects
To understand what psychological factors may affect the social network. The projects consists of survey design and social network analysis. The goal is to understand your own social network and figure out what factors may decide the intimacy around you besides of the social factors.
To analyze the information in the sport news and identify the main topic and emotional preference in the sport news. The project makes use of the text analysis and NLP tools to extract information from sport news, trying to rank subjects through media language. Poster presented at Uconn Sports Analytics Symposium (UCSAS) 2022.
To analyze and understand fashion trends through social media and text analysis. The projects first choose a couple of fashion concepts and track them through the social media, like Twitter, and then construct the time series for each of the concepts. Through the project, students learned the different trends of time series and how to classify them in order to predict the future trend.
Lecturer
Summer 2020: 36-225, Introduction to Probability Theory
Online course with over 140 students, one of the co-instructors
Data Science Initiative (DSI) Fellow:
Spring 2019: Advising 2 students with a client from Department of English to understand the difference between the Shakespeare-related biographies and other biographies in ODNB , mainly on the use of words and phrases.
Fall 2019: Advising a group of 4 students with clients from Ikos to understand the relation between housing properties, rent value and rent time.
Spring 2020: Advising a group of 5 students with clients from Chain of Demand to analyze and understand fashion trend through social media and text analysis
CMU Summer Research Program:
Summer 2019 Carnegie Mellon Sports Analytics Camp: Advising a group of 4 students with a client from Department of Computer Science in University of Pittsburgh to analysis and modify the penalty area in the soccer games to promote the fairness.
Curriculum Design
Incorporating e-learning into statistics-related courses to provide in-lecture interaction and data accessibility with ISLE, a browser-based interactive statistics & data analysis platform.
UCI, CRM/LAW C132 Forensic Science, Law, and Society: help to create interactive statistics modules for the course
CMU, ENG 76107 - Writing about Data: help to assess whether the statistics components, like data structure are reasonably presented
Teaching Assistant:
Fall 2016: 36-217, Probability Theory and Random Processes
Spring 2017: 36-217, Probability Theory and Random Processes
Summer 2017: 36-201, Statistical Reasoning and Practice
Fall 2017: 36-461/661, Special Topics: Epidemiology
Fall 2018: 36-401, Modern Regression (Head TA)
Spring 2019: 36-303, Sample, Survey and Society (Head TA)
Fall 2019: 36-311, Statistical Analysis of Networks (Head TA)
Student Service Activities:
Clare Boothe Luce Scholarship committee: The Clare Boothe Luce scholarships are meant to support women in STEM fields. We help to decide the recipients.
Active Member and Officer in the Women in Statistics, a group within the Department of Statistics \& Data Science that supports networking and career opportunities for female graduate students and promotes community outreach
Women in Data Science @ Pittsburgh: The Women in Data Science (WiDS) initiative aims to inspire and educate data scientists worldwide, regardless of gender, and to support women in the field. Women in Data Science @ Pittsburgh is one of its regional events and was held for the first time in 2018 at CMU. As a volunteer in year 2018 and 2019, I help to prepare conference material and the registration. As an executive program committee in year 2020, I was involved with inviting speakers, recruiting volunteers, designing materials, and serving as a panelist on the graduate student panel.
Applying to Grad School Panel: organize a discussion panel to help undergraduates to learn the general process and request to apply graduate schools. We have invited newly admitted Ph.D. and Master students to share their advice and experience for applying graduate schools in STEM.
Match pairs: 1 vs 1 mentoring to pair Ph.D/Master students with undergradutes to help undergradutes better understand and develop their potential career path.