Student Project Ideas

Xiaoyi Yang

Khoury College of Computer Sciences

Northeastern University

Following projects are some of my research ideas that may be suitable for undergraduate research. Please contact me if you are interested in any of them.

When we talk about Fashion, what are we talking about?

This idea is motived from one of the data science project I supervised in graduate school. It does not well-developed at that moment, so I am really excited to revisit and redesign it.

Fashion is an abstract and subjective concept that people talk about it everyday in every places. However, catching the fashion trend is hard since it changes all the times. As a data science project, we want to limit our focus in by studying and analyzing the fashion trend in a small area and time period. The project requires certain coding abilities on web scraping, text analysis, potential knowledge on time serious and a passion to fashion and sociology. Here are some potential approaches we may consider:

Use Google trend or Twitter API to achieve times series for a list of key words. Study how the words frequency change over the time and try to cluster the words by a mixture model.
Scape the title and comments information in TikTok. Analysis the common keys words and how the keys words are shifting over the time. Identify the people are posts with high influence and think about the reasons.

Psychic challenge

Have you ever seen the a TV serious called Psychic Challenge?

My friend and I started to have this idea when we were talking about to get a psychic reading on New York streets. There is no doubt that there are still a large amount of people believe the psychic power and sometimes it does comfort and help people in some way, but, is it something truly reliable?

We have no idea how this project will go but the general approach is that we are going to design a survey to test whether the psychics will give consistent and reliable suggestions, and whether they may be able to detect our research purpose. Based on their response to a specific question and how much information they achieve from and give to us, we will try to rank and cluster the answers to detect the similarity and consistency.

This is a really subjective research. Students who are interested in psychology are strongly welcomed. It may also require strong critical thinking ability to view things in a more objective way.

Name entity recognition in historical documents

This is part of my own research.

When we try to understand people's relationship in historical documents, one of the approaches to extract human names from the document and analysis the frequency that you see two names together. However, here is a problem: people like to name their children with their own names. As a result of that, it becomes harder to distinguish people.

The goal of this project is to optimize the current name entity recognition tool to identify people with the same names. We are going to work the model on fictions which has duplicated names or mimic a true history document. For example, Gabriel Garcia Marques' One Hundred Years Of Solitude or Leo Tolstoy's War and Peace. Popular fiction like A Song of Ice and Fire can also be considered. You will need to read and be familiar with the document we are going to use before the project and also have great patience to deal with the huge mess in the text data. Also, you will learn the how the general name entity recognition tool works and think about how to use the various information extracted with it to create a better identifier to classify the names.

Sports strategy analysis

Are you an athlete or someone who is super interested in one of the sports?

Let's try to learn sports in a statistical way.

Ideally, we want to work with of the university sport teams to understand how to evaluate players' performance and may propose ways for improvement. We can also work with the historical match data to propose best game strategy if applicable.

The student should be interested and well-known about one of the sports and we still try to find the connections and data from there. Some potential focuses under this topic are:

Propose a metric to evaluate one player's performance
Propose a metric to evaluate pair of players; performance
Identify players' strength and weakness
Analysis historical match data to propose best game strategy

True crime rate analysis

Crime prediction used to be a popular choice for the data science analysis project. A common approach is to give you a public police record and you fit a model to predict the next month's crime.

However, here is a problem. With a given police record, you are actually predicting the next month's police record not the true crime. If a crime happens in a place where it has not been investigate this month, the place will not be in the target for the next month. As a result of that, the places with more officers assigned to tend to have more crimes being discovered, then next month, more officers will go. Clearly, it will go to a vicious circle and lead to more conflicts between the civilians and police force.

Therefore, here is the question, How to justify the bias in the police record.

Psychological Social Network

Have you ever thought about why you like or dislike some people in your daily life? We are going to study this from an interesting perspective. Try to build a social network around you, including your family/friends/classmates etc. For each person in the social network, we are going to ask them a couple of psychology questions to determine their personality and then study how the social network is associated with their personality.

Google Sites

Report abuse