"AI can never truly be safe unless we understand it, and as it is becoming more capable, this is becoming more and more important."
I went to UC Davis with a triple major in statistics, linguistics, and Russian and a minor in computer science. Right after graduating in 2024, I started my masters in Natural Language Processing at UCSC.
NLP was just starting to take off and I had done some work in it in undergrad, but really wanted to learn as much about it as I could!
My main research focus is mechanistic interpretability and AI safety research. This is essentially focused on trying to take apart the black box nature of LLMs to uncover how they actually work, and using this to explore ways to make them safer. In the past, I have also done some work on climate resilience, which involved creating games aimed in teaching people skills about dealing with natural disasters, and then analyzing how people play these games to uncover how the learning and resilience-building process is unfolding.
"I am really passionate about making AI safe and understanding more about how they work internally."
I have always been extremely fascinated with mechanistic interpretability, as I feel AI has the potential to teach us a lot about our own consciousness and intelligence in general, and there is so much we don't know about it. I feel like taking apart the black box is the only way to get these insights. I think it is a very human centered goal too. AI can never truly be safe unless we understand it, and as it is becoming more capable, this is becoming more and more important.
Regarding the climate resilience work, it is something I am really passionate about personally, and love that there are a community of researchers trying to help with this. Its been a very inspiring project to be a part of and I have learned so much from it! I did some similar work focused on resilience in asylum seekers through my linguistics major in undergrad, and it's always been a topic I have been very interested in and cared a lot about.
Definitely working with fire-affected communities on the climate resilience work! It is amazing to see the strength and grit these communities have. We often visit community events to gather data, and I am always so touched by how the people in these towns come together to support eachother, and how supportive they are of our work too. Its really a reminder of how strong communities can be when they stand together.
"AI has the potential to teach us a lot about our own consciousness and intelligence ..."
I would like to get my PhD and continue doing research related to mechanistic interpretability and other areas of NLP. I am really passionate about making AI safe and understanding more about how they work internally.
If there is a project in the lab that is of interest to you, then I think it is a really great environment and opportunity to get involved in research. The lab has so many smart and talented people with diverse skill sets and I think it's really great to learn from them.
I really like papers where you can just tell the researchers are really interested in getting to the true nature of how things work, rather than just chasing some benchmark or getting published. I remember I read the paper "Refusal in Language Models Mediated by a Single Direction" a few years ago and just reread it again recently, and it's a really inspiring paper for staying motivated by curiosity itself. The paper itself is so creative and has such compelling findings, but something I find most telling is the appendix. In addition to explaining the main text in more detail, it's also filled with lots of interesting observations, examples, and smaller experiments that don't quite fit in with the main paper, yet they still do all these things and write about it anyways. Appendices rarely get read by reviewers, so there was no strategic reason to add all that. It just reflects how much they actually care about the work and want to share their findings with the community, which is really refreshing. This certainly isn't the only paper that does this, so I'm probably a bit temporally biased since I reread it so recently. But it's a very interesting paper nonetheless.