Interests: Computer Vision, Natural Language Processing, Automatic Speech Recognition
Google Experience: Google Lens on-device and server-side computer vision, e2e feature, and ranking model. Cloud AI multimodal LLM quality.
Currently working on Gemini multimodal multilingual evaluation and post training.
Prior to joining Google, I was a research assistant in the VLSI Research Group, working on using computer vision to solve neuroscience problems with Professor Mark Horowitz. I worked closely with the Luo Lab and was co-advised by Professor Liqun Luo.
Ignite - Certificate Program in Innovation and Entrepreneurship, Stanford School of Business
Ph.D. - Electrical Engineering, Stanford University
M.S. - Electrical Engineering, Stanford University
B.S. - Electrical Engineering, Mathematics Minor, Summa Cum Laude, High Distinction, University of Minnesota, Twin Cities
Members: Association for Computing Machinery, Society of Neuroscience
Reviewer: CVPR Computer Vision for AR/VR Workshop, MICCAI, ICIAP, Artificial Intelligience Review
Program committee: AAAI, Med-NeurlPS
In charge of multimodal mulilingual quality.
[manuscript in preparation]
Led a team of four female engineers and won AGI House Gemini 1.0 Hackathon where Sergey gave the talk.
Ran a translation channel that add subtitles in multiple langauges to Kpop idols live and shows. Created a website for fans to view selected YT videos with created subtitles. The service is currently hiatus.
Worked on general trust and safety evaluation and filtering for Gemini 1.0 launch on Vertex. Put together dataset for multimodal/multilingual self identification evaluation for Gemini 1.5 launch.
Form recognizer for documents with fixed layout [Patent filed and waiting review from USPTO]
Initiated, designed, implemented, and generated preliminary results that used user click data to improve image search ranking at Google Lens. The feature uses historical user click data combined with image similarity to rank image search results more effectly. Live experimented Q1 2022.
Tech lead of the "Top match" (now "high confidence clusters") feature of Google Lens. The feature differentiates and highlights high confidence image results from all retrieved similar images results for user image queries. Directly and primarily responsible for developing the new end-to-end feature. Created and drove the feature road map. Designed and implemented the algorithm that determines high confidence image answer for image queries. Collaborated with 4 different Google teams and across 2 time zones to deliver the feature from scratch in < 3 months. Resulted in 3pp E2E quality improvement and powers about 350M queries per month.
Tech lead of the image similarity ranking model at Google Lens. Responsible for bringing together the visual intelligence of multiple server-side vision models to rank image results from all Lens verticals effectively. Aligned requirements from 4 Lens teams, unified the image similarity definition across all Lens verticals and collected ground truth data for model training. Led engineers from 3 Google teams on training and deploying a unified image similarity scoring model in Lens backend system with neutral latency change, better result quality, and a new SW+AI architecture. This model scores and ranks retrieved images from all Lens traffic.
Winner of Silver Perfy Award in Google for capacity management.
Invented and implemented the end-to-end computer vision cascade that enables trust & safety compliant results queries that contains human by showing results from safe results from non-people-sensitive regions in a query that contains people or face.
Point of contact of on-device visual intelligience for Lens on Photos. Responsible for brining the visual intelligience of server-side vision models to mobile despite stringent compute and power constraints. Built and improved core on-device computer vision cascade for suggsted action of a feature. Privacy-preserving, compact, entirely built from distilled on-device models. Collaborated across 3 different organizations to bring the feature from scratch to live experiment in < 4 months. Conceived and implemented a new SW+AI architecture to enable E2E user privacy without sacrifacing quality and scale.
Histological brain slices are widely used in neuroscience to study the anatomical organization of neural circuits. Systematic and accurate comparisons of anatomical data from multiple brains, especially from different studies, can benefit tremendously from registering histological slices onto a common reference atlas. Most existing methods rely on an initial reconstruction of the volume before registering it to a reference atlas. Because these slices are prone to distortions during the sectioning process and often sectioned with non-standard angles, reconstruction is challenging and often inaccurate. Here we describe a framework that maps each slice to its corresponding plane in the Allen Mouse Brain Atlas (2015) to build a plane-wise mapping and then perform 2D nonrigid registration to build a pixel-wise mapping. We use the L2 norm of the histogram of oriented gradients of two patches as the similarity metric for both steps, and a Markov random field formulation that incorporates tissue coherency to compute the nonrigid registration. To fix significantly distorted regions that are misshaped or much smaller than the control grids, we train a context-aggregation network to segment and warp them to their corresponding regions with thin plate spline. We have shown that our method generates results comparable to an expert neuroscientist and is significantly better than reconstruction-first approaches.Research Projects
The dorsal raphe (DR) constitutes a major serotonergic input to the forebrain and modulates diverse functions and brain states, including mood, anxiety, and sensory and motor functions. Most functional studies to date have treated DR serotonin neurons as a single population. Using viral-genetic methods, we found that subcortical- and cortical-projecting serotonin neurons have distinct cell-body distributions within the DR and differentially co-express a vesicular glutamate transporter. Further, amygdala- and frontal-cortex-projecting DR serotonin neurons have largely complementary whole-brain collateralization patterns, receive biased inputs from presynaptic partners, and exhibit opposite responses to aversive stimuli. Gain- and loss-of-function experiments suggest that amygdala-projecting DR serotonin neurons promote anxiety-like behavior, whereas frontal-cortex-projecting neurons promote active coping in the face of challenge. These results provide compelling evidence that the DR serotonin system contains parallel sub-systems that differ in input and output connectivity, physiological response properties, and behavioral functions.
[paper]
We adopted the structure of the fully convolutional network for this segmentation problem. We trained a model to segment an experimental histological image into main brain regions - grey (cerebrum, brainstem, and cerebellum), fiber tracts, and ventricular systems - and background and achieved 96.1% accuracy on the test reference slices and 92.1% accuracy on test experimental datasets. Network is mainly trained on reference images because of the limitation on segmented experimental data.
[project report]
The Android app we developed takes video frames of a human face from camera as input and outputs a fusion image of extracted facial features and contours and a motion distribution map. The motion distribution map is generated based on MicroExpression hotmap with special color added. The brightness of the color is scaled by the magnitude of motion in each different area on the face. The client, an Android device, gets the initial location of eyes and mouth. Covariance based image registration is used to generate motion distribution of facial features on the server side. The fusion image generated with the information is then sent back to the client for display. Users can learn from this fusion image about micro changes of face features and thus interpret the human emotions. Since more than key points of facial features are extracted, we expect full utilization of our data to give precise interpretation proovided a robust scoring system of motions of different facial features and contours.
[report]
We implemented a music recommender system based on users' listening history and social network. We used collaborative filtering with both user-based and item-based strategies. For user-based collaborative filtering, we measured users' similarity with both the binary information and actual play count in their listening history. Our methods significantly increased the accuracy of recommendation. Furthermore, we modified the user-based collaborative filtering algorithm and came up with a method that combined the users' listening history and social relationships for music recommendation.
[report]
Mapping histological brain images to the Allen Mouse Brain Atlas (PDF)
Stanford University, PhD thesis
Stanford Imaging Symposium 9/17/2018.
Stanford Center for Image Systems Engineering (SCIEN) Industry Affiliates Meeting 2018.
Biomedical Computation at Stanford Symposium 4/4/2016.
Stanford Bio-X IIP Symposium 2/17/2016.
Center for Biomedical Imaging at Stanford Symposium 4/29/2015.
Invited Talks:
Neuroscience Conference 2018
CSIT 2021
AnalytiX2021
Friends of Music Applied Music Scholarship, Stanford University
Stanford Graduate Fellowship, Stanford University
Albert George Oswald Prize, University of Minnesota
KSP & Kumar Scholarship, University of Minnesota
I love scuba diving. I was involved with the Guzheng Community at Stanford when I wasn't that busy with research, and have performed in the Stanford Chinese New Year Spring Gala at the Memorial Auditorium and Stanford Guzheng Ensemble Concert.
I enjoy watching basketball games and am certified as a secondary basketball referee in China.