I build multimodal and vision-language machine learning systems focused on robust perception beyond closed label sets and practical generalization.
I focus on open world vision-language perception, multimodal indoor scene understanding that aligns imagery with geometry, layout, and language, and efficient adaptation under distribution shift with limited data. I prioritize rigorous evaluation and evidence based results to deliver reliable, production ready perception systems.
Multimodal and Vision Language Modeling: Learning aligned representations across vision and language for open world perception.
Open World Perception and Robust Generalization: Building models that perform reliably beyond closed label sets and under distribution shift.
Multimodal Indoor Scene Understanding: Connecting imagery with geometry, layout, and language for structured scene reasoning.
Data Efficient Adaptation: Improving generalization with limited data through adaptation and efficient training strategies.
Problem: Video LLMs are powerful but hard to deploy for recommendation due to multi-video inputs and serving latency.
Contribution: Co-first author. Proposed grounded knowledge-aware tokens from raw frames and a cross-layer knowledge-fusion MoE for low-latency ranking.
Why it matters: A practical path to leverage Video LLM world knowledge without relying on language-only outputs.
Links: Paper
Problem: Infer structured home attributes from heterogeneous signals (floor plans, images, and language).
Contribution: Built multimodal perception systems for floorplan structural reasoning, connectivity/visibility cues, and floorplan–image matching with explainable comparisons.
Evidence: Granted patents and intern-mentored publications in multimodal indoor scene understanding.
Links: Paper | Patent | Project
LinkedIn, Sunnyvale, CA | 08/2024 - Present | Website
Senior Machine Learning Engineer, Notification AI
Work on multimodal video recommendation models for user-facing products under production constraints.
Build and maintain end-to-end ML pipelines (data, training, offline evaluation, and online validation) with an emphasis on reliability.
Collaborate cross-functionally to iterate on model quality, robustness, and deployment readiness.
Zillow, Seattle, WA | 06/2021 - 08/2024 | Website
Senior Applied Scientist / Applied Scientist, AI Media Insights
Developed multimodal and vision-language methods for open-world perception and robust generalization in real-world applications.
Built multimodal systems for indoor scene understanding that align imagery with geometry/layout and language for structured reasoning.
Led efforts on data and tooling to enable scalable dataset creation and model iteration (quality, efficiency, and reproducibility).
Mentored research interns and drove projects from problem framing to evaluation and delivery, resulting in publications and patents.
Samsung Research America, Mountain View, CA | 06/2020 - 09/2020 | Website
Research Intern, Artificial Intelligence
NEC Labs of America, Princeton, NY | 06/2019 - 12/2019 | Website
Research Intern, Machine Learning
Zebra Technologies, Lincolnshire, IL | 06/2017 - 08/2017; 06/2018 - 08/2018 | Website
Research Intern, Computer Vision Algorithm
Selected Publications & Patents
Selected Publications:
TFM2, WACV'25 | Training-free open-vocabulary segmentation for open-world perception | Paper
ZInD-Tell, CVPRW'24 | Multimodal indoor panoramas to structured language descriptions | Paper
MOCO, TKDD'23 | Generative modeling of label dependencies for multi-label prediction | Paper
SentRL, ICDM'20 | Reinforcement learning for aspect-level sentiment modeling | Paper
GM-VAR, ICCV'19 (Oral) | Multi-view generative video action understanding | Paper
Selected Patents:
FloorPlan Understanding, US Pending | Multimodal floor-plan structural reasoning for attribute inference. | PDF
3D-Depth AutoConfig, US Granted | 3D sensing and perception for automated container measurement and configuration. | PDF
Particle Detection, CN Granted | Automated vision inspection for foreign particle detection in infusion bottles. | PDF CN
Selected Services & Awards
Service:
Conference Reviewer: CVPR, ICCV, ECCV; NeurIPS, ICLR, ICML; AAAI, IJCAI; and others (multiple years).
Journal Reviewer: TPAMI, TIP, TNNLS, TKDE, TKDD; and others (multiple years).
Awards:
AAAI Student Travel Award (2017, 2020)
Microsoft Imagine Cup, Shaanxi, China (3rd Prize, 2015; top 2%)
National Scholarship, China (2011; top 3%)
See full list: Services & Awards
Ph.D., Electrical & Computer Engineering.
Thesis: Correlation Discovery for Multi-View and Multi-Label Learning | Thesis PDF
M.S., Electronic & Information Engineering.
Thesis: Vision-based PCB Defect Detection Algorithms | Thesis PDF (CN)
B.Eng., Electrical Engineering.
Thesis: Vision-based Infusion Bottle Foreign Matter Inspection | Thesis PDF (CN)
Email: wanglichenxj [at] gmail [dot] com
Links: CV (PDF) | LinkedIn | Google Scholar