Selected Research Projects

MultiModal Understanding

We introduce a chinese language-vision pretrained model for crossmodal understanding, vision embedding, multimodal retrieval, which is built on 0.37B image-text pair datasets and with 0.6B parameters.
The corresponding technology is applied in almost all businesses in content platform, including video aduit, video tagging and standardization, video retrieval, video fingerprints, etc

We introduce a Shrinking Temporal Attention Transformer, which achieve state-of-the-art results on multiple action recognition benchmarks including Kinetics400 and Something-Something v2, outperforming prior methods with 50% less FLOPs and without any pretrained model.
The corresponding technology is applied in WeSee, QQ browser, Wechat Channel, and achieves two second places in TAAC.

We propose a set of innovative designs to tackle the problem of practical stereo matching. Our results not only rank 1st on both Middlebury and ETH3D benchmarks, outperforming existing state-of-the-art methods by a notable margin, but also exhibit high-quality details for real-life photos
The corresponding technology is applied in stereo bokeh in Megvii.
AI多摄，手机影像的“突破天花板”之战

We propose a novel Patchmatch-based framework to work on high-resolution optical flow estimation. Our method has a strong cross-dataset generalization ability that achieves the best published result on KITTI2015. Also it shows a good details preserving result on the high-resolution dataset DAVIS and consumes 2× less memory than RAFT
The corresponding technology is applied in Low-light shooting and multi-camera smoothing in Megvii.
旷视连发20个Demo，VR裸手交互、手绘转动画……大秀AI
聊聊旷厂黑科技 | 五彩斑斓的黑，旷视“算”出来了！

We propose A Pyramid Attention Network(PAN) exploit the impact of global contextual information in semantic segmentation. It achieves state-of-the-art performance on PASCAL VOC 2012 and Cityscapes benchmarks.
The corresponding technology is applied in video segmentation and video bokeh in Megvii.
聊聊旷厂黑科技 | 如何拍摄一部手机电影

We have designed real-time HDR algorithm and Raw based denoising algorithm. It achieves 9x fewer FLOPs, 4x fewer parameters and 3x faster inference speed than the existing methods while providing comparable accuracy.
The corresponding technology is applied in Low-light shooting and multi-camera smoothing in Megvii.
旷视超画质技术助力OPPO Reno惊艳亮相，引领手机夜拍新潮流
旷视科技基于AI算法打造超画质技术赋予超越基础画质的美感

We firstly introduce a deep learning based approach to remove distortion artifacts from freely-shot photos. our approach significantly outperforms the previous state-of-the-art approach both qualitatively and quantitatively.
The corresponding technology is applied in portraits correction in Megvii.

We propose a contentaware layout generation network which takes glyph images and their corresponding text as input and synthesizes aesthetic layouts for them automatically.
The corresponding technology is applied in poster generation and logo genetation in TencentVideo.
具备先进技术能力，腾讯视频 AI 素材生产系统在多场景形成生产力

We adopt a part heatmap regression network to predict the landmark points on a local granularity by generating a series of heatmaps for each 3D landmark point, which wins the first places in 300VW.
The corresponding technology is applied in 3D avatar in Megvii.

Page updated

Google Sites

Report abuse