Homepage

Chengxi Ye (叶承羲)

PhD Candidate, University of Maryland, College Park

Email:  yechengxi@gmail.com

Research interests:  Deep Learning, Algorithmic Trading, Computer Vision

>Linkedin profile<

>Google scholar profile<


I am a PhD student in the Computer Science department, University of Maryland. I am actively developing deep learning related algorithms to help understanding the blackbox. My recent development can be found here: https://github.com/yechengxi/LightNet.

I had worked in Bioinformatics for a few years and luckily developed some algorithms that have been helpful to that community. Compared to the peer methods, my proposed approaches had reduced the computational requirements by orders of magnitude. Specifically I (co)developed: 

1) A memory-efficient genome assembly algorithm for the prevalent second-generation sequencing. The work (project name: SparseAssembler) has drastically reduced the computational consumption of this fundamental task, and has been widely adopted in the industry. 

2) An efficient genome assembly algorithm with world leading computational efficiency for the latest, third-generation sequencing technology. The first assembly of a human genome using third-generation sequencing was reported to have taken 405,000 CPU hours. My work (project name: DBG2OLC) reduced the time cost to <2000 CPU hours, which was a dramatic leap at that time (2014). The work can be found in several media coverage reports including the front page of Yunnan Daily, the formal version of this work was later published in Nature Scientific Reports.

3) A blind deconvolution based ultra-efficient base-calling algorithm for Illumina sequencing platform. This work (project name: BlindCall) clearly demonstrated for the first time that the so-called DNA sequencing is actually a computational problem called 'blind deconvolution'. Please take a look at SmartDeblur (released in 2012) at the bottom of this page if you do not know what 'blind deconvolution' is :) 



EDUCATION:

2011.9 - present    Computer Science    University of Maryland, College Park

2007.9 - 2010.3    Computer Science    Zhejiang University 

2003.9 - 2007.7    Mathematics    Sun Yat-sen University


SELECTED PUBLICATIONS

Smith, J. J., Timoshevskaya, N., Ye, C., et al. (2018). The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution. Nature Genetics.

Ye, C., Yang, Y., Fermuller, C., & Aloimonos, Y. (2017). On the Importance of Consistency in Training Deep Neural Networks. arXiv preprint arXiv:1708.00631.

Ye, C., Hill, C. M., Wu, S., Ruan, J., & Ma, Z. S. (2016). DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Scientific reports, 6, 31900.

Ye, C., Zhao, C., Yang, Y., Fermüller, C., & Aloimonos, Y. (2016, October). LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning. In Proceedings of the 2016 ACM on Multimedia Conference (pp. 1156-1159). ACM.

Ye, C., & Ma, Z. S. (2016). Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads. PeerJ4, e2016.

Ye, C., Hsiao, C., & Corrada Bravo, H. (2014). BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution. Bioinformatics, 30(9), 1214-1219.

Ye, C., Ma, Z. S., Cannon, C. H., Pop, M., & Douglas, W. Y. (2012). Exploiting sparseness in de novo genome assembly. BMC bioinformatics, 13(6), S1.

Ye, C., Tao, D., Song, M., Jacobs, D. W., & Wu, M. (2013). Sparse norm filtering. arXiv preprint arXiv:1305.3971.

Ye, C., Lin, Y., Song, M., Chen, C., & Jacobs, D. W. (2012). Spectral graph cut from a filtering point of view. arXiv preprint arXiv:1205.4450.

 


RESEARCH EXPERIENCE

Deep Learning

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning 

LightNet is a lightweight, versatile and purely Matlab-based deep learning framework. The aim of the design is to provide an easy-to-understand, easy-to-use and efficient computational platform for deep learning research. The implemented framework supports major deep learning architectures such as the Multilayer Perceptron Networks (MLP), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). LightNet supports both CPU and GPU for computation and the switch between them is straightforward. Different applications in computer vision, natural language processing and robotics are demonstrated as experiments.


We explain that the difficulties of training deep neural networks come from a syndrome of three consistency issues. This paper describes our efforts in their analysis and treatment. The first issue is the training speed inconsistency in different layers. We propose to address it with an intuitive, simple-to-implement, low footprint second-order method. The second issue is the scale inconsistency between the layer inputs and the layer residuals. We explain how second-order information provides favorable convenience in removing this roadblock. The third and most challenging issue is the inconsistency in residual propagation. Based on the fundamental theorem of linear algebra, we provide a mathematical characterization of the famous vanishing gradient problem. Thus, an important design principle for future optimization and neural network design is derived. We conclude this paper with the construction of a novel contractive neural network.


Computational Biology & Bioinformatics

Genome Assembly



Genome assembly is one of the most fundamental tasks in bioinformatics. However, the large memory requirements render this work only feasible with super-computing environments.


In 2010-2011, we developed a new sparse graph structure for genome assembly that uses 1/20~1/10 memory compared to the dense graph structures which have been dominant in genome assembly for many years. We implemented the idea into SparseAssembler, a fast and ultra memory efficient genome assembler. The new assembler can assemble human genomes on a desktop computer rather than on expensive clusters.

By the end of 2012, this work has been adopted by BGI-Shenzhen (the largest genomics center in the world) and is used in SOAPdenovo2 genome assembler, a key role-player in the industry. 


Our work on genome assembly related with third generation sequencing can be found here:

http://www.nature.com/articles/srep31900


Base Calling

        Base-calling of sequencing data is a fundamental process in the high-throughput bioinformatics analysis. The major challenge in base-calling is to infer accurate base-calls from blurry and noisy fluorescence intensity measurements. However, existing third-party base-calling methods are impractical for production use due to their computational inefficiency (10x-1000x slower for production use).

        In contrast, our work is based on a simple observation that the deteriorated signals can be modeled as a blurred/convolved version of the latent signals and are denser than the latent signals. To recover the sparse latent signals, we directly formulate base-calling as a blind deconvolution problem and use state-of-the-art sparse optimization techniques to obtain efficient solutions. Our work thus provides a novel inverse problem point-of-view of the base-calling problem. To our knowledge it is also the fastest algorithm at this time while producing high quality base-calls. The computational complexity of BlindCall scales linearly with read length, making it better suited for new long-read sequencing technologies. 



Computer Vision

Image Filtering

 

        

We demonstrate a new type of image filter called sparse norm filter (SNF) from optimization-based filtering. SNF has a very simple form, introduces a general class of filtering techniques, and explains several classic filters as special implementations of SNF, e.g. the averaging filter and the median filter. It has advantages of being halo free, easy to implement, and low time and memory costs (comparable to those of the bilateral filter). Thus, it is more generic than a smoothing operator and can better adapt to different tasks. We validate the proposed SNF by a wide variety of applications including edge-preserving smoothing, outlier tolerant filtering, detail manipulation, HDR compression, non-blind deconvolution, image segmentation, and colorization.


Image Segmentation 

Image segmentation and filtering are two large fields that have been intensively investigated for decades. We build a connection between two building blocks i.e. normalized cut and the bilateral filter in the two fields (all together the original papers have over 10k citations). We therefore show these two fields are deeply connected. Based on the connection we give a new interpretation and implementation of the normalized cut, with a 10-100x speedup. We also show how a new conditioned random field model can be introduced for segmentation and how it can be solved efficiently. 


Image Deblurring 

I codeveloped SmartDeblur 2.0 (with Vladimir Yuzhikov). With this interesting small tool we hope to provide a comprehensive and easy way for everyone to handle blurry images.

Academically I worked on the blind deblurring problem in 2008-2009 and developed algorithms that are similar to what was later on introduced to Adobe (but I was late by a few months). Later on I developed some novel models and partially contributed to SmartDeblur 2.0. 

 


MEDIA COVERAGE (Chinese)

《三代基因测序组装算法和软件研发获突破》 

Covered by:

Yunnan Daily《云南日报》

www.china.com.cn《中国网》

www.biotech.org.cn 《中国生物技术信息网》

www.people.com.cn 《人民网》

...

《中美合作Sparc新软件弥补三代基因测序“硬伤”》

Covered by:

China Science Daily 《中国科学报》 

www.sciencenet.cn 《中国科学网》

news.163.com 《网易新闻》

www.gmw.cn 《光明网》

finance.qq.com 《腾讯财经》

www.sohu.com 《搜狐网》

Comments