Group Meeting 2021

Fall, 2021

  • Xianghong HU. December 16. Invariant causal prediction.

  • Jia ZHAO. December 2. Single-cell data integration. Single-cell ATAC-seq (scATAC-seq) measures chromatin accessibility, while single-cell RNA-seq (scRNA-seq) provides the expression profiles of individual cells. Integrating these different modalities offers opportunities to gain a comprehensive understanding of diverse cellular behaviors. In this group meeting, we first discussed a recently developed integration method, scJoint, which takes advantage of the well-annotated scRNA-seq datasets to generate cell type annotations for unlabeled scATAC-seq data with high accuracy. Then, we talked about Portal’s capability of integrating scATAC-seq and scRNA-seq data. Finally, through real-data experiments, we showed that Portal had a better alignment performance compared to scJoint, owing to its specific model and algorithm designs.

  • Jiashun XIAO. November 25. Spatial transcriptome and detection of spatial expression patterns. Identifying genes that display spatial expression patterns is a key analytic task in spatial transcriptomic studies. In this group meeting, we first reviewed the development of transcriptomic and spatial transcriptomic technology. Next, we discussed two advanced methods, SPARK and SPARK-X, that apply score test and covariance test to detect spatial expression patterns, respectively. Finally, we conducted a series of simulation experiments to compare their performance in power and type I error control.

  • Gefei WANG. November 18. Introduction to score-based generative models. Score-based generative models, or diffusion models, are a category of deep generative models that has achieved state-of-the-art generating results on many image benchmarks. In this group meeting, we reviewed several representative score-based methods including denoising diffusion probabilistic models and SDE-based models, and discussed the connection among those methods. Besides, we also talked about existing and potential applications of score-based models.

  • Xiaomeng WAN. November 4. BayesSpace. Spatial Transcriptomics is an emerging technology that adds spatial dimensionality and tissue morphology to the genome-wide transcriptional profile of cells in an undissociated tissue. However, existing analysis methods do not address the limited resolution of the technology or use the spatial information efficiently. We discusses BayesSpace, a fully Bayesian statistical method that uses the information from spatial neighborhoods for resolution enhancement of spatial transcriptomic data and for clustering analysis.

  • Mingxuan CAI. October 29. Non-negative matrix factorization. [pdf] The multinomial topic model is widely applied in no-negative matrix factorization (NMF). Compared to the multinomial topic model, the Poisson NMF has simpler constraints, which enables a faster and parallelizable coordinate descent algorithm for parameter estimation. We discussed the connection between the multinomial topic model and the Poisson NMF and a mapping between their parameters. Based on this connection, the topic model parameters can be obtained by first solving the Poisson NMF and then applying the mapping to the Poisson parameters. Ref: Non-negative matrix factorization algorithms greatly improve topic model fits.

Spring, 2021

  • Xinyi Yu. July 9. Estimating heritability under case-control sampling. Common genetic variants identified by GWASs typically explain a minority of heritability. This phenomenon is known as "missing heritability". Restricted maximum likelihood (REML), which has achieved great success in estimating the total heritability of quantitative traits attributable to common variants, considerably underestimates the true heritability when applied to case-control studies of disease. A general framework for heritability estimation, called phenotype correlation-genotype correlation (PCGC) regression, is developed to avoid being affected by ascertainment bias. PCGC regression is shown to yield unbiased estimates under case-control sampling and can correctly account for additional covariates in application to estimate the heritability for several diseases.

  • Xianghong HU. July 2. Mendelian Randomization: using genetic data to infer causal relationships. Mendelian Randomization (MR) is a method using genetic variants as instrumental variables (IVs) for causal inference. Two-sample MR has been widely used to exploit GWAS summary data from the public domain. Due to the complexity of human genetics, MR has several unique challenges that come from pleiotropy, confounding bias sample structure issues (including population stratification, sample overlap, and cryptic relatedness). Our method MR-APSS has been proposed to deal with the challenges in a unified framework. We show that MR-APSS can better identify plausible causal relationships with high reliability in comparison with existing methods.

  • Jia ZHAO. June 18. Flexible atlas-level integration of single-cell RNA sequencing data. We present a new algorithm to integrate atlas-level single-cell RNA sequencing data. RNA expressions of cells from two different datasets are mapped into a shared latent space with two encoders. The latent space is designed for preserving biological variation and removing batch effects. Then we train two generative adversarial networks to decode the latent variables to simulate the data generating mechanisms of different datasets. Our algorithm can scale up to large datasets like Human Heart Atlas with 10^5 cells, and finish training within 10 minutes. Results show that our algorithm provided comparable integration performance to state-of-art methods.

  • Zhiyuan Yu. June 11. Implicit Neural Representation and 3D RGB Reconstruction. Implicit neural representation (INR) provides us differentiable representations for both geometry and appearance. With the help of INR and differentiable render, we can reconstruct 3D model given input images and camera poses end-to-end. However, current methods only support single object reconstruction. By adding encoder and sparse INR to the pipeline, we get prior information about different 3D models and achieve detailed object reconstruction.

  • Xiaomeng WAN. June 4. A review of flow-based generative models. Flow-based generative models have been attractive due to the tractability of the exact log-likelihood and latent-variable inference. Glow which is one of the most commonly used flow-based generative models achieves the ability to generate high-quality images. By utilizing the architecture of Glow, some conditional flow-based methods have emerged to solve different real applications.

  • Yueqi QIAN. April 30. Statistical arbitrage in the cryptocurrency market. Statistical arbitrage uses algorithms and quantitative methods to uncover price discrepancies based on historical data and takes advantage of inefficient pricing in correlated securities. We apply a statistical arbitrage strategy to the cryptocurrency market since it is newly formed with inefficiency that we can make a profit from. More specifically, two strategies are designed to use different dimensions of information from K-line data, pair trading strategy, and dollar-neutral strategy. The former strategy is based on cointegration between two time series. The latter strategy distinguishes relative strong and weak assets at each cross-section, which can achieve high returns with low risk.

  • Cong ZHENG. April 23. Quantitative trading strategy. We present a complete process of building a CTA trading strategy. From alpha-factor discovery to strategic risk optimization. In order to cover the execution cost, we often need to do alpha-factor boosting to improve the sensitivity of the alpha factor. After identifying alpha factors, a remaining question is how to control the risk of the strategy. There exists a trade-off between the annual return and the strategic risk. One general solution is the mean-variance optimization, which can help us to achieve a balance point between the annual return and annual volatility.

  • Jiafa HE. April, 16. Learning hybrid representations for automatic 3D vessel centerline extraction. We present a hybrid representation learning approach for automatic 3D vessel centerline extraction. The main idea is to use CNNs to learn local appearances of vessels in image crops while using another point-cloud network to learn the global geometry of vessels in the entire image. In inference, the proposed approach extracts local segments of vessels using CNNs, classifies each segment based on global geometry using the point-cloud network, and finally connects all the segments that belong to the same vessel using the shortest-path algorithm. This combination results in an efficient, fully automatic, and template-free approach to centerline extraction from 3D images.

  • Gefei WANG. April, 9. Deep generative learning via Schrodinger bridge. We propose to learn a generative model via entropy interpolation with a Schrödinger Bridge. The generative learning task can be formulated as interpolating between a reference distribution and a target distribution based on the Kullback-Leibler divergence. Under some mild smoothness assumptions of the target distribution, we prove the consistency of both the score estimator and the density ratio estimator, and then establish the consistency of the proposed Schrödinger Bridge approach. Experimental results on multimodal synthetic data and benchmark data support our theoretical findings and indicate that the generative model via Schrödinger Bridge is comparable with state-of-the-art GANs, suggesting a new formulation of generative learning. We demonstrate its usefulness in image interpolation and image inpainting.

  • Shunkang ZHANG. March, 26. Style transfer and portrait generation. We introduce existing methods in neural style transfer and styleGAN-based portrait generation. Instead of matching statistics information from only two images, we leverage the large-scale dataset and transfer learning technique for portrait generation. Based on this generative model, we apply the layer swapping operation to keep low-level information from the source model and high-level information from the transferred model. Finally, we can not only generate high-quality portraits but also create portraits with reference-style images.

  • Jiashun XIAO. March, 19. XPXP: Improving polygenic prediction in cross-population and cross-phenotype. Existing PRS methods primarily focused on a single phenotype of a single population at a time. However, genetic correlation estimated from GWASs reveals widespread pleiotropy cross phenotypes and cross populations. In light of this discovery, we extend the recently proposed trans-ancestry PRS construction method, XPASS, to work with an arbitrary number of correlated GWAS summary statistics and take the sample overlap into account.

  • Mingxuan CAI. March, 12. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. We present a unified statistical framework (XPA) to improve the prediction accuracy of human traits using multi-ancestry genetic data. Paired with innovations in data structure and algorithm design, our framework is highly scalable, with both computational cost and memory storage linear to the sample size and number of predictors. In practice, XPA can analyze 3 million variants from 430K samples with only 385 Gb memory usage in 54.5 hours. In a Chinese cohort, our method achieves 7.3% -- 198.0% accuracy gain for height prediction in terms of the R-squared value compared to existing methods. [XPA software][XPASS software].