Projects

Machine Learning/Representation Learning

Deep Learning/Natural Language Processing

Selected Research Projects:

Machine Learning/Representation Learning

Many applications naturally produce structured inputs such as sequences or sets of objects, which however cannot be directly used by most standard machine models (SVM or MLP) since they are designed for inputs with a vector feature representation. In the following research projects, to address the long-standing problem for building a p.d. kernel for structured objects, Lingfei has been developing a family of the alignment-aware kernels and embedding methods for learning unsupervised feature representations for a variety of structured objects such as documents, time series, symbolic sequences, and graphs. These effective continuous vector space representations fill gaps between structured objects and the usages of various advanced machine learning techniques.

D2KE: From Distance to Kernel and Embedding

Lingfei Wu*, Ian E.H. Yen*, Fangli Xu, Pradeep Ravikumar, and Michael J. Witbrock (both authors contributed equally), "D2KE: From Distance to Kernel and Embedding", arXiv preprint 2018.

[PDF] [CODE] [PPT] [Media] [Blog]

In this paper, we presented D2KE (D2KE: From Distance to Kernel and Embedding), a generic distance kernel learning framework for structured inputs with structured distance measure. We propose a general framework to derive a family of positive definite kernels from a given dissimilarity measure, which subsumes the widely-used representative-set method as a special case, and relates to the well-known distance substitution kernel in a limiting case. Our approach draws from the literature of Random Features, but instead of deriving feature maps from an existing kernel, we construct novel kernels from a random feature map, that we specify given the distance measure. D2KE addressed a long-standing problem for building a p.d. kernel for structured objects and has been adopted to achieve huge success in many sub-domains like time-series, text, strings, and graphs.

Random Warping Series: A Random Features Method for Time-Series Embedding

[AISTATS'18] Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock, "Random Warping Series: A Random Features Method for Time-Series Embedding", In the 21st International Conference on Artificial Intelligence and Statistics. [oral paper, acceptance rate: 5.9% (38/645)]

[PDF] [CODE] [PPT] [Media] [Blog]

In this paper, inspired by the latest advancement of kernel learning methodology from distance [D2KE (distances to kernels and embeddings)], we study Random Warping Series (RWS), a generic framework to generate vector representation of time-series, where we construct a family of p.d. kernels from an explicit feature map given by the DTW between original time series and a distribution of random series. RWS is the first dedicated random features method for constructing a truly p.d. time-series kernel and the corresponding embeddings. It is can easily combined with any downstreaming tasks such as time-series classification, clustering, and forecasting.

Word Mover’s Embedding: From Word2Vec to Document Embedding

[EMNLP'18] Lingfei Wu, Ian E.H. Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, Michael J. Witbrock, "Word Mover's Embedding: From Word2Vec to Document Embedding", 2018 Conference on Empirical Methods in Natural Language Processing.

[PDF] [CODE] [PPT] [Media] [IBM Research AI Blog]

In this paper, we build on a recent innovation, in building kernels from distance measures, D2KE (distances to kernels and embeddings), and present the Word Mover's Embedding (WME), an unsupervised generic framework that learns continuous vector representations for text of variable lengths such as a sentence, paragraph, or document. In particular, we propose a new approach to first construct a positive-definite \emph{Word Mover's Kernel} via an infinite-dimensional feature map given by the Word Mover's distance (WMD) to random documents from a given distribution. WME is extremely easy to implement, fully parallelizable, and highly extensible.

Deep Learning/Natural Language Processing

Many data inputs are naturally represented in a form of graph. Machine learning on graphs is an important and ubiquitous task with applications ranging from natural language processing to molecular design. The primary challenge in this domain is to develop a framework to not only represent or encode graph structure but also to decode these latent representation of graphs to a sequence of discrete objects such as texts or graphs. In the following research projects, Lingfei has been working on develop a new paradigm for tackling graph encoding, graph-to-sequence, and graph translation.

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Kun Xu*, Lingfei Wu*, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin (both authors contributed equally), "Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks", arXiv preprint 2018.

[PDF] [CODE] [PPT] [Media] [IBM Research AI Blog]

The celebrated Sequence to Sequence learning (Seq2Seq) technique and its numerous variants achieve excellent performance on many tasks. However, many machine learning tasks have inputs naturally represented as graphs; existing Seq2Seq models face a significant challenge in achieving accurate conversion from graph form to the appropriate sequence. To address this challenge, we introduce a general end-to-end graph-to-sequence neural encoder-decoder architecture that maps an input graph to a sequence of vectors and uses an attention-based LSTM method to decode the target sequence from these vectors.

SQL-to-Text Generation with Graph-to-Sequence Model

[EMNLP'18] Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, and Vadim Sheinin, "SQL-to-Text Generation with Graph-to-Sequence Model", 2018 Conference on Empirical Methods in Natural Language Processing.

[PDF] [CODE] [PPT] [Media] [IBM Research AI Blog]

Previous work approaches the SQL-to-text generation task using vanilla Seq2Seq models, which may not fully capture the inherent graph-structured information in SQL query. In this paper, we first introduce a strategy to represent the SQL query as a directed graph and then employ a graph-to-sequence model to encode the global structure information into node embeddings. This model can effectively learn the correlation between the SQL query pattern and its interpretation.

Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model

[EMNLP'18] Kun Xu, Lingfei Wu, Zhiguo Wang, Mo Yu, Liwei Chen, and Vadim Sheinin, "Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model", 2018 Conference on Empirical Methods in Natural Language Processing.

[PDF] [CODE] [PPT] [Media] [IBM Research AI Blog]

Existing neural semantic parsers mainly utilize a sequence encoder, i.e., a sequential LSTM, to extract word order features while neglecting other valuable syntactic information such as dependency or constituent trees. In this paper, we first propose to use the syntactic graph to represent three types of syntactic information, i.e., word order, dependency and constituency features; then employ a graph-to- sequence model to encode the syntactic graph and decode a logical form.

Machine Learning/Data Mining

Kernel methods have great promise for learning non-linear model from simple data input representations and have been demonstrated successful for solving various learning problems such as regression, classification, feature extraction, clustering and dimensionality reduction. However, does not scale to large data set due to its quadratic complexity in the number of samples. In the following research projects, Lingfei has been working on developing Random Features methods for scaling up the pairwise proximity construction such as Kernel Machines and Spectral Clustering for large datasets.

Scalable Spectral Clustering Using Random Binning Features

[KDD'18] Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia and Charu Aggarwal, "Scalable Spectral Clustering Using Random Binning Features", In The 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. [oral paper, acceptance rate: 10.8% (107/983)]

[PDF] [CODE] [PPT] [Media] [Blog]

Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and computing subsequent eigendecomposition. In this paper, we present a novel scalable spectral clustering method using Random Binning features (RB) and a state-of-the-art SVD solver (PRIMME) to simultaneously accelerate both similarity graph construction and the eigendecomposition. Using these two building blocks, we reduce the computational cost from quadratic to linear in the number of data points while achieving similar accuracy.

Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability

[KDD'16] Lingfei Wu*, Ian En-Hsu Yen*, Jie Chen, and Rui Yan (both authors contributed equally) , "Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability", In The 22th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. [oral paper, acceptance rate: 9% (70/784)]

[PDF] [CODE] [PPT] [Media] [Blog]

Different random feature functions have since been proposed to approximate a variety of kernel functions. Among them the Random Binning (RB) feature, proposed in the first random-feature paper, has drawn much less attention than the Random Fourier (RF) feature proposed also in the same paper. In this work, we observe that the RB features, with right choice of optimization solver, could be orders-of-magnitude more efficient than other random features and kernel approximation methods under the same requirement of accuracy.

Numerical Linear Algebra

Computing a small number of singular values and vectors of a large-scale matrix (dense or sparse) is a fundamental problem in numerical linear algebra and has high demands in many applications, including machine learning, data mining, natural language processing, and diverse scientific sciences. In this following projects, we have developed and deployed a state-of-the-art SVD solver for efficiently tackling this problem. Our software has been widely used in National Labs, Industry companies, and data science practitioners.

PRIMME_SVDS: A State-of-The-Art SVD Solver in PRIMME Library

[SISC'15] Lingfei Wu and Andreas Stathopoulos, "A Preconditioned Hybrid SVD Method for Computing Accurately Singular Triplets of Large Matrices”, SIAM J. Sci. Comput. (2017), 37(5), pp. S365-S388. [SIAM SISC, Flagship Journal in Scientific Computing]

[SISC'17] Lingfei Wu, Eloy Romero, and Andreas Stathopoulos, “PRIMME_SVDS: A High-Performance Preconditioned SVD Solver for Accurate Large-scale Computations”, SIAM J. Sci. Comput. (2017), 39(5), pp. S248–S271. [SIAM SISC, Flagship Journal in Scientific Computing]

[PDF-1] [PDF-2] [CODE] [PPT] [Media] [Blog]

We have designed and implemented a state-of-the-art high-performance SVD software, PRIMME_SVDS on top of PRIMME for solving large-scale SVD problems. It can be used in a single node or a large cluster, supporting computation of both largest and smallest singular triplets in full accuracy. It provides C, Fortran, Matlab, Octave, Python, and R interfaces to serve a broad class of users.

Free download at: https://github.com/primme/primme

Report abuse