Notes

This page lists my research technical notes on some concepts that either I'm interested in or I believe are important for research and applications, to be continued.

Notes on Adaptive Online Learning

[Abstract] We will discuss adaptive online learning where the learning rate is scheduled in an adaptive manner. Specifically we will discuss adaptive Follow-The-Regularized-Leader (FTRL) and give regret bound for General FTRL and FTRL-Proximal algorithms. We also discuss adaptive FTRL with an additional regularization term. This chapter is to supplement McMahan (2014) where proofs of some claims are not provided.

Aug. 18th, 2016

Notes on Limited Memory BFGS

[Abstract] We first discuss the original optimization problem for the limited memory BFGS (L-BFGS) before explaining the two-loop recursion, and finally we will give the L-BFGS algorithm.

Aug. 15th, 2016

Notes on RNN, LSTM, and GRU

[Abstract] In this note we discuss architecture design, parameter estimation, and algorithms for traditional recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU). Once back-propagation is done, we can apply adaptive online learning methods to adaptively update model parameters, for which we will discuss in detail in another note.

Aug. 7th, 2016

Notes on Expectation Maximization

[Abstract] In this note, we will explain the EM algorithm and show that how to design an auxiliary function (which is also a lower bound of the log-likelihood function w.r.t. the model parameters) to increase the log-likelihood function.

July 12th, 2016

Notes on Semi-continuity

[Abstract] Semi-continuity is an important property which is proved to be very useful for many other advanced concepts. For example, in convex analysis, the closure of a proper convex function f is the greatest lower semi-continuous function majorized by f (also called the lower semi-continuous hull of f). In measure theory and probability theory, semi-continuity has been used for weak convergence of measures.

Sep. 22nd, 2013

Notes on Conditional Random Fields

[Abstract] We give detailed derivation on the basic conditional random field (CRF). Maximum conditional log-likelihood parameter estimation for the CRF is discussed. Conditional independent property of CRF is proved, based on which the Viterbi algorithm for CRF is designed to predict the single best label sequence for an observation sequence. We also discuss the technical details on dynamic programming for CRF, which guarantees the gradient of the conditional log-likelihood of CRF on training data can be efficiently computed. Scaling, which is very important for implementing inference and prediction procedures for CRF, is also studied with mathematical derivations. An improved Viterbi algorithm using logarithms with scaling is then given, followed by comments and discussions on implementation issues for the basic CRF in the end.

Feb. 20th, 2013

Notes on Hidden Markov Models

[Abstract] In this note, we give a detailed derivation and comments on technical issues of the basic HMM. The inference algorithm listed in the end is succinct and is easy to implement. Dynamic programming methods like Viterbi algorithm and general EM algorithm are discussed as well. Scaling and Viterbi algorithm using logarithms are discussed in the end, which are very important and required for implementing the inference procedure of HMMs. We hope this note would be helpful for researchers and engineers from both academia and industry.

Feb. 15th, 2013

The Equivalence of Logistic Regression and Maximum Entropy Modeling

[Abstract] In this technical note, we will show the equivalence of maximum entropy modeling and logistic regression. The equivalence is built on the fact that the optimization problem of logistic regression is to maximize the log-likelihood of model parameters knowing the exponential form of posterior probability functions, which is actually the dual problem of maximum entropy modeling. It is the maximum likelihood estimation (MLE) technique that brides maximum entropy modeling and logistic regression.

Oct. 29th, 2012