Long Document Classification

Joint work with a company for a commercial product to reduce the high FPR relying only on keywords matching.
Considered the trade-off between the context information and the computing efficiency. Led to a reinforcement learning problem by selecting the most representative sentences.
Implemented A two-stage approach by combining coarse grained keywords matching and fine grained context based model. Significantly reduced FPR and getting high recall. Model was delivered in Docker and served by RESTful API.

Since our model uses the hard attention mechanism to smartly extract key words or sentences in a long document, it can be modeled by a typical Partially Observable Markov Decision Process (POMDP) with a reward strategy.

At each step the Controller receives an observation from the environment, which is the glimpsed feature gt, then the Controller executes an action A_i – emitting the next glimpsed location loc_t by the location network. Once an action is taken, the Controller will receive a new observation g_{t+1} and a reward signal r_{t+1}. The goal of the Controller is to maximize the sum of the reward signal defined as R = \sum_{t=1}^{T} r_t. In this work, a positive reward is t=1 given only if the classification result is correct after T steps, then r_T =1, otherwise r_t =0(t=1,...,T −1).