André F.T. Martins (Unbabel)
"AD3 and Sparsemax: Structured Inference for Natural Language Processing"
In the first part of the talk, I will present AD3, a new algorithm for approximate inference on factor graphs. AD3 has a modular architecture, where local subproblems are solved independently, and their solutions are gathered to compute a global update. I will show how to solve these AD3 subproblems for dense and structured factors, as well as factors imposing first-order logic constraints, and I will end by describing experiments on dependency parsing.
In the second part of the talk, I will propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, I will show how its Jacobian can be efficiently computed, enabling its use in a neural network trained with backpropagation. I will show promising empirical results in attention-based neural networks for natural language inference.