Proceedings of NAACL-HLT 2019
    
We present an adaptation of RNN sequence models to the problem of multi-label classifi- cation for text, where the target is a set of la- bels, not a sequence. Previous such RNN mod- els define probabilities for sequences but not for sets; attempts to obtain a set probability are after-thoughts of the network design, includ- ing pre-specifying the label order, or relating the sequence probability to the set probability in ad hoc ways.
Our formulation is derived from a princi- pled notion of set
      probability, as the sum of probabilities of corresponding
      permutation se- quences for the set. We provide a new training
      objective that maximizes this set probability, and a new
      prediction objective that finds the most probable set on a test
      document. These new objectives are theoretically appealing be-
      cause they give the RNN model freedom to discover the best label
      order, which often is the natural one (but different among
      documents).
      We develop efficient procedures to tackle the computation
      difficulties involved in training and prediction. Experiments on
      benchmark datasets demonstrate that we outperform state-
      of-the-art methods for this task.