Learning to Calibrate and
      Rerank Multi-label Predictions

Cheng Li; Virgil Pavlu; Javed Aslam; Bingyu Wang; Kechen Qin

doi:10.1007/978-3-030-46133-1_14

Learning to Calibrate and Rerank Multi-label Predictions

Cheng Li, Virgil Pavlu, Javed Aslam, Bingyu Wang, and Kechen Qin
Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Würzburg, Germany, 2019.

Contact information: chengli.email@gmail.com
Note that my school email (chengli@ccs.neu.edu) has been deactivated and is no longer used.

Downloads

[Paper] [Supplementary Material] [Slides] [Poster] [Code] [BibTex]

An extended version of this work is presented in Chapter 4 of my Phd thesis.

Abstract

A multi-label classifier assigns a set of labels to each data object. A natural requirement in many end-use applications is that the classifier also provides a well-calibrated confidence (probability) to indicate the likelihood of the predicted set being correct; for example, an application may automate high-confidence predictions while manually verifying low-confidence predictions. The simplest multi-label classifier, called Binary Relevance (BR), applies one binary classifier to each label independently and takes the product of the individual label probabilities as the overall label-set probability (confidence). Despite its many known drawbacks, such as generating suboptimal predictions and poorly calibrated confidence scores, BR is widely used in practice due to its speed and simplicity. We seek in this work to improve both BR’s confidence estimation and prediction through a post calibration and reranking procedure. We take the BR predicted set of labels and its product score as features, extract more features from the prediction itself to capture label constraints, and apply Gradient Boosted Trees (GB) as a calibrator to map these features into a calibrated confidence score. GB not only produces well-calibrated scores (aligned with accuracy and sharp), but also models label interactions, correcting a critical flaw in BR. We further show that reranking label sets by the new calibrated confidence makes accurate set predictions on par with state-of-the-art multi-label classifiers—yet calibrated, simpler, and faster.