This BESS project explores interactions between machine learning and logic. In particular, we consider scenarios where the learning supervision is provided wholly or in part by logical constraints, also referred to as "symbolic supervision".
The project is led by Zsolt Zombori, and includes Michael Benedikt, Wolfgang Gatterbauer, Agapi Rissaki, and Kristóf Szabó.
As a first step in the project, we explore Disjunctive Supervision (DS), which is strongly related to Partial Label Learning (PLL). In both setups, each input is associated with a set of outputs, instead of a single correct one, as in the classical case. The difference between PLL and DS is that the former assumes a single, unknown, correct output corrupted by noise, while any of the provided outputs are considered correct for DS. This work is presented in paper "Towards Unbiased Exploration in Partial Label Learning" https://www.jmlr.org/papers/volume25/23-0868/23-0868.pdf . In the following, we provide links to the code needed to reproduce experimental results as well as some supplementary material.
All experiments related to rule learning (Tables 4 and 5 in Section 6.4 in the arxiv paper) are reproducible using the code in public repository:
https://github.com/zsoltzombori/mapping
Code for reproducing experiments related to partial label learning with synthetic datasets as well as with the CIFAR10/100 datasets are provided in repository:
https://github.com/agapiR/partial-label-learning
In the body of the arxiv paper (Section 6.2) we present experiments with a small syntactic dataset and provide average training accuracies over 1000 random network initializations in Figure 6. To provide a qualitative comparison of the different loss functions on this dataset, we show 4 randomly-selected training curves for each loss function. On all plots, we show how the probabilities of allowed outputs change during training. In the graphs below, Total refers to the sum of all allowed probabilities. On this dataset, the only optimal output is o0, corresponding to the blue curve. In the caption, we provide the number of times that learning resulted in the optimal output being assigned the highest probability.
Libra-loss
NLL-loss
0.5-merit-loss
uniform-loss
RC-loss
lws-loss
Sag-loss
In Subsection 6.3 we describe a model for adding distractors synthetically to a real dataset. Here we provide more detail.
Wen et al. (2021) introduces three PLL noise models for classification with 10 labels. The models are instance-independent, i.e., the noise only depends on the true label. Figure 8 presents results based on 5 such noise matrices. Of these the first three are taken directly from Wen et al. (2021) and the last two are harder variants added by us. The noise models are represented as [10 x 10] matrices M where M_ij represents the probability of label j becoming a distractor given true label i. In the following we provide these 5 noise matrices.