Budgeted Learning
Learning tasks typically begin with a data sample --- eg, symptoms and test results for a set of patients, together with their clinical outcomes. By contrast, many real-world studies begin with no actual data, but instead with a budget --- funds that can be used to collect the relevant information. For example, one study has allocated $30 thousand to develop a system to diagnose cancer, based on a battery of patient tests, each with its own (known) costs and (unknown) discriminative powers. Given our goal of identifying the most accurate classifier, what is the best way to spend the $30 thousand? Should we indiscriminately run every test on every patient, until exhausting the budget? Or, should we selectively, and dynamically, determine which tests to run on which patients? We call this task budgeted learning.
Our initial work on this task studied the theoretical foundations of budgeted learning and proved several important results, such as NP-hardness. Our other work builds upon the theory, and provides algorithms for budgeted learning a passive (Naive Bayes) classifer. Finally, our most recent extensions consider both learning and classifying under a budget, and thus budgeted-learn a bounded active classifier. As budgeted learning is a sequential decision problem, we also provide empirical results which demonstrate that the obvious Reinforcement Learning techniques do not perform particularly well on this high-dimensional and complex task, and are typically bested by our simpler, heuristic policies. A list of all these works and additional experiments is given below. (See also Open Problems.)
SEE ALSO:
Other Work on Active Learning, especially Active Learning Literature Survey (B Settles, Jan 2009)
Optimistic Active Learning using Mutual Information (related especially to 1, 2)
Learning to Segment from a Few Well-Selected Training Images (related especially to 1, 2)
Learning Cost-Sensitive Active Classifiers (related especially to 4, 7)
Other results (by Kamesh Munagala, Ashish Goel, Sudipto Guha) especially
Asking the right questions: Model-driven optimization using probes, Kamesh Munagala, Ashish Goel and Sudipto Guha, PODS 2006
Approximation algorithms for budgeted learning problems, Sudipto Guha, Kamesh Munagala, STOC 2007
The Ratio Index for Budgeted Learning, with Applications, Ashish Goel, Sanjeev Khanna, Brad Null, SODA 2009
BUDGETED LEARNING OF NAIVE BAYES CLASSIFIERS
EMPIRICAL RESULTS
Synthesized data with single discriminative feature in 10 features: p(+f|+c) = .95, p(+f|-c)=.05
Performance on several UCI data Sets. Relevant information such as class distribution and the discriminative power of each individual feature can be obtained by clicking on the dataset name. Plots of performance are shown for one or more budgets amounting to less than 30 queries on average per feature:
KR-VS-KP (Chess Domain)
LEARNING AND CLASSIFYING UNDER HARD BUDGETS
Utility Based Data Mining Workshop, KDD 2005