Budgeted Learning

Learning tasks typically begin with a data sample --- eg, symptoms and test results for a set of patients, together with their clinical outcomes. By contrast, many real-world studies begin with no actual data, but instead with a budget --- funds that can be used to collect the relevant information. For example, one study has allocated $30 thousand to develop a system to diagnose cancer, based on a battery of patient tests, each with its own (known) costs and (unknown) discriminative powers. Given our goal of identifying the most accurate classifier, what is the best way to spend the $30 thousand? Should we indiscriminately run every test on every patient, until exhausting the budget? Or, should we selectively, and dynamically, determine which tests to run on which patients? We call this task budgeted learning.

Our initial work on this task studied the theoretical foundations of budgeted learning and proved several important results, such as NP-hardness. Our other work builds upon the theory, and provides algorithms for budgeted learning a passive (Naive Bayes) classifer. Finally, our most recent extensions consider both learning and classifying under a budget, and thus budgeted-learn a bounded active classifier. As budgeted learning is a sequential decision problem, we also provide empirical results which demonstrate that the obvious Reinforcement Learning techniques do not perform particularly well on this high-dimensional and complex task, and are typically bested by our simpler, heuristic policies. A list of all these works and additional experiments is given below. (See also Open Problems.)