Results Related To Belief Nets

Many of today's artificial intelligence tools inherently involve probabilistic reasoning. Bayesian Belief Nets have become the standard tool for these applications. Much of this current work provides effective ways tolearn these systems from data (finding both the structure/model and the parameters that are best, where "best" can be generative -- best model of world -- and discriminative -- best classifier).

We also provide other systems for:


BRETT POULIN, ROMAN EISNER, DUANE SZAFRON, PAUL LU, RUSSELL GREINER, DAVID WISHART, ALONA FYSHE, BRANDON PEARCY, CAM MACDONNELL, JOHN ANVIK

Abstract Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naive Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.

Appears in IAAI'06

See also Explaining Native Bayes Classifications.

YUHONG GUO, RUSSELL GREINER

AAAI, Pittsburgh, 2005

Bayesian belief nets (BNs) are often used for classification tasks, typically to return the most likely class label for a specified instance. Many BN-learners, however, attempt to nd the BN that maximizes a different objective function --- viz., likelihood, rather than classiifcation accuracy, typically by first using some model selection criterion to identify an appropriate graphical structure, then finding good parameters for that structure. This paper considers a number of possible criteria for selecting the best structure, both generative (i.e., based on likelihood; BIC, BDe) and discriminative (i.e., Conditional BIC (CBIC), resubstitution Classification Error (CE) and Bias2+Variance (BV) ). We empirically compare these criteria against a variety of different correct BN structures, both real-world and synthetic, over a range of complexities. We also explore different ways to set the parameters, dealing with two issues: (1) Should we seek the parameters that maximize likelihood versus the ones that maximize conditional likelihood? (2) Should we use (i) the entire training sample first to learn the best parameters and then to evaluate the models, versus (ii) only a partition for parameter estimation and another partition for evaluation (cross-validation)? Our results show that the discriminative BV model selection criterion is one of the best measures for identifying the optimal structure, while the discriminative CBIC performs poorly; that we should use the parameters that maximize likelihood; and that it is typically better to use cross-validation here.

Read the paper here.

Describes the challenge of learning a belief net structure that will produce a good classifier. Unfortunately, due to space limitations, it could only include some of our empirical data. This appendix provides a more comprehensive set of results. (Also in postscript.)


RUSSELL GREINER, XIAOYUAN SU, BIN SHEN, WEI ZHOU

Machine Learning Journal, 59:3, 2005

Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely class label for each specified instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function --- viz., likelihood, rather than classification accuracy --- typically by first learning an appropriate graphical structure, then finding the parameters for that structure that maximize the likelihood of the data. As these parameters may not maximize the classification accuracy, ``discriminative parameter learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, shows how it extends standard logistic regression, and analyzes its inherent sample and computational complexity. We then present a general algorithm for this task, ELR, that applies to arbitrary BN structures and that works effectively even when given incomplete training data. Unfortunately, ELR is not guaranteed to find the parameters that optimize conditional likelihood; moreover, even the optimal-CL parameters need not have minimal classification error. This paper therefore presents empirical evidence that ELR produces effective classifiers, often superior to the ones produced by the standard ``generative'' algorithms, especially in common situations where the given BN-structure is incorrect.

See the Empirical data used in our experiments here.

Auxiliary information is ELR webpage.

JACK NEWTON, RUSSELL GREINER

This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define a hierarchical PRM, which extends standard PRMs by dynamically refining classes into hierarchies, which improves the expressiveness as well as the context sensitivity of the PRM. Finally, we show that hierarchical PRMs achieve state-of-the-art results on this dataset.

JIE CHENG, RUSSELL GREINER, JONATHAN KELLY, DAVID BELL, WEIRU LIN

Artificial Intelligence Journal, 137:1-2, pp. 43-90, 2002

This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn belief networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.

BIN SHEN, XIAOYUAN SU, RUSSELL GREINER, PETR MUSILEK, CORRINE CHENG

Proceedings of the Fifteenth IEEE International Conference on Tools with Artificial Intelligence (ICTAI-03),

Sacramento, November 2002

Greiner and Zhou [1] presented ELR, a discriminative parameter-learning algorithm that maximizes conditional likelihood (CL) for a fixed Bayesian Belief Network (BN) structure, and demonstrated that it often produces classifiers that are more accurate than the ones produced using the generative approach (OFE), which finds maximal likelihood parameters. This is especially true when learning parameters for incorrect structures, such as Na ve Bayes (NB). In searching for algorithms to learn better BN classifiers, this paper uses ELR to learn parameters of more nearly correct BN structures e.g., of a general Bayesian network (GBN) learned from a structure-learning algorithm [2]. While OFE typically produces more accurate classifiers with GBN (vs. NB), we show that ELR does not, when the training data is not sufficient for the GBN structure learner to produce a good model.. Our empirical studies also suggest that the better the BN structure is, the less advantages ELR has over OFE, for classification purposes. ELR learning on NB (i.e., with little structural knowledge) still performs about the same as OFE on GBN in classification accuracy, over a large number of standard benchmark datasets.

See the Empirical data used in our experiments here.

RUSSELL GREINER, WEI ZHOU

Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely ``class label'' for each instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function (viz., likelihood, rather than classification accuracy), typically by first learning an appropriate graphical structure, then finding the maximal likelihood parameters for that structure. As these parameters may not maximize the classification accuracy, ``discriminative learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, and shows how it relates to logistic regression, which corresponds to finding the optimal CL parameters for a naive-bayes structure. After analyzing its inherent (sample and computational) complexity, we then present a general algorithm for this task, ELR, which applies to arbitrary BN structures and which works effectively even when the training data is only partial. This paper analyses this approach, presenting empirical evidence that it works better than the standard ``generative'' approach in a variety of situations, especially in common situation where the BN-structure is incorrect.

See the Empirical data used in our experiments here.

See also: Webpage, Poster

TIM VAN ALLEN, RUSSELL GREINER, PETER HOOPER

Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle, August 2001

A Bayesian Belief Network (BN) is a model of a joint distribution over a finite set of variables, with a DAG structure to represent the immediate dependencies between the variables, and a set of parameters (aka CPTables) to represent the local conditional probabilities of a node, given each assignment to its parents. In many situations, the parameters are themselves treated as random variables --- reflecting the uncertainty remaining after drawing on knowledge of domain experts and/or observing data generated by the network. A distribution over the CPtable parameters induces a distribution for the response the BN will return to any ``What is P(H | E)?'' query. This paper investigates the distribution of this response, shows that it is asymptotically normal, and derives closed-form expressions for its mean and asymptotic variance. We show that this computation has the same complexity as simply computing the (mean value of the) response --- ie, O(n * exp(w)), where n is the number of variables and w is the effective tree width. We also provide empirical evidence showing that the error-bars computed from our estimates are fairly accurate in practice, over a wide range of belief net structures and queries.

JIE CHENG, RUSSELL GREINER

Proceedings of the Canadian Conference on Artificial Intelligence (CSCS101), Ottawa, May 2001

Runner-Up, Best Paper Prize

This paper investigates the methods for learning predictive classifiers based on Bayesian belief networks (BN) -- primarily unrestricted Bayesian networks and Bayesian multi-nets. We present our algorithms for learning these classifiers, and discuss how these methods address the overfitting problem and provide a natural method for feature subset selection. Using a set of standard classification problems, we empirically evaluate the performance of various BN-based classifiers. The results show that the proposed BN and Bayes multi-net classifiers are competitive with (or superior to) the best known classifiers, based on both BN and other formalisms; and that the computational time for learning and using these classifiers is relatively small. These results argue that BN based classifiers deserve more attention in the data mining community.

TIM VAN ALLEN, RUSSELL GREINER

ICML '00

We are interested in the problem of learning the dependency structure of a belief net, which involves a trade-off between simplicity and goodness of fit to the training data. We describe the results of an empirical comparison of three standard model selection criteria --- viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a Cross-Validation criterion --- applied to this problem. Our results suggest that AIC and cross-validation are both good criteria for avoiding overfitting, but MDL does not work well in this context.

Short Version

Technical Report

Data Used

WEI ZHOU, RUSSELL GREINER

Conditional Independence Structures and Graphical Models, Toronto, September 1999

JIE CHENG, RUSSELL GREINER

Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99), Sweden, August 1999

In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers -- Naïve­Bayes, tree augmented Naïve­Bayes (TANs), BN augmented Naïve­Bayes (BANs) and general BNs (GBNs), where the GBNs and BANs are learned using two variants of a conditional­ independence based BN­learning algorithm. Based on their performance, we then define a new type of classifier. Experimental results show the resulting classifiers, learned using the proposed learning algorithms, are competitive with (or superior to) the best classifiers, based on both Bayesian networks and other formalisms, and that the computational time for learning and using these classifiers is relatively small. These results argue that BN classifiers deserve more attention in machine learning and data mining communities. ​

Paper presented at Learning Relations workshop, Stanford, 1999(?)

RUSSELL GREINER, ADAM GROVE, DALE SCHUURMANS

A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize some other criteria (eg, based on Kullback-Leibler divergence, or Bayesian Information Criterion), which is independent of the distribution of queries that are posed. This paper takes the ``performance criteria'' seriously, and considers the challenge of computing the BN whose performance --- read ``accuracy over the distribution of queries'' --- is optimal. We show that many aspects of this learning task are more difficult than the corresponding subtasks in the standard model, and then present an important subclass of queries that greatly simplifies our learning task.

HOME > RESEARCH INTERESTS > TECHNOLOGY PUSH > RESULTS RELATED TO BELIEF NETS