Bayesian Networks and Path Analysis

BNPA R Package

This study proposes a hybrid approach, which unites the computational and statistical resources of the Bayesian Networks (BNs) to learn a network structure from one data set and the robustness of the statistical methods present in the Structural Equation Modeling (SEM).

Bayesian Networks (BN)

A BN B (the figure above on left side) is composed by a DAG encoding a Joint Probability Distribution (JPD) over a set of random variables V [1]. A BN for V is defined by the pair B = (G,Θ), where G is the DAG and the nodes X1 ,X2 ,...,Xn represents the random variables and its edges represents the direct dependencies between these variables. The graph G encodes independence assumptions, where each variable Xi is independent of its nondescendents given its parents in G. The second element of BN, Θ, represents the set of parameters of the network and contains a parameter θxixi = PB(xi | Πxi) for each occurrence of xi of Xi and Πxi of ΠXi, where ΠXi represents the parents of Xi in G. Accordingly, B represents a single JPD on V, denoted by:

To build a BN model two steps are needed [2]. The first is BN structure learning, which can be done from the data by constraint-based, score-based or mixed algorithms or with help of experts. The second step is to estimate the prior conditional probability for each connection between the nodes [2].

Structural Equation Modeling (SEM)

SEM (the figure above on right side) is a multivariate statistical methodology that includes Factor and Path Analysis [3]. In SEM the modeler hypothesizes a structure that expresses the existing knowledge. By this fact SEM is known as an a priori statistical technique. In this technique, the initial model is formulated through the creation of an expected covariance structure, which is tested against the covariance matrix of the observed data [4]. The null hypothesis H0 that formalizes the idea of SEM is:

H0 : Σ = Σ(θ)

where Σ is the covariance matrix of the observed variables for population or sample, θ is a vector with model parameters and Σ(θ) is the implicit covariance matrix in the model [5]. Unlike the conventional statistical models where the rejection of the null hypothesis is the target, in SEM the objective is to accept the null hypothesis [4]. This means that the existing data supports the proposed model. The model is adjusted by minimizing the differences between covariance observed and predicted by the model.

[1] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Mach. Learn., vol. 29, pp. 131–163, 1997.

[2] M. Scutari, “Learning Bayesian Networks with the bnlearn R Package,” J. Stat. Softw., vol. VV, no. 3, pp. 1–22, 2009.

[3] B. H. Pugesek, A. Tomer, and A. Von Eye, Structural Equation Modeling. Applications in Ecological and Evolutionary Biology. 2003.

[4] G. B. Arhonditsis, C. A. Stow, L. J. Steinberg, M. A. Kenney, R. C. Lathrop, S. J. McBride, and K. H. Reckhow, “Exploring ecological patterns with structural equation modeling and Bayesian analysis,” Ecol. Modell., vol. 192, no. 3–4, pp. 385–409, 2006.

[5] K. A. Bollen, Structural Equations with Latent Variables. 1989.