Bayesian Networks and Path Analysis

BNPA R Package

This study proposes a hybrid approach, which unites the computational and statistical resources of the Bayesian Networks (BNs) to learn a network structure from one data set and the robustness of the statistical methods present in the Structural Equation Modeling (SEM).

Bayesian Networks (BN)

A BN B (the figure above on left side) is composed by a DAG encoding a Joint Probability Distribution (JPD) over a set of random variables V [1]. A BN for V is defined by the pair B = (G,Θ), where G is the DAG and the nodes X₁ ,X₂ ,...,X_n represents the random variables and its edges represents the direct dependencies between these variables. The graph G encodes independence assumptions, where each variable X_i is independent of its nondescendents given its parents in G. The second element of BN, Θ, represents the set of parameters of the network and contains a parameter θ_xi|Π_xi = P_B(x_i | Πx_i) for each occurrence of x_i of X_i and Π_xi of ΠX_i, where ΠX_i represents the parents of X_i in G. Accordingly, B represents a single JPD on V, denoted by:

To build a BN model two steps are needed [2]. The first is BN structure learning, which can be done from the data by constraint-based, score-based or mixed algorithms or with help of experts. The second step is to estimate the prior conditional probability for each connection between the nodes [2].

Structural Equation Modeling (SEM)

SEM (the figure above on right side) is a multivariate statistical methodology that includes Factor and Path Analysis [3]. In SEM the modeler hypothesizes a structure that expresses the existing knowledge. By this fact SEM is known as an a priori statistical technique. In this technique, the initial model is formulated through the creation of an expected covariance structure, which is tested against the covariance matrix of the observed data [4]. The null hypothesis H₀ that formalizes the idea of SEM is:

H₀ : Σ = Σ(θ)

where Σ is the covariance matrix of the observed variables for population or sample, θ is a vector with model parameters and Σ(θ) is the implicit covariance matrix in the model [5]. Unlike the conventional statistical models where the rejection of the null hypothesis is the target, in SEM the objective is to accept the null hypothesis [4]. This means that the existing data supports the proposed model. The model is adjusted by minimizing the differences between covariance observed and predicted by the model.

[1] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Mach. Learn., vol. 29, pp. 131–163, 1997.

[2] M. Scutari, “Learning Bayesian Networks with the bnlearn R Package,” J. Stat. Softw., vol. VV, no. 3, pp. 1–22, 2009.

[3] B. H. Pugesek, A. Tomer, and A. Von Eye, Structural Equation Modeling. Applications in Ecological and Evolutionary Biology. 2003.

[4] G. B. Arhonditsis, C. A. Stow, L. J. Steinberg, M. A. Kenney, R. C. Lathrop, S. J. McBride, and K. H. Reckhow, “Exploring ecological patterns with structural equation modeling and Bayesian analysis,” Ecol. Modell., vol. 192, no. 3–4, pp. 385–409, 2006.

[5] K. A. Bollen, Structural Equations with Latent Variables. 1989.