What can YouTube tell us about dog bites? Analysis of dog bites videos*n00b needs help Sara Owczarczak-Garstecka, Institute for Risk and Uncertainty, University of Liverpool, UK YouTube videos of dog bites present an unexplored opportunity to observe dog bites directly. We recorded the context of interactions preceding the bite, location of the bite, site of the bite on the body and victim and dog characteristics (like sex, age, size) for 143 videos. In addition, for 56 videos where the quality of a video allowed further inspection, details of human and dog behaviour preceding the bite were noted. Bite severity was also scored for each video using available visual characteristics (such as puncture wound, duration of a grip etc.) so that every bite was given an overall severity score. I would like to find out whether any of the demographic variables linked with a dog or victim, and context variables are associated with a greater severity of a bite. To do so, I think I need to put together a hierarchical regression model. I know that bite scores fit gamma distribution better than Poisson. I would also like to know if all of those variables are useful in predicting severity- if they actually add anything to the model. The model should also somehow reflect that bite contexts are not completely independent of one another but are a subset of possible categorisations.
I think that a Bayesian method is advantageous here because it helps to express uncertainty linked with having an uneven number of observations in different contexts (e.g. 2 videos in resting and 43 in play), relatively small dataset, and missing 10% of observations in ‘who initiated interactions’ variable. In addition, given number of observations, I would like to go around reporting multiple p-values, which I think is the alternative. There could be an easier way of doing this, and if that’s the case I would be keen to find out.
Software problems in Python
*technical problems
Uchenna Oparaji, Institute for Risk and Uncertainty, University of Liverpool, UK
I am trying to model a Bayesian Neural Network using theano and pymc3 in Python. I have followed the tutorials online on how to do this, however, I keep getting error messages that I cannot sort out.
Setting up a Bayesian machine
*software issues
Alfredo Garbuno, Institute for Risk and Uncertainty, University of Liverpool, UK I am currently trying out some recently available Python modules. I know there are a lot of other software tools that enable Bayesian analyses. How should I pick the software I should use?
Bayesian model updating
*technical problems
Roberto Rocchetta, Institute for Risk and Uncertainty, University of Liverpool, UK
A Bayesian model updating framework for damage identification is proposed and a high-fidelity FE model is used to simulated damaged components. To tackle the computational cost, surrogate (ANN) models are employed within the Transitional-MCMC algorithm. The goal is to find a set of component components that most likely explain the experimental evidence and, to do so, select an appropriate likelihood function is necessary. Different likelihood functions have been investigated, however, it is not clear how to select optimal likelihood? Is there a theoretical performance evaluation to help in the selection? In addition, uncertainty related issues (such as limited and noisy measurements and poorly known material properties of the device) complicate the identification procedure and lead, in some cases, to false-detections or miss-detections of structural damages.
Extreme climate events studied with a Bayesian network approach: application to a general hydropower station
*n00b needs help
Hector Diego Estrada-Lugo, Institute for Risk and Uncertainty, University of Liverpool, UK
Technological facilities might seriously be affected by extreme weather conditions (among other reasons) that can result in technological disasters triggered by natural threats. Also, the uncertainty factor associated with the global warming effect needs to be taken into account in the vulnerability quantification since it is not negligible. In this project, a probabilistic approach is applied in an example through the use of an Enhanced Bayesian Network to evaluate the effect of extreme precipitation in a dam overtopping event of a Hydropower facility. Different time and emission scenarios are considered to study this event trend when climate change factor takes place. Structure information from the Lianghekou hydropower station project in southwest China is used to construct the network.
An economic index of neighborhood attractiveness
*technical problems, *nagging concerns, *philosophical puzzles
Daniel Arribas-Bel, Department of Geography and Planning, University of Liverpool, UK
How attractive neighborhoods are is a highly subjective topic, and is also a characteristic likely to change over time. However, obtaining an accurate picture of the extent to which people value different areas of a city at a given moment is of relevance for a range of actors, from urban planners to commercial developers to potential new residents. Using underpinnings from urban economics and a Bayesian framework, this paper presents an index based on revealed preferences that is subject to be continuously updated as new data become available. In particular, the method is based on a hedonic model fitted using a Bayesian multilevel model. Because it is a hedonic regression, the model allows to recover the willingness to pay of dwellers for a particular location, backing out the effect of house characteristics. Because of its Bayesian nature, the index features two additional advantages: it is easily "updatable" without the need to re-run a regression with the entire dataset when new data becomes available; and instead of obtaining a single point estimate, an entire distribution is derived, allowing one to obtain a natural measure of uncertainty and making comparisons between neighborhoods very direct. Because of its properties, the approach is well suited to analyze housing data from new sources such as online listings or frequently released open data on house transactions. The resulting index provides a unique window to look into a host of urban phenomena, from neighborhood decline, to gentrification, to early warning systems of neighborhood change. The presentation will cover the basics of the index, two extensions to accomodate changing boundaries over time and make statistical comparisons between neighborhoods, and an empirical illustration using UK data.
Bayes’ rule in medical counseling: implications for kindergarteners’ cooties
*philosophical puzzles
Scott Ferson, Institute for Risk and Uncertainty, University of Liverpool, UK
Medical practitioners commonly diagnose a patient’s health condition by employing a medical test which is not by itself definitive, but has some statistical probability of revealing the true health state. Naively interpreting the result from a medical test can therefore lead to an incorrect assessment for a patient’s true health condition because of the possibility of false-positive and false-negative disease detections. Bayes’ rule is commonly used to estimate the actual chance a patient is sick given the results from the medical test, from the statistical characteristics of the test used and the underlying prevalence of the disease. However, Winkler and Smith have argued that the traditional application of Bayes’ rule in medical counseling is inappropriate and represents a “confusion in the medical decision-making literature”. They propose in its place a radically different formulation that makes special use of the information about the test results for new patients. Remarkably, Bayesians do not seem to have a means within their theory to determine whether the traditional approach or the Winkler and Smith approach is correct.
Bayes classifiers for functional data
*my analysis is perfect
Diego Andres Perez Ruiz, The University of Manchester, UK
We consider the problem of supervised classification for functional data. Classifiers for functional data pose a challenge. This is because the probability density function for functional data do not exists. Therefore is common to construct classifiers based on projections of the data. The most common use is the functional principal component scores. Therefore, we propose a new method to estimate the posterior probability based on the density ratios of projections on a sequence of scores that are common to the groups to be classified. A study of the asymptotic behaviour of the proposed classifiers in the large sample limit shows that under certain conditions the misclassification rate converges to zero. Some simulations with real data are showed.
Estimating the parameters of dynamical systems from large data sets using sequential Monte Carlo samplers
*technical problems
Peter Green, Institute for Risk and Uncertainty, University of Liverpool, UK
In this presentation, we show a method which facilitates computationally efficient parameter estimation of dynamical systems from a continuously growing set of measurement data. It is shown that the proposed method, which utilises Sequential Monte Carlo samplers, is guaranteed to be fully parallelisable (in contrast to Markov chain Monte Carlo methods) and can be applied to a wide variety of scenarios within structural dynamics. Its ability to allow convergence of one’s parameter estimates, as more data is analysed, sets it apart from other sequential methods (such as the particle filter).
Mechanical equivalent of logical inference
*solved problems
Andrea Verzobio, Department of Civil and Environmental Engineering University of Strathclyde
Structural Health Monitoring requires engineers to understand the state of a structure from its observed response. When this information is uncertain, Bayesian probability theory provides a consistent framework for making inferences. However, structural engineers are often unenthusiastic about Bayesian logic and prefer to make inference using heuristics. Here, we propose a quantitative method for logical inference based on a formal analogy between linear elastic mechanics and Bayesian inference with linear Gaussian variables. To start, we investigate the case of single parameter estimation, where the analogy is stated as follows: the value of the parameter is represented by the position of a cursor bar with one degree of freedom; uncertain pieces of information on the parameter are modelled as linear elastic springs in series or parallel, connected to the bar and each with stiffness equal to its accuracy; the posterior mean value and the accuracy of the parameter correspond respectively to the position of the bar in equilibrium and to the resulting stiffness of the mechanical system composed of the bar and the set of springs. Similarly, a multi-parameter estimation problem is reproduced by a mechanical system with as many degrees of freedom as the number of unknown parameters. In this case, the inverse covariance matrix of the parameters corresponds to the Hessian of the potential energy, while the posterior mean values of the parameters coincide with the equilibrium – or minimum potential energy – position of the mechanical system. We use the mechanical analogy to estimate, in the Bayesian sense, the drift of elongation of a bridge cable-stay undergoing continuous monitoring. We demonstrate how we can solve this in the same way as any other linear Bayesian inference problem, by simply expressing the potential energy of the equivalent mechanical system. We finally discuss the extension of the method to non-Gaussian estimation problems.
TUTORIAL: Prior knowledge, proportions and probabilities of probabilities
Alejandro Diaz, Institute for Risk and Uncertainty, University of Liverpool, UK
Which proportion is higher: 5 out of a 10 or 400 out of 1000? The answer seems obvious. But if these proportions were number of successes divided by number of trials, would you still think the same? Is a baseball player who achieved 5 hits out of 10 chances throughout his career, better than one who achieved 400 hits out of 1000 chances? In this introductory tutorial, we will see how Bayesian inference helps us add context in order to make decisions. The key will reside on representing prior knowledge using a probability distribution for probabilities: the very famous and elegant beta distribution.
TUTORIAL: Bayesian linear regression and hierarchical models
Alfredo Garbuno, Institute for Risk and Uncertainty, University of Liverpool, UK
Bayesian data analysis allows researchers to conduct probabilistic inference about non-observable quantities in a statistical model. This introductory workshop is aimed at those interested in applying the Bayesian paradigm in their data analysis tasks. The tutorial will start with Bayesian linear regression models, and will provide guidelines for probabilistic enhancement to the model's complexity. This improvement will lead to the hierarchical regression model in which the Bayesian paradigm allows for a more flexible model, whilst providing a natural mechanism to prevent over-fitting. The session will present a classical Bayesian regression problem which can be followed through Python notebooks.
Tutorial: Quantifying uncertainty using data and experts
Ullrika Sahlin, Centre for Environmental and Climate Research, Lund University, SE
This tutorial introduces some of the basic principles to quantify uncertainty by Bayesian probability. I will demonstrate a way to quantify uncertainty by integrating expert’s knowledge and data. Participants can follow practical examples in R, using existing R packages for expert’s elicitation (SHELF) and sampling from the posterior (rjags requiring JAGS). The first example is a simple risk classification problem under sparse information and several experts with differing judgements. The second example is the familiar task to quantify uncertainty in input parameters of an assessment model using different sources of information and where uncertainty in assessment output matters.
TUTORIAL: Generating samples from strange probability distributions
Peter Green, Institute for Risk and Uncertainty, University of Liverpool, UK
When conducting a probabilistic analysis, we often end up having to generate samples from a probability distribution. This, for example, is a crucial part of Monte Carlo simulations. For better-known probability distributions (Gaussian, uniform etc.), some simple tricks allow us to generate samples without too much difficulty. For the more ‘strange-looking’ distributions – which commonly arise in a Bayesian analysis – the problem becomes more difficult. This tutorial describes methods which can be used to generate samples from generic probability distributions. They often form an essential part of a Bayesian analysis. The tutorial is aimed at beginners, and will cover basic sampling algorithms before describing Markov chain Monte Carlo (MCMC) and importance sampling algorithms. Sample Matlab code will also be provided.
TUTORIAL: Approximate Bayesian computation (ABC)
Brendan McCabe, Economics, Finance and Accounting, University of Liverpool, UK
This tutorial looks at how to do Bayesian Inference when it is too difficult to calculate the true likelihood and hence the exact posterior. (This is a Bayesian version of frequentist ‘indirect inference’ really.) We use model based summary statistics to match simulations form the assumed (difficult) model with the actual data at hand. Conventional approaches to ABC emphasize the role of parameter estimation but, in time series problems, forecasting is often the focus of attention and so it is to this dimension we direct our efforts. The role of Bayesian consistency is highlighted.