Talk 1

One Day Meeting on Advanced Statistics Topics

CIDMA- University of Aveiro

(Joint work with Ricardo Pereira, NOVA Math & NOVA Faculdade de Ciências e Tecnologia, Portugal)



Title of the talk: Statistical modeling and/or machine learning?  


Abstract: In the current data world, “large-size”, “complexity” and “rush” go together. Driven by the former two and pushed by the latter, machine learning models have been widely used when the aim is to establish data relationships and to do predictions for the issues addressed by the data. Machine learning methods are pragmatic tools, as they look for the most accurately predictions and aim to find data patterns without assuming an underlying structure for data. Unlike traditional statistical modelling, concerned with creating a model for the data generation process and doing inference about the relationships between the observed data variables, making strong assumptions about the data probability distribution, type of relationships to consider between variables, assuming additive predictors and parsimonious parametric models. All these assumptions need to be validated so that one can be confident in results. That strong assumptions are usually not required by machine learning methods, as they often rely on non-parametric models. However, they also need from the statistical learning theory prediction functions, probabilistic and statistical learning algorithms and the assessment of their properties.Actually, both statistical modelling and machine learning are valid approaches for the common aim of establishing patterns and doing prediction but, although may result in similar conclusions, are quite different in conception, assumptions and application.  The emergence of getting results for big, complex datasets have launched machine learning methods, but the statistical approach is often more robust, establishing interpretable relationships and allowing inference. In this talk the two approaches are compared, based on the most common models of each approach, establishing advantages and disadvantages and detailing when to use one or the other, on a statistician point of view. 

Isabel Natário completed her PhD in Statistics and Operational Research in 2005 in Faculty of Sciences of University of Lisbon, a Master's degree in Statistics and Operational Research in 1999 in the same institution and a Degree in Applied Mathematics and Computing - specialization in Probabilities and Statistics in 1995 in Instituto Superior Técnico. She is an Associate Professor at NOVA School of Science and Technology. She is one of the coordinators of the Master's Degree in Statistics in Health. She has published articles in specialized journals, book chapters and 1 book. She has participated as Investigator in some projects and she was Co-Principal Investigator in one project. In her curriculum, the most frequent terms in the context of scientific production are: Spatial modeling; Bayesian hierarchical models; spatial econometrics. 


ORCID: 0000-0001-6020-9373