
Since 2024 I hold a Chair in Computer and Data Science at University of Birmingham, UK, School of Computer Science. Prior to joining Birmingham, from 2011 to 2023 I have been a Lecturer, Reader, and Professor in the School of Computing at Newcastle University. I have been a Fellow (2018-2023) of the Alan Turing Institute, UK's National Institute for Data Science and Artificial Intelligence.
This happened after a prior career in industrial Applied Research at Bellcore, USA (telcom services), 1994-2001, and a spell (2001-2004) as independent consultant for the Italian Government on Data Quality in the Public Administration, and for a variety of public and private companies.

My qualifications include first and MSc degrees in Computing at Universita' di Udine, Italy (1990), a further MSc in Computer Science from University of Houston (USA), a PhD in Computer Science from the University of Manchester (2004-2008), with focus on infrastructure to support data quality in workflow programming for scientific applications.

I have been leading post-graduate teaching on Data Engineering for AI, and introduction to Predictive Analytics at the undergraduate level.

Since 2016 I have been Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ).


School of Computer Science, University of Birmingham

Edgbaston Campus
Birmingham B15 2TT
United Kingdom

I am currently a Professor of Computer and Data Science at  University of Birmingham, UK, School of Computer Science.

Please also visit my institutional pages here:

My core expertise is in data and knowledge management, and data science and engineering, and applied machine learning.

Complete list of Publications (on BibBase) - most recent first

Current research interests:

Please open the Research section below for details on current and past projects.

Teaching: my main interest is in Scalable Distributed Data Processing architectures. Please see below under "teaching"


My background is in databases and data management. Below find an overview of my research interests and associated projects, past and current, with selected publications. Funding primarily from ESPRC (UKRI), National Institute for Health Research (NIHR) UK, Biomedical Research Council (UK), Newton Fund UK, and the Alan Turing Institute.

Health data science:

Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021

Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.

Data-driven vs knowledge-driven inference of health outcomes in the ageing population: a case study. Ferrari, D; Guaraldi, G; Mandreoli, F; Martoglia, R; Milic, J; and Missier, P. In DARLI workshop - Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 2020. CEUR-WS

Predicting respiratory failure in patients with COVID-19 pneumonia: a case study from Northern Italy. Ferrari, D.; Mandreoli, F.; Guaraldi, G.; and Missier, P. In The HELPLINE workshop, co-located with the 24th European Conference on AI (ECAI2020), Online!, 2020. CEUR-WS

Provenance of data and processes

Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier, P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507–520, January, 2021

Abstracting PROV provenance graphs: A validity-preserving approach. Missier, P.; Bryans, J.; Gamble, C.; and Curcin, V. Future Generation Computer Systems, 111: 352 - 367. 2020.

The lifecycle of provenance metadata and its associated challenges and opportunities. Missier, P. In Lemieux, V., editor(s), Building Trust in Financial Information - Perspectives on the Frontiers of Provenance., volume Springer. Springer, 2016.

Analyzing Provenance across Heterogeneous Provenance Graphs. Oliveira, W., Missier, P., Ocana, K., de Oliveira, D., & Braganholo, V. In Procs. IPAW 2016, Washington D.C., USA, 2016.

Algorithmic fairness

Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V., Salas, J., Prangle, D., & Missier, P. In Velegrakis, Y., Zeinalipour-Yazti, D., Chrysanthis, P. K., & Guerra, F., editors, Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.

ReComp: preserving value from large-scale data analytics over time through selective re-computation

Project web site 

Most recent talk: ReComp: optimising the re-execution of analytics pipelines in response to changes in the data

Efficient Re-computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications. Missier, P. & Cala, J. In Procs. IEEE Big Data Congress, Milano, Italy, 2019. IEEE.

Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study. Cal�a, J.; and Missier, P. Big Data Research, 13: 76 - 94. 2018. Big Medical/Healthcare Data Analytics

Provenance Annotation and Analysis to Support Process Re-Computation. Cala, J.; and Missier, P. In Procs. IPAW 2018, London, 2018. Springer

Social media analytics: discovering online social activists from the Twitter stream

A customisable pipeline for the semi-automated discovery of online activists and social campaigns on Twitter. Primo, F., Romanovsky, A., de Mello, R., Garcia, A., & Missier, P. WWW Journal, 2021.

Cloud-eGenome:: efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (2013-2015)

Scalable and Efficient Whole-exome Data Processing Using Workflows on the Cloud. Cala, J., Marei, E., Yu, Y., Takeda, K., & Missier, P. Future Generation Computer Systems, 2016.

Enabling trust-less and fair marketplaces for data streams using blockchain technology 

Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S., Changyu, D., & Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE.

Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.

Workflows for e-science

Taverna, reloaded. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., & Goble, C. In Gertz, M, Hey, T, & Ludaescher, B, editors, Procs. SSDBM 2010, Heidelberg, Germany, 2010.

Distilling structure in Taverna scientific workflows: a refactoring approach. Cohen-Boulakia, S., Chen, J., Missier, P., Goble, C., Williams, A., & Froidevaux, C. BMC Bioinformatics, 15(Suppl 1):S12, 2014.

Data and Information Quality

Managing information quality in e-science: the Qurator workbench. Missier, P; Embury, S M; Greenwood, R M; Preece, A D; and Jin, B In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1150–1152, New York, NY, USA, 2007. ACM

Quality Views: Capturing and Exploiting the User Perspective on Data Quality. Missier, P, Embury, S M, Greenwood, M, Preece, A D, & Jin, B In Procs. VLDB, pages 977–988, Seoul, Korea, September, 2006.

P.Missier, Modelling and Computing Information Quality in e-science, PhD Thesis, University of Manchester, 2008 


PhD Students


Matt McTeer: AI methods for liver disease diagnosis and prognosis

Naif Alzahrani: On generating synthetic data from accelerometry for Digital Health applications

Philip Darke: Dynamic risk Modelling using longitudinal data from Health Records

Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021


2023: Ben Lam: digital phenotypes from accelerometry data [Machine Learning for Health]

Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.

2022: Carlos Vladimiro Gonzales Zelaya:  Algorithmic fairness. [Machine Learning]

Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V.; Salas, J.; Prangle, D.; and Missier, P. In Velegrakis, Y.; Zeinalipour-Yazti, D.; Chrysanthis, P. K.; and Guerra, F., editor(s), Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.

Parametrised Data Sampling for Fairness Optimisation. Gonzalez Zelaya, C. V., Missier, P., & Prangle, D. In Proceedings of Explainable AI for Fairness, Accountability & Transparency Workshop (KDD XAI), 2019. ACM.

2021: Shaimaa Bajoudah: data marketplaces without central trust [Blockchain]

Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.

Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S.; Changyu, D.; and Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE

2021: Yulong Gu: Knowledge graph link discovery [Logic Programming]

Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting. Gu, Y.; and Missier, P. In Procs. ILP '17, 27th International Conference on Inductive Logic Programming (late-breaking paper), Orleans, France, 2017. CEUR-WS

Priyaa Tavashimani: differencing provenance graphs

Why-Diff: Exploiting Provenance to Understand Outcome Differences from non-identical Reproduced Workflows. Thavasimani, P.; Cala, J.; and Missier, P. IEEE Access,1–1. 2019.

Rachel Thomson: phenotype annotations for genomics

Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder. Thompson, R.; Papakonstantinou Ntalis, A.; Beltran, S.; Tapf, A.; de Paula Estephan, E.; Polavarapu, K.; ’t Hoen, P. A. C.; Missier, P.; and Lochmuller, H. Human Mutation. 2019.

Targeted therapies for congenital myasthenic syndromes: systematic review and steps towards a treatabolome. Thompson, R.; Bonne, G.; Missier, P.; and Lochmüller, H. Emerging Topics in Life Sciences,ETLS20180100. jan 2019.

Hugo Firth: Query-aware Graph partitioning [graph data management]

Loom: Query-aware Partitioning of Online Graphs. Firth, H; and Missier, P In Procs. 21st EDBT, Vienna, Austria, 2018

TAPER: query-aware, partition-enhancement for large, heterogenous graphs. Firth, H.; and Missier, P. Distributed and Parallel Databases,1–31. 2017.

Tudor Miu: Activity recognition using accelerometers [Machine Learning]

Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. Miu, T.; Missier, P.; and Plötz, T. In Proceedings of the 14th IEEE International Conference on Ubiquitous Computing and Communications, 2015.

On Strategies for Budget-based Online Annotation in Human Activity Recognition. Miu, T.; Plötz, T.; Missier, P.; and Roggen, D. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, of UbiComp '14 Adjunct, pages 767–776, New York, NY, USA, 2014. ACM

Academic Community service

Sr. Associate Editor for the ACM Journal of Data and Information Quality

Member of the EPSRC Peer Review College 

2021-present: grant reviewing: MRC (Medical Research Council), BBSRC, Wellcome Trust, EPSRC

2021-present Journal reviewing (most recent first): PVLDB, Philisophical Transactions of the Royal Society,  Journal of Web Semantics, BMC Bioinformatics, Concurrency and Computation: Practice and Experience, IEEE Transaction on Cloud Computing, DAPD (Distributed and Parallel Databases), IEEE TKDE, Data and Knowledge Engineering (DKE), Knowledge and Information Systems, Information Systems (special issue on data integration), VLDB Journal, SIGMOD Record, Royal Society Open Science, Knowledge and Information Systems, WWW journal, Briefings in Bioinformatics

2021-present Conference Program Committees: PVLDB, CIKM, EDBT, SSDBM, TAPP (SIGMOD workshop), CMLS workshop, DaWak, ... and a number of other specialised workshops