Bio
Since 2024 I hold a Chair in Computer and Data Science at University of Birmingham, UK, School of Computer Science. Prior to joining Birmingham, from 2011 to 2023 I have been a Lecturer, Reader, and Professor in the School of Computing at Newcastle University. I have been a Fellow (2018-2023) of the Alan Turing Institute, UK's National Institute for Data Science and Artificial Intelligence.
This happened after a prior career in industrial Applied Research at Bellcore, USA (telcom services), 1994-2001, and a spell (2001-2004) as independent consultant for the Italian Government on Data Quality in the Public Administration, and for a variety of public and private companies.
My qualifications include first and MSc degrees in Computing at Universita' di Udine, Italy (1990), a further MSc in Computer Science from University of Houston (USA), a PhD in Computer Science from the University of Manchester (2004-2008), with focus on infrastructure to support data quality in workflow programming for scientific applications.
I have been leading post-graduate teaching on Data Engineering for AI, and introduction to Predictive Analytics at the undergraduate level.
Since 2016 I have been Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ).
Contacts
School of Computer Science, University of Birmingham
Edgbaston Campus
Birmingham B15 2TT
United Kingdom
I am currently a Professor of Computer and Data Science at University of Birmingham, UK, School of Computer Science.
Please also visit my institutional pages here: https://www.birmingham.ac.uk/staff/profiles/computer-science/academic-staff/missier-paolo
My core expertise is in data and knowledge management, and data science and engineering, and applied machine learning.
Complete list of Publications (on BibBase) - most recent first
Current research interests:
Improving Data Science to improve Science:
Exploring how to make Data Science responsible and trustworthy by instrumenting complex data processing pipelines. This includes exploring new techniques for collecting and making sense of data provenance and audit traces and enhancing, specifically within the emerging context of Data-Centric AI.
Health Data Science at scale for participatory, preventative, personalised care:
The future of healthcare is not only data-driven, it is also participatory: we explore health technology and models aimed at motivating and empowering individuals to engage with their own health, enabling disease prevention and early onset detection. We envision "AI-enabled", trusted personal advisors who have current knowledge of our health trajectory and are able to track risk factors without the need for periodic clinical visits.
Exploring the potential of integrated care systems, including Health Records but also data from self-monitoring devices, to prevent and better manage multiple long-term conditions. What models and tools can we provide to best support the patient-clinician-carer ecosystem through the life course, and how can we measure and then improve the Quality of Life of multi-morbid chronic patients?
Please open the Research section below for details on current and past projects.
Teaching: my main interest is in Scalable Distributed Data Processing architectures. Please see below under "teaching"
Research
My background is in databases and data management. Below find an overview of my research interests and associated projects, past and current, with selected publications. Funding primarily from ESPRC (UKRI), National Institute for Health Research (NIHR) UK, Biomedical Research Council (UK), Newton Fund UK, and the Alan Turing Institute.
Health data science:
AI MULTIPLY: AI lens on multimorbidity and polypharmacy in London and the North East, NIHR funding, 2021-2024, PI Prof. Reynolds, Prof. Barnes https://ai-multiply.co.uk/
ADMISSION UK Multimorbidity Research Collaborative on Multiple Long-Term Conditions in Hospital: from burden and inequalities to underlying mechanism (2021-2024, PI: Prof. Sayers)
Digital biomarkers for preventive personalised healthcare, Turing Institute funding, 2019-2021
Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021
Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.
Data-driven vs knowledge-driven inference of health outcomes in the ageing population: a case study. Ferrari, D; Guaraldi, G; Mandreoli, F; Martoglia, R; Milic, J; and Missier, P. In DARLI workshop - Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 2020. CEUR-WS
Predicting respiratory failure in patients with COVID-19 pneumonia: a case study from Northern Italy. Ferrari, D.; Mandreoli, F.; Guaraldi, G.; and Missier, P. In The HELPLINE workshop, co-located with the 24th European Conference on AI (ECAI2020), Online!, 2020. CEUR-WS
Provenance of data and processes
W3C PROV data model for provenance (2011-2013).
DP4DS: Data Provenance for Data Science
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier, P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507–520, January, 2021
Abstracting PROV provenance graphs: A validity-preserving approach. Missier, P.; Bryans, J.; Gamble, C.; and Curcin, V. Future Generation Computer Systems, 111: 352 - 367. 2020.
The lifecycle of provenance metadata and its associated challenges and opportunities. Missier, P. In Lemieux, V., editor(s), Building Trust in Financial Information - Perspectives on the Frontiers of Provenance., volume Springer. Springer, 2016.
Analyzing Provenance across Heterogeneous Provenance Graphs. Oliveira, W., Missier, P., Ocana, K., de Oliveira, D., & Braganholo, V. In Procs. IPAW 2016, Washington D.C., USA, 2016.
Algorithmic fairness
Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V., Salas, J., Prangle, D., & Missier, P. In Velegrakis, Y., Zeinalipour-Yazti, D., Chrysanthis, P. K., & Guerra, F., editors, Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.
ReComp: preserving value from large-scale data analytics over time through selective re-computation
Efficient Re-computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications. Missier, P. & Cala, J. In Procs. IEEE Big Data Congress, Milano, Italy, 2019. IEEE.
Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study. Cal�a, J.; and Missier, P. Big Data Research, 13: 76 - 94. 2018. Big Medical/Healthcare Data Analytics
Provenance Annotation and Analysis to Support Process Re-Computation. Cala, J.; and Missier, P. In Procs. IPAW 2018, London, 2018. Springer
Social media analytics: discovering online social activists from the Twitter stream
A customisable pipeline for the semi-automated discovery of online activists and social campaigns on Twitter. Primo, F., Romanovsky, A., de Mello, R., Garcia, A., & Missier, P. WWW Journal, 2021.
Cloud-eGenome:: efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (2013-2015)
Scalable and Efficient Whole-exome Data Processing Using Workflows on the Cloud. Cala, J., Marei, E., Yu, Y., Takeda, K., & Missier, P. Future Generation Computer Systems, 2016.
Enabling trust-less and fair marketplaces for data streams using blockchain technology
Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S., Changyu, D., & Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE.
Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.
Workflows for e-science
Taverna, reloaded. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., & Goble, C. In Gertz, M, Hey, T, & Ludaescher, B, editors, Procs. SSDBM 2010, Heidelberg, Germany, 2010.
Distilling structure in Taverna scientific workflows: a refactoring approach. Cohen-Boulakia, S., Chen, J., Missier, P., Goble, C., Williams, A., & Froidevaux, C. BMC Bioinformatics, 15(Suppl 1):S12, 2014.
Data and Information Quality
Managing information quality in e-science: the Qurator workbench. Missier, P; Embury, S M; Greenwood, R M; Preece, A D; and Jin, B In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1150–1152, New York, NY, USA, 2007. ACM
Quality Views: Capturing and Exploiting the User Perspective on Data Quality. Missier, P, Embury, S M, Greenwood, M, Preece, A D, & Jin, B In Procs. VLDB, pages 977–988, Seoul, Korea, September, 2006.
P.Missier, Modelling and Computing Information Quality in e-science, PhD Thesis, University of Manchester, 2008
Teaching
CSC8101 Big Data Analytics, module leader, (MSc, 10 credits)
CSC3831 Predictive Analytics, , Computer Vision & AI, module leader (UG, 20 credits)
CSC2034, Contemporary Topics in Computing, UG, contributor
Publications
PhD Students
Current
Matt McTeer: AI methods for liver disease diagnosis and prognosis
Naif Alzahrani: On generating synthetic data from accelerometry for Digital Health applications
Philip Darke: Dynamic risk Modelling using longitudinal data from Health Records
Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021
Graduated
2023: Ben Lam: digital phenotypes from accelerometry data [Machine Learning for Health]
Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.
2022: Carlos Vladimiro Gonzales Zelaya: Algorithmic fairness. [Machine Learning]
Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V.; Salas, J.; Prangle, D.; and Missier, P. In Velegrakis, Y.; Zeinalipour-Yazti, D.; Chrysanthis, P. K.; and Guerra, F., editor(s), Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.
Parametrised Data Sampling for Fairness Optimisation. Gonzalez Zelaya, C. V., Missier, P., & Prangle, D. In Proceedings of Explainable AI for Fairness, Accountability & Transparency Workshop (KDD XAI), 2019. ACM.
2021: Shaimaa Bajoudah: data marketplaces without central trust [Blockchain]
Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.
Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S.; Changyu, D.; and Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE
2021: Yulong Gu: Knowledge graph link discovery [Logic Programming]
Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting. Gu, Y.; and Missier, P. In Procs. ILP '17, 27th International Conference on Inductive Logic Programming (late-breaking paper), Orleans, France, 2017. CEUR-WS
Priyaa Tavashimani: differencing provenance graphs
Why-Diff: Exploiting Provenance to Understand Outcome Differences from non-identical Reproduced Workflows. Thavasimani, P.; Cala, J.; and Missier, P. IEEE Access,1–1. 2019.
Rachel Thomson: phenotype annotations for genomics
Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder. Thompson, R.; Papakonstantinou Ntalis, A.; Beltran, S.; Tapf, A.; de Paula Estephan, E.; Polavarapu, K.; ’t Hoen, P. A. C.; Missier, P.; and Lochmuller, H. Human Mutation. 2019.
Targeted therapies for congenital myasthenic syndromes: systematic review and steps towards a treatabolome. Thompson, R.; Bonne, G.; Missier, P.; and Lochmüller, H. Emerging Topics in Life Sciences,ETLS20180100. jan 2019.
Hugo Firth: Query-aware Graph partitioning [graph data management]
Loom: Query-aware Partitioning of Online Graphs. Firth, H; and Missier, P In Procs. 21st EDBT, Vienna, Austria, 2018
TAPER: query-aware, partition-enhancement for large, heterogenous graphs. Firth, H.; and Missier, P. Distributed and Parallel Databases,1–31. 2017.
Tudor Miu: Activity recognition using accelerometers [Machine Learning]
Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. Miu, T.; Missier, P.; and Plötz, T. In Proceedings of the 14th IEEE International Conference on Ubiquitous Computing and Communications, 2015.
On Strategies for Budget-based Online Annotation in Human Activity Recognition. Miu, T.; Plötz, T.; Missier, P.; and Roggen, D. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, of UbiComp '14 Adjunct, pages 767–776, New York, NY, USA, 2014. ACM
Academic Community service
Sr. Associate Editor for the ACM Journal of Data and Information Quality
Member of the EPSRC Peer Review College
2021-present: grant reviewing: MRC (Medical Research Council), BBSRC, Wellcome Trust, EPSRC
2021-present Journal reviewing (most recent first): PVLDB, Philisophical Transactions of the Royal Society, Journal of Web Semantics, BMC Bioinformatics, Concurrency and Computation: Practice and Experience, IEEE Transaction on Cloud Computing, DAPD (Distributed and Parallel Databases), IEEE TKDE, Data and Knowledge Engineering (DKE), Knowledge and Information Systems, Information Systems (special issue on data integration), VLDB Journal, SIGMOD Record, Royal Society Open Science, Knowledge and Information Systems, WWW journal, Briefings in Bioinformatics
2021-present Conference Program Committees: PVLDB, CIKM, EDBT, SSDBM, TAPP (SIGMOD workshop), CMLS workshop, DaWak, ... and a number of other specialised workshops