Professor Paolo Missier, PhD (Computer Science)

School of Computing, Newcastle University
Urban Sciences Building, Firebrick Avenue
Newcastle upon Tyne, NE4 5TG, United Kingdom

paolo dot missier at newcastle dot ac dot uk
Official staff page @Newcastle
my Google Scholar profile
Scopus WOS page

: [self-curated] [Google Scholar] [Official University page]
 [DBLP] [ResearchGate]

Recent funded projects

  • 2021-22: CO-I, Assessment of well-being state in Post-acute COVID Syndrome: an international multicentre prospective cohort study (GILEAD, PI Prof. Guaraldi, Universita' di Modena e Reggio Emilia)
  • 2020-21: CO-I, AI for Multiple Long-Term Conditions, NIHR AIM Development Grant
  • 2021-2024: CO-I, ADMISSION: UK Multimorbidity Research Collaborative on Multiple Long-Term Conditions in Hospital, NIHR
  • 2019-2021: PI, P4@NU: Addressing Data Science challenges associated with the development of new data-driven models to enable a vision of Preventive, Predictive, Personalised, and Participatory (P4) medicine.  See funding from the Alan Turing Institute in the UK. Read more
  • 2016-2019PIReComp: sustained value extraction from analytics by recurring, selective re-computation. (EPSRC funding, £585,000, 2016-2019).   [invited talk]. [8,9,3,5]

PhD students and their research

Current students
  • Carlos Vladimiro Gonzales Zelaya:  Algorithmic fairness. [Machine Learning]. [7]
  • Ben Lam: digital phenotypes from accelerometry data [Machine Learning for Health]
  • Philip Darke: Dynamic risk Modelling using longitudinal data from Health Records
  • Shaimaa Bajoudah: data marketplaces without central trust [Blockchain] [19,20]
Graduated students
  • Yulong Gu: Knowledge graph link discovery [Logic Programming[18]
  • Hugo Firth: Query-aware Graph partitioning [graph data management]. [21,22]
  • Rachel Thomson: phenotype annotations for genomics [23,24]
  • Tudor Miu: Activity recognition using accelerometers [Machine Learning] [25, 26]

Selected publications referenced on this page

Please see "publications" tab for the whole list

  1. Abstracting PROV provenance graphs: A validity-preserving approach. Missier, P.; Bryans, J.; Gamble, C.; and Curcin, V. Future Generation Computer Systems, 111: 352 - 367. 2020.
  2. Why-Diff: Exploiting Provenance to Understand Outcome Differences from non-identical Reproduced Workflows. Thavasimani, P.; Cala, J.; and Missier, P. IEEE Access,1–1. 2019.
  3. Provenance Annotation and Analysis to Support Process Re-Computation. Cala, J.; and Missier, P. In Procs. IPAW 2018, London, 2018. Springer
  4. The lifecycle of provenance metadata and its associated challenges and opportunities. Missier, P. In Lemieux, V., editor(s), Building Trust in Financial Information - Perspectives on the Frontiers of Provenance., volume Springer. Springer, 2016.
  5. The data, they are a-changin'. Missier, P.; Cala, J.; and Wijaya, E. In Cohen-Boulakia, S., editor(s), Proc. TAPP'16 (Theory and Practice of Provenance), Washington D.C., USA, 2016. USENIX Association
  6. Analyzing Provenance across Heterogeneous Provenance Graphs. Oliveira, W.; Missier, P.; Ocana, K.; de Oliveira, D.; and Braganholo, V. In Procs. IPAW 2016, Washington D.C., USA, 2016. Springer
  7. Parametrised Data Sampling for Fairness Optimisation. Gonz�lez Zelaya, C. V.; Missier, P.; and Prangle, D. In Proceedings of Explainable AI for Fairness, Accountability & Transparency Workshop (KDD XAI), 2019. ACM
  8. Efficient Re-computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications. Missier, P.; and Cala, J. In Procs. IEEE Big Data Congress, Milano, Italy, 2019. IEEE
  9. Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study. Cal�a, J.; and Missier, P. Big Data Research, 13: 76 - 94. 2018. Big Medical/Healthcare Data Analytics
  10. A customisable pipeline for continuously harvesting socially-minded Twitter users. Primo, F.; Missier, P.; Romanovsky, A.; Mickael, F.; and Cacho, N. In procs. ICWE'19, Daedjeon, Korea, 2019.
  11. Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S.; Changyu, D.; and Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE
  12. Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.
  13. Scalable and Efficient Whole-exome Data Processing Using Workflows on the Cloud. Cala, J.; Marei, E.; Yu, Y.; Takeda, K.; and Missier, P. Future Generation Computer Systems, In press(Special Issue: Big Data in the Cloud - Best paper award at the FGCS forum 2016). 2016.
  14. Quality Views: Capturing and Exploiting the User Perspective on Data Quality. Missier, P; Embury, S M; Greenwood, M; Preece, A D; and Jin, B In Procs. VLDB, pages 977–988, Seoul, Korea, September 2006.
  15. Managing information quality in e-science: the Qurator workbench. Missier, P; Embury, S M; Greenwood, R M; Preece, A D; and Jin, B In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1150–1152, New York, NY, USA, 2007. ACM
  16. Data-driven vs knowledge-driven inference of health outcomes in the ageing population: a case study. Ferrari, D; Guaraldi, G; Mandreoli, F; Martoglia, R; Milic, J; and Missier, P. In DARLI workshop - Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 2020. CEUR-WS
  17. Predicting respiratory failure in patients with COVID-19 pneumonia: a case study from Northern Italy. Ferrari, D.; Mandreoli, F.; Guaraldi, G.; and Missier, P. In The HELPLINE workshop, co-located with the 24th European Conference on AI (ECAI2020), Online!, 2020. CEUR-WS
  18. Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting. Gu, Y.; and Missier, P. In Procs. ILP '17, 27th International Conference on Inductive Logic Programming (late-breaking paper), Orleans, France, 2017. CEUR-WS
  19. Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.
  20. Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S.; Changyu, D.; and Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE
  21. Loom: Query-aware Partitioning of Online Graphs. Firth, H; and Missier, P In Procs. 21st International Conference on Extending Database Technology (EDBT), Vienna, Austria, 2018. EDBT
  22. TAPER: query-aware, partition-enhancement for large, heterogenous graphs. Firth, H.; and Missier, P. Distributed and Parallel Databases,1–31. 2017.
  23. Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder. Thompson, R.; Papakonstantinou Ntalis, A.; Beltran, S.; Tapf, A.; de Paula Estephan, E.; Polavarapu, K.; ’t Hoen, P. A. C.; Missier, P.; and Lochmuller, H. Human Mutation. 2019.
  24. Targeted therapies for congenital myasthenic syndromes: systematic review and steps towards a treatabolome. Thompson, R.; Bonne, G.; Missier, P.; and Lochmüller, H. Emerging Topics in Life Sciences,ETLS20180100. jan 2019.
  25. Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. Miu, T.; Missier, P.; and Plötz, T. In Proceedings of the 14th IEEE International Conference on Ubiquitous Computing and Communications, 2015.
  26. On Strategies for Budget-based Online Annotation in Human Activity Recognition. Miu, T.; Plötz, T.; Missier, P.; and Roggen, D. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, of UbiComp '14 Adjunct, pages 767–776, New York, NY, USA, 2014. ACM


I am Professor of Scalable Data Analytics with the School of Computing at Newcastle University and currently a Fellow (2018-2020) of the Alan Turing Institute, UK's National Institute for Data Science and Artificial Intelligence.

With a background in traditional databases and data management, my research has touched on Data and Information Quality, web semantics, workflow-based infrastructure for e-science, and data provenance, where he has published over 150 peer reviewed research articles. I have been involved in the specification of the W3C PROV data model for provenance (2011-2013) where I have been co-editor of the main recommendation documents.

My work has been funded over the years by EPSRC, NIHR-BRC, Newton Fund UK, Microsoft Azire for Research, and the Turing Institute. Funded projects include Cloud-e-Genome on scalable processing of genomics pipeline on the cloud (NIHR-BRC), ReComp on optimising analytics pipelines in response to changes in data (EPSRC), P4@NU (towards Personalised, Participatory, Preventive, Precision Medicine), and more.
My more recent research portfolio addresses the challenges and opportunities of Applied Data Science and Machine Learning for Health, with contributions on the search for "digital markers" from self-monitoring devices, and recent work on predicting respiratory crises  in acute covid patients.

My interests are also expanding to the management and exploitation of data provenance in data science pipelines, and to he study of algorithmic fairness.

I started my career as Research Scientist at Bell Communications Research, USA (1994-2001), and then as a Research Fellow at the University of Manchester, School of Computer Science (2004-2011) where I received my PhD in 2008.

At Newcastle I lead the School of Computing's post-graduate academic teaching on Big Data Analytics. 
I am Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ).

Research profile

Current research focus: 
Data Engineering and Data Science for Health [16,17]
- Methods for predicting and preventing age-related diseases through Machine Learning;
- Personalised disease trajectories
- Discovery of digital biomarkers from self-monitoring devices (wearables).

Other research interests, current and past:
  • Provenance of data and processes. [1,2,3,4,5,6]. I have also been involved in the specification of the W3C PROV data model for provenance (2011-2013).
  • Optimisation of algorithmic fairness [7]
  • ReComp: preserving value from large-scale data analytics over time through selective re-computation [invited talk]. [8,9,3,5]
  • Social media analytics (Twitter) to help health authorities combat Zika and Dengue epidemics [10]
  • Enabling trust-less and fair marketplaces for data streams using blockchain technology [11,12]
  • Implementing efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (funded project:  Cloud-eGenome:, 2013-2015, PI, MRC/ NIHR) [13]
  • Data and Information QualityDuring my PhD I proposed the notion of Quality Views, a semantics-based method for semi-automatically adding data quality control to scientific workflows [14,15]