Bio

I am currently Professor of Scalable Data Analytics with the School of Computing at Newcastle University and a Fellow (2018-2022) of the Alan Turing Institute, UK's National Institute for Data Science and Artificial Intelligence.

After first and MSc degrees in Computing at Universita' di Udine, Italy (1990), and a further MSc in Computer Science from University of Houston (USA), my career began as Research Scientist at Bell Communications Research, USA (1994-2001), working on Software and Data Engineering for next-generation telco services.

After a 3-year spell as independent consultant for the Italian Government and as adjunct professor at Universita' Milano Bicocca, in 2004 I moved on to University of Manchester, School of Computer Science where in 2018 I completed my PhD in Computer Science, and where I stayed on as postdoc until 2011, worklng in Prof. Carole Goble's eScience group.

I then moved to Newcastle in 2011 as a Lecturer, progressing directly to Reader and then, in 2020, Professor.

At Newcastle I lead the School of Computing's post-graduate academic teaching on Big Data Analytics, as well as a UG module on Predictive Analytics.

In our School, I co-lead the Scalable Computing Group.

I am Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ).


Contacts

School of Computing, Newcastle University

Urban Sciences Building, Firebrick Avenue

Newcastle upon Tyne, NE4 5TG

United Kingdom

paolo.missier at ncl dot ac dot uk

Research

My background is in databases and data management. Below find an overview of my research interests and associated projects, past and current, with selected publications. Funding primarily from ESPRC (UKRI), National Institute for Health Research (NIHR) UK, Biomedical Research Council (UK), Newton Fund UK, and the Alan Turing Institute.

Health data science:

  • AI MULTIPLY: AI lens on multimorbidity and polypharmacy in London and the North East, NIHR funding, 2021-2024, PI Prof. Reynolds, Prof. Barnes https://ai-multiply.co.uk/

  • ADMISSION UK Multimorbidity Research Collaborative on Multiple Long-Term Conditions in Hospital: from burden and inequalities to underlying mechanism (2021-2024, PI: Prof. Sayers)

  • Digital biomarkers for preventive personalised healthcare, Turing Institute funding, 2019-2021

Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021

Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.

Data-driven vs knowledge-driven inference of health outcomes in the ageing population: a case study. Ferrari, D; Guaraldi, G; Mandreoli, F; Martoglia, R; Milic, J; and Missier, P. In DARLI workshop - Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 2020. CEUR-WS

Predicting respiratory failure in patients with COVID-19 pneumonia: a case study from Northern Italy. Ferrari, D.; Mandreoli, F.; Guaraldi, G.; and Missier, P. In The HELPLINE workshop, co-located with the 24th European Conference on AI (ECAI2020), Online!, 2020. CEUR-WS

Provenance of data and processes

Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier, P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507–520, January, 2021

Abstracting PROV provenance graphs: A validity-preserving approach. Missier, P.; Bryans, J.; Gamble, C.; and Curcin, V. Future Generation Computer Systems, 111: 352 - 367. 2020.

The lifecycle of provenance metadata and its associated challenges and opportunities. Missier, P. In Lemieux, V., editor(s), Building Trust in Financial Information - Perspectives on the Frontiers of Provenance., volume Springer. Springer, 2016.

Analyzing Provenance across Heterogeneous Provenance Graphs. Oliveira, W., Missier, P., Ocana, K., de Oliveira, D., & Braganholo, V. In Procs. IPAW 2016, Washington D.C., USA, 2016.

Algorithmic fairness

Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V., Salas, J., Prangle, D., & Missier, P. In Velegrakis, Y., Zeinalipour-Yazti, D., Chrysanthis, P. K., & Guerra, F., editors, Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.

ReComp: preserving value from large-scale data analytics over time through selective re-computation

Project web site

Most recent talk: ReComp: optimising the re-execution of analytics pipelines in response to changes in the data

Efficient Re-computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications. Missier, P. & Cala, J. In Procs. IEEE Big Data Congress, Milano, Italy, 2019. IEEE.

Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study. Cal�a, J.; and Missier, P. Big Data Research, 13: 76 - 94. 2018. Big Medical/Healthcare Data Analytics

Provenance Annotation and Analysis to Support Process Re-Computation. Cala, J.; and Missier, P. In Procs. IPAW 2018, London, 2018. Springer

Social media analytics: discovering online social activists from the Twitter stream

A customisable pipeline for the semi-automated discovery of online activists and social campaigns on Twitter. Primo, F., Romanovsky, A., de Mello, R., Garcia, A., & Missier, P. WWW Journal, 2021.

Cloud-eGenome:: efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (2013-2015)

Scalable and Efficient Whole-exome Data Processing Using Workflows on the Cloud. Cala, J., Marei, E., Yu, Y., Takeda, K., & Missier, P. Future Generation Computer Systems, 2016.

Enabling trust-less and fair marketplaces for data streams using blockchain technology

Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S., Changyu, D., & Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE.

Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.

Workflows for e-science

Taverna, reloaded. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., & Goble, C. In Gertz, M, Hey, T, & Ludaescher, B, editors, Procs. SSDBM 2010, Heidelberg, Germany, 2010.

Distilling structure in Taverna scientific workflows: a refactoring approach. Cohen-Boulakia, S., Chen, J., Missier, P., Goble, C., Williams, A., & Froidevaux, C. BMC Bioinformatics, 15(Suppl 1):S12, 2014.

Data and Information Quality

Managing information quality in e-science: the Qurator workbench. Missier, P; Embury, S M; Greenwood, R M; Preece, A D; and Jin, B In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1150–1152, New York, NY, USA, 2007. ACM

Quality Views: Capturing and Exploiting the User Perspective on Data Quality. Missier, P, Embury, S M, Greenwood, M, Preece, A D, & Jin, B In Procs. VLDB, pages 977–988, Seoul, Korea, September, 2006.

P.Missier, Modelling and Computing Information Quality in e-science, PhD Thesis, University of Manchester, 2008

Teaching

  • CSC8101 Big Data Analytics, module leader, (MSc, 10 credits)

  • CSC3831 Predictive Analytics, , Computer Vision & AI, module leader (UG, 20 credits)

  • CSC2034, Contemporary Topics in Computing, UG, contributor

PhD Students

Current

Matt McTeer: AI methods for liver disease diagnosis and prognosis

Naif Alzahrani: On generating synthetic data from accelerometry for Digital Health applications

Philip Darke: Dynamic risk Modelling using longitudinal data from Health Records

Curating a longitudinal research resource using linked primary care EHR data—a UK Biobank case study. Darke, P., Cassidy, S., Catt, M., Taylor, R., Missier, P., & Bacardit, J. Journal of the American Medical Informatics Association, December, 2021

Ben Lam: digital phenotypes from accelerometry data [Machine Learning for Health]

Using wearable activity trackers to predict Type-2 Diabetes: A machine learning-based cross-sectional study of the UK Biobank accelerometer cohort. Lam, B, Catt, M, Cassidy, S, Bacardit, J, Darke, P, Butterfield, S, Alshabrawy, O, Trenell, M, & Missier, P JMIR Diabetes, January, 2021.


Graduated

2022: Carlos Vladimiro Gonzales Zelaya: Algorithmic fairness. [Machine Learning]

Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V.; Salas, J.; Prangle, D.; and Missier, P. In Velegrakis, Y.; Zeinalipour-Yazti, D.; Chrysanthis, P. K.; and Guerra, F., editor(s), Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 445–450, 2021.

Parametrised Data Sampling for Fairness Optimisation. Gonzalez Zelaya, C. V., Missier, P., & Prangle, D. In Proceedings of Explainable AI for Fairness, Accountability & Transparency Workshop (KDD XAI), 2019. ACM.

2021: Shaimaa Bajoudah: data marketplaces without central trust [Blockchain]

Mind My Value: a Decentralized Infrastructure for Fair and Trusted IoT Data Trading. Missier, P.; Bajoudah, S.; Capossele, A.; Gaglione, A.; and Nati, M. In Procs. 7th International Conference on the Internet of Things, Linz,Austria, 2017.

Toward a Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blockchain. Bajoudah, S.; Changyu, D.; and Missier, P. In Procs. 2nd IEEE International Conference on Blockchain (Blockchain 2019), Atlanta, USA, 2019. IEEE

2021: Yulong Gu: Knowledge graph link discovery [Logic Programming]

Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting. Gu, Y.; and Missier, P. In Procs. ILP '17, 27th International Conference on Inductive Logic Programming (late-breaking paper), Orleans, France, 2017. CEUR-WS

Priyaa Tavashimani: differencing provenance graphs

Why-Diff: Exploiting Provenance to Understand Outcome Differences from non-identical Reproduced Workflows. Thavasimani, P.; Cala, J.; and Missier, P. IEEE Access,1–1. 2019.

Rachel Thomson: phenotype annotations for genomics

Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder. Thompson, R.; Papakonstantinou Ntalis, A.; Beltran, S.; Tapf, A.; de Paula Estephan, E.; Polavarapu, K.; ’t Hoen, P. A. C.; Missier, P.; and Lochmuller, H. Human Mutation. 2019.

Targeted therapies for congenital myasthenic syndromes: systematic review and steps towards a treatabolome. Thompson, R.; Bonne, G.; Missier, P.; and Lochmüller, H. Emerging Topics in Life Sciences,ETLS20180100. jan 2019.

Hugo Firth: Query-aware Graph partitioning [graph data management]

Loom: Query-aware Partitioning of Online Graphs. Firth, H; and Missier, P In Procs. 21st EDBT, Vienna, Austria, 2018

TAPER: query-aware, partition-enhancement for large, heterogenous graphs. Firth, H.; and Missier, P. Distributed and Parallel Databases,1–31. 2017.

Tudor Miu: Activity recognition using accelerometers [Machine Learning]

Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. Miu, T.; Missier, P.; and Plötz, T. In Proceedings of the 14th IEEE International Conference on Ubiquitous Computing and Communications, 2015.

On Strategies for Budget-based Online Annotation in Human Activity Recognition. Miu, T.; Plötz, T.; Missier, P.; and Roggen, D. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, of UbiComp '14 Adjunct, pages 767–776, New York, NY, USA, 2014. ACM

Academic Community service

Sr. Associate Editor for the ACM Journal of Data and Information Quality

Member of the EPSRC Peer Review College

2021-present: grant reviewing: MRC (Medical Research Council), BBSRC, Wellcome Trust

2021-present Journal reviewing (most recent first): PVLDB, Philisophical Transactions of the Royal Society, Journal of Web Semantics, BMC Bioinformatics, Concurrency and Computation: Practice and Experience, IEEE Transaction on Cloud Computing, DAPD (Distributed and Parallel Databases), IEEE TKDE, Data and Knowledge Engineering (DKE), Knowledge and Information Systems, Information Systems (special issue on data integration), VLDB Journal, SIGMOD Record, Royal Society Open Science, Knowledge and Information Systems, WWW journal, Briefings in Bioinformatics

2021-present Conference Program Committees: PVLDB, CIKM, EDBT, SSDBM, TAPP (SIGMOD workshop), CMLS workshop, DaWak