Current Projects

  • 2016-2019PIReComp: sustained value extraction from analytics by recurring, selective re-computation. (EPSRC funding, £585,000, 2016-2019).
  • 2017-2021: CO-I, Flood-PREPARED: Predicting Rainfall Events by Physical Analytics of REaltime Data (NERC funding, £1.9M, 48 months)
  • 2017-2019: PI for Newcastle, CEM-DIT: Communication and Trust in Emergencies, with Heriot-Watt and Coventry Universities (Office of Naval Research Global, (£110,000, 3 years) 
  • 2017-present: Enabling a fair IoT data marketplace without central trust. PhD student: Shaimaa Bajoudah
    • follows 2016-2017Researcher in Residence programme, Digital Economy Catapult (EPSRC funding, £25,000, Nov. 2016-June 2017). 
Recent past projects

Recent publications

Bio and research profile

I am a Reader in Large-Scale Information Management (roughly equivalent to Associate Professor if you are not from the UK) with the School of Computing at Newcastle University, with 20+ years experience in CS research, development, and research management.

The broad goal of my research is to understand the role of metadata, most notably data provenance [7], in making sense of the underlying (big) data as well as improving and optimising the processes that produce and extract added value from the data (i.e. through “big data” analytics).
I call this metadata analytics.
I have been leading (as Principal Investigator) the ReComp project (2016-2019, EPSRC) focused on preserving value from large-scale data analytics over time through selective re-computation, recomp.org.uk where the challenge of collecting provenance metadata and extracting value from it through analytics techniques is central to the research. [invited talk]

I am also interested in the role of provenance in making experimental science more reproducible [11,3,13], in helping track scientific data assets as a way to incentivise scientists to share their data in an Open Science setting (Data Trajectories: a research agenda) [4], and on the automatic creation of views over provenance to facilitate limited-trust data exchange [8] (funded projects: Trusted Dynamic Coalitions, PI, 2012-2013, EPSRC,  CEM-DIT: Communication and Trust in Emergencies, CO-I, 2017-2019, ONRG).

I have also been involved in the specification of the W3C PROV data model for provenance (2011-2013) where I contributed to the main recommendation documents [12,14], which follows the Open Provenance Model [15].

Additional research

My other research interests are centred around (large-scale) information management:
  • Social media analytics (Twitter) to help health authorities combat Zika and Dengue epidemics [5], 
  • Enabling trust-less and fair marketplaces for “personal” IoT data streams using blockchain technology [2], 
  • Real-time multi-source data analytics to predict rainfall events and mitigate their impact (funded project: Flood-PREPARED, 2017-2021, Co-I, NERC)
  • Implementing efficient and cost-effective genomics data processing pipelines using workflow technology on the Cloud (funded project:  Cloud-eGenome:, 2013-2015, PI, MRC/ NIHR) [6]
  • Online active learning for Human Activity Recognition [9]
  • Analysing the effect of cognitive load on car drivers [10]
  • Data and Information QualityDuring my PhD I proposed the notion of Quality Views [16,17], a semantics-based method for semi-automatically adding data quality control to scientific workflows. I am currently Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ) 


I am responsible for the our School’s post-graduate academic teaching on Big Data Analytics and for coordinating the School's new curriculum on Data Science.
Students interested in projects (UG/PGT/PGR) should look here.

