DataToKnowledge

RSC Molecular Spectroscopy Groups & the British Mass Spectrometry Society:

Data To Knowledge

The creation, management and application of complex data sets

The inaugural Data To Knowledge meeting took place on the 29th-30th March 2011 at the Alderley Park Conference Centre, AstraZeneca in Cheshire, UK. Ninety delegates from across the UK came together to hear about and discuss one of the real challenges currently facing us presently – what do we do with so much data…??

The meeting was opened by Steve Burns, Director of Analytical Sciences at AstraZeneca. Steve spoke about the importance of understanding all aspects of analysis “to produce knowledge you need to understand both the system you are investigating and the measurement system… data on it’s own is of no value without interpretation.”

This introduction was followed up by an excellent plenary lecture given by Dr Bryan Lawrence of the British Atmospheric Data Centre. Bryan highlighted the importance of storage and accurate citing of digital data, the real problem of how and where do we store the massive amounts of data we generate and more importantly ensure that the data will be accessible (readable) in years to come.

There then followed an interesting talk from Alice Luares, detailing the lengths GlaxoSmithKline go to, to identify and mitigate the risk posed by counterfeiters, in this case for a number of leading toothpaste brands. Analysis of suspect materials by GC-MS or “e-nose” enables identification of high levels of potentially toxic materials substituted into fake brands and she showed how PCA can be used to group together samples from counterfeit suppliers.

There was then a change of direction and the first NMR talk of the meeting was given by Dr Mathias Nilsson of the University of Manchester. Mathias demonstrated how NMR diffusion experiments could be coupled with relaxation or concentration information leading to trilinear data, allowing the use of powerful multi-way methods, such as PARAFAC (Parallel Factor analysis).

Dr Heather Chassaing of Pfizer then gave a great review of some of the software tools that are available to help us interpret our data, specifically from complex LC-MS data sets for metabolite identification. She also covered the importance of storing this precious knowledge and ensuring the accessibility to recall the information for future use.

We then had the first of our presentations from vendors who were supporting the meeting, ACD/Labs and Thermo Fisher Scientific who both informed us about their latest offerings. Day 1 was completed with a talk by Mark Earll from Syngenta, describing the use of accurate mass UPLC-MS for metabolomic profiling of tomato ripening for different genotypes using time based Orthogonal PLS modelling, which clearly differentiates genotype, ripening and systematic experimental effects.

The evening entertainment was kindly lubricated by Waters and Mestrelab Research who provided the fuel for our discussions and debates over dinner, some even related to the meeting. We finished the night off with an embarrassing round of happy birthday to celebrate John Langley’s impending 50th birthday!

Despite the quality (and quantity) of food and drink on offer the previous evening a full turnout arrived for the start of Day 2 with Prof. Jeremy Nicholson’s plenary lecture about the analysis and modeling of disease. NMR and mass spectrometric methods have been successfully applied to characterize and quantify a wide range of metabolites in biological fluids and tissues to explore the biochemical nature of human disease processes. Jeremy also emphasized the importance of the bacterial genome to our understanding and treatment of diseases and that genetically we are more bacteria the human!

There then followed 2 further vendor presentation from Waters and Bruker before we retired for coffee. Next up was Dr John Langley (University of Southampton), who asked us to think about what knowledge we can get from our data and showed some exciting examples of predicting MS/MS fragmentation patterns and, uniquely, the ability to accurately predict their signal intensity for a class of pharmaceutical compounds.

We then had a talk from a post-grad student, James McKenzie from the University of York detailing his work on data fusion approaches as used to combine information from complementary data sets obtained by 1H NMR and LC-MS in order to maximise the information extracted. The talk was exemplified with analysis of samples from the recent melamine milk scandal in China.

Our final 2 vendor presentations were then given by Agilent and Mestrelab Research before lunch and the formal poster session. Kirsten Hobby (AstraZeneca) on home turf won the longest title prize for his talk about metabolite semi-quantitation. Kirsten’s solution for converting data to knowledge was the use of a custom application that directly interfaces with data output from metabolite mining software and performs the necessary calculation of metabolite AUC’s, metabolite specific ‘calibration’ and a final tabular summary.

It was down to Prof Richard Brereton (University of Bristol) to close the meeting, with his passionate and enthusiastic lecture on pattern recognition methodology. Richard discussed Moore’s law of doubling computer power and pointed out that the majority of chemometric analysis techniques hail from a time when there was limited computing power and the size and nature of “complex” problems were much more limited. Richard exemplified his talk showing the application of modern approaches from machine learning, including self organising maps and support vector machines for both forensic and metabolomic analysis.

Overall the meeting was a great success, “thank you for organizing a brilliant 2-day RSC/BMSS 'Data to Knowledge' conference... the programme and lectures were exceptional coupled with a superb conference dinner. I really enjoyed the whole experience”, Robert Slinn

The organising committee would like to thank all the delegates who attended as well and the kind sponsorship and donations from vendors. Further details of the meeting can be found at Data To Knowledge

Steve Coombes, April 2011.