Motivation: The democratization of metabolic analyses has extended the scope of metabolomic to ecology investigations. Chemical ecology interprets the variation and diversity of chemical signals of non-model organisms in the light of species interactions. Elucidating the biological information within such complex signals, using robust statistical analyses, require a large number of replicates and dedicated bioinformatic tools.
Results: To analyse large GC/LC-MS datasets of chemical compounds, we developed an unsupervised pre-processing method. The method detects individual compounds within complex mixtures, through the clustering of mass spectra. Retention time or retention index are used in a second time to control the quality of clusters. No profile correction, migration time alignment or normalization are needed. The method was robust to the use of different types of chromatographic support and to shifts in retention times, which are common for large and/or long-term analysis due to column ageing, contamination or replacement. We validated our method on two distinct biological datasets and showed that it compares favorably to other pre-processing methods. We found that the best clustering method that groups similar mass spectra into molecules was the hierarchical clustering analysis with Euclidean distance and Ward linkage. However, we implemented a function that allows identifying if other algorithms could be more appropriate for other datasets.Availability and implementation: an R package “MSeasy” implementing our pre-processing method is freely available on demand. For non R users a Graphical interface, called MSeasyTkGUI, was created and is also freely available here.
mzML, mzData, mzXML and netCDF formats are acceptable.
R can be downloaded here (http://cran.r-project.org/)
For mzML, mzXML, mzData and netCDF format we use mzR R package
Licence: free under GPL-2 http://www.gnu.org/licenses/gpl-2.0.html