This is a effectively the supplementary details page corresponding to my Panton Fellowship 2012 application. Comments, encouragement, criticism - all welcome! The brief for the Fellowship application is here . I initially started writing this as my covering letter, but it got far too extensive and detailed to be a covering letter, thus I'm posting it here instead. I'll try and make the covering letter a more concise and targeted distillation of some of the points, achievements and aims espoused here.
UPDATE 16/03/2012 : So, I made it through to the next round of the application process. I've now added a couple more proposals to my aims & objectives.
UPDATE 01/04/2012: Thanks all who commented / helped with my proposal - it was accepted, so I'm now one of two Panton Fellows for 2012 - 2013. You can read the official announcement of this here.
A couple of years ago, when I first started my PhD research at the University of Bath - I was hugely excited at the prospect of doing integrative, meta-analytical research on the impact of fossils in phylogeny (c.f. Cobbett et al. 2007). The prospects for knowledge synthesis in this area were and still are huge. After researching the basic methods and performing a thorough literature search, all I had to do was assemble together all the relevant datasets I needed to start work. Armed with my assembled list of relevant papers I set about trying to get the underlying cladistic matrices analysed in each of them, so I could perform my re-analyses - partitioning the data into cranial and postcranial subsets, for comparative statistical evaluations.
Little did I know, but this turns out to be a rather non-trivial task for most papers, even though I was looking exclusively at 21st century digital papers! I spent literally months obtaining enough machine-readable data. I wasn't the only one either - having asked around friends doing similar research - it's apparently very common for grad students to get tasked with laboriously re-extracting/obtaining data from research papers - in some cases line-by-line re-typing entire datasets, or so I've heard... In my time I've encountered a whole variety of obstacles to getting data including:
and in terms of trying to contact the author(s) for the data files or further information about the data supplied:
and more... there are some detailed examples that can be found here if you're interested.
It was made very clear to me, that the process of scientific publishing, whilst reasonably good at disseminating text, can be very poor at communicating data. This is why I started taking great interest in all things Open Science and Open Data: in most cases the process of scholarly publishing just doesn't seem to be fully taking advantage of all that modern computation, and web-technology has to offer - to quote Jason Priem:
"Todays journals are the best scholarly communication system possible using 17th century technology... The Web has revolutionized everything but scholarly communication... Online journals are essentially paper journals, delivered by faster horses." source
The biggest barrier to change appears to be entirely sociological - we could change many things (for the better) starting tomorrow - but it's just not that simple to go against the status quo. For example TreeBASE has existed for a long time, to archive precisely the valuable type of data I require for my research - yet barely any palaeontologists use it. I should also note it's free to deposit data there and free to access from the re-user POV.
Thus in 2010 I decided I needed to do something to raise awareness of this problem, explain *why* it is a problem, and what reasonable steps could be taken to ameliorate it. I chose the next appropriate conference; the 12th Young Systematists' Forum and gave a talk principally about the lack of data availability (below):
Indeed of all phylogenetic analyses performed in 2010, I helped estimate that less than 10% of these are archived in TreeBASE - this represents a huge and rather wasteful loss of information to science IMO. Sure, there might be a little bit more data scattered about various lab websites - but the permanence and discoverability of these self-archived datasets is highly questionable, amongst other issues.
My YSF12 talk was well-received, and retweeted around Twitter a fair bit thanks to Rod Page in particular.
But this wasn't enough - great as the conference was - there were only so many people there, and only so many people will have seen my Prezi on the Internet - it wasn't going to change anything. So I made my next plan more ambitious - an attempt to notify everyone in palaeontology about how we could make research better; more transparent, more repeatable, more usable (by academics and educators alike) by sharing *all* of the underlying data in digitally-useful forms. Inspired by some OKFN work I'd seen, I organised a draft Open Letter on an EtherPad, and soon we had a statement, and a website with which one could register support for our digital data vision.
Despite coverage in Nature news, we didnt actually get all that much traction. More importantly, some academics publicly raised strong concerns about our proposals (and with hindsight I thank them for doing that, it provided a much needed sense of balance, and gave me an insight into some of the traditionalist worries about the consequences of Open Data).
We did at least achieve widespread awareness that there is a problem. And we have gained sufficient numbers of supportive academics, that have made their feelings clear - they do want greater digital accessibility to research data (Open Data), it's just the 'how' that I infer to be the stumbling block. I then gave a few more presentations on Open Science & Open Data later that year too, one at OKCon (Berlin) & one at the British Ecological Society annual meeting.
2012 and beyond...
I believe there is ample opportunity to build-upon incremental advances that have already been made in my discipline with respect to Open Data and Open Science. Towards this goal, I have been invited to discussions with Dryad and MorphoBank, and have helped-out with MIAPA towards establishing a community-agreed Minimum Information About a Phylogenetic Analysis standard. Through reading Peter Murray-Rust's blog and other sources, I've become increasingly aware of the importance of re-use licences in academia, for papers, data and code. To this end, I have helped document that the majority of publishers (except publishers like PLoS, BMC and Pensoft to name a few) use a rather loose non-OKD non-BOAI compliant definition of 'open' that would be better termed 'freely accessible' or 'sponsored article' (rare praise for Elsevier - for their honesty in NOT falsely claiming to provide Open Access with this option, although some of their individual journals do still mistake the 'sponsored article option' for being equivalent to open or open access - it is not!).
I am also in the process of co-writing a book chapter with Daniel Janies which will review at least in part; the importance of data sharing, code sharing, and collaboration in modern science. Whilst some might think this is rather ironic content for a dead-tree format book, for the benefit of a commercial publishing house - I think this is a necessary irony if we are to achieve our goals of disseminating open data views to a wider audience who may not necessarily read Open Access journals, or go online that much. Such academics exist in significant numbers I suspect.
If I was awarded a Panton Fellowship - my specific aims for the duration of the Fellowship would be:
I think it's worth saying I don't believe in Openness for open-sake. There can be some very valid cases e.g. sensitive patient data, for which full openness would neither be appropriate, nor needed, as long as other data essential to the scientific findings are reported. I'm not anti-establishment, anarchic, anti-corporate, nor do I consider myself that political (all criticisms sometimes implied of supporters of Openness) - I just find that the goals of Open Science and the Panton Principles for Open Data are sensible, rational, and reasonable aims, that I choose to support because I genuinely believe they are what is best for science.
Panton Fellowship application DRAFT by Ross Mounce is licensed under a Creative Commons Attribution 3.0 Unported License.