Misc‎ > ‎

My Panton Fellowship 2012 application supplementary information

This is a effectively the supplementary details page corresponding to my Panton Fellowship 2012 application. Comments, encouragement, criticism - all welcome! The brief for the Fellowship application is here . I initially started writing this as my covering letter, but it got far too extensive and detailed to be a covering letter, thus I'm posting it here instead.  I'll try and make the covering letter a more concise and targeted distillation of some of the points, achievements and aims espoused here. 

UPDATE 16/03/2012 : So, I made it through to the next round of the application process. I've now added a couple more proposals to my aims & objectives.

UPDATE 01/04/2012: Thanks all who commented / helped with my proposal - it was accepted, so I'm now one of two Panton Fellows for 2012 - 2013. You can read the official announcement of this here
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Background

A couple of years ago, when I first started my PhD research at the University of Bath - I was hugely excited at the prospect of doing integrative, meta-analytical research on the impact of fossils in phylogeny (c.f. Cobbett et al. 2007). The prospects for knowledge synthesis in this area were and still are huge. After researching the basic methods and performing a thorough literature search, all I had to do was assemble together all the relevant datasets I needed to start work. Armed with my assembled list of relevant papers I set about trying to get the underlying cladistic matrices analysed in each of them, so I could perform my re-analyses - partitioning the data into cranial and postcranial subsets, for comparative statistical evaluations.

Little did I know, but this turns out to be a rather non-trivial task for most papers, even though I was looking exclusively at 21st century digital papers! I spent literally months obtaining enough machine-readable data. I wasn't the only one either - having asked around friends doing similar research - it's apparently very common for grad students to get tasked with laboriously re-extracting/obtaining data from research papers - in some cases line-by-line re-typing entire datasets, or so I've heard... In my time I've encountered a whole variety of obstacles to getting data including:
  • Data not found anywhere in the paper or the supplementary materials.
  • Data supplied at a URL that no longer existed.
  • Data matrices printed as images.
  • Data only partially given, referring to older non-digital literature for the rest.
  • Data matrices missing vital data that should have been in there - rendering them unusable, until the author could be contacted to supply the missing cell data.
  • Data matrices containing completely unexplained codings (lack of adequate metadata explaining the data).
  • Large matrices inappropriately split between multiple pages which were non-simple to piece together again.
and in terms of trying to contact the author(s) for the data files or further information about the data supplied:
  • Authors who have changed email address since publication and are difficult to re-identify and contact.
  • Authors who refused to give me the data, unless I buy an accompanying hard-copy of the work.
  • Authors who have left academia.
  • Authors who will be in the field for 'the next 4 or 5 months'
  • Deceased authors
and more... there are some detailed examples that can be found here if you're interested.


It was made very clear to me, that the process of scientific publishing, whilst reasonably good at disseminating text, can be very poor at communicating data. This is why I started taking great interest in all things Open Science and Open Data: in most cases the process of scholarly publishing just doesn't seem to be fully taking advantage of all that modern computation, and web-technology has to offer - to quote Jason Priem:

"Todays journals are the best scholarly communication system possible using 17th century technology... The Web has revolutionized everything but scholarly communication... Online journals are essentially paper journals, delivered by faster horses."  source  

The biggest barrier to change appears to be entirely sociological - we could change many things (for the better) starting tomorrow - but it's just not that simple to go against the status quo. For example TreeBASE has existed for a long time, to archive precisely the valuable type of data I require for my research - yet barely any palaeontologists use it. I should also note it's free to deposit data there and free to access from the re-user POV. 

Thus in 2010 I decided I needed to do something to raise awareness of this problem, explain *why* it is a problem, and what reasonable steps could be taken to ameliorate it. I chose the next appropriate conference; the 12th Young Systematists' Forum and gave a talk principally about the lack of data availability (below):


Indeed of all phylogenetic analyses performed in 2010, I helped estimate that less than 10% of these are archived in TreeBASE - this represents a huge and rather wasteful loss of information to science IMO. Sure, there might be a little bit more data scattered about various lab websites - but the permanence and discoverability of these self-archived datasets is highly questionable, amongst other issues.
  
My YSF12 talk was well-received, and retweeted around Twitter a fair bit thanks to Rod Page in particular.

But this wasn't enough - great as the conference was - there were only so many people there, and only so many people will have seen my Prezi on the Internet - it wasn't going to change anything. So I made my next plan more ambitious - an attempt to notify everyone in palaeontology about how we could make research better; more transparent, more repeatable, more usable (by academics and educators alike) by sharing *all* of the underlying data in digitally-useful forms. Inspired by some OKFN work I'd seen, I organised a draft Open Letter on an EtherPad, and soon we had a statement, and a website with which one could register support for our digital data vision.

Despite coverage in Nature news, we didnt actually get all that much traction. More importantly, some academics publicly raised strong concerns about our proposals (and with hindsight I thank them for doing that, it provided a much needed sense of balance, and gave me an insight into some of the traditionalist worries about the consequences of Open Data).
 
We did at least achieve widespread awareness that there is a problem. And we have gained sufficient numbers of supportive academics, that have made their feelings clear - they do want greater digital accessibility to research data (Open Data), it's just the 'how' that I infer to be the stumbling block. I then gave a few more presentations on Open Science & Open Data later that year too, one at OKCon (Berlin) & one at the British Ecological Society annual meeting.

2012 and beyond...

I believe there is ample opportunity to build-upon incremental advances that have already been made in my discipline with respect to Open Data and Open Science. Towards this goal, I have been invited to discussions with Dryad and MorphoBank, and have helped-out with MIAPA towards establishing a community-agreed Minimum Information About a Phylogenetic Analysis standard. Through reading Peter Murray-Rust's blog and other sources, I've become increasingly aware of the importance of re-use licences in academia, for papers, data and code. To this end, I have helped document that the majority of publishers (except publishers like PLoS, BMC and Pensoft to name a few) use a rather loose non-OKD non-BOAI compliant definition of 'open' that would be better termed 'freely accessible' or 'sponsored article' (rare praise for Elsevier - for their honesty in NOT falsely claiming to provide Open Access with this option, although some of their individual journals do still mistake the 'sponsored article option' for being equivalent to open or open access - it is not!).

I am also in the process of co-writing a book chapter with Daniel Janies which will review at least in part; the importance of data sharing, code sharing, and collaboration in modern science. Whilst some might think this is rather ironic content for a dead-tree format book, for the benefit of a commercial publishing house - I think this is a necessary irony if we are to achieve our goals of disseminating open data views to a wider audience who may not necessarily read Open Access journals, or go online that much. Such academics exist in significant numbers I suspect.   


Aims

If I was awarded a Panton Fellowship - my specific aims for the duration of the Fellowship would be:


  • **NEW** I will use my new position on the Systematics Association council to argue for underlying publication data for all future Systematics Association research publications, to be made openly-licenced, open data via an appropriate data repository. I was co-opted onto the council to help them modernise, so I'm reasonably confident I can achieve this goal.
  • **NEW** A citizen science proposal, to be developed inconjunction with OKFN, to re-extract otherwise lost/PDF-buried phylogenetic data from the literature. Work I've been involved with (submitted to BMC Research Notes), shows that only ~10% of phylogenetic tree data published each and every year gets archived on databases such as TreeBASE, in a machine-readable format. This is despite phylogenetic tree data being eminently re-usable data. I intend to demo this idea at the upcoming Open Science working group hackday on the 31st of March.
  • **UPDATED** To gather more data on re-use licences, and their permissiveness (BOAI-Open Access or less than this, and to what extent). At the journal-publisher level for which I have already collected a lot of data, as well as for institutional repositories for which there is much work still to be done. Perhaps I could liase with UKOLN who are based at my institution towards this end. There is potential to publish a peer-reviewed paper from this investigation I think.
  • **UPDATED** To re-work, publicise, and encourage the sharing and adaptation (re-use/remix), of an improved version of my Prezi on the Panton Principles - to promote the Principles, and to explain why they should be adopted, and why current (non-open data) systems create otherwise avoidable problems. One version of this could be a Prezi with the words of the Panton Principles, with audio of someone explaining them as the 'slides' go by, an expansion upon a small segment of my Panton Fellowship video
  • **UPDATED STATS** To continue my social-media engagement in the research community, sharing, linking, networking, educating, promoting, and facilitating discussion of Open Data and Open Science principles via Facebook, Twitter, blogs, and particularly Google+ where I also have the chance to engage with non-academics too (I have >7.5k 8.6k followers here). All part of my civic duties as an informed citizen of the network
  • To give a talk at the OKfestival, perhaps on the Panton Principles and their practical application in scientific publishing, and to raise further discussion on how we can get the academic community &/or journal editorial boards to adopt them, knowingly or otherwise.
  • To facilitate and encourage formal partnerships between data repositories e.g. DryadMorphoBank, and FigShare, and palaeontology journals as an enthusiastic but independent third-party endorser. 
  • To work on building and strengthening the cohesiveness and collective power of the large open-minded segment of the palaeontology community that have at numerous times, for numerous different but related causes, revealed themselves to be proponents of scholarly reform in some way or another.
  • Finally, I would resolve to keep plugging away at my PhD research, get some papers submitted from my PhD thesis, give talks at relevant conferences, and successfully defend my thesis in 2013 - for this Fellowship would not be warranted if I could not remain an active part of the research community in the future. This is my first and foremost task, all others must necessarily come second to it. However, I feel the above tasks should smoothly complement with the integrative syntheses of knowledge my research aims to offer.     


Postscript


I think it's worth saying I don't believe in Openness for open-sake. There can be some very valid cases e.g. sensitive patient data, for which full openness would neither be appropriate, nor needed, as long as other data essential to the scientific findings are reported. I'm not anti-establishment, anarchic, anti-corporate, nor do I consider myself that political (all criticisms sometimes implied of supporters of Openness) - I just find that the goals of Open Science and the Panton Principles for Open Data are sensible, rational, and reasonable aims, that I choose to support because I genuinely believe they are what is best for science.
  



Creative Commons Licence
Panton Fellowship application DRAFT by Ross Mounce is licensed under a Creative Commons Attribution 3.0 Unported License.

Subpages (1): Comments
Comments