PalAss2011

Welcome!

You've arrived at the landing page for my Palaeontology Association 55th annual meeting poster electronic supplementary materials.

Table of contents (click to go to the relevant section):

If you haven't seen the poster, it's embedded below:
I composed it in LibreOffice on the Ubuntu operating system (both of which are free, Open Source software), 
you can find the .ppt file here on Figshare     [PS props for R too, which I used to create figure 5]






What is this webpage about? Why do I have an electronic supplementary materials for a hard-copy poster?

Earlier on in the year, I co-wrote an Open Letter, ambitiously addressed to 'all of palaeontology', to highlight an unfortunate difficulty with many scholarly communications. If you missed it the first time around, then you can read a summary in Nature here
The problem is much of the data, and code used to analyse and summarise research is rarely conveyed in immediately useful, digital formats with scholarly communications (e.g. outlined in this talk). Indeed as I discovered at the Open Knowledge Conference in Berlin later this year - this is a problem we share with many other areas of academia (it's certainly not just palaeontology!), Guo Xu in particular expressed how difficult it was to extract and re-analyse data from economic research papers, for instance.


So, I thought I'd try out for myself to see how easy it is to make the underlying data & code behind a research publication (in this case, just a poster) available in readily re-usable digital formats. I can't say it took no time at all, and perhaps I haven't released everything in the most Open way possible (feel free to point out my shortcomings if you so wish), but I feel this is certainly a step in the right direction. At the very least, I hope these bits of code and data will help as instructive teaching examples.

Props too, to Mike Taylor (an example here), Rob Asher (an example here) and others, for providing some shining real examples of how to provide rich, re-usable digital data along with research publications.


As an example of good practice, I've archived the data I analysed AND the commands I used with which to analyse the data behind (figure 5 of my poster) over at Figshare. As these analyses are pre-publication, I think Figshare is probably the best place for them. But if this were a research paper, one may prefer Dryad, MorphoBank, TreeBASE or any other subject relevant data repository. Self-hosting data files on one's own personal website or departmental page is better than nothing, but likewise I wouldn't really recommend it, such sites have questionable permanence. Data repositories however will hopefully provide your data with more permanent URLs.

I'm not going to go over everything. It's up to you which heuristic search methods to use, but the key commands to use, to perform a good ILD test in PAUP* are:

1.) exclude uninf;
Uninformative characters bias measurements of length difference. Therefore these must first be excluded from the analysis.

Lee MSY. Uninformative Characters and Apparent Conflict Between Molecules and Morphology. Molecular Biology and Evolution. 2001 Apr;18(4):676-680. Available from: http://mbe.oxfordjournals.org/cgi/content/abstract/18/4/676.

2.) charset cra= 1-68; charset non= 69-145; charpartition crantest= 1:cra, 2:non;

Charset defines both the name (e.g. 'cra' for cranial) of the partition you're specifying and the characters your assigning to it (1to68).
Charpartition defines the partitions you will compare with the ILD test, so use the names you assigned with your charset command.
If you have more than 2 partitions to investigate, I'd strongly recommend doing pairwise ILD comparisons.

3.) hompart partition=crantest nreps=1000 search=heuristic; 

Hompart (short for homogeneity partition test, yet another name variant for the ILD test), is the command with which you can execute an ILD test with. Nreps specifies the number of random partition replications you'll run - statistically it's important to do at least a minimum of 1000 as per the advice of Allard et al. amongst other papers.

Allard MW, Farris JS, Carpenter JM. Congruence among Mammalian Mitochondrial Genes. Cladistics. 1999 Mar;15(1):75-84. Available from: http://dx.doi.org/10.1111/j.1096-0031.1999.tb00398.x.

RESULTS

I've also put on Figshare the exact 'raw' log file output from PAUP*, for this analysis (figure 5 of my poster). You can find it here  
PAUP* neatly sums up all 1000 replicates into a nice summary table for you, and this is the data I displayed, with an R plot in figure 5 of my poster. The length of cranial + postcranial partititions, is shorter than most of the other random partitions and thus using an alpha of 0.05 (5% significance level), these partitions are judged to be significantly incongruent.
Results of partition-homogeneity test:

        Sum of       Number of
  tree lengths       replicates
  -----------------------------
           271*           4
           272           12
           273           35
           274           59
           275          128
           276          270
           277          492
              * = sum of lengths for original partition

   P value = 1 - (996/1000) = 0.004000

TNT is much MUCH quicker at maximum parsimony analyses than PAUP*. Therefore you might well want to save yourself some time and do your ILD analysis in TNT. BUT the ILD test is not 'in-built' into TNT so you'll have to use an external script to help you automate this test. Kindly, one such script is provided by Mark Siddall over at the TNT wiki manual here (if you do use this script, please do cite this script not just TNT!).

Unfortunately, I didn't know enough about using TNT when I first started my PhD so I did all 62 ILD tests in my first paper (in prep) in PAUP* - I hope by providing these instructions it might save someone from wasting time with PAUP* ;)

The basics akin to the PAUP* version I explain below, you'll need to set heuristic search settings yourself:

1.) xinact; [to exclude uninformative characters]
2.) blocks 0 67; [specifies your partitions to be tested, remember TNT counts from 0 so character 1 is the 0th character]
3.) run ild.run 999; 
[runs the script, and specifies how many random replications - in this case 999 to be equivalent with 1000 in PAUP*] 

RESULTS 
999 random partitions out of 999 (+1 for the observed) with greater sum of len so technically P = 0.001000

You may notice this is not the exact same result as generated in PAUP* (above).
A) It's a randomization-based test so there will be small variance in p-values between runs
B) the default branch collapsing settings in TNT and PAUP* are different [be careful what settings you use both explicitly and implicity!]

either way, the conclusion is that the cranial and postcranial characters are significantly incongruent in this dataset.


FYI I'm in the process of writing a review paper on the ILD test, the basis for which is this talk I gave at Hennig XXX (Brazil) this year.

Code and worked example of partition Goodman-Bremer support analyses in TNT

I only did this in TNT, although it certainly is possible to calculate in PAUP* too with the help of other programs/scripts, but I wouldn't recommend it. Stick with TNT if you can.

I used the script as written by Carlos Peña and described here to calculate the pGBS values for figure 2 on my poster.
For instructions on how to use this script, just see his website, I don't think there's a point in repeating it all here!

Again, for example (not to be critical in ANY way of the authors, it's just an interesting dataset to use to display this analysis) I used the Ezcurra & Cuny 2007 data matrix. My TNT formatted version can be found here, with cranial and postcranial data interleaved in the file with the use of &[num]

Once you've followed the instructions given and run the script on your data, use

ttag; ttag &pGBS.svg;

to out put a nice SVG image file of your strict consensus tree labelled with pGBS values at each node (which is what I used for figure 2 on my poster).

pbs QRcodes, URL shorteners and other miscellanea to help you link to the web

I think using QRcodes is a fun way of linking to the web. Sometimes it can be a lot quicker than typing in a URL, but URL shorteners also help with this. So just FYI, I used http://goqr.me/ to create the QR code, and https://bitly.com/ for my URL shortener on the poster.
There are however plenty of other alternative websites that also offer these services for free. I have no particular preference. 


Creative Commons and intellectual 'property' rights

My research is funded by the BBSRC - hence I'm publicly, indeed taxpayer funded. Thus I think my various works should be made as useful and Open as possible, not just freely accessible - there are important differences. I chose to explicitly licence the poster under a Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/3.0/ ; CC-BY 3.0). This certainly applies to the digital version of the poster, but I'm not sure if this is applicable to the hard-copy (real-life) version. Either-way it's well-worth doing - because another big problem I've recently noticed with scholarly communications is that even some supposedly "open access" papers, are at least in the legal sense, not actually Open Access (as per the original Budapest Declaration; BOAI) e.g. Wiley's 'Open Access' branded papers. [FYI; PLoS, BMC, Springer Open, and other smaller publishers offer full BOAI-compliant open access publishing options]

Without explicitly saying "you can do anything you want with this poster, as long as you cite the poster" (which is effectively what the CC-BY licence does) technically, you'd have to ask my permission to re-use even small elements of this poster - which IMO rather defeats the point of this poster. What if I was on holiday for a month, what if my email address had changed and you couldn't contact me, what if I was dead? Without this Open licence, this work would be a lot of hassle to get permission to re-use. I would strongly encourage authors to consider licencing their works in more explicitly Open ways to permit maximum re-use, it'll be a head-ache for future generations otherwise!

If you're interested about licencing of academic works, and problems poor licence choices can cause for future potential re-users, I strongly recommend you read these papers:

Hagedorn G, Mietchen D, Morris R, Agosti D, Penev L, Berendsohn W, et al. Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information. 2011 Nov;150(0):127-149. Available from: http://dx.doi.org/10.3897/zookeys.150.2189

Carroll MW. Why Full Open Access Matters. PLoS Biol. 2011 Nov;9(11):e1001210+. Available from: http://dx.doi.org/10.1371/journal.pbio.1001210.

Murray-Rust P. Open Data in Science. Nature Precedings. 2011;(713). Available from: http://dx.doi.org/10.1038/npre.2008.1526.1


To the extent that is possible, all content on this page is Creative Commons Licence
licensed under a Creative Commons Attribution 3.0 Unported License.
Comments