Running SNAPP on the cluster - making it work

Post date: Aug 28, 2020 3:11:2 PM

Important things that I have changed

# Change total count = 4 to total count = 3 in the xml file - supposed to speed up the analysis

"WARNING: removed 8826 patterns (8826 sites) because they have one or more branches without data."

https://groups.google.com/g/beast-users/c/-bu2D8e9q9s

Hi Tobias,

SNAPP needs to have at least one taxon with data for each species, otherwise it cannot calculate the tree likelihood for that site and data for all other species is ignored for that site. What you could possibly do to diminish this problem (apart from getting data covering all species) is remove species from the analysis such that more sites have all species covered.

Cheers,

Remco

The earlier versions required all taxa to have data, and removed sites if any taxa had missing data.

The current version can handle missing data, but under the condition that all species have at least some data. So, if a species X consists of lineages A, B and C and there is missing data for A and B, but C has data, then it will not be removed. However, if all three lineages have data missing for a site, this site will be removed from the alignment since SNAPP currently cannot calculate a treelikelihood for that situation.

In summary: SNAPP can handle some missing data, just not when there is a species with no data at all.

I get the error saying:

chain10/run5_testConvergence.xml.state.new

Things to try:

1 - Does it happen when I am running a single run on the cluster (try with interactive node)?

2 - Can I make several individuals pertaining to the same species

KINGSPEAK

Kingpeak is operated in a condominium fashion with some (48) general CHPC nodes on which groups can get allocation, along with additional nodes owned by different research groups. Kingspeak has a 385 total nodes (8292 cores), with the nodes having 16, 20, 24, 28,or 32 cores each, and memory between 32GB and 1TB each.

I can try

(1) decrease the number of tasks + decrease the number of chains (maybe note enough memory to keep the .state file) (suggested here, last message: https://groups.google.com/g/beast-users/c/6TAlYU_Bndc)

What I am trying:

in run5/testConvergence

interactive job: run one chain (beast -threads 3 -overwrite run5_testConvergence.xml) to see if it is working by itself. - yes

batch job: decreased the number of chains to 10 and the number of tasks to 15 (30 before)

[u6028866@kp164 testConvergence]$ sbatch RunBeast.sh

Submitted batch job 8550762

The job is pending (10h15) - I will let the one on my computer finish so that I have one chain to look at.

In the meantime, let's understand fully each parameter to make educated guesses about them

Maybe make a calendar of work days?

RUN 6 - I made a new .xml file! Will try it now...

What I changed: the lambda parameter is really high -

run6/testConvergence

beast -threads 3 -overwrite run6_testConvergence.xml - not happy!

P(prior) = -Infinity (was -Infinity)

P(lambdaPrior.pando_8826_rdnOutgroup) = -Infinity (was -Infinity)

P(HyperPrior.hyperLogNormalDistributionModel-M-uPrior.pando_8826_rdnOutgroup) = 0.0 (was 0.0)

Fatal exception: Could not find a proper state to initialise. Perhaps try another seed. - ok, this did not work well

After many changes, run 6 is decided - please see the list of parameters here -

It seems to be working - if I see the 1000 steps coming up I will start the chains.

[u6028866@kp167 testConvergence]$ sbatch RunBeast.sh

Submitted batch job 8550852

Page updated

Google Sites

Report abuse