Problem Set #1

Genome Sciences 561: Problem Set #1

due Sunday January 13 at 5 PM by email or hard copy to Foege 420C


Complete the following pen-and-paper problems from Chapter 2 of Graham Coop’s population genetics notes:

https://github.com/cooplab/popgen-notes/blob/master/popgen_notes.pdf


Question 1, parts A and B

Question 3

Question 4

Question 5, parts A, B, and C

Question 10, parts B and C

Question 11, parts A and B


Simulating the life cycle of genetic variation using PopG


Download the program PopG from the website http://evolution.gs.washington.edu/popgen/ and follow the posted instructions to launch the GUI.


Imagine you are running a long-term experiment involving a colony of 100 diploid, sexually reproducing individuals. You CRISPR a new barcode into the genome of each individual, then let the population evolve for 400 generations before checking whether each barcode is lost, fixed or still segregating. To simulate this scenario, click on the “Run” menu of PopG, select, “New Run,” and leave the default settings of 1.0 for fitness values and 0.0 for mutation and migration rates.

a. What population size should you enter? Remember that this is the number of chromosomes in the population.

b. What is the initial frequency of each new CRISPR allele? Enter this for “Initial Frequency of Allele A.”

c. For “Populations evolving simultaneously,” enter 100 (the number of mutations you introduced). To make the real experiment match the simulation as closely as possible, would it be best for you to CRISPR in the bar codes at the same locus, different loci on the same chromosome, or different loci on different chromosomes? Explain your choice.

d. Push “OK” to run your simulation. Taking the output as ground truth, if you were to sequence the barcodes in generation 400, at the endpoint of the simulation, how many barcodes would be lost? How many would be fixed? How many would be still segregating?

f. Push “OK” 9 more times to perform a total of 10 independent simulations with identical parameters. For each simulation, count the mutations/barcodes that are lost, fixed, and segregating by the end of the simulation. Averaged over these 10 simulations, what fractions of starting mutations are fixed, lost, or still segregating after 400 generations?

g. In theory, what fractions of starting mutations would be fixed, lost, or still segregating after 400 generations if you ran an infinite number of these simulations and tallied the results?

h. Repeat steps c-g, but for a diploid population of size 10 (not 100). Adjust the number of populations evolving simultaneously, the starting allele frequency, and the population size accordingly. Run 10 simulations, each for 400 generations, and report your data as well as your expectations for the fractions of variants lost, fixed, and still segregating at the end of the simulation.