Post date: Feb 19, 2014 11:30:39 AM
Jason wrote Scott:
What if you want to 'update' a c-box that does not come directly from data?
Say I have c-box estimations of the genotype at some locus for a mother, a father, but not the offspring. The c-boxes were constructed using data in the form of counts that support either allele A or B.
For the mother, 50 out of 50 pieces of data support the A allele. Using this data, we can construct a c-box on the proportion of the A allele to the B allele. She is probably homozygous for A.
For the father, only 25 out of 50 pieces of data support the A allele, where as the other 15 pieces of data support the B allele. We can construct a c-box estimate for the proportion of the A allele to the B allele in the father as well. He is probably heterozygous A/B.
If we assume that the probability of each parent giving either of their two alleles to their offspring is 50%, then we can use p-box arithmetic to multiply the c-boxes by 0.5 (and other arithmetic operations) to generate some estimate for the offspring's genotype, couldn't we? If so, then the resulting c-box (that is, if it is still considered a c-box at this point) would not be constructed directly from data. In this case, how do you update this estimate once you start to generate data for the offspring genotype?
I hope that this was clear. If not, we should maybe discuss over Google hangouts if you find it to be interesting.
(I guess I could just ask whether it is kosher to multiply a c-box by a scalar, and if so how do you update the resulting c-box. The pooling rule would not work here, would it?)
Scott replied to Jason:
What if you want to 'update' a c-box that does not come directly from data?
That's the question of the day.
Say I have c-box estimations of the genotype at some locus for a mother, a father, but not the offspring. The c-boxes were constructed using data in the form of counts that support either allele A or B.
For the mother, 50 out of 50 pieces of data support the A allele. Using this data, we can constrcut a c-box on the proportion of the A allele to the B allele. She is probably homozygous for A.
Would you say her probability of being so is estimated by the c-box kn(50,50)? That is, is it to be considered a Bernoullli probability, and the "pieces" were binary bits of evidence (literally)? If not how would you construct a c-box? The "proportion of the A allele to the B allele" is usually something we talk about for a population, not a person. Unless you're talking about probability, in which case we usually don't use the word 'proportion'. Am I wrong? Am I understanding you?
For the father, only 25 out of 50 pieces of data support the A allele, where as the other 15 pieces of data support the B allele. We can construct a c-box estimate for the proportion of the A allele to the B allele in the father as well. He is probably heterozygous A/B.
Okay, well, 25 + 15 is less than 50, so that suggests the evidence is not binary if it can be something else than A or B. Is that what you intended to imply? Or did you mean for them to add up? Assuming they should add up, would his c-box be kn(25,50) or whatever? If not, what then? Note that the standard way to rescue yourself in the multinomial case is to say the evidence is 'A' and 'not A', rather than listing separate genotypes for the latter.
If we assume that the probability of each parent giving either of their two alleles to their offspring is 50%,
I guess it's an assumption, but it's maybe a bit stronger since it's a law of Gregor Mendel's, isn't it? What is it called, assortative mating or something? Maybe it has the word 'independent' in it.
then we can use p-box arithmetic to multiply the c-boxes by .5
Yes, without question.
(and other arithmetic operations)
Like what exactly?
to generate some estimate for the offsprings genotype, couldn't we?
Yes, absolutely. I think that's exactly the idea.
If so, then the resulting c-box (that is, if it is still considered a c-box at this point)
It's a c-box if it's still characterizing a constant parameter, rather than a random variable. (It works out nicely, because c in c-box is also mnemonic for 'constant'.)
would not be constructed directly from data.
No, it's constructed from arithmetic (or logical) operations on the c-box, but that's the message of Michael's paper, that you can make such calculations and they'll also have the confidence interpretation.
In this case, how do you update this estimate once you start to generate data for the offspring genotype?
That's a lovely problem. I'd say it is now the prior for a subsequent analysis based on (binary) data sampling from the offspring. Can you suggest an exemplary data set so we can do these calculations? I'd be interested to learn what it says.
We could also construct a simulation in which we know the genotypes of the parents, and see how often the calculation yields an answer is consistent with the hypothetical results. This would be a nice little note.
Um, is there just one offspring, or are there many?
I hope that this was clear. If not, we should maybe discuss over google hangouts if you find it to be interesting.
It was mostly clear, modulo my initial questions, but you can still call anytime you like.
(I guess I could just ask whether it is kosher to multiply a c-box by a scalar, and if so how do you update the resulting c-box. The pooling rule would not work here, would it?)
Well, only because there's nothing to pool, or, if there is, the sets are measuring different quantities. But, as I understand Michael's paper, what you suggest is kosher. You use the p-box technology, that is, Risk Calc or pbox.r or something to do the calculations. Of course I could be misreading Michael's paper, or he could be mistaken, which I think Teddy Seidenfeld tend to expect. And there will generally be questions about when and how to use assumptions about dependence, but I guess those should be figure-out-able.