Chapter 6: evidence for the causal mechanism
Here we will compute evidence from simulated income and gpa data, generated by simulating natural variation in income data (as a percentile of min/max income in a given community), and then computing the associated gpa data when the slope of the relation between the two was either zero, one, or something-in-between..
We begin by defining constants and generating the simulated data values describing the possible relation between income and gpa:
clear %#ok<*UNRCH>
reset=1; plotFLAG=0;
COL=[.55*[1 1 1]; .3 .4 .85; .8 .5 .4];
S=12; N=nan;
sig=20*[1 1.5];
Ns=61; slist=linspace(min(sig)/2,max(sig)*2,Ns); Oh=ones(1,Ns); %noise level for each observation type
slistEN2=1./slist.^2;
logpri=-(S*log(slist')*ones(1,Ns))-(S*ones(Ns,1)*log(slist));
cprime=[55 5]; gprime=[0 1.05];
minI=0; maxI=100; %min/max income (percentile)
minG=0; maxG=110; %min/max gpa ans percentage (allowed to exceed 100 by 10%
Nb=45; blist=linspace(0,100,Nb)';
Nc=21*[1 1]; clist=linspace(0,110,max(Nc));
Ng=[1 31]; glist=linspace(0,1.15,max(Ng));
B1prime=linspace(10,90,S)'; %keep the VP slopes identical across sims
B2prime=cprime(2)+B1prime*gprime(2)-.1; %mixing endpoints (independent vs. causal)
B2now=B2prime;
Ngam=6; %r0=atan(-gam(2));
gamlist=linspace(0,1,Ngam);
%keep noise samples identical across simulations for clearer comparison
d1=nan(S,1); d2prime=nan(S,1); inds=[1:S];
while ~isempty(inds), nnow=length(inds);
d1(inds)=B1prime(inds)+rand(nnow,1)*sig(1);
d2prime(inds)=B2prime(inds)+rand(nnow,1)*sig(2);
inds=find(or(any([d1<min(blist),d1>max(blist)],2),any([d2prime<min(clist),d2prime>max(clist)],2))); end
err=[d2prime-B2prime];
Next we initialize the main loop that varies the true slope of the relation between the two dimensions:
tmp=zeros(Ns,Ns); bss=zeros(Nb,Ns,Ns);
diffs0=[diff(clist(1:2)) diff(slist(1:2)) diff(slist(1:2))];
diffs1=[diff(glist(1:2)) diffs0];
Elist=nan(Ngam,2);
for igam=1:Ngam, gamnow=gamlist(igam); tic
L0all=zeros(Nc(1),Ns,Ns); L1all=zeros(Ng(2),Nc(2),Ns,Ns);
%%%%%%%%R%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% DATA SIM with variable slope %%%
if gamnow~=gprime(2),
B2now=B1prime*gamnow; B2now=B2now-mean(B2now)+mean([minG maxG]);
d2=B2now+err; else d2=d2prime; end
if ~or(any(d2<clist(1)),any(d2>clist(end))),
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% outer param likelihood by condition %%%
for ig=1:Ng(2), gnow=glist(ig);
for ic=1:Nc(2), cnow=clist(ic);
gtg0=and(ig==1,ic<=Nc(1));
%%% Likelihoods [independence, causality models]
likeFNi=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+( phi2- cnow+0).^2/snow.^2); %output is Nb x Ns
likeFNc=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+((phi2-(cnow+gnow*blist)).^2/snow.^2)*Oh); %output is Nb x Ns
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% inner param likelihood by condition %%%
tmp0=tmp; tmp1=tmp;
for s=1:S, %by source
if gtg0, L0bss=bss; end
L1bss=bss;
for is=1:Ns, snow=slist(is); %dispersion (sd)
if gtg0, L0bss(:,:,is)=likeFNi(d1(s),d2(s),snow); end
L1bss(:,:,is)=likeFNc(d1(s),d2(s),snow); end
%integrate over b-vals for each sub, and combine likelihoods
if gtg0, tmp0=tmp0+logsum(L0bss,1,diff(blist(1:2))); end
tmp1=tmp1+logsum(L1bss,1,diff(blist(1:2)));
%figure(20); hold on
%plot(d1(s),logpeakval([str8n(logsum(L0bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','k')
%plot(d1(s),logpeakval([str8n(logsum(L1bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','b')
end
if gtg0, L0all( ic,:,:)=tmp0+logpri; end
L1all(ig,ic,:,:)=tmp1+logpri; end, end
Elist(igam,1)=logsum(L0all,[1 2 3],diffs0);
tmpE=logsum(L1all,[2 3 4],diffs1(2:4));
Elist(igam,2)=logsum(tmpE-log(1+glist.^2)'-log(.5*pi),1,diffs1(1));
%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Model likelihoods %%%
Elist(igam,3)=10*(Elist(igam,2)-Elist(igam,1)); %Elist(:,1,2) are already in log units
disp(''); disp(['iGAM' num2str(igam) ':']); end, toc, end
Notice the ordering of the marginalizations above. First, the innermost integral involving source-specific betas is computed for each pair of dispersions and the unknown values of the model equation (gamma-slope and intercept). After these source-specific variables are marginalized out of our equations, we can combine data across sources in the overall matrices Lall and Elist.
Finally, we plot the evidence and associated model likelihood values:
figure(1); clf; plot([0 1],[0 0],'k--','LineWidth',2); hold on
plot(gamlist,Elist(:,3),'ko-','MarkerFaceColor','w','MarkerSize',11);
plot(gamlist(Elist(:,3)>0),Elist(Elist(:,3)>0,3),'ko','MarkerFaceColor',COL(1,:),'MarkerSize',11);
plot(gamlist(Elist(:,3)<0),Elist(Elist(:,3)<0,3),'ko','MarkerFaceColor','k','MarkerSize',11);
xlabel('true slope','FontName','Arial','FontSize',15);
ylabel('evidence (db)','FontName','Arial','FontSize',15);
figure(2); clf; hold on
plot(gamlist,Elist(:,1),'ko:','MarkerFaceColor','k'); %black
plot(gamlist,Elist(:,2),'ko:','MarkerFaceColor',COL(1,:)) %grey
xlabel('true slope','FontName','Arial','FontSize',15);
ylabel('mod likelihood','FontName','Arial','FontSize',15);
The above can be updated to look at variation in the underlying dispersion values, which yields the evidence and model likelihood plots of Fig. 6.12.
clear %#ok<*UNRCH>
reset=1; plotFLAG=0;
COL=[.55*[1 1 1]; .3 .4 .85; .8 .5 .4];
S=11; N=nan;
sig0=[1 1.5]; sig=5;
Ns=81; slist=linspace(min(sig*sig0)/2,110,Ns); Oh=ones(1,Ns); %noise level for each observation type
slistEN2=1./slist.^2;
logpri=-(S*log(slist')*ones(1,Ns))-(S*ones(Ns,1)*log(slist));
cprime=[55 25]; gprime=[0 .45];
minI=0; maxI=100; %min/max income (percentile)
minG=0; maxG=110; %min/max gpa ans percentage (allowed to exceed 100 by 10%
Nb=45; blist=linspace(0,100,Nb)';
Nc=31*[1 1]; clist=linspace(0,110,max(Nc));
Ng=[1 41]; glist=linspace(0,1.15,max(Ng));
Ndisp=9; displist=linspace(0,75,Ndisp)+sig*sig0(2);
B1prime=linspace(10,90,S)'; %keep the VP slopes identical across sims
B2prime=cprime(2)+B1prime*gprime(2); %mixing endpoints (independent vs. causal)
%keep noise samples identical across simulations for clearer comparison
d1=nan(S,1); d2prime=nan(S,1); inds=[1:S]';
while ~isempty(inds), nnow=length(inds);
d1(inds)=B1prime(inds)+rand(nnow,1)*sig*sig0(1); e1=d1-B1prime; d1=d1-mean(e1);
d2prime(inds)=B2prime(inds)+rand(nnow,1)*sig*sig0(2); e2=d2prime-B2prime; d2prime=d2prime-mean(e2);
inds=find(or(any([d1<min(blist),d1>max(blist)],2),any([d2prime<min(clist),d2prime>max(clist)],2))); end
figure(3); clf; hold on
plot([0 100],cprime(2)+[0 100]*gprime(2),'k--')
plot(d1,d2prime,'ko','MarkerFaceColor','k','MarkerSize',8)
xlabel('income (percentile)','FontName','Arial','FontSize',15);
ylabel('gpa (100 point scale with possible 10% bonus)','FontName','Arial','FontSize',15);
err=[d2prime-B2prime]/(displist(1));
tmp=zeros(Ns,Ns); bss=zeros(Nb,Ns,Ns);
diffs0=[diff(clist(1:2)) diff(slist(1:2)) diff(slist(1:2))];
diffs1=[diff(glist(1:2)) diffs0];
Elist=nan(Ndisp,2);
for idisp=1:Ndisp, dispnow=displist(idisp); tic
L0all=zeros(Nc(1),Ns,Ns); L1all=zeros(Ng(2),Nc(2),Ns,Ns);
%%%%%%%%R%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% DATA SIM using new dispersion %%%
B2now=B2prime+err*dispnow; d2=B2now-mean(B2now)+mean([minG maxG]);
if ~or(any(d2<clist(1)),any(d2>clist(end))),
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% outer param likelihood by condition %%%
for ig=1:Ng(2), gnow=glist(ig);
for ic=1:Nc(2), cnow=clist(ic);
gtg0=and(ig==1,ic<=Nc(1));
%%% Likelihoods [independence, causality models]
likeFNi=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+( phi2- cnow+ 0 ).^2/snow.^2); %output is Nb x Ns
likeFNc=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+((phi2-(cnow+gnow*blist)).^2/snow.^2)*Oh); %output is Nb x Ns
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% inner param likelihood by condition %%%
tmp0=tmp; tmp1=tmp;
for s=1:S, %by source
if gtg0, L0bss=bss; end
L1bss=bss;
for is=1:Ns, snow=slist(is); %dispersion (sd)
if gtg0, L0bss(:,:,is)=likeFNi(d1(s),d2(s),snow); end
L1bss(:,:,is)=likeFNc(d1(s),d2(s),snow); end
%integrate over b-vals for each sub, and combine likelihoods
if gtg0, tmp0=tmp0+logsum(L0bss,1,diff(blist(1:2))); end
tmp1=tmp1+logsum(L1bss,1,diff(blist(1:2)));
%figure(20); hold on
%plot(d1(s),logpeakval([str8n(logsum(L0bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','k')
%plot(d1(s),logpeakval([str8n(logsum(L1bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','b')
end
if gtg0, L0all(ic,:,:)=tmp0+logpri; end
L1all(ig,ic,:,:)=tmp1+logpri; end, end
Elist(idisp,1)=logsum(L0all,[1 2 3],diffs0);
tmpE=logsum(L1all,[2 3 4],diffs1(2:4));
Elist(idisp,2)=logsum(tmpE-log(1+glist.^2)'-log(.5*pi),1,diffs1(1));
%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Model likelihoods %%%
Elist(idisp,3)=10*(Elist(idisp,2)-Elist(idisp,1)); %Elist(:,1,2) are already in log units
disp(''); disp(['iDISP' num2str(idisp) ':']); end, toc, end
figure(1); clf; hold on
plot(displist(1:8),Elist(1:8,3),'ko-','MarkerFaceColor','w','MarkerSize',11);
plot(displist(Elist(1:8,3)>0),Elist(Elist(1:8,3)>0,3),'ko','MarkerFaceColor',COL(1,:),'MarkerSize',11);
plot(displist(Elist(1:8,3)<0),Elist(Elist(1:8,3)<0,3),'ko','MarkerFaceColor','k','MarkerSize',11);
xlabel('dispersion','FontName','Arial','FontSize',15);
ylabel('evidence (db)','FontName','Arial','FontSize',15);
figure(30); clf; hold on
plot(displist,Elist(:,1),'ko:','MarkerFaceColor','k'); %black
plot(displist,Elist(:,2),'ko:','MarkerFaceColor',COL(1,:)) %grey
xlabel('dispersion','FontName','Arial','FontSize',15);
ylabel('mod likelihood','FontName','Arial','FontSize',15);