Chapter 6: evidence for the causal mechanism

Here we will compute evidence from simulated income and gpa data, generated by simulating natural variation in income data (as a percentile of min/max income in a given community), and then computing the associated gpa data when the slope of the relation between the two was either zero, one, or something-in-between..

We begin by defining constants and generating the simulated data values describing the possible relation between income and gpa:

clear %#ok<*UNRCH>

reset=1; plotFLAG=0;

COL=[.55*[1 1 1]; .3 .4 .85; .8 .5 .4];

S=12; N=nan;

sig=20*[1 1.5];

Ns=61; slist=linspace(min(sig)/2,max(sig)*2,Ns); Oh=ones(1,Ns); %noise level for each observation type

slistEN2=1./slist.^2;

logpri=-(S*log(slist')*ones(1,Ns))-(S*ones(Ns,1)*log(slist));

cprime=[55 5]; gprime=[0 1.05];

minI=0; maxI=100; %min/max income (percentile)

minG=0; maxG=110; %min/max gpa ans percentage (allowed to exceed 100 by 10%

Nb=45; blist=linspace(0,100,Nb)';

Nc=21*[1 1]; clist=linspace(0,110,max(Nc));

Ng=[1 31]; glist=linspace(0,1.15,max(Ng));

B1prime=linspace(10,90,S)'; %keep the VP slopes identical across sims

B2prime=cprime(2)+B1prime*gprime(2)-.1; %mixing endpoints (independent vs. causal)

B2now=B2prime;

Ngam=6; %r0=atan(-gam(2));

gamlist=linspace(0,1,Ngam);

%keep noise samples identical across simulations for clearer comparison

d1=nan(S,1); d2prime=nan(S,1); inds=[1:S];

while ~isempty(inds), nnow=length(inds);

d1(inds)=B1prime(inds)+rand(nnow,1)*sig(1);

d2prime(inds)=B2prime(inds)+rand(nnow,1)*sig(2);

inds=find(or(any([d1<min(blist),d1>max(blist)],2),any([d2prime<min(clist),d2prime>max(clist)],2))); end

err=[d2prime-B2prime];

Next we initialize the main loop that varies the true slope of the relation between the two dimensions:

tmp=zeros(Ns,Ns); bss=zeros(Nb,Ns,Ns);

diffs0=[diff(clist(1:2)) diff(slist(1:2)) diff(slist(1:2))];

diffs1=[diff(glist(1:2)) diffs0];

Elist=nan(Ngam,2);

for igam=1:Ngam, gamnow=gamlist(igam); tic

L0all=zeros(Nc(1),Ns,Ns); L1all=zeros(Ng(2),Nc(2),Ns,Ns);

%%%%%%%%R%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% DATA SIM with variable slope %%%

if gamnow~=gprime(2),

B2now=B1prime*gamnow; B2now=B2now-mean(B2now)+mean([minG maxG]);

d2=B2now+err; else d2=d2prime; end

if ~or(any(d2<clist(1)),any(d2>clist(end))),

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% outer param likelihood by condition %%%

for ig=1:Ng(2), gnow=glist(ig);

for ic=1:Nc(2), cnow=clist(ic);

gtg0=and(ig==1,ic<=Nc(1));

%%% Likelihoods [independence, causality models]

likeFNi=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+( phi2- cnow+0).^2/snow.^2); %output is Nb x Ns

likeFNc=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+((phi2-(cnow+gnow*blist)).^2/snow.^2)*Oh); %output is Nb x Ns

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% inner param likelihood by condition %%%

tmp0=tmp; tmp1=tmp;

for s=1:S, %by source

if gtg0, L0bss=bss; end

L1bss=bss;

for is=1:Ns, snow=slist(is); %dispersion (sd)

if gtg0, L0bss(:,:,is)=likeFNi(d1(s),d2(s),snow); end

L1bss(:,:,is)=likeFNc(d1(s),d2(s),snow); end

%integrate over b-vals for each sub, and combine likelihoods

if gtg0, tmp0=tmp0+logsum(L0bss,1,diff(blist(1:2))); end

tmp1=tmp1+logsum(L1bss,1,diff(blist(1:2)));

%figure(20); hold on

%plot(d1(s),logpeakval([str8n(logsum(L0bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','k')

%plot(d1(s),logpeakval([str8n(logsum(L1bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','b')

end

if gtg0, L0all( ic,:,:)=tmp0+logpri; end

L1all(ig,ic,:,:)=tmp1+logpri; end, end

Elist(igam,1)=logsum(L0all,[1 2 3],diffs0);

tmpE=logsum(L1all,[2 3 4],diffs1(2:4));

Elist(igam,2)=logsum(tmpE-log(1+glist.^2)'-log(.5*pi),1,diffs1(1));

%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Model likelihoods %%%

Elist(igam,3)=10*(Elist(igam,2)-Elist(igam,1)); %Elist(:,1,2) are already in log units

disp(''); disp(['iGAM' num2str(igam) ':']); end, toc, end

Notice the ordering of the marginalizations above. First, the innermost integral involving source-specific betas is computed for each pair of dispersions and the unknown values of the model equation (gamma-slope and intercept). After these source-specific variables are marginalized out of our equations, we can combine data across sources in the overall matrices Lall and Elist.

Finally, we plot the evidence and associated model likelihood values:

figure(1); clf; plot([0 1],[0 0],'k--','LineWidth',2); hold on

plot(gamlist,Elist(:,3),'ko-','MarkerFaceColor','w','MarkerSize',11);

plot(gamlist(Elist(:,3)>0),Elist(Elist(:,3)>0,3),'ko','MarkerFaceColor',COL(1,:),'MarkerSize',11);

plot(gamlist(Elist(:,3)<0),Elist(Elist(:,3)<0,3),'ko','MarkerFaceColor','k','MarkerSize',11);

xlabel('true slope','FontName','Arial','FontSize',15);

ylabel('evidence (db)','FontName','Arial','FontSize',15);

figure(2); clf; hold on

plot(gamlist,Elist(:,1),'ko:','MarkerFaceColor','k'); %black

plot(gamlist,Elist(:,2),'ko:','MarkerFaceColor',COL(1,:)) %grey

xlabel('true slope','FontName','Arial','FontSize',15);

ylabel('mod likelihood','FontName','Arial','FontSize',15);

The above can be updated to look at variation in the underlying dispersion values, which yields the evidence and model likelihood plots of Fig. 6.12.

clear %#ok<*UNRCH>

reset=1; plotFLAG=0;

COL=[.55*[1 1 1]; .3 .4 .85; .8 .5 .4];

S=11; N=nan;

sig0=[1 1.5]; sig=5;

Ns=81; slist=linspace(min(sig*sig0)/2,110,Ns); Oh=ones(1,Ns); %noise level for each observation type

slistEN2=1./slist.^2;

logpri=-(S*log(slist')*ones(1,Ns))-(S*ones(Ns,1)*log(slist));

cprime=[55 25]; gprime=[0 .45];

minI=0; maxI=100; %min/max income (percentile)

minG=0; maxG=110; %min/max gpa ans percentage (allowed to exceed 100 by 10%

Nb=45; blist=linspace(0,100,Nb)';

Nc=31*[1 1]; clist=linspace(0,110,max(Nc));

Ng=[1 41]; glist=linspace(0,1.15,max(Ng));

Ndisp=9; displist=linspace(0,75,Ndisp)+sig*sig0(2);

B1prime=linspace(10,90,S)'; %keep the VP slopes identical across sims

B2prime=cprime(2)+B1prime*gprime(2); %mixing endpoints (independent vs. causal)


%keep noise samples identical across simulations for clearer comparison

d1=nan(S,1); d2prime=nan(S,1); inds=[1:S]';

while ~isempty(inds), nnow=length(inds);

d1(inds)=B1prime(inds)+rand(nnow,1)*sig*sig0(1); e1=d1-B1prime; d1=d1-mean(e1);

d2prime(inds)=B2prime(inds)+rand(nnow,1)*sig*sig0(2); e2=d2prime-B2prime; d2prime=d2prime-mean(e2);

inds=find(or(any([d1<min(blist),d1>max(blist)],2),any([d2prime<min(clist),d2prime>max(clist)],2))); end


figure(3); clf; hold on

plot([0 100],cprime(2)+[0 100]*gprime(2),'k--')

plot(d1,d2prime,'ko','MarkerFaceColor','k','MarkerSize',8)

xlabel('income (percentile)','FontName','Arial','FontSize',15);

ylabel('gpa (100 point scale with possible 10% bonus)','FontName','Arial','FontSize',15);

err=[d2prime-B2prime]/(displist(1));

tmp=zeros(Ns,Ns); bss=zeros(Nb,Ns,Ns);

diffs0=[diff(clist(1:2)) diff(slist(1:2)) diff(slist(1:2))];

diffs1=[diff(glist(1:2)) diffs0];

Elist=nan(Ndisp,2);

for idisp=1:Ndisp, dispnow=displist(idisp); tic

L0all=zeros(Nc(1),Ns,Ns); L1all=zeros(Ng(2),Nc(2),Ns,Ns);

%%%%%%%%R%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% DATA SIM using new dispersion %%%

B2now=B2prime+err*dispnow; d2=B2now-mean(B2now)+mean([minG maxG]);

if ~or(any(d2<clist(1)),any(d2>clist(end))),

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% outer param likelihood by condition %%%

for ig=1:Ng(2), gnow=glist(ig);

for ic=1:Nc(2), cnow=clist(ic);

gtg0=and(ig==1,ic<=Nc(1));

%%% Likelihoods [independence, causality models]

likeFNi=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+( phi2- cnow+ 0 ).^2/snow.^2); %output is Nb x Ns

likeFNc=@(phi1,phi2,snow) -(1/2)*(((phi1-blist).^2)*slistEN2+((phi2-(cnow+gnow*blist)).^2/snow.^2)*Oh); %output is Nb x Ns

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% inner param likelihood by condition %%%

tmp0=tmp; tmp1=tmp;

for s=1:S, %by source

if gtg0, L0bss=bss; end

L1bss=bss;

for is=1:Ns, snow=slist(is); %dispersion (sd)

if gtg0, L0bss(:,:,is)=likeFNi(d1(s),d2(s),snow); end

L1bss(:,:,is)=likeFNc(d1(s),d2(s),snow); end

%integrate over b-vals for each sub, and combine likelihoods

if gtg0, tmp0=tmp0+logsum(L0bss,1,diff(blist(1:2))); end

tmp1=tmp1+logsum(L1bss,1,diff(blist(1:2)));

%figure(20); hold on

%plot(d1(s),logpeakval([str8n(logsum(L0bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','k')

%plot(d1(s),logpeakval([str8n(logsum(L1bss,[2 3],diff(slist(1:2))*[1 1])) blist(:)]),'ko','MarkerFaceColor','b')

end

if gtg0, L0all(ic,:,:)=tmp0+logpri; end

L1all(ig,ic,:,:)=tmp1+logpri; end, end

Elist(idisp,1)=logsum(L0all,[1 2 3],diffs0);

tmpE=logsum(L1all,[2 3 4],diffs1(2:4));

Elist(idisp,2)=logsum(tmpE-log(1+glist.^2)'-log(.5*pi),1,diffs1(1));

%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Model likelihoods %%%

Elist(idisp,3)=10*(Elist(idisp,2)-Elist(idisp,1)); %Elist(:,1,2) are already in log units

disp(''); disp(['iDISP' num2str(idisp) ':']); end, toc, end

figure(1); clf; hold on

plot(displist(1:8),Elist(1:8,3),'ko-','MarkerFaceColor','w','MarkerSize',11);

plot(displist(Elist(1:8,3)>0),Elist(Elist(1:8,3)>0,3),'ko','MarkerFaceColor',COL(1,:),'MarkerSize',11);

plot(displist(Elist(1:8,3)<0),Elist(Elist(1:8,3)<0,3),'ko','MarkerFaceColor','k','MarkerSize',11);

xlabel('dispersion','FontName','Arial','FontSize',15);

ylabel('evidence (db)','FontName','Arial','FontSize',15);

figure(30); clf; hold on

plot(displist,Elist(:,1),'ko:','MarkerFaceColor','k'); %black

plot(displist,Elist(:,2),'ko:','MarkerFaceColor',COL(1,:)) %grey

xlabel('dispersion','FontName','Arial','FontSize',15);

ylabel('mod likelihood','FontName','Arial','FontSize',15);