PROC DTREE procedure is useless for building a predictive model.
You need to specify all the stages of the tree level by level, e.g top level is the decision (y/n), the next stage(level) is the price, the third stage is the quality, etc.
Also you need to pre-calculate the probability of every event (on each level). e.g. 90% the price is high, 10% the price is low.
Finally you still need to list all possible combinations of the events and give a reward/score for the combination.
Then DTREE calculates the aggregated reward(score) for each branch of the tree based on the probability and score you gave earlier and draw the tree.
So DTREE doesn't construct the tree for you, but just give you the aggregated score for each branch to assist your decision. You still need to construct the tree yourself.
An example:
Assume you want to decide go fishing or not depending on the weather and market price of fish.
In your area there is a 10% chance that it rains and 90% sunny. When it rains the amount you catch is 100 otherwise 1000.
The market is good 70% of the time and bad 30% of the time. When it's good the unit price is $10 otherwise $5.
Firstly, creates the stages(levels of the tree). The D means a decision stage. The C means a change stage. The tree grows from FISHING -> WEATHER -> MARKET.
data stages;
format _STNAME_ $12. _STTYPE_ $2. _OUTCOM_ $12. _SUCCES_ $12. ;
input _STNAME_ $ _STTYPE_ $ _OUTCOM_ $ _SUCCES_ $ ;
datalines;
FISHING D YES WEATHER
FISHING D NO WEATHER
WEATHER C RAIN MARKET
WEATHER C SUN MARKET
MARKET C GOOD .
MARKET C BAD .
;
Specify the probability of each event. Assuming the stage variables are independent of each other.
data events;
input _EVENT1 $12. _PROB1
_EVENT2 $12. _PROB2;
datalines;
RAIN 0.1 SUN 0.9
GOOD 0.7 BAD 0.3
;
Finally list all of the possible combinations and the score for each of them.
data combinations;
format _STATE1-_STATE3 $12. _VALUE_ dollar12.0;
input _STATE1 $ _STATE2 $ _STATE3 $ ;
/* calculate the net return/score for this scenario */
if _STATE1 ='RAIN' then Amount=100;
else if _STATE1 ='SUN' then Amount=1000;
if _STATE2 ='GOOD' then UnitPrice=10;
else if _STATE2 ='BAD' then UnitPrice=5;
if _STATE3 = 'YES' then _VALUE_ = Amount * UnitPrice;
else _VALUE_ = 0;
datalines;
RAIN GOOD YES
RAIN GOOD NO
RAIN BAD YES
RAIN BAD NO
SUN GOOD YES
SUN GOOD NO
SUN BAD YES
SUN BAD NO
;
run;
The DTREE procedure aggregates score and draw the tree. The EV is the estimated value (score/reward).
proc dtree stagein=stages
probin=events
payoffs=combinations
nowarning;
evaluate / summary;
OPTIONS LINESIZE=100;
treeplot/ lineprinter;
run;
RAIN GOOD --------------------C---------------------E | p=0.1 EV= $850 | p=0.7 EV= $1,000 YES | |BAD -===================C-| --------------------E | EV= $7,735 | p=0.3 EV= $500 | |SUN GOOD | --------------------C---------------------E | p=0.9 EV= $8,500 | p=0.7 EV= $10,000 | |BAD -------------------D-| --------------------E EV= $7,735 | p=0.3 EV= $5,000 | RAIN GOOD | --------------------C---------------------E | | p=0.1 EV= $0 | p=0.7 EV= $0 | NO | |BAD --------------------C-| --------------------E EV= $0 | p=0.3 EV= $0 |SUN GOOD --------------------C---------------------E p=0.9 EV= $0 | p=0.7 EV= $0 |BAD --------------------E p=0.3 EV= $0
Very useless from a machine learning perspective.