Parts of the Study to Consider:
1) The Question
-A good question is FINER:
Feasible
Interesting
Novel
Ethical
Relevant
2) Significance
-Check Background- What is already known? Why is this important? What answers will this provide?
-Do background check to see what questions remain unanswered.
3) Design
-Observational - passive, see what's happening
-Cohort Study - follow the diff groups (divided based on R/F e.g. race) over time
-Cross-sectional Study - check status at one point in time
-Case-Control Study - compare two groups (divided based on dz status, one w & one w/o)
-looks for R/F's for a certain dz...
-Prospective or Retrospective?
-Data type:
-Descriptive - get a general sense of things
-Analytical - try to find a cause and effect relationship
-Clinical Trial - active, apply an intervention and see what's different
-Randomized Blinded Trial- best type, but unblinded or time series designs an option
4) Subjects
-Selection Criteria - who is the target population?
-How to recruit a representative sample? (consider Generalizability vs Feasibility/Cost)
5) Picking Variables
-In Descriptive study, check one variable at a time
-In Analytical study, check associations between 2 or more variables, to predict outcomes or draw inferences of cause and effect.
-Predictor Variable- (aka Independent variable) - the one that came first/is likely the biological underlying cause
-Outcome Variable- (aka Dependent variable) -the one that came second....
6) Statistical Concerns
-Must make a plan for data analysis
-Specify the Hypothesis
-Use it to estimate Sample Size - the # subjects needed to observe the difference in outcome between the groups with a good degree of probability (Power)
-Make sure not to draw a casual inference that an association is a causation
-Random Error- wrong result due to chance
-Reduce random error with larger sample size
--> increased Precision- degree to which each observed value are near each other
-Systematic Error- wrong result due to Bias- sources of variation that distort study outcome in 1 direction
-increased sample size won't help
-Accuracy- degree to which it approximates the actual value, Improve with a better study design w less bias
Chapter 2 - A Good Question
-Feasible-
-have enough subjects
-have the technical expertise
-have enough time/money
-it's small enough
-Interesting
-to me, others
-Novel
-it can confirm/refute prev findings
-extends prev findings
-provides new findings
-Ethical
-Relevant
-to clinical and health policy, other research, etc.
Chapter 3 - Choosing Subjects
-Ensure generalizability (Study Sample is similar to the Population)
-hard because respondents may be more healthy/less healthy than nonrespondents
-The generalizability is Valid if the sample truly reps the population
-check the demographics etc of the sample vs population
-start by clearly defining the demographics of the target population, then seek out subjects...
-Selection Criteria
-Inclusion criteria- best to be specific - define the population relevant to the Q & effecient for study
-demographics, clinical characteristics, geography, timeline (bn x and y dates)
-Exclusion criteria - try to limit these as much as possible
-likely to be lost to f/u, cant provide good data, hi risk of side effects, unethical to withhold Tx
-Clinic vs. Community pt?
-selection criteria that includes pt presenting to a certain clinic/hospital skews the data potentially (e.g. sicker pts at tertiary center...)
-Population Based sample- select ppl in community who rep the nonclinical population, but hard to recruit them.
-consider mail/phone data collection to incr size/diversity
-Sampling - take a subset of the population, bc there's too many pts that meet the selection criteria
-Convenience Sampling- easily accessible
-careful about Selection Bias due to volunteerism etc, by consecutively selecting every accessible person who meet criteria (consecutive sample)
-Probability Sampling- uses a scientific basis to generalize the findings in the study sample to the population, espec important for descriptive research; = gold standard to ensure generalizability
-->random process to guarantee that each unit of the population has a specific chance at selection....
-Simple Random Sample-
-define the population, then randomly select fr each category
-Stratified Random Sample-
-divide the population into subgroups based on their traits, (strata), and weight each group proportionately so you draw pts from each group proportionate to their population..., This allows for samples that are precise for both subgroups
-Cluster Sample-
-cluster = natural grouping; cluster sample = randoms sample of them...
-e.g. studying pts who live in a specific area for ease rather than sampling statewide...
-make data more complex to analyze, bc naturally occurring groups are often homogenous
-Systemic Sample-
-a Simple Random Sample where they are selected by a preordained periodic process (e.g. take every 2nd person). Can be susceptible to errors caused by natural periodicities in the population, and allow investigator to manipulate data. No real advantage over simple random sampling.
-Recruitment-
-consider feasibility- goals: get and adequate sample that reps the population and get enough #
-Response Rate- % eligible subjects who agree to enter the study --> effect on validity of inferring that the sample actually reps the population
-Nonresponse- if person is hard to reach/refuse to participate tend to be diff from ppl that enroll, so if low response rate you can't generalize well..., d/o the Q asked and on why they didn't respond
-you can try to deal w this by finding out the traits of the nonresponders, but its hard
-Getting Enough Subjects
-You will likely get way less ppl enrolled than initially estimated. So, first do a pretest, to plan the study with an accessible population thats larger than you think needed, as a contingency in case you will need more subjects than initially thought.
-You must keep in mind and monitor the progress meeting recruitment goals, and looking at reasons why you fall short, looking at the diff proportions of potential subjects that were lost at each stage... help enhance recruitment by reducing these loses..
Chapter 4 - Precision & Accuracy - Planning Measurements
-Precise = free of random error
-Accurate = free of systematic error
-Measurement Scales
-Continuous Variables- quantified intervals on infante scale of values (e.g. weight)
-Discrete Variables- have a finite number of intervals (# of beers pt drinks/day), but if a large # of values are possible, and so are statistically similar to continuous variables
-Categorical Variables- If cant be quantified
-Dichotomous- only 2 possible outcomes
-Polychotomous- >2 outcomes; type:
-Nominal- not ordered (e.g. blood type); qualitative/absolute value so easy to measure
-Ordinal- ordered categories (mild, mod, sev),
-Ordinal is better than Nominal, but not as good as Discrete variables
-Picking the best Measurement:
-best to use a continuous variable bc it allows you to compare smaller variations--> more info--> more Power w a smaller sample size
-Exception: if looking at R/F's, then you just want to categorize pt by some attribute (+/-outcome...) and thus a larger sample size would be more helpful
-If you can pick the number of response categories in ordinal scale (e.g. dislike, like, love), the results can be converted to a dichotomous scale later
-Precision
- = free of random error (chance)
-types of error:
-Observer Variability = variation bc of observers- e.g. word choice of interviewer, skill of mechanical instrument
-Instrument Variability = variation bc of changing env factors- e.g. temp, aging parts...
-Subject Variability = intrinsic biologic variation in subject- e.g. changing mood...
-measurements are reproducible (thus even if not near true value, these values are close to ea other)
-More precise--> more powerful study at each sample size
-Assess Precision
-Within-Observer Reproducibility- single observer checks measures on a set of subjects
-Between-Observer Reproducibility- diff observer checks measures on a set of subjects
-Within-Instrument Reproducibility- single instrument used for repeated measures...
-Between-Instrument Reproducibility- diff instruments used for repeated measures...
-Within Subject Standard Deviation = reproducibility of continuos variables
-Coefficient of Variation= within subject SD/mean, used for continuous variable, better than the Within subject SD, if the "bkand altman" plot of the within subject SD vs subjects mean is linear assoc.
-Percent Agreement & Kappa used for categorical variables
-How to enhance Precision
-Standardized measurement method-
-Operational definitions- specific instructions for measurement making
-how to prep the env, the subject, how to carry out interview, calibrate instrument, ...
-Train & certify observers- test their skill of getting the data
-Refine instruments- ensure machines to measure are good, write down questionnaires...
-Automate Instruments- removes human observer variability
-Repetition- reduces effect fr random error
-Accuracy-
- = free of systematic error
-the degree to which it actually reps what it's meant to represent (proximity to the truth)
--> internal validity (amt it reps the sample) and external validity (amt it reps the population)
-Sources of systematic error:
-Observer bias- distorted perception in reporting the measurements (e.g. tend to round up, leading questions...)
-Subject bias- distorted measurements by the subject (e.g. respondent bias- distorts how they report the event, pt w the dz and think they know the cause may exaggerate how much they had the cause...)
-Instrument bias- faulty machine, poor calibration, etc.
-How to check Accuracy:
-Measure the result against a Gold Standard
-if using a Continuous variable, check the mean diff bn the measure of the investigation and the gold standard
-if using a Categorical variable, assess the Sn and Sp compared to gold standard
-How to enhance Accuracy:
-Standardize measurement method (see above)
-Train and certify observers (see above)
-Refine instruments (see above)
-Automate instruments (see above)
-Make unobtrusive measures- reduce Subject bias by using measures they are unaware of (check number of candy wrappers in trash to check eating habits, instead of asking subject how many eaten)
-Blinding- reduce Differential Bias- bias that affects one group more than another
-Calibrate the instrument- use gold standard to calibrate them...
-consider the importance of the variable, magnitude of potential impact, and amt of inaccuracy it would cause, and the feasibility and cost of conduction the above strategies to eliminate each bias
-Validity-
-a type of accuracy; = how well the measurement reps the truth
-Content Validity-
-Face validity- subjective judgement about whether the measure makes intuitive sense
-Sampling Validity- whether the measure incorps. all the aspects of the phenomenon (e.g. QOL study- does it include all aspects of QOL- psych, social, emotional...)
-Construct Validity-
-How well a measure conforms to constructs (theoretical concepts) about the entity
-if a trait theoretically is different bn 2 groups, then the measure has construct validity if it does show there's a diff bn the 2 groups
-Criterion-Related Validity- degree to which the measure correlates w an external criterion of the phenomenon
-Predictive Validity- ability of the measurement to predict the future occurrence of the item (it's valid if the measure an predict the outcome)
-Start by searching the Lit and c/s experts to find a good instrument that has already been validated. That way you don't have to design one yourself..., It also makes it easier to compare results w earlier studies, and strengthens ability to get grants/publish results. However, you must make sure it isnt outmoded/ensure it's appropriate for the study.
-Sensitivity
-the measure must be Sn enough to detect the diffs bn the 2 groups. The amt of Sn needed d/o the Q
-Specificity
-the measure must only rep the item your looking for...,
-Appropriate
-ensure it's appropriate to your objective,
-Distribution of Responses
-the measure must yield an adequate distrib of responses of the subjects, thus gives a range; ensure the results don't just cluster at one end of the range
-Objectivity
-reduce observer involvement, increase structure of the instruments and the degree to which it addresses the specific detail..., but at the same time be careful not to cause tunnel vision, whereby the measure is so limited in scope that it misses unanticipated phenomena...
Chapter 5 - Hypothesis & Estimating Sample Size
-Research Hypothesis- the final, more specific version of the original hypothesis
-summarizes study elements: sample, design, predictor & outcome variables
-the purpose of the hypothesis is to establish the basis for tests of statistical significance
-You don't need a hypothesis for a descriptive study, one that describes how characteristics are distributed in a population (e.g. % ppl w lung cancer smoke).
-You do need a hypothesis if the study compares findings among groups, (e.g. % ppl smoke w lung cancer vs w no cancer).
-If you are using words like >, <, causes, leads to, compared to, more likely than, assoc w , related to, similar to, correlated w ==> it is NOT descriptive, and thus needs a hypothesis
-Making a good hypothesis:
-SImple vs Complex:
-Simple- 1 predictor and 1 outcome
-Complex- >1 predictor or >1 outcome, (e.g. EtOH or sedentary life on DM; EtOH on DM or CA)
-can't be tested w a single stat test, better off approaching them as 2 separate simple hypoth's
-Sometimes you can combine the predictor/outcome (e.g. smoke cig or pipe on CA)...
-Specific vs Vague:
-Specific: no ambiguity about the subject, the variable, the stat test,
-use concise operational definitions, summarize nature and source of subjects and how variable will be measured
-It must be obvious fr the hypoth if the predictor variable and outcome variable are dichotomous, continuous, or categorical, and it is unclear then you must specify the variable
-e.g. EtOH use (>30mg/day) will increase risk of proteinuria (>30mg/dL)....
-In Advance vs After the Fact
-State in writing at the outset. THis keeps the research focused on the main objective
&--> stronger basiss to interpret results.
-if you make hypoth p looking @data, you are more likely to over interp finding's importance
-Types of Hypotheses:
-Null Hypothesis & Alt Hypoth: Null = no assoc bn predictor and outcome, thus no diff bn the groups
Alt Hypoth = there is an assoc bn predictor & outcome, the groups are diff fr each other
-it is not directly tested, but instead is accepted if you reject Null hypoth w the test
-One- and Two-Sided Alternative Hyoptheses
-One-Sided = specify the direction of assoc (x is more common in this group than that group, z causes the item to go up in this group compared to that group)
-Two-Sided = there is an assoc, doesn't specify which direction
-One-Sided Hypoth's good if there's only 1 direction for it to go..., e.g. the htn Rx causes more rash than placebo (we don't care if it causes less rash than placebo)
-ok if there's good evidence fr prior studies that an assoc is not likely to occur in other drctn,
but be careful bc prev assumptions/study outcomes may be wrong...
-It is easier to use a One=Sided outcome. Howevever, you may have to use a Two-Sided hypoth for sample size planning even if your research hypoth is One-Sided because grant/manuscript reviewers etc are critical of One-Sided hypoths, and statistical rigor requries you to make a choice bn One and Two sided hypoths before looking at the data. You can't just switch to a One-Sided hypoth to lower the P value (!)
-Underlying Statistical Principles
-Level of Statistical Significance- the standard for the stats test to reject Null hypoth
-Type I Error = False Positive- incorrectly reject Null hypoth (=there's a diff bn groups, really no diff)
-Type II Error = False Negative- incorrectly keep Null hypoth (= there's no diff, really there is a diff)
-incr sample size, change design or measurements to reduce these errors
-Errors can be due to chance (random error) or bias (systematic error)
-Effect Size-
-likelihood to detect a diff bn groups d/o the amt of diff that exists (big diff = easy to see), but you can't tell ahead of time how diff the groups are (that's the pt of the study), so instead you should choose the Effect Size- how much of a diff bn the groups you want to be able to detect.
-Picking an effect size is the hardest part of choosing the sample size
-Review Lit and try to figure out what is a reasonable amt of difference bn the groups that you want to be able to detect, or choose the smallest effect size that will be clinically important, or do a pilot study first
-BOTH the relative diff's and the absolute diff's matter when picking the effect size
-if x group has 20% of dz and y has 30% dz, then you can say y has a 10% more absolute incr in risk or a 50% relative incr in risk...
-After collecting the data, you review it to try to reject the Null hypothesis.
4 possible outcomes: Correctly accept Null, Correctly Reject,
Incorrectly Accept Null (Type II), Incorrectly Reject Null (Type I)
-Must establish ahead of time what the max chance of making a Type I or II error will be
-alpha =probability of rejecting Null even though it's true (Type I- you incorrectly said theres a diff)
-aka Level of Significance; alpha of 0.05 = 5% chance of making a Type I error
-beta =probability of accepting Null even though it's false (Type II- you incrctly said there's no diff)
-Power = [1-beta] - probability of rejecting the Null correctly (ie the actual effect meets the equals the effect size). if beta is 0.1, then the investigator says I am willing to accept a 1-% chance of missing an association of a given effect size. Thus the power is 90% (I will pick up on 90% of cases whereby the there is a diff bn the groups that meets the effect size)
-an alpha and beta of 0 is perfection..., pick your sample size to keep them as low as possible, but in practice we settle on an alpha of 0.05 and beta 0.2 (Power of .8) usually, though some use stricter #s
-Better to use a lower alpha if it is important to avoid a false positive (type I), e.g. it is a potentially dangerous med, so you want to know if it really effective; and use a a low beta if it is important to avoid a false negative (Type II), e.g. want to reassure ppl that something is safe
-P Value
= probability of seeing an effect as big or bigger than that in the study by Chance, if the null hypothesis were actually true (ie probability that if samples are really the same, you'd still get a difference in your results)
-If P is < alpha, then you can reject the Null - this means that you prev set a level for a false positive (alpha), and so you are comparing the probability your numbers are because of chance (and thus false +) (P value) compared to what you set out initially (alpha)
-Result is Nonsignificant if P>alpha
-this doesnt mean there is no assoc in the population, but rather that the result is small compared with what might have occurred by chance alone (much of this could be fr chance)
-Important to remember that P values are not all or nothing, even if you don't have P<0.05, still that data might show some tendency away from Null hypoth, but you can't exclude that it is really due to chance...
-Sides of the Alternate Hypothesis
-With 2 sided tests you can err in either direction, so the P value includes the prob that you made a Type I error (false+) in either direction
-You can convert the P value bn 1 sided and 2 sided: P value of 0.05 for a 1 sided test is 0.1 for a 2 sided test (usually)
-If you're only interested in one side of the outcome, and made the alt hypoth reflect this, then you can plan the sample size around this, but be aware that you will have less power to test a 2 sided hypoth if you decide later on you want to (you would have to ~double your P value as above... to account for the chance of error in either direction)
-You should never use a one sided hypoth just bc you want to reduce the sample size
-Must pick your statistical test before choosing sample size (to be discussed later)
-Variability
-greater variability of the outcome variable --> more likely the groups will overlap, and thus harder to show an overall diff bn the 2 groups, so will need larger sample size. Precise measurements help reduce variability some.
-If using a continuous variable, you must estimate the variability, but not for other types of variables bc the variability is already included in the other parameters entered int o the sample size formulas...
-Multiple & Post Hoc Hypotheses
-Post Hoc Hypoth = made after analyzing data
-If looking at multiple hypoth (espec post-hoc ones), you increase the risk that at least one of the outcomes will be bc of chance
-e.g. test 20 diff independent hypoth's w an alpha of 0.05, then 64% (1 -.9520) chance that 1 will be a false+; so if you do that, then you must lower your alpha accordingly...
-Bonferroni approach- divide the significance level by # hypoths testsed (0.05/#hypoths), so to test 4 hypoths w intent to get alpha value of 0.05 for each test, then you must use an overall alpha of 0.0125, thus a much bigger sample size needed
-This might be too stringent, so they rec using it for like >10 hypoths or if theres a high chance of false+ error
-It is misleading to test many hypoths and report on just a few
-What is prob more important than # of hypoths tested is the prior probability of a false + (so if high chance, then use lower alpha value)
-Also, if the hypoth makes sense, then you shouldn't have to consider the impact of a bunch of other hypoths you tested that don't make sense (e.g. not likely to be significant anyway)
-Bayesian approach- consider the prior probability of the outcome, ie is there substantial prior reasonableness of the hypoth being tested...
-it is also no good to go fishing for unanticipated associations, (hypothesis generation) ie making comparisons during data analysis, it is like using multiple hypotheses
-Similarly don't redefine the variables during data analysis, or when presenting data fro some of the subgroups
-Signif P value for data generated hypoths that weren't considered during the deign are often due to chance
-Best to specify one primary hypothesis that will be tested statistically, without plan to adjust for multiple hypoth testing. This focuses the study on its main objective and helps to calc sample size
Chapter 6 - How to Estimate Sample Size & Power
1) State Null hypothesis & the 1 or 2 sided Alt hypoth
2) Pick the right statistical test based on the type of predictor variable & outcome variable
3) Pick a reasonable effect size (and if needed variability (for continuous variables))
4) Set alpha and beta (make it a 2 sided alpha unless the hypoth is def 1 sided)
5) Use the right table/formula to estimate the sample size (in appendix of bk...)
Types of Tests:
Dichotomous predictor
-Dichot outcome --> Chi squared test
-Cont outcome --> t-test
Continuous predictor
-Dichot outcome --> t-test
-Cont outcome --> Correlation Coefficient
Other- some tests won't it neatly, so you may need some other types of tests...
t-Test
-aka Student's t-test
-Does the mean of a continuous variable in one group differ from the mean in the other group?
-Assumptions: the distrib of the continuous variable is bell shaped
-Robust, so few limitations. Limitations <30-40 subjects, extreme outliers
-Sample Size:
-State Null hypoth, 1 or 2 sided Alt hypoth
-estimate E = effect size - the difference in the mean outcome of ea. group
-estimate S, the variability of the outcome variable (Standard Deviation)
-Calculate E/S = Standardized Effect Size
-Set alpha and beta
-Use prior studies to estimate E and S, or do a pilot study
-Note if you are looking at a change in a continuous variable as the outcome (e.g. change in weight after intervention), make sure you are calculated the S for the change in weight's variability, not in the variability of the weight's themselves... (the former will be a smaller SD, so smaller sample size needed)
-Standardization = dividing E by S--> simplifies comparisons bn effect sizes of diff variables; the larger the stndrzd effect size, the smaller the sample size needed (ie your looking for a big change with little variability...)
-E/S should usually be bn 0.1-0.5, bc to small an E are hard to detect, to large are obvious...
-For Case-Control Studies using a continuous predictor variable, yo use t-test to compare the mean value of the predictor variable bn the cases and the controls.
-Shortcut to calc Sample Size if >30 subjects studied with power of 0.8 (beta =0.2)
-Sample size for each group = 16/[(E/S)2]
-Two-sample t-Test = one that compares between 2 diff groups
-Unpaired t-Test = compare a variable in one group vs the other
-Paired t-Test = compare the change in the variable in one group vs the other
-One-sample t-Test = compares the variable/change in variable to zero.
Chi-Squared Test
-Compare proportion of subjects in each of the two groups who have a dichotomous outcome
-e.g. % men with ACS on ASA vs % mean with ACS off ASA...
-Always 2 sided (if you do a 1 sided for this type of study it would be a one sided Z test)
-Effect size is the difference in proportions:
-Cohort/Experiment Studies: P1 =have outcome, P2 = don't have outcome
-Case-Control Studies: P1 = have risk factor, P2 = don't have risk factor
-bc variability d/o the proportions, it's built in and so isnt specified...
-Sample Size:
-State Null hypoth, 1 or 2 sided Alt hypoth
-Estimate Effect Size in terms of P1 and P2 (% w outcome in each group)
-Set alpha and beta
-see appendix for table w values of sample size...
-May specify Effect Size in terms of Relative Risk of the outcomes of ea group
(instead of P1-P2, which would be an absolute difference)
-For Cohort study (comparing group w diff R/F, check % outcome of dz) use P1/P2 or P2/P1
-For Case Control study (compares groups w dz, looks back for R/F) must use an Odds Ratio to describe relative risk of having the outcome/dz:
OR = P1(1-P2) / P2(1-P1)
-Specify the OR and the P2 (the % of control pts exposed to the predictor variable)
-Then, P1 (% of Case pts (pts w dz) exposed to the predictor variable) is:
P1 = (OR*P2) / [(1-P2) + (OR*P2)]
-e.g specify that you expect 10% of controls to be exposed to the variable (e.g. they have the risk factor), and you want to detect an OR of 3 (of having the dz) assoc with that exposure...
Correlation Coefficient (r)
-measures the strength of the linear assoc bn 2 variables
-range fr -1 to 1
-Negative # = if one incr, the other decr
-closer to 0 ==> weaker assoc
-hard to use to estimate sample size bc little intuitive meaning
-r2- = proportion of the variability of the outcome that is due to the item's linear assoc w the predictor variable. If r=0.04, then one of the variables explains 16% (0.16) of the other variable
-Instead of using r, you can dichotomize one variable and use a t-test
-Sample Size:
-State Null hypoth, 1 or 2 sided Alt hypoth
-Estimate Effect Size as absolute value of the smallest correlation coefficient (r) tha you would like to be able to detect.
-no need to discuss variability bc it is a fx of r, so it's built in
-Set alpha and beta
-See Appendix C for table w sample size #s...
Other issues:
-Dropouts
-You should increase the sample size by the number of subjects you expect to dropout/you are unable to f/u with, so that during data analysis you can account for dropout and still have an appropriate subject size
-Categorical Outcomes
-Ordinal Variables- can treat as continuous, espec if large # options (>5)
-or, you can change the hypoth to dichotomize the categorical variables
-e.g. estimate sample size based on an all or none outcome possibility
-Survival Analysis
-To compare which of 2 treatments is more effective at prolonging life/reducing Sx phase of a dz, then use Survival Analysis
-outcome is e.g. weeks of survival- it seems like a continuous variable, but you can't use a t-test bc you're not actually measuring the time of survival, but the % of pts alive at a given time pt. Instead use a dichotomous outcome after a certain pd of time (e.g. if you're doing a 6mo f/u then look for alive/dead at 6mo post intervention), then estimate the sample size using a chi squared test
-Clustered Samples
-sample subjects by groups
-e.g. compare practice in 20 medical groups doing x vs 20 medical groups doing y, by checking charts of some pts fr each group. You are getting a cluster of some pts fr each larger group..., thus you must pick a subject size that accounts for the variability of pts within each practice..., hard to calculate...
-ballpark it by assuming similar numbers of subjects in each cluster, by aggregating the outcomes within a cluster- e.g. give a score to each practice to rep the % pts in that practice who had the outcome, then use a t test to estimate the # of practices needed.
-Matching
-may use a Matched design...
-Multivariate Analysis
-if you think some variables will confound the assoc bn predictor and outcome, then must plan to use techniques to adjust for these, so sample size will need to account for these
-the incr in sample size needed d/o
-Prevalence of the confounder
-strength of the assoc bn predictor and confounder
-strength of the assoc bn outcome and confounder
-It's complex, some methods include:
-Linear Regression
-Logistic Regression
-Cox proportional hazards analysis- adjusts for both confounders and for diff in length of f/u
-Other methods can be used to acct for subject's genetics, economic studies, dose-response studies...
Equivalence Studies
-studies to prove the Null hypothesis is correct
-e.g. to test if a new Rx is as good as an old Rx
-hard to pick sample size bc the Effect Size is 0.
-can choose an effect size that if present would not be clinically important
-if the newer Rx has less xx/is cheaper than you might accept a larger effect size
-The small effect size and large power needed--> need large sample size
-also, things that would normally make you less likely to show a diff bn groups (e.g. problem with the Sn of your measurements, losing pts to f/u) now will cause you to more likely say there is Equivalence (you'd accept Null hypoth), even if there is a diff you didn't pick up...
Descriptive Studies
-Descriptive studies don't have predictor and outcome variables; they don't compare things, so the Null/alt hypoth don't apply
-Instead, use descriptive stats- mean, proportions
-But, often, after checking descriptors (% elderly with depression), you then ask an analytical Q (what are the predictors (R/F) for the outcome?). So if you think you will ask an analytical Q, you should use a sample size that will be big enough for the analytical analysis.
-Use Confidence Intervals - the range of values about the mean or proportion
-measures the estimate's precision
-larger CI (e.g. 99% vs 95%) is more likely to include the true positive value bc it's wider
-Width of CI d/o sample size (bigger size sample--> more narrow CI)
-Start by picking the level of confidence and the width of the CI, then use the table/formula to calc sample size needed
-Continuous Variables
-use the CI around the mean value of the variable reported
-Sample Size:
-Estimate standard deviation of the variable you're looking at
-Choose the precision (CI width)
-Choose the confidence level (95%, 99%, etc)
-then use the table to calc sample size
-Dichotomous Variables
-can express the result as a CI around the % of subjects with one outcome or the other
-e.g. for a study looking at the Sn or Sp of a dx test
-Sample Size:
-Estimate the expected proportion w the variable of interest in the population
-if 1/2 pop is expected to have it, then plan sample size based on the proportion expected NOT to have the characteristic... (you're using the smaller (thus--> more stringent) option)
-Choose the precision (CI width)
-Choose the level of confidence
-use table/formula to calc sample size needed
-can also use diff methods to estimate sample size if you are looking at ROC (receiver operating characteristic) curves, likelihood ratios, reliability as the outcome
Working with Fixed Sample Sizes
-work backwards to calculate what the effect size that can be detected at a given power (usually 80%)
-use the sample size formulas/tables to calculate backward
-rule of thumb: need power >80% to detect a reasonable effect size, otherwise you might not demonstrate any diff bn the groups...
How to Minimize Sample Size and Maximize Power
-If you need more pts than you have:
-Use continuous variables
-allow for smaller sample sizes than dichotomous variables (greater power)
-though it may miss the pt, e.g. maternal MVN on birth wt, vs MVN on neonate morbidity (latter is what really matters, former is more of a surrogate, but you can sparse out some diff's)
-Use Paired Measurements
-with Experiments/Cohort studies that use continuous outcome variables, using a pared measurement, w one at baseline and another at the conclusion of the study, can be done for each subject. Now, the outcome variable is the change bn the two measurements, and you can use a t-test on the paired measurements to compare the mean value of the change in each group.
-Use More Precise Variables
-this will reduce variability (smaller SD, thus when calculating E/S for effect size, you get greater #)
-Use Unequal Group Sizes
-You may have more ppl in 1 of the groups; using all of them will increase power
-See Appendix 6A and 6B for calculating sample size w groups of unequal size
-Use a More Common Outcome
-for dichotomous outcomes; if it is more freq (up to 1/2), the power will increase
-bc if it is more common, you're more likely to pick it up in the study
-Power actually d/o the # subjects with the outcome than on the total # subjects in the study
-thus if the outcome is rare, you will need a large sample size for good Power
-Can improve outcome freq by focusing on a group that is more likely to have the outcome, or extend the f/u pd, or loosen the definition of what = +outcome (but do these w caution...)
What if you don't have enough info to calculate sample size?
-search prior literature , ask experts...
-pilot study - help estimate SD, proportions of subjects w the outcome
-dichotomize the variable if you areu unsure of the mean and SD, then you can divide everything into 2 groups and split each by the mean or median, and use a chi squared test to estimate sample size
-educated guess
Mistakes to Avoid
-Don't mistake a dichotomous outcome for a continuous one:
-alive or dead might be reported as a %alive or %dead, but really it's dichotomous
-survival analysis is dichotomous, though might seem like continuous (year lived after xyz...)
-thus you should use a chi-squared test
-Sample Size # = the # ppl need to be FOLLOWED, not enrolled (bc you must acct for dropout...)
-Tables in their chapter assume equal sample size in each group. If diff sample size, then must use the formulas
-If using t-test to calc sample size, what matters is the SD of the outcome variable, thus if the outcome is a change in the continuous variable, then use the SD for the change, not for the variable itself...
-Calculate sample size early on, or you will regret it...
-Be aware of clustered data- if there are "two levels" of sample size (one for physicians and another for pts), then ensure to acct for clustering...
Chapter 7 - Cohort Studies
-follow groups over time
-Descriptive - describe incidence of an outcome over time
-Analytic- look for associations bn predictors and the outcomes you are following over time
-2 types
-Prospective- define the sample and then measure the predictor variables before any outcomes occur
-Retrospective- def the sample & then collect data about predictor vars. after outcomes have occurred
Prospective Cohort Studies
-Strengths: good to define incidence of stuff, and look for potential causes
-prospective can prove that the potential cause came before the outcome
-the time sequence--> strengthens the inference that the factor causes the outcome
-prospective good bc you can measure things that are hard to recall etc, more accurate than trying to reconstruct past exposures after the outcome already occurred
-prospective limits bias from knowing what outcome occurred while looking for the factors...
-Weaknesses:
-expensive, inefficient if the outcome is rare; e.g. you'd need huge #ppl for colon CA r/f...
-more effic if more common outcome, so might be easier to look at progression after you've been treated for colon CA
-usually, the investigator excludes ppl who have already been dx'd w the dz (=inception cohort), this assumes that the predictor variables measured at the beginning of the study are not influenced by the outcomes.
-some of the predictors might actually be caused by the outcome (bc they are really early sx of the outcome)
-so, first use tests to r/o pts w subclinical forms of the dz
-and, extend the time frame so that the duration of following them is longer than the preclinical phase of the disease...
Retrospective Cohort Study
-Strengths: same as prospective
-can establish that predictor variables precede outcomes
-collecting measurements before outcomes are known--> not biased by knowledge of the outcome
-less costly/time consuming than prospective
-Weaknesses:
-limited control over the deisgn of the approach to sampling the population, or over nature/quality of predictor variables
-may not have all the info you need about the subjects to answer the clinical question
-data my be incomplete, inaccurate, measured in not ideal ways.
Nested Case-Control & Case-Cohort Studies
-Nested Case-Control Design- case control study nested in a prospective/retrospective cohort study
-good for predictor variables that are expensive to measure and that can be assessed at end of the study
-get a good cohort of subjects w enough cases to --> good power to answer the research Q
-describe criteria that define the outcome of interest
-Then, ID all subjects who got the outcome (the cases)
-Then, pick a sample of the subjects who were also part of the cohort but didn't get the outcome (ctrls)
-Then, get samples/images/records taken before teh outcome occurred, and measure the predictor variables for cases and controls, and compare the levels of each R/F in cases compared to controls...
-Might improve power by picking control patients that match case pts (e.g. by age, sex, etc), but be cautious about this bc you might be better w an unmatched design w statistical adjustment after the study.
-Might pick case subjects based on when they entered the study, so it is the same time as ctrls.
-Nested Case-Cohort Design-
-here, you select people at random (from both cases and controls), then look at predictor variables and which get the outcome.
-advantages: can use a single random sample of cohorts to be a control for several case control studies of diff outcomes; and the random sample can provide info on R/F prevalence
-Strengths:
-good if the measurement is costly...
-Weaknesses:
-possible that observed associations are due to effect of confounding variables, and that baseline measurements may be affected by silent preclinical dz
...
Multiple-Cohort Studies & External Controls
-following several cohorts, each with diff levels of exposure to the predictor, then look at the outcomes of each group...
-here the sample groups are chosen based on the level of exposure
-ddx a case-control study subject is chosen based on outcome presence
-might instead compare outcome of members of the cohort to that of a registry/census, instead of using a second cohort
-Strengths: may be the only way to approach a study of a rare exposure
-Weaknesses: confounding may be a problem, bc the diff cohorts can differ in important ways besides presence/absence of the predictor variable, and you can't always anticipate the differences
A good study:
-With cohorts, it is important to follow up the entire cohort
-exclude ppl who might move away etc
-get all the contact info you can (fr them, their Dr's etc)
Chapter 8 - Cross Sectional & Case-Control Studies
-Cross Sectional Study- make all measurements at once
-looks at distrib of variables in the sample, then infer cause/effect fr associations
-Case-Control Study- (work backward), pick patients with known dz (cases) and known no dz (ctrl)
-then, compare the freq of the predictr variable in the 2 groups to see which ones are assoc w outcome
CROSS SECTIONAL STUDIES
-like a cohort study, but make all measurements at once, w/o a f/u pd
-good to describe variables and distribution patterns
-good to look for associations, but hard to decide which variable is the predictor vs outcome
-good for Prevalence- % population w dz at one pt in time
(but not Incidence- % pop who get it over a pd of time)
-Relative Prevalence-ratio of prevalence of an outcome in groups classified by level of the predictor variable (similar to a relative risk)
-fast, cheap, no lost to f/u
-easy to use a cross sectional study at the start of a cohort study
-weaknesses: can't establish causal relationship well; not good for rare dz (unless you focus on only pts w the dz...)
-Serial Surveys
-series of cross sectional studies on same pop helps infer about changing patterns
CASE-CONTROL STUDIES
-usually retrospective
-start w pts w known disease (case) and known no dz (ctrl), and look back to see who had what predictor variables that might be a cause
-can't show prevalence or incidence bc it is the investigator who decides how many of dz'd pts to look at
-are good for looking at predictors and assessing strength of assoc bn the predictor and dz outcome
-make these estimates as an Odds Ratio- approximates the relative risk (ok unless high prev.)
-Strengths:
-Efficient for rare outcomes- don't need as many patients to follow, and you can look backwards rather than waiting for a dz to show itself
-Useful for Generating Hypotheses- you can look at a large number of predictor variables, so you can put forth which one you think is the cause of an outbreak of dz...
-Weaknesses:
-limited info available:
-can't directly estimate the incidence or prevalence of the dz
-can only look at one outcome (+/- the dz), unlike cohort & cross-sectional studies which can look at many outcomes at once
-more vulnerable to Bias- bc you sample the case and cohort separately, and you measure the predictor variables retrospectively
-Controlling Sampling Bias
-When sampling Case patients, how to know who has the dz? you can only pick fr pt's who have already been diagnosed, and who are available for study
-dead patients, undiagnosed patients, misdiagnosed pts are less likely to be included
-this is usually a problem w Dx that might not be immediately apparent...
-Usually you take what patients are available
-It's hard to pick Control patients too. Try to get them from general population, with people of similar risk for the disease otherwise
-Hospital or clinic based controls:
-use subjects fr same facility to control for selection bias, though this could be biased if the risk factor assessed also causes the patient to seek care in the clinic (it will falsely increase the prevalence of the risk factor, compared to the gen population)
-because hospital/clinic pts are often sicker, it can introduce bias..., but they are used anyway because of convenience- Ask, is the added convenience worth the potential bias, or will it threaten the validity of the study too much?
-Matching:
-ensures that case and controls are comparable for other factors (race, sex, etc) that might have an impact on outcome..., but this can have an impact and cause bias to influence the outcome...
-Using a population based sample-
-w diseases where there is a registry of dz, it is easier to do this to find case, then you can find similar controls by looking in the same area of the city.
-One method is Random-Digit Dialing- use the same exchange number as a case, then dial random extension numbers till you get someone w similar characteristics...,
-but this does introduce bias bc all controls have telephones, and if house has >1# may be over represented.
-Use 2 or more control groups
-that are selected in diff ways, & if the #s are consistent for both, its less likely to've been biased
-Controlling Differential Measurement Bias
-e.g. recall bias
-see ch 4 for basic strategies
-Use data recorded b4 the outcome occurred- as long as you search equally vigorously for case & ctrl
-Use blinding- of subjects and of data analyzers
CHOOSING THE RIGHT STUDY DESIGN
See Table 8.3 of bk for Pros and Cons...
Chapter 9 - Enhancing Causal Inference in Observational Studies
-if a study shows what seems like a cause-effect association between 2 variables, you must consider
-whether there really isn't an association in the populatn (that bias or chance caused the sample results)
-whether there really is an association, but not a cause-effect association.
-it might be an effect-cause relationship - that the apparent outcome variable is really the cause of the apparent predictor variable
-there might be a confounder- a third factor that causes/is assoc w both variables
Spurious Associations
-Rule out Spurious Associations Due to Chance
-e.g. due to random error (Type I) due to chance. Reduce this risk by:
-increase measurement precision
-increase sample size
-use P value and CI to quantify the magnitude of the observed assoc, compared to what might just be chance...
-Rule out Spurious Associations Due to Bias
-must DDx the research Q and the question actually answered by the study
-the latter is a reflection of what compromises the investigator made to make the study possible
-this bias can make the actual Q diff than the desired Q bc of systematic errors in design/analys.
-Design Biases
-Do the samples of study subjects represent the population of interest well enough?
-Do the measurements of the predictor variable rep the predictor of interest well enough?
-Do the measurements of the outcome variable rep the outcome of interest well enough?
-if yes/maybe, c/s if the bias is big enough to cause the study to have the wrong answer to the initial question.
-if you cannot fix the bias, consider changing your question or not doing the study...
-Analysis Biases
-if you find out in analysis phase that one of the above has occurred, try to get more info
-for sampling bias, if you can assess the magnitude of bias, by finding out how many of the control patients might have a similar characteristic as the case group (the one that caused the bias), than you may still be able to carry out the study if most of them have the same issue...
-for measurement biases, you could take a subset of cases and controls and give them a better test to see if it correlates with the initial, less optimal test...
-if the outcome measure is doubtful, you can use a more stringent outcome measure and apply it to the subjects to see if it really impacts the result
-You can also look at results of other studies; if yours is consistent w theirs, it less likely biased...
Real Associations that are NOT Cause-Effect
-Effect-Cause relationship
-outcome has actually caused the predictor
-often a problem w cross-sectional and case-control studies espec if the predictor is a lab test for which there were no prior lab tests avail, (e.g. CRP predicts MI, but really MI might have caused hi CRP)
-less likely w cohort bc you have +R/F with before (apparently) a dz occurred
-unless dz has a long latent period...
-must make biological sense...
-Confounding
-an extrinsic factor involved in the assoc that is the true cause of the outcome
-Confounding variable = one that is assoc w the predictor and a cause of the outcome
-can be the hardest, but most important, alternative to rule out...
How to reduce Confounding
-Design Phase
-w experiments, you can control confounders w experimentation, w other studies, you must be aware of confounders to control for them.
-List the variables (e.g. age, sex...) that might be assoc w the predictor var and may also --> outcm
-Choose strategies to control the influence of these
-Specification
-specify a value of the potential confounder, and exclude all pts w a diff value (e.g. exclude smokrs)
-thus if you see an assoc bn the predictor and outcome variables, you know its not bc smking.
-Disadvantages-
-Effect Modification aka Interaction- the predictor might cause the outcome only in the group you just excluded--> specification limits generalizability
-If the value is highly prevalent, you might not be able to recruit enough ppl w/o it
-Matching
-match values of the confounders in each group. Here you can still generalize bc u included them...
-Pairwise Matching- done individually - match each case patient to an equivalent control pt
-Frequency Matching- match in groups (may need more control patients than case pts...)
-most commonly used method for case control studies
-Advantages:
-effective prevention against confuonding by constitutional factors- age, sex, etc that are not susceptible to intervention, and aren't likely to be an intermediary causal pathway
-can be used to ctrl confounders that can't be measured or otherwise controlled
-e.g. matching siblings--> ctrl for genetics etc
-Increases precision by balancing # cases and ctrls at each level of the confounder.
-May be a sample of convenience- to narrow down a large # of potential ctrls
- at the risk of overmatching...
-Disadvantages:
-time, money
-may not find a control to match a case, so you must throw out the case
-Must decide to match at start of study, and it's irreversible (bc u do it at sampling stage)
-thus cant do further analysis of the effect of the matched variables on the outcome
-can have a major problem if the confounding var isn't fixed/constitutional (age/sex...), or is an intermediate in the causal pathway bn the predictor & outcome
-e.g look at EtOH amt on MI risk, w ctrl for [HDL] --> miss beneficial effect of HDL incr due to the EtOH.
-Must use special analytic techniques for analysis, to compare each subject with only the individual(s) he/she is pared with, and not others who might have a diff level of the confounder
-using reg stats techniques would likely cause a bias twd no effect, bc it uses an assumption that the groups are sampled independent fr each other (which isn't true, they are paired deliberately)
-Overmatching- if you match a var that is not a confounder bc it is NOT assoc w the outcome
--> reduce Power in case-ctrl study, so harder to find an assoc that actually does exist
-it will reduce the statistical signif, but NOT the relative risk...
-Analysis Phase
-Stratification- -->only cases and ctrls w similar levels of the confounder are compared
-First, stratify the subjects into subgroups based on level of the potential confounder
-Then, look at relationship bn predictor/outcome in ea group, separately
-e.g. look at smokers and nonsmokers separately
-Advantage: flexible- you can do it w many potential confoundrs to see which var is a true confndrs
-check if it's a true confounder by seeing if there's a true diff bn each stratum of ea group
-Disadvantage: you can control only a limited # of variables at once, (you may not have controls that meat the variable requirements of each stratum (that have the confounders (+smoker, not female, >40yr)
-on the other hand, if you broaden the strata too much (e.g. just 2 age groups), you can still have confounders within each group (e.g. for <50yo, might be a diff if <10yo vs >10yo, but ur not checking at this level of detail...)
-Adjustment-
-you can look separately at relationship bn the confounder and the predictor variables, and then set up a model that accounts for this in the outcome variables' data
-can be done for multiple confounders at once w multivariate analysis software
-Advantages: can control many confounders at once, uses continuous variables that are easily adjusted (e.g. not just 2 or 3 categories of strata but a continuous...)
-Disadvantages: model might not fit, ie they might not be right for the particular study,
-results of highly derived statistics are hard to understand intuitively (e.g. parental edu squared, or child sex times parent edu)...
Picking the right strategy
-matching best for smaller sample sizes compared w the number of strata necessary to ctrl for known confounders, and when the confounder can more easily be matched than measured. but use sparingly bc it can permanently compromise the ability to observe real associations...
-think ahead of time which variables you might want to use to adjust/stratify data/subjects later on, so you know to collect them ahead of time
-and ensure you are measuring the confounding var w appropriate precision/accuracy or you will not get good results w later adjustment...
Evidence favoring Causality
-results are consistent in studies done according to diff designs, but can still be bc of real associations (effect-cause or confounders)
-The association is strong- the more signif P value, the less likely it is bc of chance, and reduce likelihood of confounding
-Dose-response relationship is shown, but can also be seen w effect-cause assoc or w confounding
-Biologic plausibility- it should make biological sense...., though make sure you're not just making up a suggested mech for an association...
Chapter 10 & 11 - Designing an Experiment: Clinical Trials
-Apply an intervention and observe the effect on the outcome
-Advantage: can show causality
-Randomly assigning intervention--> eliminate confounder influence
-Blinding--> limit bias
-Disadvantage: expensive, time consuming, narrow clinical Q, potential harm
1) Select Patients
-Define Entry Criteria
-Inclusion/Exclusion criteria- goal to ID the population for whom the intervention likely will cause a statistically significant impact
-thus, they should optimize the rate of the outcome studied, the expected effectiveness of the Tx, the generalizability of the findings, the ease of recruitment, and the likelihood of compliance/follow up
-if the outcome is rare, you usually must recruit ppl at high risk for it, but this will limit generalizability and make it harder to recruit subjects, though it does decr #subjects u must recruit
-if the subjects are likely to have the greatest effect fr the Tx, the trial can be smaller/shorter
-try to limit exclusion criteria bc it limits generalizability
-you should exclude ppl w whom you know it might be harmful, or you already know it will (likely) be helpful (bc they should not be put in placebo group)
-you should exclude ppl who cannot/likely will not be able to be compliant, or unlikely to f/u or inability to comply w Tx (e.g. mental challenge, etc)
-Design an Adequate Sample Size & Plan Recruitment
-very important to have a good sample size estimate
-harder to recruit bc subject must be ok w being randomized and blinded, so plan for a larger # subjects when getting funding etc
2) Measure Baseline Variables
-Collect Tracking Info
-name, address, friends/relatives as alt contact, SS# (to check vital status (death))
-Describe Participants
-Offer enough info to help others judge generalizability
-Also allows to compare pts to their baseline
-First Table of the final report usually compares levels of the baseline characteristics in the 2 groups
-Ensure that diff's in the 2 groups aren't > chance (which might = bias bc of error in randomization)
-Measure Variables that are Risk Factors for the outcome, to use to define Subgroups
-espec if in small trial, try to measure predictors of the outcome so you can answer a secondary research Q.
-it also allows for statistical adjustment of the primary randomized comparison to reduce the effects of chance maldistributions of the baseline factors bn the 2 groups--> incr efficiency of the study
-it also allows you to examine whether there is effect modification (aka interaction)- the intervention has diff effects in subgroups, which though not common can be important
-Establish Banks of Materials
-Store samples/specimens , and keep for later use in other studies etc.
-Measure the Outcome Variable
-measure it at the start in addition to at the end (if possible). This helps show that the dz wasn't present from the start (w dichotomous outcome (e.g. +/-CA)), or the change in the variable (w continuous variable (e.g. BP decr)) ==> more power...
-Be parsimonious
-at the same time, you really don't have to measure anything besides the final outcome bc randomization should have taken care of everything to ensure groups are the same, and adding measurements will add time/money you might not have or be better spent on recruitment etc.
3) Randomize Subjects
-assign the subjects to each of the intervention groups (e.e.g active Tx & placebo, though may have >2)
-How to do a good job at randomization:
-Ensure that it is truly random allocation and that the assignments are tamper proof so that neither intentional/unintentional factor can influence it.
-usually via computer generated algorithm to assign subject to a group
-must ensure that members that have contact w the subject do not know/cannot influence allocation
-c/s using sealed envelope made by one not going to participate in the study
-number each envelope so you know all were used..., opaque, tamperproof
-then first record the name of the pt and the envelope number, then open envelope
-Consider special randomization techniques
-unequal allocation if 3 or more groups w one as a control
-but this disproportionate randomization can complicate things, ...but effect on power is usually small so it is usually best to use same # of subjects in each group...
-If small/mod size trial, you will incr power if you use randomization procedure to balance the study groups in the numbers of participants they contain, and in the distrib of baseline variables known to predict the outcome:
-Blocked Randomization- ensure the number of subjects is = in ea group
-do randomization in "blocks" of a predetermined size,
-Stratified blocked randomization- to ensure predictor variables of the outcome are = distrib bn both groups.
-at the basline, divide the subjects by +/- the predictor variable, and then do a blocked randomization with each group (so half go to ea group),
--> slightly improve power by reducing variability within each group (thus SD less...), but is of little help in larger (>1000 subjects) studies, because chance will ~always ensure even distrib bn the groups...
-however, you can only do this for a few (1-3) baseline variables
4) Apply the Intervention
-Blinding
-to subject, investigators/staff
-as important as randomization, bc randomzn gets rid of initial confounders, but there can be others later on (e.g. co-intervention, when the staff treats a subject differently bc of their treatment group, or patients will get other Tx too if they know they are getting the placebo)
-if you can't blind it, then c/s restricting the subjects so they don't do anything that might confound...
-prevents biased assessment of outcome, so if it is unblinded, make sure the outcome is "hard" (a lab test, etc)
-may need to make a way to quickly unblind a pt's group if they get acutely ill...
-must ensure the investigators can't tell from the results they collect/side effects which group the subject is in..
-you can ask investigators/subjects to guess their group after the study to see how well it was blinded
-Choose the Intervention
-consider the balance of effectiveness & safety, feasibility of blinding, whether you want to treat one or both of the arms, whether it is generalizable to use in practice (e.g. w a fixed dose or titrate to effect)
-Phase II Clinical trials are smaller RCTs testing a range of doses/combos of Tx- here you should test a wide range to ensure the effective ones are included
-bc they are small, Ph II trials cant usually assess safety of Rx
-if your testing a combo of Rx, then you prob won't be able to draw conclusions about a specific Rx...
-Choose the Control
-Can use placebo
-Often, you can't/don't want to withhold treatment proven effective (unethical...)
-These "co-interventions" can confound- you must account for them statistically but that might violate the intention to treat principle. So consider giving the co-intervention to everyone...
-Equivalence Trials - compare intervention to an already standard Tx, and look for advantages of the new Tx- cost, freq of adm, safety..., (here goal may be to accept the Null hypoth...), but thus failing to find a signif diff may be bc you don't have enough power and you need more pts, more outcomes, more precise measurements...
5) Ensure Follow Up & Protocol Adherence
-low f/u, don't adhere to protocol, or many don't get the intervention--> lose power/get bias
-Thus, choose an intervention that's easy to apply/take and well tolerated
-Include provisions that will enhance adherence (e.g. instructions to take it at certain time, give pill box..)
-Include provisions to measure adherence to intervention- self report, pill count, auto pill dispensers, serum levels.
-Improve adherence by explaining requirements to subjects before getting consent, ensure patients don't wait w each visit so it's more convenient, call them before each visit to remind them, pay for travel...
-The problem w lost to f/u is that if those patients still have dz (ie little Rx effect), than you can't really claim that the Rx works even if the one's who did f/u had good results bc there may be many pts in the lost to f/u ppl w a poor outcome. Espec if the outcome variable is uncommon, then just a few + cases in the lost to f/u ppl would skew results a lot.
-Intention to Treat: Even if subjects violate the protocol or stop taking the intervention, you must still count their outcomes as if they used the intervention. This way, you will include patients who in reality might not be compliant bc of side effects etc. If you do not continue to follow pts who stop the intervention, the rate of events in the Tx group will be biased downward...
-Design the study to make it as easy as possible for ppl to complete long term f/u (e.g. by telephone...)
-In addition to regular techniques to improve f/u, you can also:
-Ask subjects to attend a screening visit before randomization to exclude ppl who cant complete f/u
-Run in period- give everyone a placebo initially, and then after a predetermined time start giving some the intervention, only if they have been compliant..., so that you've already excluded noncompliant ppl.
-Could also give everyone the intervention first, then select the people who are compliant and belong to a subgroup that have a good response/don't have major side effects. This will yield a narrower study, but one more generalizable to a certain subgroup of patients that the MD feels would better respond to the intervention.
-Risk: it will underestimate side effects...
6) Measure the Outcome
-pick an outcome to measure: balance clinical relevance, feasibility, and cost!
-Clinical Outcomes vs. Surrogate Outcomes
-clinical outcome- death, MI, stroke, fear, vs surrogate- cholest level, wbc count, etc, the latter may not be clinically relevant.
-Surrogate outcomes must be biologically plausible. Even if you show causation/assoc w the S.O., you didn't show caus/assoc to the truly clinically relevant outcome...
-Statistical Characteristics
-must be able to assess the outcome accurately & precisely
-continuous variables will have more power than dichotomous ones
-but consider that sometimes the continuous variable isn't as relevant, e.g. birth weight should probably be categorized into <1000, 1000-1500, 1500-2500, etc
-If you use a dichotomous outcome, then the power depends more on the # of events, not the # of subjects.
-Number of outcome variables
-Make sure you have a single primary end point, with other variables as secondary. Use the 1y to determine size. This gets around the problems of interpreting tests of multiple hypotheses (see above)
-Adjudication of Outcomes
-self reported outcomes are not always accurate and should be confirmed if possible
-Ensure you set specific, clear criteria for the outcome, and have experts review the data to determine if it meets criteria for dx
-Ensure data collectors are blinded
-Adverse Effects
-You should assess for adverse events as well
-Initially ask broad open ended questions about all types of events, bc initially you won't know what xx to look out for.
7) Analyze the Results
-Dichotomous outcome--> compare proportions in ea grp w chi squared test
-Continuous outcome--> t-test
-If duration of f/u is different, you must use a survival time method
-Cox proportional hazards analysis- also adjust for chance maldistrib of baseline confounders
-see Chapter 11, ref 7 for details
-Intention to Treat Analysis
-What do you do with cross-overs? -Cross-over- a subject assigned to one grp who went to the other
-ITT--> compare pts in the group they were initially assigned, regardless of whether they got Tx
==> possibly underestimate outcome, but better than having biased results...
-Per Protocol Analysis
= analyze only subjects who complied w their group, (e.g. took >80% of the Rx/placebo)
-BUT, this is problematic bc those who complied are different fr those who didn't--> not a representative sample of all pts (just ones likely to be compliant (?diff disease class))--> less valid
-you can do it both ways & compare, if similar you can feel confident in the results, if diff, use ITT
-Subgroup Analysis
-comparisons between randomized groups in a subset of the trial cohort
-easy to misuse, can --> wrong conclusion
-if used well--> great info
-must use measures to describe subgroup that were define before start, to ensure randomization
-Problems w subgroups:
-smaller n --> limit power to see diffs
-if you are looking at many subgroups, you are bound to find a stat signif diff, by chance!
-You should pick which subgroups to analyze ahead of time, and report how many subgroups you checked.
-Monitoring Clinical Trials
-ensure subjects are safe, not denied beneficial Tx
-don't continue the trial if the research question can't possibly be answered
-stop if the intervention seems to have harm>benefit
-if there is clear benefit>>harm, it might be unethical to continue
-each time you monitor the data along the way, you introduce the effect of chance, so you must use an alpha with total of 0.05 for all the repeat tests you do as you monitor (!)
-Alternatives to Randomized Blinded Trials
-Factorial Design
-Answer 2 unrelated questions in 1 study (e.g. ASA on MI, b-carotene on cancer)
-randomize to 4 groups, compare each of the 4 parts...
-xx- possible interaction of both interventions on the outcomes
-Randomization of Matched Pairs
-to balance baseline confounding variables
-Group or Cluster Randomization
-randomly assign the cluster/group to a study group not the individual (e.g. 1 bball team gets x, another gets y)
-Nonrandomized Between-Group Design
-much less well done, must acct for baseline diffs ut cant well...,
-note, "every other" way to assign group is not random, allows investigator to tamper...
-Within-Group Design
-no randomization
-Time-Series Design- each subject is his own control
-xx- no concurrent control group, so appearance of efficacy may be bc of learning effects- the subject does better bc he learned fr the baseline test; also bc of regression to the mean- subjects selected for initial high BP, now have Nl BP but just bc of normal BP variations...; also bc of secular trends- e.g. less URI at f/u bc it is no longer winter
--> can deal w this by repeated start/stop of intervention, but only good if the effect is rapid and reversible
-Cross-Over Design- subjects switch groups in middle of study
-minimizes confounders bc each pt is his wn ctrl, and increases power of trial bc need less #
-but you double the duration of the study, add complexity to analysis
-Carryover Effects- residual influences of the initial intervention even after stop Tx
--> use a washout period- no placebo or intervention...
Equipoise- the uncertainty as to whether the intervention will be of benefit or harm, which forms teh ethical basis for a clinical trial. If lost, then you can't ethically do the trial.
Chapter 12 - Designing Studies of Medical Tests
-usually use descriptive studies, rather than clinical trials
-Goal: can the test be used in clinical practice?
-You don't test for statistical signif (is the test better than chance alone) bc that's not enough.
-Instead, use descriptive statistics (e.g. CI, Sn, Sp, etc)
Is it Useful?
-Reproducible + accurate + feasible + effects clinical decision/outcome
-Issues in studying Dx Tests:
-Spectrum of Disease Severity
-Spectrum Bias- spectrum of dz in the samples doesn't rep that of the population. If the patients in the sample have more severe dz than gen pop or control patients are healthier then gen pop, you might overestimate Sn of the test, and vice versa.
--> must ensure subjects have similar dz severity spectrum as gen pop
-Sources of Variation, Generalizability, & the Sampling Scheme
-Some tests d/o the performer for results (e.g. WBC doesn't matter, psych test does matter, mammogram film interpretation does matter) --> you should sample fr diff interpreters/institutions if there is variable among individual persons/institutions...--> improves generalizability
-Blinding
-If the test requires a judgement (CXR interp)- the person should be blinded to the history/physical etc. And the person doing the gold standard should not know the results of the dx test being evaluated
Is it Reproducible?
-Intraobserver variability- lack of reproducibility bn observer and himself upon repeating the test
(e.g. he sees same CXR now and later today, does he report same result?)
-Interobserver variability- lack of reproducibility bn >1 observer
-Don't need a gold standard to assess reproducibility.
-Note that just bc observers agree, doesn't mean they are correct!
--> Intra/erobserver variability addresses precision not accuracy!
-Design
-Cross-sectional design- compare results fr diff observers or on a sample of pts/specimns @diff times
-ensure you describe exactly how a specimen was obtained, bc this can differ (e.g. a biopsy)
-it might be necessary to focus on one step in the process of doing the test, to see if there's variation...
-Analysis
-Categorical Variables
-Concordance rates- the % time observers agree exactly; but if there is >2 categories or observations arent evenly distributed among categories (a lot more abNl vs Nl results), then this can be hard to interpret bc it doesnt acct for the agreement that would be expected by chance alone. thus...K
-Kappa (K)- measures extent of agreement, beyond what chance would do. Range fr -1 to 1, with 0 being exactly what's expected by chance, and 1 being perfect agreement.
-Continuous Variables
-If you are comparing measurements fr 2 diff machines (but check same thing), then look at the difference bn the paired measurements, and the distribution differences.
-You can focus on what's clinically important by saying that x amt of difference bn the 2 items (e.g. 1inch) is clinically important, and then discussing how often they differed by that amt or more
-If you are looking at a large group of diff technicians/labs/machines, then use Coeffecient of Variation- the standard deviation of he results on a single specimen, divided by the mean, expressed as a %. If normally distributed results (ie bell shaped curve), then about 95% of results on diff machines will be within 2SD's of teh mean.
-e.g. if the coeff of var was 3%, then the SD of a set of measurements with a mean of 100 would be 3 (because 3/100 = 0.03), and so 95% of results (ie 2SD's) will be within 93 and 106. (mean +/-2SD)
Is it Accurate?
-does the test give the right answer?
-use a gold standard, or a minimum number of signs/sx of a dz
-Design
-Sampling
-DIagnostic tests- cross-sectional or case-control design
-case-controls are easily biased bc u know the dx status, so only use if it's a rare dz
-Tandem testing- a varient of cross sectional design, this compares 2 tests w ea other
-do both tests in question on all pts, then do the gold standard on ones that are + in one but not the other test. --> allows you to decide which test is more accurate w/o doing gold stndrd on all
-Prognostic tests- cohort design
-prospective or retrospective cohort
-Prospective- do test, then follow over time to see outcome
-Retrospective- use banked specimens, use new test and then compare to outcomes
-hard to find the right mix of pts- not too obviously +, not too healthy in negative pts...
-Predictor variables- better to use ordinal or continuous test outcome, rather than + or neg, bc most tests have a boderline range, and are more predictive of +/- if they are very Nl/abNl...
-Outcome Variables
-usually is presence/absence of disease; best to blind evaluator to the gold standard result...
-can't always do gold standard on pts (bc very invasive), so do it on those w +test, then find another way to see if the pts who were negative had any false negatives...
-for prognostic tests, the outcome isnt whether they have the dz, it is mortality, xx, etc, blinding important...
-Analysis
-Sn & Sp- for dichotomous outcomes compared to gold standard
-ROC Curves (receiver operator characteristic curves)- for ordinal or continuous results- several values of Sn and Sp, depending on the cutoff pt chosen to be a positive test. A curve shows the trade off bn Sp and Sn
-Pick several cutoff pts and determine Sn and Sp at ea pt
-Graph the Sn (=true+) on Y axis, and 1-Sp (false+) on X axis.
-Ideal test--> reach upper left corner (100% Sn and 100% Sp).
-Worthless test--> reach lower right corner (0%Sn and 0%Sp)
-Area Under ROC Curve- range fr 0.5 (useless test) to 1 (perfect test) summarizes overal accuracy, thus good for comparing accuracy of several tests
-Likelihood Ratios- better than Sn, Sp, or ROC to describe outcomes w continuous/ordinal result
--> lets you take advantage of the info in teh test.
=likelihood of the result if someone has the dz vs. likelihood of the result if someone doesnt have it
-LR = R(result|Disease)/P(Result|No Dz)
-It answers the question: how much more likely is pt to actually have the dz if the result is positive?
-Higher LR--> better the test result to rule In the Dx (100 very high)
-Lower LR--> better the test result to rule Out the Dx (0 very low)
-Relative Risks and Risk Differences
-for studies assessing Px tests or R/F's for dz
-Do the test, then follow over time for actual outcome, then can calculate relative risk & risk diff
-If you vary in how long you f/u each patient, then best to do a survival analysis technique...
-If you only f/u for a short pd of time, (e.g. survive to discharge), then use Sn and Sp but these aren't really good for Px tests bc they really describe test's ability to predict prevalence, not incidence.
Studies on Effect of Test Result on Clinical Decisions
-Diagnostic Yield Studies
-When MD orders it, how often is the the test abNl?
-Can a test result be predicted fr other info avail at the same time?
-What happens to pts w abNl results? Do they benefit fr the results?
--> estimate the %+ tests among pts w a particular indication to get the test
-often ok to assume: if pt got the test, he is more likely to have + result than pt not ordered test, and assume that pt w neg test don't benefit fr the test. ==> if the observed yield of positive results on a test is low, one can conclude that the test is unlikely to be useful
-Before/After Studies of Clinical Decision Making
-compare what physicians do/say they will do before the test is done to what they do after the test...
Analysis
-% positive tests, or % that lead to change, or % lead to improved outcome are calculated w 95%CI
-Estimate effort/yield ratio
Studies of Feasibility, Costs, Risks of Test
-descriptive...
...
Studies of the Effect of Testing on Outcomes
-e.g. does prostrate screening affect outcome...
...
Pitfalls of Analysis of Dx Tests
-Small sample size
-Inappropriate exclusion
-Institution specific results
-Dropping borderline/uninterpretable results
...
Chapter 13 - Research Using Existing Data: Secondary Data Analysis, Ancillary Studies, & Systematic Reviews
-Secondary Data Analysis- use existing data to investigate a question other than the Q the data was collected for
-Individual Data Sets
-separate info avail for each pt, collected fr other studies, med records, healthcare bills, etc
-Previous research study data vs national/regional data sets
-Aggregate Data Sets
-info avail for groups of subjects (not individuals) e.g. death rate fr a cancer in ea. state
-studies using this data are called ecologic studies
-ecologic fallacy- associations are very susceptible to confounding, bc groups differ in many ways
-can start w a Q and find the data, or start w the data and ask a Q...
......
-Ancillary studies- add measurements of a small number of variables to study, often in a subset of the subjects, to answer a separate Q
-....
-Systemic reviews- combine results of prior studies
-Meta-analysis- the systematic review of the data
-clear question
-comprehensive/unbiased lit review to ID completed studies
-define inclusion/exclusion criteria (of studies)
-uniform/unbiased abstraction of characteristics and findings of ea study
-clear/uniform presentation of data from each study
-calculate the summary estimate of effect and CI based on the findings of all eligible studies
-assess the heterogeneity of the findings of the individual studies
-assess the potential publication bias
-subgroup and Sn analyses
...