Evidence Based Medicine
Definition:
Evidence Based Medicine
It is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.
It involves statistics and other complicated math.
It uses mathematical estimates of risk vs benefits, based on high-quality research on population samples, and applies these in making clinical decisions in the investigations, diagnosis, and managements of individual patients.
Gets information derived from research on populations and uses these to make decisions on individuals.
Research: It is the focused, systematic inquiry aimed to generate new knowledge.
Reasons why scientific papers get rejected from publications (by referees):
Study did not address an important scientific issue.
Study was not original and someone else has already done the same or similar study.
Study did not actually test the author's hypothesis.
Study design should have been different.
Study had practical difficulties (recruiting participants), authors took short cuts, compromised on the original study protocol.
Study had a small sample size.
Study was not controlled adequately.
Study used incorrect or inappropriate statistical analysis.
Study data lead the authors to draw unjustified conclusions.
Study had significant conflict of interest (financial benefit to author or study sponsors), insufficient safeguards against bias.
Study was written in an incomprehensible manner.
Three preliminary questions you need to ask yourself, when looking at a study.
What was the research question and why was the study needed?
Background information regarding the research
Brief review of the published literature on the research question
Hypotheticodeductive approach (setting up a falsifiable hypotheses which one the proceeds to test). The main research hypothesis is presented in negative - null hypothesis, and then the authors try to disprove the null hypothesis. They have to demonstrate a difference between the two arms of their study, after they state their null hypothesis, "let's assulme there's no difference; now let's try to disprove that theory".
Methodology
What was the research design?
Is it primary to secondary study
(see research design, terms, and other descriptors)
Was the research design appropriate to the questions?
Was the study an RCT (randomized-control trial) or not.
If RCT, why?
If not RCT, why not?
What broad field of research the study covers?
Was the study appropriate to this question?
5 general questions to ask in evaluating reports:
Who says so?
Has the researcher received a grant from a pharmaceutical company with a financial interest in the result? Medical conferences and papers commonly require a speaker/writer to reveal what pharmaceutical company affiliations the speaker/author has, to avoid questions of conflict of interest.
Does the researcher have a financial interest in selling a product related to the research results?
Is the report from a competing the lab, which would get better press by disapproving a known positive result rather than duplicating it?
Are results published in a reputable, peer-reviewed journal?
Are previous results correctly quoted? Are they anecdotal? Were they published in a reputable peer-reviewed journals?
Would publishing a statistically nonsignificant result adversely affect the researcher's reputation or career?
Has the study been confirmed by other labs?
How does the researcher know?
Is there bias in the sampling process? Bias in measurement? Bias in the analysis and interpretation of data?
Is there adequate sample size (often apparent on a quick look at the data)? All samples may be randomly chosen, the natural variability of the population may make two small samples markedly different from one another, with different therapeutic results.
Were survey questions biased? Did people answer honestly? Have only certain people responded to the survey?
Is the study double-blind?
Does the study combine data are from other experiments that may differ significantly in design from the present one?
Is the effect big enough to be clinically, not just statistically, significant?
Does the study just involve men (or just women)? Then it cannot suggest that it applies to women (or men) and children as well.
Does the paper containing the words "most studies show…"'? This may not be meaningful since journal publish positive results over negative results.
Doest the paper use the vague and unclear phrase "where appropriate"?
Was the correct statistical test used? Was an unpaired t-test use where a paired t-test was indicated? Were equal sample sizes used for the paired t-test? Were mutiple t-tests used where an ANOVA would be mor appropriate? Was a chi-square test used when cell counts were less than 5, in which case chi-square is no longer valid test? Reviewer may not be biostatistician's and may have overlooked and incorrectly applied test.
Has the investigator confused correlation with cause? Heart attacks are correlated with increased blood levels of troponin. But troponin does not cause heart attacks; heart attacks cause elevated troponin, which leaks out of damaged cardiac cells. Good handwriting and shoe size are correlated, but one does not cause the other. They are both products of growing up and maturing. If there is a correlation between A and B, it does not necessarily mean that it A causes B. B could cause A, or a third factor C could cause both A and B.
In the Hawthorne effect, patient may strive to perform better with they are being observed; the results may not be valid.
What's missing?
Does the paper indicate the sample size?
Does the paper describe the method of randomization?
By "randomization" is the researcher referring to random selection from the population, or is the researcher referring to random division in two groups after the random selection from the population is done?
Were the observation normally distributed (Gaussian)? Independent of one another?
Were there adequate controls?
Does the paper define what is meant by "average" (mean, median, or mode)?
Does the paper report the p-value? A confidence interval?
Does the paper state whether a one-tailed or a two-tailed t-test was used?
Did the researchers switch from a two-tailed test to a one-tailed test after the experiment was completed without telling anyone, so that the results would appear significant?
Is the power stated?
Is an affect size given?
Did the researcher keep searching, after the project was completed, for a statistical test that would provide his/her preconceived desired p < 0.05?
How many hypotheses did the researchers look at? Is the study "p-ing all over the place"?
Were the hypotheses generated before the study was completed (the correct way) or after (the wrong way)?
Does the paper mention all the statistical tests that were done on the data?
Did the research add more samples after finding that the results were not significant? Sample size needs to be established prior to the study.
Does the paper provide raw figures, not just percentages? If 3% of the population got the disease one year and 6% got it the next year, is this 3% rise or 100% rise? If 20% of adults smoked in 1980 and 10% smoked in 2000, is this a 10% decrease or a 50% decrease? Just saying "a 50% decrease" is uninterpretable. If nasal breathing strep is "60% stronger," stronger than what? The previous version of the strip? Than a copetitior's strip? Than a Band-Aid? Stronger in adhesion or stronger in spring? Is "60% stronger" compared with something that is already strong or with something that is incredibly weak, thereby making the claim less impressive?
Has the researcher dismissed negative data, suggest trial runs that did not work, or outliers? You may not know.
Has survival rate increased because of the treatment or because of early detection?
Pooled data can overlook important information:
"The average person has one ovary and one testicle."
"There has been little progress in finding a cure for cancer." While there may appear to be little progress when lumping all cancers on a whole, there has been considerable progress for certain cancers.
Simpson's Paradox (Yule-Simpson): Data combined from several groups may produce a paradoxical result. Example: if two studies show drug B was better than drug A and when the results are combined with a third study, drug A now appears to be better than drug B!
Has the paper reported withdraws from the study? Example: Dr. A surgical operations have a much higher mortality than those of Dr. B. Should we use Dr. B to be superior? Not necessarily. Dr. A may be far better more reputable surgeon which is why most difficult cases, those with a poor prognosis to begin with, are referred to her. Also, a bad review by only one disgruntled patient may not reflect the reviews of the majority, who have not volunteered an opinion.
A report some years ago stated that over 90% of eyeglass prescriptions were incorrect. Incorrect by how much? A clinically nonsignificant amount? Many ophthalmologists purposefully undercorrect high astigmatism, since perfect correction can sometimes be uncomfortable for the patient.
Are graphics missing baselines? Are graph axes distorted?
Are there possible cause of the patients' improvement other than the one favored by the author? A decrease in pain could also be due to placebo effect; laying of the hands; desire of the patient to please the therapist with a favorable report; alternative treatment that the patient was taking at the same time; spontaneous remissions (most diseases get better by themselves).
Has a therapist hyped successful cases but said little about bad results (frequent criticism in parapsychology)?
Does the study just present relative risk (or relative risk reduction), rather than the more meaningful absolute risk (or absolute risk reduction)?
Does the study state the important statistics of number needed to treat (NNT) and number needed to harm (NNH)?
Did someone change the subject?
If the investigator's subject is the increase in the number of cases, does the investigator instead provide data from increase in reported cases or increase in diagnosed cases, which are not necessarily the same?
If the subject is improvement in survival with an experimental drug, has the investigator instead provided data for an improvement in a surrogate veritable (example reduction in cardiac arrhythmia rather than sudden death)?
Has statistical significance been confused with clinical significance?
Has statistical abnormality been confused with clinical abnormality?
If results are statistically nonsignificant, is this confused with "no effect"? A drug may have a significant effect, but it is just not seen because sample size is too small.
Has a best fit line been extended beyond the data in regression analysis? If drinking 750 mL of alcohol each day damages the entire brain, it does not mean that drinking 30 mL/day damages 4% (30/750) of the brain. If doubling the dosage results in twice the improvement, it does not mean that quadrupling the dosage results in quadruple improvement. As mother used to say, "too much of anything is no good."
Does it make sense?
Drug A is better than drug B with p = 0.0529673. Really? Can one be that accurate about a p-value, or any other biology related percentage?
In the report stating that there are 20 million people in the United States with prostate cancer that would be about 1 case for every male in the 65 and older age group.
Areport indicates that a patient with multiple sclerosis improved temporarily on a new drug. Shall we prescribe the drug, when we know that multiple sclerosis is a disease marked by remissions and exacerbations?
Is the follow-up long enough? Examples:
In the 1890s a report was published of an eye transplant between a rapid and a human. The published report indicated that 17 days after the surgery the patient "was doing well," to the acclaim of the press. Is this enough time to evaluate such a procedure? Too bad investigated did not wait another few weeks. (Four other contemporary surgeons jumped on the bandwagon and tried the same procedure.) Enough time has to be given to evaluate a treatment. Many initially approved treatments are discontinued after adverse effects are noted over time.
Surgical and medical approaches to coronary insufficiency are compared, with a 1-year survival follow-up. The medical approach does better. Should we recommend the medical approach? Not necessarily; long-term follow-up may favor the surgical approach, which initially may have adverse perioperative results.
A cholesterol-lowering drug shows a 1/3rd reduction in heart attacks as compared with no treatment, over a 5-year period. With the NNT (number needed to treat) of 50 (50 people would have to be treated to prevent a heart attack in 1 person). Is 5 years a long enough time to assess the NNT, or for that matter, the NNH (number needed to harm)?
Infant mortality is less in homes where parents use iPad's. Shall parents go out and buy iPads? Home with iPads are likely to be relatively prosperous and able to afford good medical care.
A probability cannot be less than 0.
Format for papers written for medical journals
(IMRAD)
I: Introductions
M: Methods
R: Results
A: and
D: Discussion
Users Guides to the Medical Literature
Research design, terms and other descriptors
Primary study: reporting research first-hand, a.k.a empirical studies
4 categories of primary study:
Laboratory experiments: an expirement was performed on a subject in artificial and controlled surroundings.
Clinic trials: an experiment in which an intervention (drug, education program) is offerred to a group of patients who are then followed up to see what happens to them.
Surveys: something is measured in a group of patients, or some other sample of individuals. It is usually in the form of questionnaire and is done to measure people's opinions, attitudes and self-reported behaviors.
Organisational case studies: in which the researcher tells a story which tries to capture the complexity of a change effort (e.g. an attempt to implement evidence).
Overviews: which may be
Non-systematic reviews, which basically summarizes primary studies
Systematic reviews, which summarizes primary studies in a rigorous, transparent, and auditable (checkable) fashion
Meta-analyses, which integrates the numerical data from more than one study.
Guidelines: draw conclusions from primary studies about how the clinicians should be behaving
Decision analyses: using the result of primary studies to generate probability trees to be used by both health professionals and patients in making choices about clinical management.
Economic analyses: using the results of primary studies to say whether a particular course of action is a good use of resources.
Secondary study: summaries and conclusions drawn from primary studies
Overviews
Non-systemic reviews that summarizes primary studies
Systematic reviews, which study primary studies in a rigorous, transparent, and checkable manner.
Meta-anlyses: integration of data from more than one study.
Terms used to describe study design features of clinical research studies
Parallel group comparison is when two groups are given different treatments, after they are entered in the study at the same time. In this case, results are analysed by comparison of the groups.
Parallel (matched) comparison is when participants receive different treatments matched to balance potential confounding variables such as age and sex. Results are analyzed in terms of differences between participant pairs.
Within participant comparison is when participants are assessed before and after an intervention and results are analysed in terms of within participant changes.
Single blind is when participant do not know which treatment they are receiving.
Double blind is when participants and investigators do not know which treatment the former is receiving.
Crossover is when each participant received both the intervention and control treatments (in random order), often separated by a washout period or no treatment.
Placebo controlled is when control participants receive a placebo (inactive pill) which should look and taste the same as the active pill. Placebo (sham) operations may also be used in trials of surgery.
Factorial design is when a study that permits investigation of the effects (both separately and combined) of more than one independent variable on a given outcome (e.g. a 2 x 2 factorial design tested the effects of placebo, aspirin alone, streptokinase alone or aspirin + streptokinase in acute MI)
Broad fields of research.
Therapy: Is the study testing the efficacy of drug treatments, surgical procedures, other intervention, service delivery: Preferred study design is RCT.
Diagnosis: Is the study trying to show that a new diagnostic test is valid (trustworthy?) and reliable (Would we get the same result every time?). Preferred study design is cross-sectional survey.
Screening: Is the study trying to demonstrate the valued of a screening test. Peferred study design is cross-sectional survey
Prognosis: Is the study design trying to determine what is likely to happen to someone whose disease is picked up at an early stage. Peferred study design is longitudinal survey
Causation: Is the study design trying to determine if a putative harmful agent, such as environmental pollutant, is related to the development of illness. Preferred study design is cohort or case-control study, depending on how rare the disease is, but case reports may also provide crucial information.
Psychometric studies: measuring attitudes beliefs or preferences, often about the nature of illness or its treatment.
PICO - ST format for therapeutic, diagnostic, prognostic information
P: Patient or population
I: Intervention (index)
C: Comparator intervention (gold standard)
O: Outcome
S: Setting
T: Time frame
Confirmation bias is when a person selectively seeks out information that supports a belief or idea that they already have, thus "confirming" their existing beliefs. However, information that supports the contrary is not taken into consideration, dismissed, or selectively ignored. These beliefs are largely derived from stereotypes and overgeneralizations that are combined with faulty deductive logic, most commonly about particular demographic groups.
Anchoring Bias. When people are trying to make a decision, they often use an anchor or focal point as a reference or starting point. Psychologists have found that people have a tendency to rely too heavily on the very first piece of information they learn, which can have a serious impact on the decision they end up making. In psychology, this type of cognitive bias is known as the anchoring bias or anchoring effect.
Clinical Trial Basics
Research starts in the laboratory where scientists investigate the cellular processes of a disease, hoping to better understand it and possibly find targets for treatment. This type of research led to the identification of some of the antibodies that play a role in MG. During this phase, scientists can observe the effects of potential therapies on cells in human tissue samples and then in animals.
If a treatment—which may include a drug, device, or surgical procedure—has promise, the next step is to test it in a clinical trial, which requires approval from the U.S. Food and Drug Administration (FDA), and oversight from the Institutional Review Board, a group that ensures that the rights, welfare, and privacy of clinical trial participants are protected.
Clinical trials happen in three phases. Phase 1 evaluates safety and dosage and involves a small group of about 20 to 80 people. Phase 2 includes more people, typically a few hundred, to determine if the treatment is effective as well as safe. Most phase 3 trials are randomly assigned, placebo-controlled and involve a few hundred to thousands of participants. For rare diseases like myasthenia gravis, fewer people are included in phase 2 and 3 clinical trials, which are designed to compare results in people who are given the treatment with those who are given a placebo. In these trials, neither the researchers nor the participants know who is getting the treatment or the placebo.
“We don't know which treatments will fail or succeed until we actually do clinical research,” says Henry Kaminski, MD, FAAN, endowed professor of neurology at George Washington University School of Medicine & Health Sciences in Washington, D.C. “Then we can say with confidence, ‘We understand that this is going to help you, and these are the potential complications.’”
Patient registries
When medications for rare diseases come up for approval, there is often only limited evidence available on its long-term effects and safety, and conducting randomized investigations to deliver such evidence is often impossible. Therefore, the only way to generate additional evidence is to collect and analyze real-world data via high-quality, well-monitored patient registries that attempt to avoid bias, so that they provide meaningful results.