Scientific Studies in Medicine

Clinical and Basic Science

"In theory, there is no difference between theory and practice. But, in practice, there is." Yogi Berra

So far, we have learned that the best explanatory models for nature are scientific theories derived from the scientific method. In medical science, theories are built from ideas inferred from existing knowledge, basic science and new observations. Scientific knowledge is tested through clinical studies.

Clinical research in the medical sciences often differs from the tightly controlled experiments in the basic sciences. The ideas that spawn theories in medical science may start in the laboratories of the basic sciences, but they rarely end there (at least, they shouldn't end there). Human trials are usually carried out in human settings such as medical clinics, hospitals and everyday life. One should be cautious about extrapolating information directly from test tube to the exam room.

In practice, we need clinical research.

The Need for Different Types of Studies

In the Cognitive Bias section, we learned that medical science can make little progress unless attempts are made to minimize biases. Thus, clinical research should focus on controlling for as many confounding variables as possible except for the variable in question. Confounding variables include patient demographics such as age, sex, socioeconomic status, medical comorbidities, lifestyle, culture and others. Confounding variables also include pre-existing beliefs, desires and expectations of both patient and doctor.

An entire population can rarely be studied. Good clinical trials may use randomly selected samples of a given population. Responsible researchers must consider the quality of the randomization process as well as the sample size with respect to the problem being studied.

A hierarchy of reliability among the different kinds of clinical studies has developed based mainly on the ability to control for confounding variables. Some types of studies are inherently better than others at doing this. The randomized controlled trial (RCT) has become known as a gold standard for testing the effectiveness of treatments. Clever ways have been devised for minimizing biases and other confounding variables. However, some types of problems are impractical to test in a RCT. Some problems require observation of people within certain environments or situations, or a careful retrospective of existing data, care being taken to control for variables as best as possible.

Studying Treatment Effects

RCT's are considered to be the best experimental design for studying the effectiveness (efficacy) and the side-effects of a treatment. They are prospective so that confounding variables can be controlled for before the treatment is started. Study participants are selected to represent a population that is relevant to the treatment being studied. The participants are randomly assigned into groups that will either receive the new treatment (treatment group) or receive an established treatment (control group). For problems that have no established treatment, the control group may receive a placebo. Sometimes, people are allocated into 3 or more groups.

RCT's work best when they are 'blinded'. Patients should be unaware if they are receiving the test treatment, established treatment or placebo. Without blinding, the patients' biases may influence their perceptions and expectations of treatment. Unblinded studies cannot control properly for random variables and placebo effects.

In addition to blinding the patients, the best RCT's blind the doctors as well (double-blind randomized trial). Both patients and examiners may have a vested interest in the particular treatment. These interests may bias the results. When neither the patient nor the examiner are aware of the treatment, most of the factors external to the actual treatment may be controlled for.

Doctors can influence the results according to their own biases.

Single-blinded studies only blind the participant. The study administrator is aware of which treatment the patient is receiving as they participate. This introduces a major (yet subconscious) source of bias into the trial. If the study administrator has an interest in the particular treatment, then he/she may give the patient subtle cues or prompts that influence the patient's response to the treatment. Single-blind studies are inherently flawed due to this bias (see The Clever Hans Effect).

Studying Risk Factors and Environmental Effects

These are done to study the effects of factors other than treatments. For instance, the harmful effects of smoking were discovered through observational studies. It is considered unethical to ask patients to be exposed to a suspected pathogen or risk.

Cohort studies

Cohort studies are observational studies designed to study the effects of particular risk factors. We must observe people who have been or who are being exposed. An exposed study group should be matched as closely as possible to a non-exposed control group to minimize the effects of irrelevant variables. They may be prospective like a RCT, or retrospective in nature. Retrospective studies are passive in that the time period of the study has already passed and information is obtained through written records or patient recall.

Case-control studies

Case-control studies are observational studies designed to uncover the risk factors for particular conditions. Patients with a particular condition (the 'cases') are matched with similar people without the condition (the 'controls'). They are mainly retrospective in nature in that the study group already has the condition. By comparing the two groups, differences may emerge that are correlated with the disease. As we know, correlation does not necessarily mean causation. Suspected risk factors should then be tested with a well-designed cohort study.

Cigarette smoking was identified as a potential risk factor for lung cancer through the use of the case-control study. Smoking as a cause of lung cancer was further confirmed through the use of the cohort study.

Determining Prevalence of Risks and Diseases

Cross-sectional studies are done to determine the demographics and prevalence of conditions and risk factors. These studies take a certain population (or a random sample of a population) and simply count up the factors in question. They look at a population at a defined point in time.

Longitudinal studies are similar to Cross-sectional studies, but look at a factor in a population at different periods of time. In doing so, we can determine changes in exposures or disease rates over time.

Clinical Trial Phases

• phase I: safety, tolerability, pharmokinetics, pharmodynamics. Usually involves ramping doses.

• phase II: expanded version of phase I, IIa for dosing, IIb for efficacy. When drugs fail, they usually fail in phase II.

• phase III: definitive, hence expensive assessment of efficacy. typically double-blind. Necessary for a regulatory submission.

• phase IV: post-launch surveillance. Rarer adverse effects detected here.

Hierarchy of Evidence in Medical Science

The U.S. Preventive Service Task Force defined a hierarchy of reliability for clinical studies and evidence. Of the types of study designs listed above, the rank levels from most to least reliable have been classified as follows:

I Randomized controlled trials (double blinded > single blinded)

II-1 Controlled trials without randomization

II-2 Cohort and Case Control studies

III Expert opinion, and case reports (anecdotes)

Note that the least reliable evidence comes from opinion and anecdotes. Remember from the Promoting Critical Thinking in Medicine section, Mark Crislip, MD identified some basic errors in medical thinking: "a reliance on anecdotes, using sub-optimal studies as evidence, mistaking a gobbet of basic science as a meaningful clinical application, and not realizing the warping effect of confirmation bias". He also identified the three most dangerous words in medicine...'In my experience'.

Similarly, the Center for Evidence Based Medicine ranks evidence as:

1a. Systematic review of randomized controlled trials

1b. Individual randomized controlled trials

2a. Systematic review of cohort studies

2b. Individual cohort studies

3a. Systematic review of case control studies

3b. Individual case control studies

4. Case series

5. Expert opinion without explicit critical appraisal,

or based on physiology, bench research or "first principles"

This hierarchy presumes that the studies considered are of decent quality and were well designed. A poor study can be worse than useless. It can be wrong. Edward L. Hannan, PhD concluded in his article, "The design and ultimate conduct of the study is the principal criterion to consider, not the type of study per se".

Quality of Clinical Trials

The outcome of a study is only as good as the methodology of the study design and execution. Poor methodology may include inadequate sample size, inadequate randomization, inadequate blinding, poor control group matching, unwarranted conclusions, and others.

An article in the British Medical Journal, Assessing the Quality of Controlled Clinical Trials, pointed out the following:

"Empirical studies show that inadequate quality of trials may distort the results from systematic reviews and meta-analyses

The influence of the quality of included studies should routinely be examined in systematic reviews and meta-analyses

The use of summary scores from quality scales is problematic—it is preferable to examine the influence of key components of methodological quality individually

Based on empirical evidence and theoretical considerations, the generation and concealment of the allocation sequence, blinding, and handling of patient attrition in the analysis should always be assessed."

Combining Studies

Some questions have been studied numerous times, but definitive answers have not emerged. Sometimes, different studies produce different answers to the same question. If done properly, studies can be combined to create larger sample sizes and to tease out small effects by harnessing the power of large numbers. This is called a meta-analysis.

Generally, a meta-analysis combines data from similar RCTs. Sometimes, this can be a powerful tool when individual studies may be too small to reveal a small effect. They can also help to cement a conclusion that has been derived independently by several researchers. The results of many different studies can be compared graphically on a forest plot (also known as a 'blobbogram'). The forest plot to the right shows odds ratios and confidence intervals for a treatment as reported by numerous trials plotted on one graph. The vertical line at the number 1 mark represents the null hypothesis. The squares on the horizontal lines represent the odds ratios. Their sizes are proportional to the weight the individual study holds in the overall meta-anaylysis. The diamond at the bottom represents the odds ratio of all of the studies combined. Values to the left of the null line show positive effects of the treatment. Values to the right show detrimental effects. Values on the null line imply no effect.

A meta-analysis may be very misleading if studies with poor methodology are included in the analysis. They risk the 'garbage in, garbage out' problem. Studies that employ different methodology may be difficult to meaningfully combine.

Detecting Publication Bias

A funnel plot can illustrate publication bias when half of the funnel appears to be missing. In the funnel plot below, it appears that only studies with generally positive outcomes have been reported. These appear to the right of the graph. Generally negative studies would be expected to appear to the right. Even when a the subject in question is understood to be positive, one would expect there to be some smaller, poorer studies that show a null effect just by chance. These would show up on the opposite side of the funnel. The plot below demonstrates the lack of symmetry around the middle. One would suspect publication bias with such a graph.

Another potential problem is the potential for selection bias on the part of the researchers when determining which studies to include in the meta-analysis. If a researcher is biased toward a particular conclusion, he may be inclined to be more critical of disconfirming studies and less critical of confirming studies. Thus, a researcher may be more likely to exclude disconfirming studies due to poor quality than he would confirming studies.

We learned in the Cognitive Bias section that negative studies are less likely to be published due to publication bias. Thus, a meta-analysis may automatically be biased toward positive studies as it relies on published research. Publication bias is a big problem.

A tool for finding publication bias in the literature is called a 'funnel plot'. This plots the effects reported by individual studies against quality measures for medical studies. Generally, the high quality studies should gather around the true effect and the poorer quality studies should be distributed around this in a random distribution, on both the positive side and the negative side.

Real world examples of publication bias were presented in the Lancet in 2005 by Shang et al, Are the clinical effects of homoeopathy placebo effects? This illustrated a meta-analysis of 110 randomized, placebo-controlled trials (RCTs) of homeopathy. The funnel plot is illustrated to the right.

In this case, the dots to the left of the dotted lines represented positive studies in homeopathy. Surprisingly absent were dots populating the right side of the graph. Given that homeopathy has nearly zero plausibility, one should expect swaths of negative studies to populate the right side of the graph.

One should highly suspect publication bias in homeopathy trials.

Perhaps even more disturbing, Shang et al. also found publication bias to be strongly present in studies in the conventional medical literature. They tried to match the homeopathy studies with studies of conventional medicine. The funnel plot for these studies is illustrated below the homeopathy plot.

Publication bias is present across the board in the medical literature.

Skeptical Medicine gives pecial thanks to Neuroskeptic for pointing out these examples. Also as pointed out by Neuroskeptic, a potential solution for the problem of publication bias may lie in the mandatory registration of clinical trials before the trials are conducted. Thus, the 'missing' trials can be accounted for rather than being lost to the file drawer effect.

Making Sense of it All

All of the above types of clinical studies lead to a lot of information that needs to be synthesized. Not all types of studies lend themselves to meta-analysis and, as we have seen, meta-analyses have some inherent problems.

Systematic reviews are exhaustive reviews of the literature by experts in relevant fields. They have the potential for establishing definitive medical knowledge. While optimal standards of procedure for systematic reviews may have not yet been established, guidelines by major organizations have been published. Standards are important for minimizing the harmful influence of bias and poor quality research in the pool of medical knowledge. Hence, systematic reviews have become a standard measure for 'Evidence Based Medicine'.

The Cochrane Collaboration is a multinational and well-recognized organization of thousands of clinicians, scientists and statisticians. They publish systematic reviews on a growing number of medical topics, and also published their guidelines for conducting systematic reviews. The Center for Reviews and Dissemination published their guidelines as well.

Is and Ought Revisited

Scientific studies produce knowledge. How this knowledge is used is another issue.

Over the years, systems for rating recommendations based on clinical studies have been developed. Doctors can access this information when making clinical decisions with their patients. Below is the commonly used grading scale from the USPSTF.

Evidence Based Medicine Vs. Science Based Medicine

The current systematic review process has been criticized for not placing enough emphasis on the Bayesian approach; that is, putting new evidence in perspective with prior probability. Remember from the What is Science? section:

Bayesian probability essentially states the following:

The New Probability of a Theory is proportional to its Prior Probability x the Strength of New Evidence.

Let's take a rather obvious, if not silly, look at a problem from a Bayesian perspective. Suppose Dr. X comes up with the idea that inhaling Helium can be used to treat low back pain. Dr. X may design a small RCT in which patients with low back pain are treated by either inhaling helium or air from a tank. The study may show a mildly higher effect in the treatment group over the control group. The study may be reviewed and even criticized for its relatively small sample size and potential problem with blinding due to the different effects on voice between helium and air. However, the study is published in a small journal. Perhaps it is even replicated. A review of the literature would have to conclude that there may be something to this helium treatment. At worst, a review may be equivocal and state that there is not enough evidence for or against the use of helium for low back pain.

However, a Bayesian approach may look at the prior probability of the theory. Since it is a new theory, the prior probability would have to come from basic science. What do we know about helium and human physiology? Well, helium is inert. It does not participate in any chemical reactions. It does not interact with body chemistry. After considering the basic science, we can conclude that the prior probability of Dr. X's new idea is very low. In fact, we could conclude that it approaches zero. This would essentially nullify the probability of the idea even in light of the new "evidence."

The Bayesian approach to considering the probability of a theory would prevent us from wasting time and effort considering ideas that have little or no prior probability. Note that prior probability cannot technically be zero. It would take extraordinary evidence (unambiguous and reproducible results from well designed and executed studies) to overcome the hurdle of a low prior probability based on knowledge of the established basic and clinical sciences.

We will consider this topic in greater detail in the EBM vs. SBM section.


Doctors are faced with making clinical decisions. These decisions are informed by knowledge. We learned in this section that the knowledge base of medical science has different levels of reliability and a large potential for error. The skeptical doctor realizes that knowledge changes and must use the best knowledge available when making decisions. A firm grasp on the methods and interpretation of medical science is important. Only then can science-based decision-making take place.

The art of medicine is informed by the science of medicine.

John Byrne, M.D.

References and Links

"Guide to Biostatistics - MedPage Today." 2012.


"Glossary - Clinical Trials Terminology." 2011.


"Glossary of Common Site Terms -" 2007.


"Clinical study design - Wikipedia, the free encyclopedia." 2005.


"Meta-analysis - Wikipedia, the free encyclopedia." 2004.


"Systematic review - Wikipedia, the free encyclopedia." 2005.


"CEBM > About > What is EBM? > What is EBM?." 2007.


"Cochrane Handbook for Systematic Reviews of Interventions | The ..." 2010.


"Publication bias - Wikipedia, the free encyclopedia." 2004.


Concato, John, Nirav Shah, and Ralph I Horwitz. "Randomized, controlled trials, observational studies, and the hierarchy of research designs." New England Journal of Medicine 342.25 (2000): 1887-1892.


"4.15.11 RBM Report." 2011.


Stroup, Donna F et al. "Meta-analysis of observational studies in epidemiology." JAMA: the journal of the American Medical Association 283.15 (2000): 2008-2012.


Hannan, Edward L. "Randomized Clinical Trials and Observational StudiesGuidelines for Assessing Respective Strengths and Limitations." JACC: Cardiovascular Interventions 1.3 (2008): 211-217.


Concato, John, Nirav Shah, and Ralph I Horwitz. "Randomized, controlled trials, observational studies, and the hierarchy of research designs." New England Journal of Medicine 342.25 (2000): 1887-1892.


"Hierarchy of evidence - Wikipedia, the free encyclopedia." 2006.


Harbour, Robin, and Juliet Miller. "A new system for grading recommendations in evidence based guidelines." Bmj 323.7308 (2001): 334-336.


"Grade Definitions (USPSTF)." 2010.


Lewis, Steff, and Mike Clarke. "Forest plots: trying to see the wood and the trees." Bmj 322.7300 (2001): 1479-1480.


Shang, Aijing et al. "¿ Los efectos clínicos de la homeopatía son efectos placebo? Estudio comparativo de ensayos con control de placebo de la homeopatía y alopatía." Lancet 366 (2005): 726-32.


De Angelis, Catherine et al. "Clinical trial registration: a statement from the International Committee of Medical Journal Editors." New England Journal of Medicine 351.12 (2004): 1250-1251.


"Describing and interpreting the metodological ... - Biochemia Medica." 2010. <>