Use the Table of Contents below to quickly navigate this page!
On this page, you'll find my principles for designing great scientific graphs, based on my critical review of the data visualization literature as well as my years of experience creating data visualizations.
The reason I wrote this guide is simple: Scientific graph design is a science unto itself but one that natural scientists like me have, understandably, not been exposed to or considered very much, even though graphs are an essential science communication tool!
I believe that our research and our graph design deserve comparable rigor! Creating great graphs can be tough at times, but there’s consensus on how to do it well! I believe creating great graphs is worth the effort; great graphs can...
Expand our audience, impact, credibility, and reputation.
Clarify and crystallize our own understanding of our work.
Spark creativity in how we communicate and share our work.
Fuel effective decisions and spark the right questions.
Inspire, astound, and motivate.
Be the most efficient and accessible way to share a finding.
So, below, you’ll find the six principles I believe will ensure every graph you make is great. As you read, just keep in mind the following caveats:
There are exceptions to every “rule” about graph design I could present. However, I believe you must know the rules to know when and how to effectively break them.
Journals, disciplines, and venues may have guidelines and norms that deviate from these principles. That doesn’t make those other conventions “good” (in fact, it rarely does! We do a lot of dumb things for tradition's sake, after all...), but it does mean you may have to operate within them sometimes.
Exploratory graphs needn’t meet nearly the same standards as communicative graphs (this guide only covers the latter).
Graphs are only as good as the data they show. They’re also limited by the software you use to create them and your skill with it. In other words, even great graph design skills can't save lousy data-collection or programming skills, unfortunately.
There’s no shame in getting graph design help! We can’t and shouldn’t all be experts in everything. Leverage your community’s collective skill set when it comes to graphs, if you need to. But learning to design great graphs (or better ones, at least) is doable!
There is no such thing as a “perfect graph.” Everybody’s “yum” is somebody’s “yuck.” While the principles in this guide are well-supported, the effectiveness of any specific implementation of them will always be debatable. Data visualization is still very much a space for experimentation.
As with all things, designing great graphs takes practice and patience—it’s trial-and-error even for pros! Set aside time for it, and plan for it to be iterative from the outset.
The best advice I can give is to seek out someone who is not close to your work to critique your graphs! Have a "beta tester" for your graphs!
Without further ado, let's get to the first principle of great graph design...
Yes, every graph should serve a specific purpose, and there’re right/wrong graphs for a given data set and purpose! For example…
Connecting lines (such as those found in line graphs) should only be used to show linkage between discrete groups.
For example, they might link the same individual’s data over time, data of the same property collected along a single transect, different individuals from the same study group, etc.
Don’t confuse these with trendlines; their purposes differ!
Bar graphs should only be used when you have 1 (or more) count- or frequency-based variable(s) and 1 (or more) categorical variable(s). They are inappropriate for continuous data, as they mask important distributional information (despite their still-all-too-common use for showing means).
Distributional graphs/elements (boxplots, error bars, histograms, etc.) should be used only when all (sub)groups have large sample sizes (> ~10).
Pie charts and stacked bar or area charts should only show “parts of a whole” and when you’ve got 6ish groups in total or fewer.
3D graphs (those with a z-position channel) should be interactively rotatable or should be avoided because they too easily biased by a selection of their viewpoint.
Point graphs (e.g., scatterplots) should be used only to show trends between two numeric variables and only when point overplotting is modest.
Trendlines should only be used to argue for and emphasize trends of a specific shape.
Use transformed scales (e.g., logarithmic) only to posit that a trend is only fairly evaluated in the transformed scale.
Yes, there’s a right way to save your graphs too!
Save figures in vector-based formats (e.g., PDF, SVG, EPS), not raster-based formats (e.g., PNG, BMP, JPG, GIF); the former resize better without distorting the contents.
The exception is with very voluminous data. For example, if you’re graphing thousands of data points, an SVG file might be impractically large. These types of files are also harder to integrate into Word or Doc files, so they are best used as final files for submission or sharing rather than as working files.
No, a graph isn’t always your best option!
Your goal is to present the most relevant information in the clearest way using the least ink and space. Tables and text can be better sometimes for this!
Want to choose the right graph type every time by answering just three basic questions? Check out my guide to the most common graph types and how to choose between them.
Great graphs communicate data in order to support arguments.
First, identify your graph's argument—the idea it will hopefully convince readers of.
If you’re not crystal clear on what your intended message is, you cannot design a graph that will convey it!
Second, select the data/patterns to graph that’ll best support your argument.
*Don’t present graphs that are contrary or irrelevant to your argument!
*Don’t include contrary or irrelevant data/patterns in your graphs!
Third, design each graph so as to emphasize the data/patterns that support your argument.
It’s perfectly appropriate to make your intended read of a graph stand out!
Remember: Great graphs tell complete, self-contained stories; they "stand alone."
Don’t include multiple graphs showing the same pattern/message. Show the most representative one; save others for supplemental materials or a repository.
In contrast, multiple graphs can be used to collectively convey a single message. No single graph needs to “do it all.”
*Unless your message is that they are contrary, inconclusive, and/or irrelevant! Save graphs purely for transparency or unbiased interpretation of raw data for supplementary materials or data repositories, where space is less limited and where your narrative won’t be muddied by them. However, good scientists do not hide or downplay contrary or inconclusive results—they integrate them into the narrative they tell!
Balance transparency with complexity and neutrality with argument.
Great graphs make your intended interpretation stand out while still permitting readers to draw independent conclusions.
For example, an emphatic trendline can propose that a certain relationship exists while a fainter uncertainty band around it acknowledges the full range of other plausible possibilities.
Overly complex graphs are not transparent but rather the opposite—they’re confusing and obscuring.
Cognitive load: The volume of info one can effectively and efficiently integrate to form a holistic conclusion.
Every element added to a graph increases cognitive load. With too many, a graph becomes impenetrable, no matter your intent.
You can often include, but visually de-emphasize, some elements to balance the need for transparency with the need to keep cognitive load low.
For example, you can bold, thicken, or color a trendline on a scatterplot while instead making the points unfilled, grayed, or desaturated.
To strike a great balance between complexity, message, and transparency, try…
First adding the element(s) most central to your message (e.g., a trendline).
Then adding the element(s) most needed for transparency and unbiased interpretation (e.g., uncertainty bands).
After, add any additional elements only if they enhance transparency, unbiased interpretation, and/or message (e.g., raw data points)...
…But stay within your audiences’ capacity for complexity. If you don’t know what that capacity is, either find out or assume that simpler is better.
Universal design: All can access and benefit from a product, regardless of age, language, ability, etc.
When you design graphs universally, you maximize your potential audience!
“First, get it right in black and white.”
Color printing (and even true color viewing!) is expensive/technical—not everyone can or does do it.
~4% of people are color-vision-impaired. Use colorblind-friendly palettes whenever you use color, even for emphasis (e.g., viridis).
Cultural color associations can muddy interpretation when they aren't universal (e.g., red = "bad" to some but not others).
Humans can't even accurately or precisely interpret colors in many contexts (e.g., the simultaneous contrast illusion)!
So, color should almost never be your first choice of visual channel (see Principle #4 for other ideas).
Even when you do use it, ensure the graph is still fully interpretable in grayscale.
Remember: While graphs can be artful, they are not art. Their primary purpose is to convey info, not "look pretty!" Color, even just for beautification, is easier to use poorly than well.
Make text and key elements in the plotting area extra large.
Everything important (text and key elements) should look almost ridiculously big to you to be big enough for those with visual impairments.
Many journals shrink figures to type-set them; extra-large elements can more gracefully bear being shrunk.
Maximize contrast.
For example, yellow is hard to see against a white background. One solution: Surround points filled with bright fill colors with dark outlines.
Too many colors (> ~6) inevitably leads to lost contrast between pairs of colors (perceptual color space is just not very wide).
Use white space between elements to enhance readability.
For example, space axis titles away from the axis labels, ensure axis labels aren’t too frequent, and add white space between elements meant to be compared to each other.
Eliminate all non-essential regionalisms, jargon, abbreviations, etc.
Your audience may be global and thus know different things than you, use different abbreviations, think about things differently, be unfamiliar with your country's or region's geography or customs, speak other languages, etc.
Remove extraneous elements to reduce cognitive load.
For example, remove a gray background that reduces contrast while not conveying information.
Every text box, rectangle, and line should serve an essential, unique purpose! If it doesn't, cut it.
Reduce the need for readers to “eye-jump” between many parts of your figure to find essential info—concentrate whenever possible.
For example, label groups inside the plotting area instead of including a legend, or move key statistics out of the caption and into the plotting area.
Channel: A visual attribute whose variation conveys variance in data in a graph.
While some channels are likely very familiar (e.g., x- and y-position, length, and color hue), there are way more channels available than you might think!
Consider all available channels when designing your graphs.
The most common channels include:
Position (i.e., location along the x/y/z-axes).
Color hue (ratio of red, green, and blue), color luminance (how bright or dark it is), and color saturation (how gray it is). Yes, "color" is (at least) three separate channels!
Also, consider here fill color, transparency (opaqueness), and outline color.
For example, primary groupings could be blue and red, and subgroupings could be lighter and darker shades of those colors.
Size, length/width, area, and volume.
For example, primary groupings could have lines that vary in width, and subgroupings could have lines that vary in length.
Shape, pattern or style, and angle or orientation.
For example, primary groupings could be squares and triangles, and subgroupings could be these same shapes rotated at different angles.
Less common, but still useful, channels include: Enclosure (e.g., symbols are surrounded in various ways), motion (e.g., some groups move and others don't in an animated graph), curvature (e.g., the bars on a bar graph have variation in corner roundness), end and border type (e.g., lines have blunted or arrow ends, or bars have dashed or dotted outlines), and arrangement (e.g., related elements are clustered together).
The goal: Find the channel(s) that make(s) key differences clearest. Don't always reach for the same old channels you always use.
Favor 0D and 1D channels over 2D and 3D channels.
For example, reading exact values and comparing groups is much more precise on a bar chart (as relative bar length) than on a pie chart (as relative wedge angle). Reading positions along a single scale on a scatterplot or boxplot is even more precise than reading lengths.
Don’t use 2+ channels to convey the same information ("double-mapping")
For example, don’t use size and shape, or x-axis position and color, to differentiate groups.
Why not? It…
Wastes an available channel you could use another way (or omit to simplify the design).
Increases cognitive load by forcing readers to spend extra time clarifying which channels convey which information. The first assumption, for most people, is that channels are not double-mapped, and it can be hard to overcome this assumption.
One (debatable) exception: Double-mapping a second channel along with color can make a graph more captivating while maintaining accessibility for those with color-vision impairment. However, you should explicitly call attention to such double-mappings, consider channels other than color in these instances, and weigh whether your audience will value being captivated in this way enough to justify the hassle.
Creating "subpanels" is using a channel (specifically, it's a form of arrangement).
Rather than distinguishing groups in other ways, divide one graph into several subpanels, one per group.
In this way, a complex figure becomes three much more digestible ones that can share a single caption, design decisions, and axis scales.
Many channels often mean legends. But are they really necessary? Need they be off to the side?
Legends are bulky and often put essential info off to the side and out of view, forcing readers to "eye-jump" between the graph and the legend to integrate information (a taxing cognitive task).
Instead, use x- or y-position to differentiate groups whenever possible.
If that's not possible, directly label groups within the plotting area somehow. The best legend is no legend at all!
If you can't eliminate a legend, try moving it into the plotting area, if there's ample "void space."
If that's not possible, convert the legend into a "stripe" that can sit above the graph so it's the first element encountered by a reader.
In any case, don’t relegate any “legend-level info” to the caption unless absolutely necessary. Doing so creates another place readers must "eye-jump" between!
Great scientific graph designers are ethical and professional. As such, there are some things they just don't do, such as...
Overuse colors (any more than ~6 tends to be overwhelming) or use garish color palettes (like those incorporating "neon" tones).
Generally, color should convey meaning, not only beautify a graph.
Muted, desaturated tones (those with gray in them) often look more professional.
Include gridlines when determining exact values or thresholds is unnecessary.
You can eliminate gridlines entirely, desaturate them, thin them to be less frequent, and/or use them in just one relevant direction.
Remember: Graphs are not good for conveying exact values; they’re more about “vibes,” and gridlines don’t fit with that focus.
Plus, it increases cognitive load to parse data when gridlines in the plotting area must be ignored.
Omit units in axes titles or using cumbersome units when there’s an available alternative (e.g., using 1,000s of meters instead of single kilometers).
Omit capitals (e.g., at the beginnings of legend labels or for proper nouns), use unnecessary capitals, or inconsistently capitalize (when in doubt, use “Sentence case” throughout a figure).
Figures should be proofread for typos and inconsistencies just as thoroughly as the rest of your product's contents!
Include a plot title (or subtitle).
Good captions do everything a title can do but better! The exception is when there’s no space for a full caption, such as in a presentation. However, even in those instances, it’s better when a graph is designed to be so self-evident it doesn’t need a title.
Have truncated axis labels—both ends of axis scales should be labeled.
For example, if your data along the y-axis range from 0 to 35 and your y-axis labels are [0, 10, 20, 30], expand the axis limits to include 40 and add a label there. Otherwise, it’ll look like a label is “missing,” and readers will have a harder time reading your graph.
Make different design choices across figures in the same product.
Have just one set of visual “rules” your reader must learn whenever possible. Plus, figure inconsistency just looks amateurish!
Use pseudo-3D features.
For example, bars are harder to interpret when they’re made to look three-dimensional when this doesn't actually convey information.
Arrange categories randomly within your plot/legend.
Sort categories by message (e.g., largest to smallest) or else alphabetically or in some other logical way for ease of skimming.
Use your graph’s design to distort, mislead, hide, etc.
For example, distorting an axis scale to make a trivial numerical difference seem large.
As a general rule, the "null hypothesis" should be visible in your graph somewhere!
Few figures can "stand alone" without a great caption that clarifies everything our reader needs to know to be able to make as much sense of our graph as we can.
A great caption tends to (but doesn't always have to) include:
A brief nod to the graph’s purpose—which argument does this graph accompany?
A brief explanation of how the data shown were gathered—what’s their context?
Answer the basic questions of "when," "where," and "how:" Provide dates/times, study sites, site conditions, general methods used, etc.
If it's not obvious, state what data are shown (e.g., are they raw data or group means)?
*A brief call to attention—what’s the message? Where in the graph is that message found?
Clarification of all channels used that are not already clarified by an axis title or legend—what do all lines, symbols, colors, sizes, etc. mean?
Clarification of all units, abbreviations, shorthand, idioms, regionalisms, or jargon not explicitly defined elsewhere in the graph.
Better yet, eliminate the need to clarify something. For example, are all abbrs. used needed? Can any be avoided?
Clarification of what’s different about any subpanels, if it's not obvious.
Mention of any transformations made to the data.
If statistics are shown or implied (e.g., a trendline), an explanation of where they came from and their significance values.
Sample/group size(s), if not discernible from the graph.
A great caption does not include:
Any interpretation.
*The call to attention answers an objective “what” question: What pattern/trend are we seeing? Interpretation, meanwhile, answers a subjective “how” or “why” question: Why is this pattern occurring, or how should we respond to it? Results sections should point out patterns/trends. Interpretation must be saved for the Discussion, however.
Clarification of anything already obvious from the graph itself, or a redundant recap of any basic elements of the graph.
For example, don't start a bar graph's caption with “A bar graph of…” We should assume a reader can figure that out!
A detailed methods recap. Readers need only a brief encapsulation of the relevant context.
References to the text or other figures in lieu of presenting pertinent information in the caption. That’s definitely not standing alone!
Statements about or clarifications of things not actually in the figure (i.e., irrelevant info).
If that feels like a lot that a caption needs to do (and not do), it can be! Rarely is a one- or two-sentence caption even close to sufficient. But you often needn’t use complete sentences to meet any of the needs above; for example, a call to attention can be a half a sentence or even a few words, if you're clever!
Want help turning all this theory into practice? Check out my introductory guide to the ggplot2 R graphics package.
While this page has been largely devoted to the themes and "whys" of great graph design, if you're more interested in the specific "whats" of how to design a great graph, check out the Recommendations page, which goes into greater detail about what to do and what to avoid.