How to Write Science - VIII. Germinating a Research Project

WRITING AN ABSTRACT AND A TITLE FOR A SCIENTIFIC RESEARCH PAPER

CRAFTING RESEARCH HYPOTHESES, PREDICTIONS, AND TESTS

CRAFTING A RESEARCH QUESTION

As noted in the Preface, this page specifically discusses the thought process I've used to mentor junior colleagues on designing their research project. It's totally "off-topic" here in this guide, but I include it in case it's helpful. After all, you cannot write about a research project that does not exist! We've all gotta start somewhere. Plus, thinking critically and being forward-thinking at the project inception stage will often make the eventual writing process go so much smoother. If you find my logic compelling, read on!

Now, because science can be so hard to actually do, there’s one asset I tell junior scientists they need to have in abundance: curiosity. If you want to know an answer badly enough, you’ll endure whatever it takes to get it.

So, an early goal of any research project should be to confidently state what you're curious about.

I’m fortunate to have the freedom to be curious about nearly anything (provided I have the skills, tools, and collaborators needed!). But that freedom can be overwhelming, too—there are too many questions I could ask. Most scientists, by contrast, don’t get to choose their questions—their bosses, mentors, or funding agencies dictate their inquiry. That can be frustrating, especially if the assigned question doesn’t spark your curiosity. Either extreme has challenges.

As such, here, I’ll consider a middle ground—all the examples I’ll give'll assume we’re trying to build a research project motivated by three broad questions:

Why is species X hard to raise successfully from seeds?
Why does species X have the seed germination requirements it does?
What is the best method for raising species X from seed?

These questions will hopefully constrain our imagination enough that I can introduce and build upon real-feeling examples, but I haven’t constrained us so much that there is nothing in here that would spark your own curiosity!

Once you can state what you're curious about, we need to mold that curiosity into a scientific question. While there’s no surefire way to form a good scientific question that I know of, a good scientific question has a few characteristics. It's...

Precise. An example of an imprecise scientific question is: “Why do seeds need water?” A better one would be “How often do seeds of species X need to be fully immersed in water for them to germinate more than 80% of the time?” There is a specific species, rate, requirement, and unit of measurement encapsulated by the latter question but not by the former.
Lacks an agreed-upon answer. If 20 studies have all already found that soaking species X’s seeds for 1 hour three times, 3 days apart, yields 100% seed germination, then “How many times do seeds of species X need to be soaked to break their dormancy?” becomes a bad question to investigate further (unless you have doubts about past results!).
Objectively answerable. “How happy are seeds of species X if they're soaked 3 times?” is a bad question unless you have an objective measurement for “seed happiness.” You should be able to express the answer to your question in terms of something empirically measurable: light levels, time, volume, square miles, etc.
This is not to say that so-called subjective words like “happy” or “healthy” or “beautiful” are somehow invalid for scientific inquiry! Just that we must find concrete ways to define and parameterize them and then convince others they align with the properties stated in our questions.
Avoids sticky words like “best.” Best in what specific sense? If you can’t define what you mean by “best” (or whatever similar word you might use), then that word probably doesn’t belong in your question.
Informed. A good research question builds on what’s already known. For example, asking “Do seeds need water to germinate?” isn’t helpful—we’ve known that answer for centuries. A better question would start from that place and then ask something that builds upon it, such as “How much water is needed to break dormancy in species X?”

Scientific questions rarely come into existence perfect—at first, they may possess only some of these qualities and must acquire the rest through revision. That’s ok!

Let’s say our overarching motivation is to understand what’ll happen to plant reproduction as climate change progresses (this was something I explored in my own research!). A good scientific question related to this broader one might be “If early springs (March-April) in Southeastern Minnesota gets wetter, by how much would germination rates of black raspberry (Rubus occidentalis) seeds be expected to drop?”

That question is specific: I’ve articulated a specific place, time, taxon, and knowledge gap. It’s also something we don’t already know but that could be valuable to know (if you like black raspberries, anyway!). And it can be answered objectively because germination rates can be objectively measured in units we can all (mostly) agree on.

Plus, it builds off of some “knowns,” including that climate change may make springs wetter in Minnesota and that waterlogged soils are not good for some species’ seeds. Maybe most importantly, though, this question would make me curious! I want black raspberries to persist successfully, so if this aspect of climate change threatens them, I’d want to know.

CRAFTING RESEARCH HYPOTHESES, PREDICTIONS, AND TESTS

Once we have a good scientific question, we need to craft a possible answer to it: a hypothesis. Recall that a hypothesis is an explanation for how the world might work or why it might work that way.

The hypothesis is one of the trickiest concepts for developing scientists (and even seasoned veterans!) to master. That’s partly because a hypothesis could be either of two related but distinct things. A hypothesis is either...

A pattern we can expect to reliably observe under specific conditions, assuming a known process is operating like it ought to, or
A process that should generate such a pattern.

Which type of hypothesis we need depends on our question and the “unknown” it contains. If our question is “Would black raspberry seeds germinate better in dryer Spring soils, given that waterlogged seeds can suffocate?”, the unknown here is whether a specific pattern would appear under certain conditions, given a process we know exists but have not confirmed operates for this species in this context.

The hypothesis we'll need in this case should whether the expected pattern should (or shouldn’t) exist in our context plus explain why (or why not). For example: “Black raspberry seeds will germinate better in dryer soils, not because of seed suffocation but rather because dryer soils foster fewer parasitic molds.”

If our question were instead “Why do black raspberry seeds struggle to germinate in wetter Spring soils?”, the unknown here is a process that explains where this pattern we’ve already observed is coming from. The hypothesis needed here will logically link this observed pattern to a process—either one known from elsewhere or a new one. For example: “Black raspberry seeds struggle to germinate in wetter Spring soils because mortality from seed-predating molds is higher in such soils.”

Can a question be so general that it demands an answer (a hypothesis) that is both a process and a pattern? For example, consider the question “Do black raspberry seeds germinate better in dryer Spring soils?” It clearly alludes to a pattern, but it doesn’t articulate a process that would create that pattern nor stake a claim as to which pattern we should observe.

In cases like these, where there's ambiguity about both pattern and process, it’s probably better to start by determining there's a pattern. There’s little sense in trying to explain a pattern that doesn’t exist! If you do observe a pattern, then you can investigate why that pattern might exist.

However, I might argue that a question phrased like the one above probably isn’t as informed as it could be. Scientific inquiry is meant to build upon what we already know—are we really in such “unknown territory” that we cannot speculate whether and why black raspberry seeds would germinate better or worse in dryer Spring soils? That raises a red flag: Where did this line of inquiry even come from, then?

Maybe necessity, or a breakthrough, has pushed us somewhere we wouldn’t have normally found ourselves! But I suspect this ought to be rare. As a rule of thumb, if your question demands both a new process and a new pattern to answer it, it may be too vague or too speculative to support a strong study design. Consider revising it.

So, how do you fashion a hypothesis?

I'd recommend starting by figuring out the unknown implied by your question—is it a process or a pattern? Once you determine which it is, attempt to fill in that unknown. For example, if your question is: “Why does scraping black raspberry seeds with sand increase germination rates?”, a hypothesis could be “scraping black raspberry seeds with sand creates large pores through which germination inhibitors can easily flow out,” which indicates the "missing process" you could then try to observe and measure.

What we expect to observe—in the specific context of our study—is called our prediction. Articulating predictions is the next vital step of doing research; they make assessing the validity of our hypothesis easier: Did we see what we expected to? Yes? Then we have evidence supporting our hypothesis. No? Then our hypothesis may be flawed, or perhaps our test was (or both).

Once you’ve proposed a hypothesis, how do you form a prediction? Ask yourself: If the world works the way your hypothesis proposes, what outcome(s) should we expect to see when the world is behaving a specific way?

For example, if your hypothesis is “Black raspberry seeds need large pores through which germination inhibitors can leave because most of its inhibitors are large molecules,” then a prediction could be: “The average molecular weight of compounds escaping soaking black raspberry seeds will be significantly higher for seeds scraped with sand than for seeds not scraped.”

Notice that, in that prediction, I had to get really specific in a way I didn’t have to in my hypothesis. When our hypothesis is a pattern, such as “Higher germination rates under dryer conditions,” it can seem like that is both a hypothesis and a prediction. However, remember that a hypothesis is meant to be as general as possible—it explains how or why the world generally works the way it does.

Meanwhile, a prediction is a specific outcome we expect, complete with all the trappings of the context. In this particular case, it might explicitly define what we mean by “dryer conditions” (e.g., only 10mL water applied once per day), how much higher we expect germination rates to be (e.g., “statistically significantly higher”), and how we intend to measure “successful” germination (e.g., survival after 3 days sitting on moist towelettes).

As such, a prediction is specifically linked to the set of circumstances we create to hopefully yield it. This (sometimes contrived) set of circumstances is called a test. When we test our hypothesis, we do so in a deliberately constructed version of the world—a microcosm—that allows us to observe how things behave under controlled/known conditions.

Sometimes, this microcosm will be designed to “behave normally.” Other times, it may be designed to behave “abnormally.” Whenever we’re comparing an “abnormal” world to a “normal” one, we’re doing an experiment. For example, if we’re investigating how black raspberry seeds might germinate if thrust into a climate-changed world, that might necessitate creating “abnormal” conditions.

Whenever we’re comparing two or more “normal(ish)” worlds to each other, meanwhile (e.g., we’re comparing two different ways black raspberry seeds successfully germinate naturally to each other), we’re doing an observational study. Most scientists regularly do both kinds of tests.

Either way, a good test creates circumstances in which it’s likely we’ll observe our prediction if our hypothesis is correct but unlikely if it isn’t. If scraping seeds for exactly 30 seconds is best, we should see the best results when we scrape for 30 seconds and worse results when we scrape for 1 minute or 3 seconds instead. However, if 30 seconds isn’t ideal, that should be obvious too.

There is no one way to design a good test: it requires tons of creativity! But the general way is to get a large number and variety of items to study (e.g., many trays worth of black raspberry seeds gathered from all over) and then split them into groups or along an existing continuum (e.g., different amounts of scraping) such that there is more variation in the factors you think matter than in all other factors (under your control).

To recap so far:

Most professional scientists (ones able to cope with doing science professionally every day) are fueled by curiosity.
They form their curiosities into good scientific questions. A good question identifies what we don’t know but should know, based on what we do know, and it does so as specifically as possible.
A good hypothesis states a possible explanation for the question—it proposes a missing process or pattern that we can then assess the validity of.
A good prediction is what you’d specifically observe, under your specifically designed conditions, if your hypothesis is right—and ideally, only if it's right. That is, the predicted outcome shouldn't be just as likely to arise as the result of some alternative explanation.
A good test should produce results aligning with your prediction when your hypothesis is correct but, ideally, not when your hypothesis is flawed.

As you can see, all these parts of the scientific process are linked!

GOING FROM NOTHING TO SOMETHING

Whenever I get the opportunity to mentor junior scientists, I like to put them through a series of exercises at the beginning of their project, and we meet often to discuss them. These are:

Find a general topic (related to my research agenda, of course! All life’s a cage…) that makes them curious.
Identify a specific aspect of that subject that makes them curious and then form that curiosity into a scientific question.
Do some digging and pondering: What might be going on here??
We then form our thoughts into a hypothesis tied to the unknown that sparked the mentee’s curiosity in the first place.
Formulate an outcome (prediction) the mentee could imagine seeing if their hypothesis were right.
Cook up a set of circumstances that should yield results in line with that prediction if our hypothesis is right. The simpler the set of circumstances, the better!
Write this all up (this is often my first chance to really get a feel for a mentee’s science writing skills!).

There’s so much that can be learned and explored during this process, on both sides. What kind of mentorship does the mentee want or need? What does each party expect from the relationship? Is there a line of inquiry that only the one with "fresh eyes" could see? What does the mentee want out of their professional life? All this and more tends to emerge organically from this process.

After we work through those steps, we come to the culminating activity: I ask every mentee to sketch the graph (or graphs) they think they’ll present at the end of the project, the one that will show support for our hypothesis (or not). This is our “target graph”—our visual prediction, our outcome-in-waiting, the thing we’re doing all this to eventually receive.

What kind of graph is it? Why this type and not some other type? What patterns or trends can be seen in it? What groups or ranges are present? What are the units on the axes? What would the caption need to clarify or explain?

I cannot stress this enough: I truly feel that one of the biggest mistakes a scientist can make is to treat the data they ultimately want from their project as an afterthought, as if the project is the “whole” and the data are a “byproduct.”

No—your data are the point! If there are challenges ahead with respect to getting and analyzing the data you really want, we want to catch and fix those problems early! “Later” may be too late! This graphing exercise helps reveal these kinds of issues better than any other one I’ve found so far.

Say your y-axis is labeled “seed health.” What does that mean? How will you measure it? Can you access the right tools to do so? Could someone reasonably argue that your measure of seed health isn’t valid or logical? If these questions make you squirm, that’s a sign you might need to back up and refine your thinking before moving forward.

Ultimately, our “target graph” is just our prediction, but it’s our prediction visualized, with units, scales, variables, relationships, etc. all made explicit. Here’s an example I sketched out:

Germination rate is on the y-axis in sprouts per 10 seeds; frequency of rains is on the x-axis in rain events per week; and the graph is a scatterplot showing a negative relationship with a trend line.

As I wrote the caption above, I realized there’s a lot that'd need to be clarified further in the caption for this figure, such as “how were the seeds planted and treated?” Which means I should probably start thinking about those details of the test now. Such is the many benefits of this exercise.

As the above example illustrates, these graphs do not have to be all that complex to do their job. In fact, simpler graphs often translate into simpler, more focused tests! If you find yourself trying to draw “3D” graphs or reaching for a whole box of colored pencils to represent all the different sub-groups, it’s a sign you could be overcomplicating your test.

One of the most important things, at this point, is that you can put units on your axes that you feel you could defend—if you can, then you have probably identified a specific enough line of inquiry to be successful!

So, while this step of making a “target graph” isn’t a requirement, I highly recommend it. Doing it will help ensure your project is more defensible, easier to plan, and more focused, but there is another major benefit: it gives your project a clear direction. You are largely “done” with your project when have your “target graphs” in hand! Once you have built your “target graphs,” a research project becomes just ‘a series of things you need to do to get you closer to making your “target graphs.”’ Given how quagmire-y research projects can become, the clarity your “target graph" provides isn’t a luxury—it’s a guiding light. When your project starts to wander (and it will), your target graph can point you back toward what really matters.

Page updated

Report abuse