Specs vs. Tests

There’s something to this BDD kool-aid that people have been drinking lately…

As part of the Rails project I’ve been working on for the last few weeks, I’ve been using RSpec. RSpec is a unit testing tool similar in spirit to JUnit or Test/Unit. However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.

In Java, using JUnit, we might write the following unit test:
public class BowlingGameTest extends TestCase {
private Game g;

protected void setUp() throws Exception {
g = new Game();
}

private void rollMany(int n, int pins) {
for (int i=0; i< n; i++) {
g.roll(pins);
}
}

public void testGutterGame() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
assertTrue(g.isComplete());
}

public void testAllOnes() throws Exception {
rollMany(20,1);
assertEquals(20, g.score());
assertTrue(g.isComplete());
}
}
This is pretty typical for a Java unit test. The setup function builds the Game object, and then the various test functions make sure that it works in each different scenario. In Ruby however, this might be expressed using RSpec as:
require 'rubygems'
require_gem "rspec"
require 'game'

context "When a gutter game is rolled" do
setup do
@g = Game.new
20.times {@g.roll 0}
end

specify "score should be zero" do
@g.score.should == 0
end

specify "game should be complete" do
@g.complete?.should_be true
end
end

context "When all ones are rolled" do
setup do
@g = Game.new
20.times{@g.roll 1}
end

specify "score should be 20" do
@g.score.should == 20
end

specify "game should be complete" do
@g.complete?.should_be true
end
end
At first blush the difference seems small. Indeed, the RSpec code might seem too verbose and fine-grained. At least that was my first impression when I first saw RSpec. However, having used it now for several months I have a different reaction.

First, let’s looks a the semantic differences. In JUnit you have TestCase derivatives, and test functions. Each TestCase derivative has a setUp and tearDown function, and a suite of test functions. In RSpec you have what appears to be an extra layer. You have the test script, which is composed of context blocks. The contexts have setup, teardown, and specify blocks.

At first you might think that the RSpec context block coresponds to the Java TestCase derivative since they are semantically equivalent. However Java throws something of a curve at us by only allowing one public class per file. So from an organizational point of view there is a stronger equivalence between the TestCase derivative and the whole RSpec test script.

This might seem petty. After all, I can write Java code that is semantically equivalent to the RSpec code simply by creating two TestCase derivatives in two different files. But separating those two test cases into two different files makes a big difference to me. It breaks apart things that otherwise want to stay together.

Now it’s true that I could keep the TestCase derivatives in the same file by making them package scope, and manually put them into a public TestSuite class. But who wants to do that? After all, my IDE is nice enough to find and execute all the public TestCase derivatives, which completely eliminates the need for me to build suites—at least at first.

Note: The JDave tool provides BDD syntax for Java.

Again, this might seem petty; and if that were the only benefit to the RSpec syntax I would agree. But it’s not the only benefit.

Strange though it may seem, the next benefit is the strings that describe the context and specify blocks. At first I thought these strings were just noise, like the strings in the JUnit assert functions. I seldom, if ever, use the JUnit assert strings, so why would I use the context and specify strings? But over the last few weeks I have come to find that, unlike the JUnit assert strings, the RSpec strings put a subtle force on me to create better test designs.

Stable State: An Emergent Rule.

When a spec fails, the message that gets printed is the concatenation of the context string and the specify string. For example: 'When a gutter game is rolled game should be complete' FAILED. If you word the context and specify strings properly, these error message make nice sentences. Since, in TDD, we almost always start out with our tests failing, I see these error message a lot. So there is a pressure on me to word them well.

But by wording them well, I am constrained to obey a rule that JUnit never put pressure on me to obey. Indeed, I didn’t know it was a rule until I started using RSpec. I call this rule Stable State, it is:

Tests don’t change the state.

In other words, the functions that make assertions about the state of the system, do not also change the state of the system. The state of the system is set up once in the setUp function, and then only interrogated by the test functions.

If you look carefully at the specification of the Bowling Game you will see that the state of the Game is changed only by the setup block within the context blocks. The specify blocks simply interrogate and verify state. This is in stark contrast to the JUnit tests in which the test methods both change and verify the state of the Game.

If you don’t follow this rule it is hard to get the strings on the context and specify blocks to create error messages that read well. On the other hand, if you make sure that the specify blocks don’t change the state, then you can find simple sentences that describe each context and specify block. And so the subtle pressure of the strings has a significant impact on the structure of the tests.

I can’t claim to have discovered the pressure of these strings. Indeed, Dan North’s original article on the topic is captivating. However, I felt the pressure and came to the same conclusion he did, well before I read his article; simply by using a tool inspired by his work.

The benefit of Stable State is that for each set of assertions there is one, and only one place where the state of the system is changed. Moreover the three level structure provides natural places for groups of state, states, and asserts.

The demise of the One Assert rule.

There have been other rules like this before. One that circulated a few years back was:

One assert per test.

I never bought into this rule, and I still don’t. It seems arbitrary and inefficient. Why should I put each assert statement into it’s own test method when I can just as well put the assert statement into a single test method.

In other words, why prefer this:
  public void testGutterGameScoreIsZero() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
}

public void testGutterGameIsComplete() throws Exception {
rollMany(20, 0);
assertTrue(g.isComplete());
}
over this:
  public void testGutterGame() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
assertTrue(g.isComplete());
}

I think the authors of the One Assert rule were trying to achieve the benefits of Stable State, but missed the mark. It’s as though they could smell the rule out there, but couldn’t quite pinpoint it.

The State Machine metaphor

When you follow the Stable State rule your specifications (tests) become a description of a Finite State Machine. Each context block describes how to drive the SUT to a given state, and then the specify blocks describe the attributes of that state.

Dan North calls this the Given-When-Then metaphor. Consider the following triplet:

Given a Bowling Game: When 20 gutter balls are rolled, Then the score should be zero and the game should be complete.

This triplet corresponds nicely to a row in a state transition table. Consider, for example, the subway turnstile state machine:

Current State Event New State
Locked coin Unlocked
Unlocked pass Locked
Locked pass Alarm
Unlocked coin Unlocked

We can read this as follows:

GIVEN we are in the Locked state, WHEN we get a coin event, THEN we should be in the Unlocked state. GIVEN we are in the Unlocked state, WHEN we get a pass event, THEN we should be in the Locked state. etc.

Describing a system as a finite state machine has certain benefits.

  1. We can enumerate the states and the events, and then make sure that every combination of state and event is handled properly.
  2. We can formalize the behavior of the system into a well known tabular format that can be read and interpreted by machines.
    • I am, of course, thinking about FitNesse
  3. There are well known mechanisms for implementing finite state machines.

The point is that organizing the system description in terms of a finite state machine can have a profound impact on the system design and implementation.

The Butterfly Effect.

I find it remarkable that two dumb annoying little strings put a subtle pressure on me to adjust the style of my tests. That change in style eventually caused me to see the design and implementation of the system I was writing in a very new and interesting light.


Comments

Leave a response

  1. Avatar
    Chris Hedgate about 16 hours later:

    I really like the Stable State idea, and I think you are probably right about One Assert Per Test really was about this but did not quite get there. However, if you are using BDD and following Stable State, does not that more or less make One Assert Per Test a given as well? In your example above you have several specify, each with a single something.should specification. If they would all be slumped together in one specify, writing the string for that would again be impossible. And I think TDD can work the same way, which is why I have always tried to abide to One Assert Per Test. I recently wrote about this in One Assert Per Test should come natural.

  2. Avatar
    Michael Feathers about 2 hours later:

    One nit I have with BDD style is the fact that the test comment string so closely reflects the code in the case. It feels like duplication.

    When you’ve worked to be able to say @g.score.should == 0, it feels weird to have to write “score should be zero.” Granted, you write the string before the the code, but it still looks odd after the fact. Makes you wonder whether a framework written in fluent style could generate the string.

  3. Avatar
    Chris Hedgate 20 minutes later:

    Michael: I do not know about RSpec (have just tried it once), but Specter (a similar framework for .Net) does that. Here is an example (Specter specs are written in Boo):

    context "When input is a phrase marked with asterisks":
      output as string
    
      setup:
        transformer = ContextileTransformer()
        output = transformer.ToHtml("*foo*")
    
      specify output.Must.Not.BeEmpty()
    
      specify "Phrase is given strength in output":
        output.Must.Equal("<p><strong>foo</strong></p>")
    

    When this is run in a test runner the output is something like:

    WhenInputIsAPhraseMarkedWithAsterisks
      outputMustNotBeEmpty
      PhraseIsGivenStrength
    
  4. Avatar
    David Chelimsky 4 minutes later:

    Chris – that looks pretty interesting.

    Michael – while the duplication that you are describing is easy to end up with, there really is plenty of flexibility in what you write. You could, for example, say:

    A bowling game
    - should score an all gutter game correctly
    - should score an all ones game correctly
    - should score an all spares game correctly
    - should score an all strikes game correctly
    

    or

    A bowling game should produce the correct score
    - given an all gutter game
    - etc
    

    One thing we’re working on is some means of nesting contexts and/or specifications. So you could do something like this:

    Bowling game behaviour
    - should score correctly
      - when game is all gutters
      - when game is all ones
      - when game is all spares
      - when game is all strikes
    - should consider the game complete
      - after 20 0s
      - after 20 1s
      - after 21 5s
      - after 12 Xs
    

    Not saying these are “right”, just that they become possible. Note how the last example begins to feel more like a description of behaviour – not just because of the word, but because of the nesting structure. An x should do y under conditions a,b,c and d.

    Food for thought.

  5. Avatar
    David Chelimsky 1 minute later:

    More thoughts – the process through which specs evolve is going to have an impact on what goes in the name and what goes in the code. “should score 0” might have come from a customer who said “this is what the score should be for an all gutter game”. That name then serves as definition up front, documentation later.

  6. Avatar
    Tobias Grimm 20 minutes later:

    Recently I started using RSpec the first time. Beeing used to Assert.*, I’ve always been sceptical about the “pseudo-natural-languge-style” expressions. I still don’t see that much of a difference (from a programmers point of view) whether I write “Assert.Equals(expected, actual)” or “actual.should == expected”.

    But what I absolutely love about RSpec, is the way it makes me think about what I want to code. In the “normal” TDD-way I’ve always been kinda more focused on the design of my class currently under test, making me loose focus on what I really wanted to do. BDD forces me to focus more on application behaviour and helping me to stay on track. And just as you, I think, this is mainly because of the context/specification-style of writing tests.

    With TDD the test method names often technically describe the class under test and how it is to be used. In BDD style the test methods (=specifications) describe what this part of my code is good for after all. The technical stuff then moves to the body of the specification.

    I’ve now even started to write my NUnit-tests in BDD-style, which works pretty well. Each TestFixture is a context (defined in SetUp) and each Test is a specification. Of course this doesn’t give me the nice failure messages that RSpec produces, but it seems to work too. I think David Chelimsky somwhere said something like “BDD is doing TDD the right way.”.

    Regarding the “Stable State” thing – I try to follow this rule, but sometimes it just doesn’t seem to fit and I break it. Maybe a sign, that I haven’t fully adopted the BDD-style yet.

  7. Avatar
    YAChris about 2 hours later:

    Hmmmmmmmmmmm…

    I’m with you on the “One Assert rule”. Recently, I’ve been working on some code which is state-based, so I wind up with:

    State state = State.START;
    state = stateMachine.step(state, 'Foo');
    assertEquals(state, State.PAST_FOO);
    state = stateMachine.step(state, 'Bar');
    assertEquals(state, State.SKIP);
    state = stateMachine.step(state, 'Baz');
    assertEquals(state, State.PAST_FOO_AND_BAZ);
    

    where it’s necessary to have changing state, because the interesting bit is that there can be many SKIP situations interleaved, BUT we have to remember where we were before the SKIP, to wind up in the right state at the end.

    So, the question is, in Specs-land would this have to become three complete Contexts? That seems unfortunate. Maybe there is a better way to do it?

  8. Avatar
    Michael Feathers about 3 hours later:

    State is a slippery thing. Uncle Bob’s bowling game, as presented, is stateful, but imagine a different problem: you need to create a command object which accepts an array of throws and returns the score. The object doesn’t have mutable state, so technically no spec would alter state, but without a need for setup you could easily end up without all of the contexts that Bob found.

    Seems that the benefit that BDD provides in this context lessens to the degree that you move toward less stateful objects, and that seems to happen among people who do a lot of interaction-style TDD.

  9. Avatar
    Paul Holser 1 day later:

    I think the intent behind the “One Assert Per Test” rule was to get you thinking of “TestCases” as “fixtures” instead - wherein setUp() puts the system or a slice thereof in a specific state, and the testX() methods each contain one assertion about the state of the system that should hold if the system is working correctly. So in the first code listing under “The Demise of the ‘One-Assert’ Rule”, you’d factor out the multiple rollMany(20,0) calls into a setUp() - just like the RSpec specification for the gutter game does. It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.

    It’s interesting to see that RSpec facilitates thinking about aspects of the system being developed in terms of those system states, and not so much a one-spec-per-class mindset.

  10. Avatar
    Paul Holser 1 day later:

    I think the intent behind the “One Assert Per Test” rule was to get you thinking of “TestCases” as “fixtures” instead - wherein setUp() puts the system or a slice thereof in a specific state, and the testX() methods each contain one assertion about the state of the system that should hold if the system is working correctly. So in the first code listing under “The Demise of the ‘One-Assert’ Rule”, you’d factor out the multiple rollMany(20,0) calls into a setUp() - just like the RSpec specification for the gutter game does. It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.

    It’s interesting to see that RSpec facilitates thinking about aspects of the system being developed in terms of those system states, and not so much a one-spec-per-class mindset.

  11. Avatar
    Steven Baker 3 days later:

    Michael: RSpec includes a great mocking and stubbing framework (derived from SchMock, but now more closely resembling Mocha).

    Myself, and many of the others working with RSpec hardly use state-based specifications at all. I work almost exclusively with mocks in RSpec in most of the projects I use it.

  12. Avatar
    Jason Gorman 7 days later:

    Uncle Bob: In an interview I did with you for the (now defunct) objectmonkey.com site, didn’t you tell me that you didn’t care if TDD was like formal specification? Have you changed your mind since then?

  13. Avatar
    Paul Davis 7 days later:

    After reading this article, I’m wondering…. Could we use jRuby to write RSpec tests against java code? Damn Uncle Bob, I’m going to lose another weekend because of you. ;-)

  14. Avatar
    Uncle Bob 8 days later:

    rspec in jRuby… now that’s an interesting thought…

  15. Avatar
    Uncle Bob 8 days later:

    Jason, You’ll have to refresh my memory about that interview and the context of it. I’ve been making the “formal document” argument for at least five years.

  16. Avatar
    Jason Gorman 13 days later:

    Alas, I don’t have the original manuscript to hand, but from memory I think I asked you if TDD was formal specification by the back door, and I distinctly recall you saying you didn’t care if it was. The title of the interview was “Getting Sh*t Done”, if that helps establish the context :-)

  17. Avatar
    Jason Gorman 13 days later:

    That’s not to say I don’ totally agree with what you’re saying now. I think any movement towards higher integrity specs – executable specs – is progress. It all sounds very familiar to me – I think I’ve been doing BDD right from the get go since I started doing TDD – so I’m bound to draw the comparison now.

  18. Avatar
    Jason Gorman 13 days later:

    Courtesy of the WaybackWhen web cache (interview from 2003, I think):

    ObjectMonkey: Here’s a hot potato for you – is Test-driven Development really Formal Methods in disguise?

    Uncle Bob: Test-driven development is the most profound and auspicious thing to happen to the software industry since I’ve been a programmer. I think it’s even more important than OO.

    ObjectMonkey: I’m inclined to agree.

    Uncle Bob: Nothing has had such a profound effect upon the way I write code than TDD. Nothing. When I write code now, I run tests every few minutes. My stuff is always working. I never have windows all over my screen with modules torn apart, hoping I can one day piece them back together again. Every minute or two I run tests, and get my stuff working. I don’t use debuggers anymore. Debuggers are a drug. You get addicted to them. They drag you down a rat hole. You spin and spin, trying to set your breakpoints, trying to follow the logic, trying to figure out what the hell is going on. With TDD, that all but goes away. I haven’t used a debugger in anger in over three years. And I chide anyone I see who is using one. So I don’t care whether there is a link between TDD and FM. TDD is a great boon to me, and to software in general.�

  19. Avatar
    Pandora over 4 years later:

    However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.

  20. Criminal Check over 4 years later:

    So from an organizational point of view there is a stronger equivalence between the TestCase derivative and the whole RSpec test script.

  21. Avatar
    Criminal Check over 4 years later:

    In other words, the functions that make assertions about the state of the system, do not also change the state of the system.

  22. Avatar
    Criminal Records over 4 years later:

    It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.

  23. Avatar
    Tenant Screening over 4 years later:

    Test-driven development is the most profound and auspicious thing to happen to the software industry since I’ve been a programmer. I think it’s even more important than OO.

  24. Avatar
    Hotel Bucuresti over 4 years later:

    Normally, the tests must reveal the same values as the specs said. However, the reality seems to be very difficult to understand because sometimes the differences are very big.


Comments