http://smc.sourceforge.net/
http://www.embedded.com/2000/0012/0012feat1.htm#ref
http://johnsantic.com/comp/state.html
http://www.complang.org/ragel/
http://snippets.dzone.com/posts/show/3793
http://www.drdobbs.com/184401236;jsessionid=P1E04N3XOT31HQE1GHOSKHWATMY32JVN?pgno=1
Ragel State Machine Compiler
Ragel compiles executable finite state machines from regular languages. Ragel targets C, C++, Objective-C, D, Java and Ruby. Ragel state machines can not only recognize byte sequences as regular expression machines do, but can also execute code at arbitrary points in the recognition of a regular language. Code embedding is done using inline operators that do not disrupt the regular language syntax.
The core language consists of standard regular expression operators (such as union, concatenation and Kleene star) and action embedding operators. The user's regular expressions are compiled to a deterministic state machine and the embedded actions are associated with the transitions of the machine. Understanding the formal relationship between regular expressions and deterministic finite automata is key to using Ragel effectively.
Ragel also provides operators that let you control any non-determinism that you create, construct scanners, and build state machines using a statechart model. It is also possible to influence the execution of a state machine from inside an embedded action by jumping or calling to other parts of the machine, or reprocessing input.
Ragel provides a very flexible interface to the host language that attempts to place minimal restrictions on how the generated code is integrated into the application. The generated code has no dependencies.
action dgt { printf("DGT: %c\n", fc); }
action dec { printf("DEC: .\n"); }
action exp { printf("EXP: %c\n", fc); }
action exp_sign { printf("SGN: %c\n", fc); }
action number { /*NUMBER*/ }
number = (
[0-9]+ $dgt ( '.' @dec [0-9]+ $dgt )?
( [eE] ( [+\-] $exp_sign )? [0-9]+ $exp )?
) %number;
main := ( number '\n' )*;
||
\/
=>
st0:
if ( ++p == pe )
goto out0;
if ( 48 <= (*p) && (*p) <= 57 )
goto tr0;
goto st_err;
tr0:
{ printf("DGT: %c\n", (*p)); }
st1:
if ( ++p == pe )
goto out1;
switch ( (*p) ) {
case 10: goto tr5;
case 46: goto tr7;
case 69: goto st4;
case 101: goto st4;
}
if ( 48 <= (*p) && (*p) <= 57 )
goto tr0;
goto st_err;
Writing robust protocol implementations.
Parsing data formats.
Lexical analysis of programming languages.
Validating user input.
Construct finite state machines using:
regular language operators
state chart operators
a scanner operator
some mix of the above
Embed actions into machines in arbitrary places.
Control non-determinism using guarded operators.
Minimize state machines using Hopcroft's algorithm.
Visualize output with Graphviz.
Use byte, double byte or word-sized alphabets.
Generate C, C++, Objective-C, D, Java or Ruby code with no dependencies.
Choose from table or control flow driven state machines.
[1]
Adrian D. Thurston. "Parsing Computer Languages with an Automaton Compiled from a Single Regular Expression." In 11th International Conference on Implementation and Application of Automata (CIAA 2006), Lecture Notes in Computer Science, volume 4094, pp. 285-286, Taipei, Taiwan, August 2006. pdf.
Ragel has a user guide available in PDF format as well as a man page. Major version number releases contain language changes. See the ChangeLog and Release Notes for details.
If you use Vim, there is a syntax file ragel.vim for your editing pleasure. If you use TextMate there is a Ragel bundle Ragel.tmbundle.
The Ragel mailing list is available here: ragel-users. Ask for help, post parsing problems, or tell us what you think of Ragel.
Using Ragel and XCode. link
Mongrel is an HTTP library and server for Ruby.
Hpricot is an HTML parsing and manipulation library for Ruby.
RFuzz is an HTTP destroyer.
Zed Shaw on Ragel State Charts. link
SuperRedCloth is a snappy implementation of Textile for Ruby.
json A JSON parser and generator for Ruby.
A JSON parser for Pike. link
Utu: internet communication with cryptographically enforced identity, reputation and retribution.
appid: single-pass application protocol identification.
RaSPF is an SPF library in C.
Layout Expression Language (part of Profligacy) is for building Swing GUIs with JRuby.
Qfsm is a graphical tool for designing state machines. It includes a Ragel export feature.
ABNF Generator is a tool which accepts grammars in ABNF and outputs Ragel definitions.
Perian is a QuickTime component that adds native support for many popular video formats.
devChix article: A Hello World for Ruby on Ragel. (updated)
A Brazilian Portuguese translation of the above: Um Hello World para o Ruby em Ragel 6.0
Using Ragel for scanning wikitext. link.
An article in Japanese on using Ragel. link.
A little assembler that uses Ragel for scanning and Lemon for parsing. link.
An ESI server derived from Mongrel. link.
CroMo: Morphological analysis of Croatian and other languages. link.
An include file dependency scanner, mostly for C files. link.
EaRing, an assembler using Ragel and Lemon. link.
Screenplay typesetting. link.
Source code line counting. link.
An implementation of W3C Selectors in Java. link.
There are Ragel lexers for the Pygments syntax highlighting system. link.
Lighttpd sandbox (which will become 2.0) uses Ragel. link.
Article on use of ragel in embedded systems.
Calculating the mean selector specificity, using a Ragel-generated CSS3 parser. project, blog post.
Clang: a scanner for a simple C like language. clang.rl
Mailbox: parses unix mailbox files. It breaks files into separate messages, the headers within messages and the bodies of messages. mailbox.rl
AwkEmu: performs the basic parsing that the awk program performs on input. awkemu.rl
Atoi: converts a string to an integer. atoi.rl
Concurrent: performs multiple independent tasks concurrently. concurrent.rl
StateChart: the Atoi machine built with the named state and transition list paradigm. statechart.rl
GotoCallRet: demonstrates the use of fgoto, fcall, fret and fhold. gotocallret.rl
Params: parses command line arguments. params.rl
RagelScan: scans ragel input files. rlscan.rl
CppScan: A c++ scanner that uses the longest-match method of scanning cppscan.rl
Development Version: Please use this for creating patches.
http://svn.complang.org/ragel/trunk/
Tar.Gz: The latest release is version is ragel-6.6.tar.gz (sig).
Older: Previous versions are available here.
Debian: The homepage for the Debian package of Ragel is here. It is by Robert Lemmen.
OpenPKG: Ragel has been included in the OpenPKG project.
FreeBSD: A port for Ragel is available in the FreeBSD ports system.
NetBSD: There is a package for Ragel in the pkgsrc database.
Mac OS X: A port is available in the MacPorts repository.
Crux: A port for the Crux Linux distribution is available here.
Gentoo: A Gentoo port is available.
Suse: Packages for Suse can be found here.
Windows: Ragel can be compiled using Cygwin or MinGW. Binaries compiled with visual studio are here (6.6), provided by Joseph Goettgens.
Redhat/Fedora: A package for Ragel is available in Fedora Extras.
Slackware: A package is available at LinuxPackages.net
The public key for package signing is here.
Ragel is released under the GNU General Public License. A copy of the license is included in the distribution. It is also available from GNU.
Note: parts of Ragel output are copied from Ragel source covered by the GNU GPL. As a special exception to the GPL, you may use the parts of Ragel output copied from Ragel source without restriction. The remainder of Ragel output is derived from the input and inherits the copyright status of the input file. Use of Ragel makes no requirements about the license of generated code.
Ragel was written by Adrian Thurston. It was originally developed in early 2000 and was first released January 2002. Many people have contributed feedback, ideas and code. Please have a look at the CREDITS file.
I'm vaguely plotting a finite state machine -> C compiler (as toy code, not for a serious project - I know there are plenty already out there), so thought I'd write a sample one by hand to see how it should look. Here's the code for it.
This checks if stdin contains the phrase 'foo' or 'bar'
#include <stdio.h>
main()
{
int c;
START:
switch(c = getchar()){
case 'f' : goto F;
case 'b' : goto B;
case EOF : goto FAIL;
default: goto START; }
F:
switch(c = getchar()){
case 'o' : goto FO;
case EOF : goto FAIL;
default : goto START;}
FO:
switch(c = getchar()){
case 'o' : goto SUCCESS;
case EOF : goto FAIL;
default : goto START;}
B:
switch(c = getchar()){
case 'a' : goto BA;
case EOF : goto FAIL;
default : goto START;}
BA:
switch(c = getchar()){
case 'r' : goto SUCCESS;
case EOF : goto FAIL;
default : goto START;}
FAIL:
printf("Does not match.\n");
return;
SUCCESS:
printf("Matches.\n");
return;
}
One common way of conquering difficult software design problems is to use a state machine. First you figure out all the states the software can be in. Then you determine all the inputs to the state machine—all the events that can cause the state machine to take some action or to change states. Finally you determine the state machine outputs—all the actions that the state machine can perform.
When your state machine design is done, you'll have a list of states, a list of events (inputs), and a set of action procedures for each state that describe what the state machine does for each event (outputs).
There are two ways to code a state machine in C. One way uses a set of nested switch statements. The outer switch has a case for each possible state. Each of these outer cases has an inner switch with a case for each possible event. The actual code that gets selected performs the actions for that state/event. Alternately, the outer switch could have a case for each event, and the inner switch could have a case for each state.
Another more concise way of coding is to use a lookup table. First, number all your states consecutively, starting with 0—an enum is a convenient way to do this. Do the same for your events. Then make up a set of tables, one table per state. Each table has one entry per event, in the same order as the event enum. Then the entire set of tables is arranged in the same order as the state enum. Each item in a table is the function to execute to perform the action for that particular event in that particular state.
The listing below is an example with three states and two events, and therefore six action procedures.
/* Define the states and events. If your state machine program has multiple
source files, you would probably want to put these definitions in an "include"
file and #include it in each source file. This is because the action
procedures need to update current_state, and so need access to the state
definitions. */
enum states { STATE_1, STATE_2, STATE_3, MAX_STATES } current_state;
enum events { EVENT_1, EVENT_2, MAX_EVENTS } new_event;
/* Provide the fuction prototypes for each action procedure. In a real
program, you might have a separate source file for the action procedures of
each state. Then you could create a .h file for each of the source files,
and put the function prototypes for the source file in the .h file. Instead
of listing the prototypes here, you would just #include the .h files. */
void action_s1_e1 (void);
void action_s1_e2 (void);
void action_s2_e1 (void);
void action_s2_e2 (void);
void action_s3_e1 (void);
void action_s3_e2 (void);
enum events get_new_event (void);
/* Define the state/event lookup table. The state/event order must be the
same as the enum definitions. Also, the arrays must be completely filled -
don't leave out any events/states. If a particular event should be ignored in
a particular state, just call a "do-nothing" function. */
void (*const state_table [MAX_STATES][MAX_EVENTS]) (void) = {
{ action_s1_e1, action_s1_e2 }, /* procedures for state 1 */
{ action_s2_e1, action_s2_e2 }, /* procedures for state 2 */
{ action_s3_e1, action_s3_e2 } /* procedures for state 3 */
};
/* This is the heart of the state machine - where you execute the proper
action procedure based on the new event you have to process and your current
state. It's important to make sure the new event and current state are
valid, because unlike "switch" statements, the lookup table method has no
"default" case to catch out-of-range values. With a lookup table,
out-of-range values cause the program to crash! */
void main (void)
{
new_event = get_new_event (); /* get the next event to process */
if (((new_event >= 0) && (new_event < MAX_EVENTS))
&& ((current_state >= 0) && (current_state < MAX_STATES))) {
state_table [current_state][new_event] (); /* call the action procedure */
} else {
/* invalid event/state - handle appropriately */
}
}
/* In an action procedure, you do whatever processing is required for the
particular event in the particular state. Among other things, you might have
to set a new state. */
void action_s1_e1 (void)
{
/* do some processing here */
current_state = STATE_2; /* set new state, if necessary */
}
void action_s1_e2 (void) {} /* other action procedures */
void action_s2_e1 (void) {}
void action_s2_e2 (void) {}
void action_s3_e1 (void) {}
void action_s3_e2 (void) {}
/* Return the next event to process - how this works depends on your
application. */
enum events get_new_event (void)
{
return EVENT_1;
}
Embedded State Machine Implementation
Turning a state machine into a program can be pretty straightforward if you follow the advice of a skilled practitioner.
By: Martin Gomez
Many embedded software applications are natural candidates for mechanization as a state machine. A program that must sequence a series of actions, or handle inputs differently depending on what mode it's in, is often best implemented as a state machine.
This article describes a simple approach to implementing a state machine for an embedded system. Over the last 15 years, I have used this approach to design dozens of systems, including a softkey-based user interface, several communications protocols, a silicon-wafer transport mechanism, an unmanned air vehicle's lost-uplink handler, and an orbital mechanics simulator.
State Machines
For purposes of this article, a state machine is defined as an algorithm that can be in one of a small number of states. A state is a condition that causes a prescribed relationship of inputs to outputs, and of inputs to next states. A savvy reader will quickly note that the state machines described in this article are Mealy machines. A Mealy machine is a state machine where the outputs are a function of both present state and input, as opposed to a Moore machine, in which the outputs are a function only of state. [1] In both cases, the next state is a function of both present state and input. Pressman has several examples of state transition diagrams used to document the design of a software product.
Figure 1 shows a state machine to do that. In this example, the first occurrence of a slash produces no output, but causes the machine to advance to the second state. If it encounters a non-slash while in the second state, then it will go back to the first state, because the two slashes must be adjacent. If it finds a second slash, however, then it produces the "we're done" output.
The state machine approach I recommend proceeds as follows:
Learn what the user wants
Sketch the state transition diagram
Code the skeleton of the state machine, without filling in the details of the transition actions
Make sure the transitions work properly
Flesh out the transition details
Test
An example
A more illustrative example is a program that controls the retraction and extension of an airplane's landing gear. While in most airplanes this is done with an electrohydraulic control mechanism (simply because they don't have a computer on board), cases exist-such as unmanned air vehicles-where one would implement the control mechanism in software.
Let's describe the hardware in our example so that we can later define the software that controls it. The landing gear on this airplane consists of a nose gear, a left main gear, and a right main gear. These are hydraulically actuated. An electrically driven hydraulic pump supplies pressure to the hydraulic actuators. Our software can turn the pump on and off. A direction valve is set by the computer to either "up" or "down," to allow the hydraulic pressure to either raise or lower the landing gear. Each leg of the gear has two limit switches: one that closes if the gear is up, and another that closes when it's locked in the down position. To determine if the airplane is on the ground, a limit switch on the nose gear strut will close if the weight of the airplane is on the nose gear (commonly referred to as a "squat switch"). The pilot's controls consist of a landing gear up/down lever and three lights (one per leg) that can either be off, glow green (for down), or glow red (for in transit).
Let us now design the state machine. The first step, and the hardest, is to figure out what the user really wants the software to do. One of the advantages of a state machine is that it forces the programmer to think of all the cases and, therefore, to extract all the required information from the user. Why do I describe this as the hardest step? How many times have you been given a one-line problem description similar to this one: don't retract the gear if the airplane is on the ground.
Clearly, that's important, but the user thinks he's done. What about all the other cases? Is it okay to retract the gear the instant the airplane leaves the ground? What if it simply bounced a bit due to a bump in the runway? What if the pilot moved the gear lever into the "up" position while he was parked, and subequently takes off? Should the landing gear then come up?
One of the advantages of thinking in state machine terms is that you can quickly draw a state transition diagram on a whiteboard, in front of the user, and walk him through it. A common notation designates state transitions as follows: < event that caused the transition >/< output as a result of the transition >. [2] If we simply designed what the user initially asked us for ("don't retract the gear if the airplane is on the ground"), what he'd get would look a bit like Figure 2. It would exhibit the "bad" behavior mentioned previously.
Keep the following in mind when designing the state transition diagram (or indeed any embedded algorithm):
Computers are very fast compared to mechanical hardware-you may have to wait
The mechanical engineer who's describing what he wants probably doesn't know as much about computers or algorithms as you do. Good thing, too-otherwise you would be unnecessary!
How will your program behave if a mechanical or electrical part breaks? Provide for timeouts, sanity checks, and so on
We can now suggest the following state machine to the user, building upon his requirements by adding a few states and transitions at a time. The result is shown in Figure 3. Here, we want to preclude gear retraction until the airplane is definitely airborne, by waiting a couple of seconds after the squat switch opens. We also want to respond to a rising edge of the pilot's lever, rather than a level, so that we rule out the "someone moved the lever while the airplane was parked" problem. Also, we take into account that the pilot might change his mind. Remember, the landing gear takes a few seconds to retract or extend, and we have to handle the case where the pilot reversed the lever during the process. Note, too, that if the airplane touches down again while we're in the "Waiting for takeoff" state, the timer restarts-the airplane has to be airborne for two seconds before we'll retract the gear.
Implementation
This is a good point to introduce a clean way to code a finite state machine. Listing 1 is my implementation of the state machine in Figure 3.
Let's discuss a few features of the example code. First, you'll notice that the functionality of each individual state is implemented by its own C function. You could just as easily implement it as a switch statement, with a separate case for each state. However, this can lead to a very long function (imagine 10 or 20 lines of code per state for each of 20 or 30 states.) It can also lead you astray when you change the code late in the testing phase-perhaps you've never forgotten a break statement at the end of a case, but I sure have. Having one state's code "fall into" the next state's code is usually a no-no.
To avoid the switch statement, you can use an array of pointers to the individual state functions. The index into the array is the curr_state, which is declared as an enumerated type to help our tools enforce correctness.
Table 1, to which I've added an explicit listing of all the inputs and outputs. [3] Here, we'll fill out the state transitions. code is a very direct mapping from the state transition table. This, as you can imagine, is not always the case.
In coding a state machine, try to preserve its greatest strength, namely, the eloquently visible match between the user's requirements and the code. It may be necessary to hide hardware details in another layer of functions, for instance, to keep the state machine's code looking as much as possible like the state transition table and the state transition diagram. That symmetry helps prevent mistakes, and is the reason why state machines are such an important part of the embedded software engineer's arsenal. Sure, you could do the same thing by setting flags and having countless nested if statements, but it would be much harder to look at the code and compare it to what the user wants. The code fragment in Listing 2 fleshes out the RaisingGear() function.
Notice that the code for RaisingGear() attempts to mirror the two rows in the state transition table for the Raising Gear state.
As an exercise, you may want to expand the state machine we've described to add a timeout to the extension or retraction cycle, because our mechanical engineer doesn't want the hydraulic pump to run for more than 60 seconds. If the cycle times out, the pilot should be alerted by alternating green and red lights, and he should be able to cycle the lever to try again. Another feature to exercise your skills would be to ask our hypothetical mechanical engineer "Does the pump suffer from having the direction reversed while it's running? We do it in the two cases where the pilot changes his mind." He'll say "yes," of course. How would you modify the state machine to stop the pump briefly when the direction is forced to reverse?
Testing
The beauty of coding even simple algorithms as state machines is that the test plan almost writes itself. All you have to do is to go through every state transition. I usually do it with a highlighter in hand, crossing off the arrows on the state transition diagram as they successfully pass their tests. This is a good reason to avoid "hidden states"-they're more likely to escape testing than explicit states. Until you can use the "real" hardware to induce state changes, either do it with a source-level debugger, or build an "input poker" utility that lets you write the values of the inputs into your application.
This requires a fair amount of patience and coffee, because even a mid-size state machine can have 100 different transitions. However, the number of transitions is an excellent measure of the system's complexity. The complexity is driven by the user's requirements: the state machine makes it blindingly obvious how much you have to test. With a less-organized approach, the amount of testing required might be equally large-you just won't know it.
It is very handy to include print statements that output the current state, the value of the inputs, and the value of the outputs each time through the loop. This lets you easily observe what ought to be the Golden Rule of Software Testing: don't just check that it does what you want-also check that it doesn't do what you don't want. In other words, are you getting only the outputs that you expect? It's easy to verify that you get the outputs that you expected, but what else is happening? Are there "glitch" state transitions, that is, states that are passed through inadvertently, for only one cycle of the loop? Are any outputs changing when you didn't expect them to? Ideally, the output of your printfs would look a lot like the state transition table.
Finally, and this applies to all embedded software and not just to that based on state machines, be suspicious when you connect your software to the actual hardware for the first time. It's very easy to get the polarity wrong-"Oh, I thought a '1' meant raise the gear and a '0' meant lower the gear." On many occasions, my hardware counterpart inserted a temporary "chicken switch" to protect his precious components until he was sure my software wasn't going to move things the wrong way.
Crank it
Once the user's requirements are fleshed out, I can crank out a state machine of this complexity in a couple of days. They almost always do what I want them to do. The hard part, of course, is making sure that I understand what the user wants, and ensuring that the user knows what he wants-that takes considerably longer!
Martin Gomez is a software engineer at the Johns Hopkins University Applied Physics Lab, where he is presently developing flight software for a solar research spacecraft. He has been working in the field of embedded software development for 17 years. Martin has a BS in aerospace engineering and an MS in electrical engineering, both from Cornell University. He may be reached at martin.gomez@jhuapl.edu.
References:
Hurst, S.L. The Logical Processing of Digital Signals. New York: Crane, Russak, 1978.
Pressman, Roger A. Software Engineering: A Practitioner's Approach, 3rd Edition. New York: McGraw-Hill, 1992.
Shumate, Kenneth C. and Marilyn M. Keller. Software Specification and Design: A Disciplined Approach for Real-Time Systems. New York: John Wiley & Sons, 1992.