The first step, then, is to develop a mental picture of what reverse engineering is. To do this, it would be useful to start with a simplistic model of engineering itself.
Engineering is the process of turning a specification into a product for performing to it.
A full specification may not only specify what the product has to do, but also specify performance criteria it needs to meet along the way. As the complexity of IT systems grows, so does the importance of a good specification.
In between the specification and the product is the engineering or implementation process, which involves some human creativity, and an increasing degree of automation.
It is convenient to think in terms of "layers" or "levels" of abstraction. Thus:
- A written functional specification for human consumption is a "top level" description;
- An abstracted structural description (for example source code or circuit diagrams), for human consumption (but usually machine readable), is an "intermediate level" description;
- A detailed structural description (e.g. assembler, or a node list) for machine consumption, but possibly comprehensible by humans, is a "low level" description; and
- The product (a chip storing machine code, an integrated circuit or a circuit board), which is usually incomprehensible to human, constitutes the lowest level.
Nowadays, in many processes the lower level descriptions are unnecessary and, if produced at all, are produced only for documentation purposes rather than during the design process.
Often, the same specification can be implemented in many different ways, for example by using different target platforms (e.g. microprocessors) or different engineering tools and techniques (e.g. CMOS or NMOS). There is thus one to many relationship between a top level specification and products produced from it.
Reverse engineering, as the name implies, is the reverse of this; in other words, the attempt to recapture the top level specification by analysing the product - "attempt" because it is not possible in practice, or even in theory, to recover everything in the original specification purely by studying the product.
Reverse engineering is difficult and time consuming, but it is getting easier all the time thanks to IT, for two reasons:
- Firstly, as engineering techniques themselves become more computerised, more of the design is due to the computer. Thus, recognisable blocks of code, or groups of circuit elements on a substrate, often occur in many different designs produced by the same computer program. These are easier to recognise and interpret than a customised product would be.
- Secondly, artificial intelligence techniques for pattern recognition, and for parsing and interpretation, have advanced to the point where these and other structures within a product can be recognised automatically.
However, whilst it is often possible to automate the generation of a higher level structural description of a product, recognising what it is doing is difficult and still requires human skills, and it may simply not be possible to recapture some parts of original specification to which the product was made by studying the product.
Since reverse engineering still needs human input, at some stage the reverse engineering process needs to produce a complete system description of the product, to allow a human to work out how the product functions; it is only after this human analysis that the product can be split into its component parts.
Thus, reverse engineering generally consists of the following stages:
- Analysis of the product
- Generation of an intermediate level product description
- Human analysis of the product description to produce a specification
- Generation of a new product using the specification.
There is thus a chain of events between the underlying design specification and any intermediate level design documents lying behind the product, through the product itself, through the reverse engineered product description, through the reverse engineered specification, and into the new product itself. This raises at least the risk of infringement of copyright or similar design or chip protection rights.
If the same person both reverse engineers the old product and designs the new product, and there are similarities, it is hard to avoid an assumption that some copying has taken place, and so reverse engineering "best practice" involves breaking the chain, so far as possible, at the specification stage. The specification is made as abstract and functional as possible by the reverse engineers, and is then handed over to a "clean room" design team who have no other contact with the old product, or the team who analysed it, and who will then design the new product using as little low-level information as possible from the old product.
One group of reverse engineers are users of old products, who want to maintain them, but find that the original supplier may no longer exist, or may have dropped support for the product (1). Most users will want a subcontractor to re-engineer the software, rather than doing so themselves. To maintain an old product, it will often be necessary to understand how it works.
Recently, the "Year 2000" problem has stimulated interest in maintaining old software. Reverse engineering may be needed if an old product is to work alongside, or within a new system. There is also a desire, amongst some software users, to re-engineer old software, to make it more modular, re-useable, accessible or reliable.
Genuine competitors may wish to use reverse engineering techniques, for one of two reasons:
- Firstly, they may wish to produce a product which operates with the product to be analysed (but does not compete with it), or
- Secondly, they may wish to produce a product which competes with the product to be analysed.
Examples of interoperable products are applications which need to interoperate with operating systems; software controlled exchanges which need to operate with others; and programs which are locked by hardware "dongles".
Finally, pirates may occasionally use reverse engineering techniques. Pirates do not usually need to understand how a product works merely to copy it, but occasionally a product may include security features which the pirate needs to defeat.
There are thus legitimate and illegitimate reasons for wanting to reverse engineer software products, chips, printed circuit boards and other hardware products, and firmware (i.e. mask-programmed or similar embedded software products).
A genuine competitor will try to reduce their use of the intermediate description generated by reverse engineering to the absolute minimum necessary, to avoid copying, whereas a pirate will attempt to use the reverse engineered code to the maximum extent possible and add the minimum of independently created material, to make the product a "lookalike" in so far as possible, and to reduce the amount of effort they use.