Documentation

Types of Documentation

When we talk about documentation it is often nebulous. We say we should have good documentation for our software but we do not talk more specifically what that looks like. There are many guides for particular languages in terms of the exact format of different types of documentation but here we will present and discuss some general principals of documentation.

We will separate documentation into four main sections:

Comments - in the code to describe sections and lines of code, clarify programming decisions and make notes for expansion or modifications needed
Docstrings - in the code at the start of functions, objects and files to provide descriptions of their uses.
API style - Similar to docstrings but extracted to be searched and read separately to the code itself.
README - file used to provide key information on software in the software's directory

We will address each of these types in detail and discuss why, how, when and for who we want each type of documentation.

If your interested in more information about producing documentation please visit writethedocs.org a network of people who care about good documentation and communication surrounding software.

Comments

Comments are for you and other developers who need to understand the body of the code. This audience can lead us to forget comments but they are important reminders of decisions which have been made in writing the code. They allow us to look back at code days, weeks and months after we write it and still understand what we wrote and why we wrote it the way we did. In a world of code you copied off stackoverflow to solve a problem it can help to have a comment that explains it or includes a link to where you originally found it.

There are different types of comments we can use for different purposes:

Structural
Labelling
Clarification
Reminders

This list is not exhaustive by any means, if a comment is useful to you when you write it or for future reference then it is a good comment. We will explore the above list more though to discuss how to use these comments. A key feature of comments is that they should be clear, concise and add information.

Structural

Structural comments are useful for enabling developers to scan read the code. It helps you to find the part of the code when debugging or updating the code.

These comments can also be a useful start point even before you start writing code. They indicate the purpose of different parts of the code with in functions and objects. Examples:

# For loop running over population to apply update
# Input checking
# Loading and tidying up data
# Test outputs

For a more extended example if we want to write for example a bubble sort algorithm we can structure it first with:

# Repeat over the list until there are no swaps

# Iterate over the list (in pairs so stop one item before end of list)

# If left of pair is larger swap items.

# Reduce checked list by one pair each iteration as the end of the list will be correct

This can then be filled in with the code to provide the finished article. This does not have to explain every function call or the choices how to write the code just provide an idea of what is happening in broad strokes.

# Repeat over the list until there are no swaps

swapped = false

while not swapped{

# Iterate over the list (in pairs so stop one item before end of list)

n = length(A)

for i in 0:n-1{

# If left of pair is larger swap items.

if A[i+1]>A[i]{

swap(A[i], A[i+1])

swapped = true

}

# Reduce checked list by one pair each iteration as the end of the list will be correct

n = n - 1

}

Labelling

Comments can be very useful when we don't want to use long variable or functions names but we want to make it clear what those variables and functions are for. Reusing the bubble sort example above we would add:

# Repeat over the list until there are no swaps

swapped = false # boolean flag indicating if a swap happened

while not swapped{

# Iterate over the list (in pairs so stop one item before end of list)

n = length(A) # A is the list being sorted

for i in 0:n-1{

# If left of pair is larger swap items.

if A[i+1]>A[i]{

swap(A[i], A[i+1])# swap function takes two items and exchanges them in the list

}

# Reduce checked list by one pair each iteration as the end of the list will be correct

n = n - 1

}

Clarification

Comments can be important for clarifying why the code is working the way it is. We make decisions when writing code for a particular reason at the time and it can be useful to note the reasons when we write it so in future we know why. This can be very important when we change the code to repair it or if we use an unusual solution which we find online, since we're unlikely to remember the reasoning behind the code.

We already have a couple of examples of this in our bubble sort code on the end of our structural notations about iterating over the list and reducing the list:

# Repeat over the list until there are no swaps

swapped = false # boolean flag indicating if a swap happened

while not swapped{

# Iterate over the list (in pairs so stop one item before end of list)

n = length(A) # A is the list being sorted

for i in 0:n-1{

# If left of pair is larger swap items.

if A[i+1]>A[i]{

swap(A[i], A[i+1])# swap function takes two items and exchanges them in the list

}

# Reduce checked list by one pair each iteration as the end of the list will be correct

n = n - 1

}

Reminders

This form of comment is less about explaining the code and more about providing developers with notes. This can be information like a todo list, where an idea or line of code came from or things you might want to change later.

For the example we can add a reminder to implement a better version of the algorithm later with a link to a resource with the improved algorithm.

# Repeat over the list until there are no swaps

swapped = false # boolean flag indicating if a swap happened

while not swapped{

# Iterate over the list (in pairs so stop one item before end of list)

n = length(A) # A is the list being sorted

for i in 0:n-1{

# If left of pair is larger swap items.

if A[i+1]>A[i]{

swap(A[i], A[i+1])# swap function takes two items and exchanges them in the list

}

# Reduce checked list by one pair each iteration as the end of the list will be correct

n = n - 1

# optimised bubble sort can be found: https://en.wikipedia.org/wiki/Bubble_sort should implement

}

While our example is probably now over commented for it's size and length. However these commenting principals hold for longer and more complicated code to provide a clear explanation of the code for anyone trying to understand its inner workings.

Documentation Comments

This form of documentation is normally more standardised than comments. It is used at the start of files, objects and functions to provide an explanation of inputs, outputs, errors, warnings and the intended function of that part of the software. They provide developers and users a guide to the code with out them needing to read the code line by line and a reminder to you of how you designed your functions.

With modern IDEs this can go further. Some IDEs can detect documentation comments and use them to provide type and input descriptions as your objects and functions are used elsewhere in your project.

Different languages will have formats for these types of documentation, in Python and R these are called docstrings.

The general format of documentation comments which appear as a block comment just under the function/object definition is:

< warnings and errors with descriptions>

The above example of a Python docstring shows how this information can allow someone to use a functions without needing a detailed understanding of the code. It is important that all descriptions are done in plain english where possible.

For example:

Provides the sum of two signed integers and iteratively outputs the string "word" to the console that number of times before returning the resultant integer

Is a bad example of a description, too much jargon. A better description:

Returns the value a+b and prints word to screen a+b times.

Concise and in plain english is the goal.

API style

API style documentation hosted online can provide a searchable record of how to use software. You've probably used it for checking how functions or packages in a language work. They provide an idea of the available functions and objects and how to use them.

For many languages there are tools to automatically generate API style documentation from documentation comments in the code. This makes documentation comments particularly useful.

Numpy

The API reference for Python's NumPy package is a useful searchable resource for looking up functions. It includes additional information on the algorithms and methods used to implement the mathematics.

NumPy API reference

C++

C++ has API documentation available for the standard libraries. It lists all functions in each library and is available to programmers for them to search.

Standard C++ Library Reference

Github pages

You can generate your own API style documentation as a website which can be hosted through github. Tools exist to do this for most major languages and can provide a searchable tool for anyone using your software to find what they need and use your software.

GitHub Pages

JSON APIs

Sphinx APIs

These sorts of documentation webpages can also be expanded to include pages about installation, context and other resources relating to the software. For instance for research software we can include a page describing the versions of the software used for different papers and how to replicate results for different papers.

readme

All software should include a README file. It should address the 6 Qs: Who, What, Why, When, Where and How

Who

Your Readme should identify any developers and provide contact information for at least one developer if you are making the software publicly available. You should also identify the copyright and intellectual property owners of the code. In terms of research identify the institution and funders for the research.

What and Why

Name the software and provide a brief description of the software and why it was written. This will place the software in context and help users to identify if this is the correct software for their purposes.

When

Provide information on when the project was last updated and versioning. This allows users to tell if this is the correct version of the software they were looking for.

It is also worth including information on dependencies and their versions. This will tell the used if the software will work with their existing installs or if they need to install other software before using it.

Where

Links to where the software can be downloaded, where documentation and other resources such as research papers and tutorials can be found.

How

Information of how to install and use the software is vital. This will assist users in getting started and will be a first reference point for anyone who tries to install your software.