How to Design Data Definitions (HtDDD)

Acknowledgement: the content of these design recipes pages has been adapted from: UBC CPSC 110

A data definition consists of:
  1. A types comment that describes how information in the problem domain is represented as data in the program
  2. An interpretation that describes how the data should be interpreted as information in the problem domain
  3. One or more examples of the data
  4. A template for a one-argument function operating on the data

We now examine some common forms of information and data.

Simple Atomic Data

Use simple atomic data when the information that you are trying to represent is itself atomic.  For example, the temperature or pressure of an object, or the x-coordinate of a particle.

# Temperature is float
# interp. the air temperature in degrees Celsius

T1 = 0.0
T2 = 36.812
T3 = -24.5

#def fn_for_temp(t):
#    return ...t


Use an interval when the information you are trying to represent is numeric and in a certain range.  Intervals can be closed (e.g. int[0, 5] includes 0 and 5) or open (e.g. float(0.0, 5.0) excludes 0.0 and 5.0) or half-open (e.g. float[0.0, 5.0) includes 0.0 but not 5.0, or float(0.0, 5.0] includes 5. 0 but not 0.0).  

# Time is Integer[0, 86400)
# interp. the time in seconds since midnight

ONEAM = 3600
MIDDAY = 43200
END = 86399

#def fn_for_time(t):
#    return ...t

Testing: be sure to test the endpoints at the closed-end of any interval as well as a point somewhere in the middle of the interval.


Use an enumeration when the information consists of a fixed number of distinct values.  Note that in this case, examples are usually considered redundant as the types comment includes all possible values for the data.

# Opinion is one of:
# - 'Agree'
# - 'Neutral'
# - 'Disagree'
# interp. an opinion expressed in response to a survey question

# <Examples are redundant>

#def fn_for_opinion(o):
    if o == 'Agree':
        return ...
    elif o == 'Neutral':
        return ...
    elif o == 'Disagree':
        return ...

Testing: you should have at least as many tests as there are cases in the enumeration.

Compound Data

Use compound data when two or more values naturally belong together.  For example, the x and y coordinates of a point, or the air temperature, air pressure and time of a weather report.  

# A Posn is dict(x=int, y=int)
# interp. the x and y coordinates of a point on the screen

P0 = dict(x=0, y=0)
P1 = dict(x=4, y=5)
P2 = dict(x=15, y=7)

#def fn_for_posn(p):
#    return ...p['x'] ...p['y']


Use an itemization when information comprises two or more subclasses, at least one of which is not a distinct data item.  (Note that if all cases are distinct data items, you should use an enumeration.)  Itemizations often consists of two or more intervals.

#Rainfall is one of:
# - int[0, 5)
# - int[5, 10]
#interp. rainfall accumulation in mm

R1 = 0
R2 = 3
R3 = 5
R4 = 7
R5 = 10

#def fn_for_rainfall(r):
#    if 0 <= r < 5:
#        return ...r
#    elif 5 <= r <= 10:
#        return ...r

Testing: you should have at least as many tests as there are items in the itemization.  In this particular example, as each item is an interval, we test the closed-end of each interval as well as a point in the middle of each interval. 


Use a self-referential data definition when information in the problem domain is of an arbitrary size.

# ListOfX is one of:
# - []
# - [X] + ListOfX
# interp. a list of items of type X

L0 = []
L1 = [4]
L2 = ['a', 'ab', 'abc']

#def fn_for_lox(lox):
#    if lox == []:
#        return ...
#    else:
#        return ...lox[0]  ...fn_for_lox(lox[1:])

Note that the above data definition is used so commonly that we represent it with the special notation (listof X).

Testing: you must test the base case and the case where the list has at least 2 elements.