Sequential, Indexed Sequential and Direct (random) access files

Sequential

Records are stored and accessed in key sequence order.

Advantages of Sequential Access

● Easier to program / fewer overheads than indexed sequential or random.

● Particularly suitable (and faster) if access only ever needs to be sequential

Disadvantages of Sequential Access

The biggest drawback of sequential access is that is is very slow e.g. backup tapes. The method is acceptable because the speed of access isn't important.

Same process to add and delete (apart from step 2)

Sequential - Addition of a Record

Make a new copy of the records until in the correct place to add the new record
Add the new record to the new copy
Continue until the end of the file
If multiple records to be added, these should preferably be sorted before the above process to avoid multiple updates

Sequential - Deletion of a Record

Make a new copy of the records until in the correct place for deletion
Do not copy the record to be deleted
Continue until the end of the file
If multiple records to be deleted, these should preferably be sorted before the above process to avoid multiple updates

Indexed Sequential File

Records are stored in key sequence order
An index allows data to be accessed directly / index contains key field and disc address of record / the key field and index are used to locate the correct position
- Multilevel index usually used:
  - - There is a main index which contains the location of the next index
    - This process may extend to several levels and the last index contains the physical address of the record

Advantage of using an indexed sequential file compared with a standard sequential file

faster access than sequential - can use index to access required data/records directly
Avoids overheads of random - If only sequential access is required for one application, should be faster than random

Disadvantages of Indexed Sequential File Access

It requires more storage space
Expensive as it requires special software

Exam Tip: (Indexed Sequential Files)

An indexed sequential file stores records in key sequence order and an index allows data to be accessed directly. The key field and index are used to locate the correct position. This has the advantage of allowing faster access as the data can be accessed directly

Direct (random) access Files

The physical location of a record is calculated using a hashing algorithm
This calculation is carried out on data in the key field
A data collision occurs when two data items are hashed to the same location
In this case there needs to be an overflow area where the latest data/record is stored, usually in a linear structure
When there are many items in the overflow area, access may become slow as the data in the overflow area is normally stored and searched in a linear manner
File may need reorganising (and new hashing algorithm) if overflow becomes too large - If a new hashing algorithm is required a larger file may be needed
Existing records are accessed in the same way.

Exam Tip: (Random Access File)

A random access file is one where the physical location of the record is calculated (using a hashing algorithm) from the data in the key field. Sometimes, a data collision occurs (i.e. two data items are hashed to the same location.) In these circumstances, there needs to be an overflow area where the latest data is stored. When the file begins to get quite full, there may be many items in the overflow area and access may become slow. A solution to this problem is to create a new hashing algorithm and a larger file may be needed.

A situation in which it would be more suitable to use a direct (random) access file rather than an indexed sequential file could be customer records – sales etc. This is because there is a need to update them in an unpredictable order as sales come in 1 (particularly where large file and large number of transactions).

A direct (random) access file normally uses a hashing algorithm to allocate a storage location for each record. When a hashing algorithm allocates a record to a storage location which is already occupied the new data is diverted to an overflow area instead of the original.

When an attempt is made to access this record/data later, the same process will occur (will access original location fail to find the data and access the overflow).

Hashing Algorithm

A hashing algorithm provides access to a record in a random access file by:-

finding the position on the disc from the key field (input)
Using an algorithm to calculate / determine disc address

Overflow Area

An overflow area is often used when a hashing algorithm provides access to a record in a random access file. An outline of how an overflow area operates is outlined below :-

An overflow area is necessary if this address is already occupied by data
data in the overflow area is usually stored and searched in linear order

Exam Tip: (Hashing Algorithm and Overflow Areas)

Consider and expand on these 5 points:

Hash Algorithm (access to all storage locations, minimise collision)
Calcualted on Key field
Gives Storage Addess
Collision (if data aleady present)
Overflow area used (in serial order)

A hashing algorithm takes the key field as an input to an algorithm which calculates / determines the disc address. An overflow area is necessary if this address is already occupied by data. When this happens, the hashing algorithm points to a separate overflow area, where the data is normally stored and searched in linear order.

Advantages of Random Access

Allows very fast access irrespective of position in file – very suitable for large files which need this sort of access.

Disadvantages of Random Access

High complexity in programming
File updates is more difficult when compared to a sequential method

Page updated

Report abuse