is defined as
the steps involved in storing and retrieving records from a file
Serial files are used as transaction files to store day to day data (eg overtime, sales etc)
Access Method
File Organisation
is defined as
the physical arrangement of data in a file into records and pages on secondary storage
(The data is not organised in any particular order - in terms of the information held (eg Surname)
Data is stored in Chronological order (i.e. as written to disk 1st entry at start to Last entry at end)
A pointer is used to find the position in a file.
New data appended to end of file.
Quick to add new data (Just add another record to the end of the file)
Good for writing lots of data that does not need to be searched
Searching for information is slow
Poor for high Hit rate applications
Records are stored in key value order
A Binary search is done for retrieval of data by using the key value
Records stored in indexed key value order
An index is a file with 2 fields:
the key value
the page address.
The Index is held in main memory.
If the index too large then multi-level index used
Specialised data structures that facilitate search are used for indexes
Advantages of Indexed Sequential File Access
Sequential file organisation (in key value order) supports high hit-rate applications.
Index allows fast retrieval of individual records
Supports low hit-rate applications.
What is a hashing algorithm and how is it used to access data
•Records in seemingly random order
•Storage address calculated from key value
–Page Address = Hash (Key Value)
–If necessary convert a non-numerical key value to an integer
–For example, HBZ 3459 to 0802263459.
•Number of pages usually far smaller than the number of key values. #Pages < $Keys
Collision occurs when the same page address is computed for 2 or more records.
An overflow occurs when the storage capacity of a page is exceeded.
Words.txt is a list of words that are allowable in crossword puzzles.
NB Each line only has a single word on it.
You will need to import the file using the IMPORT function followed by the file name.
You will also need to open the file in read only mode to access the data.
OPEN is a Built-In FUNCTION that takes the <file name> as a parameter
and RETURNS a file object you can use to read the filer.
fin = OPEN('words.txt') #Fin is a commonly used name for file objects used for input.
PRINT fin #RESERVED words in capitals need to be lower case in Python
open file 'words.txt', mode 'r' # mode 'r' indicates the file is open for read only
READLINE is a method for reading characters from the file until it gets to a new line
and RETURNS the result a STRING
fin.readline()
Task 1
Write a program that reads each line of words.txt and prints any words with more than 20 characters.
save as <filename>_v1
Task 2
Modify your program to return all the words begining with 'q'
save as <filename>_v2
Task 3
Now, Modify your program to return all the words ending with 'ually'
save as <filename>_v3
Task 4
Modify your program to return onlt the words without an 'e'
save as <filename>_v4
Task 5
Write a function called avoids that takes a word and a string of forbidden letters
and RETURNS TRUE if the word uses none of the forbidden letters.
Extension: Find a combination of 5 letters that excludes the smallest number of words
Task 6
Write a function called anagram, that takes a word and a string of letters
and RETURNS TRUE if the word uses only those letters. (ie an anagram)
Task 7
Modify the function and save as uses_only, so that it takes a word and a string of letters
and RETURNS TRUE if the word uses all of the letters at least once.
Extension
Write a function called abecedarian that returns TRUE if the letters of a word appear in alphabetical order. (Double letters OK)
How many abecedaian words are there?
Extension: What other methods of file access are there?
Exercises: