Master Address File

When not to use B-trees

B-trees are powerful not just because they allow any file item to be immediately located using any attribute as a key, but because they work even when the file is very dynamic, with items constantly being inserted, changed, or deleted.

But imagine a parts file where new items are infrequently added or changed (say, once every few months) and rarely deleted. It may be tempting to use a B-tree to allow instant lookup of a part number for a given description, but a B-tree is probably overkill in this situation, since the data file being indexed rarely changes.

Whenever a data file is relatively static and unchanging, it may be better to use an indexing method simpler than B-trees, such as a minimum file of pointers that is completely rebuilt any time the target data file is changed.

Let's say we have a rarely-changing PARTS file with part numbers for the item identifiers and descriptions in attribute one. If every part has a unique description, then we can use a command like REFORMAT PARTS AMC1 AMC0 to create an index file where the part descriptions are the item identifiers and corresponding part numbers are in attribute one, making it easy to look up a part by description.

Unfortunately, inverting the file in that manner has a number of disadvantages: (1) some descriptions may be too long for an allowable item identifier, (2) the exact description must be known to find a part number, (3) once a part number is found for a given description, the previous or next description cannot be easily determined, (4) storage is wasted by keeping descriptions in both the data and index files, and (5) a more complicated and possibly huge item structure is necessary (typically a multivalued list) if multiple part numbers share the same descriptions.

A better way to build an index file that avoids all of the above problems is to first SSELECT the PARTS file by description (and by any other desired subsorts), then use a program like

    FILL.INDEX

001 OPEN "INDEX" ELSE STOP

002 CLEARFILE

003 I = 1

004 100 READNEXT P ELSE

005  WRITE I ON "MAXI"

006  STOP

007  END

008 WRITE P ON I

009 I = I+1

010 GOTO 100

011 END

The FILL.INDEX program creates a file of pointers that can be easily searched to find a part number for a given description. Careful study will show that, just as B-tree files avoid indexing problems, the five problems with plain inversion disappear when an index file is created with the FILL.INDEX program.

Once the INDEX file has been built by FILL.INDEX, a browser program can be used to find part numbers for a given description. For example,

    FIND.PART

001 EQU desc.amc TO 1 ;* AMC number in PARTS file

002 EQU middle TO 11 ;* Middle screen row

003 EQU around TO 10 ;* Number of rows around middle

004 OPEN "INDEX" TO I.FILE ELSE STOP

005 OPEN "PARTS" TO P.FILE ELSE STOP

006 CRT @(-1): ;* Clear screen

007 READ MAX.ID FROM I.FILE, "MAXI" ELSE MAX.ID = 0

008 INDEX.ID = INT(MAX.ID/2) ;* Start at midpoint

009 100 CRT @(0,middle-around): ;* Move to middle row

010 MIDDLE.ID = INDEX.ID ;* Remember middle id

011 FOR INDEX.ID = MIDDLE.ID-around TO MIDDLE.ID+around

012   GOSUB get.desc ;* Get PART.ID and DESC

013   CRT PART.ID "L#20" : DESC "L#50" ;* Show item

014 NEXT INDEX.ID

015 CRT @(0,23):@(-3):"U or X or  or ":

016 INPUT COMMAND:

017 BEGIN CASE

018 CASE COMMAND = "U" ; INDEX.ID = MIDDLE.ID-around

019 CASE COMMAND = "X" ; STOP

020 CASE COMMAND = "" ; INDEX.ID = MIDDLE.ID+around

021 CASE 1 ;* Assume COMMAND is a description

022  LEFT = 0 ; RIGHT = MAX.ID ; LAST.ID = ""

023  GOSUB search ;* Perform binary search

024 END CASE

025 GOTO 100

026 *

027 get.desc: * Get PART.ID and DESC for INDEX.ID

028 READ PART.ID FROM I.FILE,INDEX.ID ELSE PART.ID=""

029 READV DESC FROM P.FILE,PART.ID,desc.amc ELSE DESC=""

030 RETURN

031 *

032 search: * Perform binary search

033 INDEX.ID = INT((LEFT+RIGHT)/2)

034 IF INDEX.ID = LAST.ID THEN RETURN ;* No change

035 LAST.ID = INDEX.ID

036 GOSUB get.desc ;* Get DESC for INDEX.ID

037 IF DESC = COMMAND THEN RETURN ;* Matches

038 IF DESC > COMMAND THEN ;* We're too far right

039   RIGHT = INDEX.ID

040   END ELSE LEFT = INDEX.ID ;* Too far left

041 GOTO search

042 *

043 END

Just like other B-TREE-P browsers, the FIND.PART program does a nice job of allowing the operator to locate part numbers by description, but it uses a smaller and simpler index file. Enhancements such as highlighting the middle row or stopping scrolling at the beginning and end of the file are left as exercises.

It's easy to adapt FIND.PART for browsing other indices and attributes, or to create B-tree-style lookup/previous/next subroutines for accessing the index file. But subroutines to allow unlimited insert/delete operations for this type of index file are not possible.

Of course, any time the data file keys change, the index file must be rebuilt. But rebuilding is faster than building a B-tree, and shouldn't have to be done often for stagnant data files.

Note that if your data entry programs are immediately and correctly updating your B-trees the moment a data file item is created, changed, or deleted, then your B-trees are always up-to-date, and would only have to be rebuilt after a disastrous crash. But if you're not immediately updating your B-trees and instead periodically rebuilding them from scratch, then the B-trees are not any more valuable than the simpler type of index files presented here.