When not to use B-trees

B-trees are powerful not just because they allow any file item to be immediately located using any attribute as a key, but because they work even when the file is very dynamic, with items constantly being inserted, changed, or deleted.

But imagine a parts file where new items are infrequently added or changed (say, once every few months) and rarely deleted. It may be tempting to use a B-tree to allow instant lookup of a part number for a given description, but a B-tree is probably overkill in this situation, since the data file being indexed rarely changes.

Whenever a data file is relatively static and unchanging, it may be better to use an indexing method simpler than B-trees, such as a minimum file of pointers that is completely rebuilt any time the target data file is changed.

Let's say we have a rarely-changing PARTS file with part numbers for the item identifiers and descriptions in attribute one. If every part has a unique description, then we can use a command like REFORMAT PARTS AMC1 AMC0 to create an index file where the part descriptions are the item identifiers and corresponding part numbers are in attribute one, making it easy to look up a part by description.

Unfortunately, inverting the file in that manner has a number of disadvantages: (1) some descriptions may be too long for an allowable item identifier, (2) the exact description must be known to find a part number, (3) once a part number is found for a given description, the previous or next description cannot be easily determined, (4) storage is wasted by keeping descriptions in both the data and index files, and (5) a more complicated and possibly huge item structure is necessary (typically a multivalued list) if multiple part numbers share the same descriptions.

A better way to build an index file that avoids all of the above problems is to first SSELECT the PARTS file by description (and by any other desired subsorts), then use a program like

    FILL.INDEX
001 OPEN "INDEX" ELSE STOP
002 CLEARFILE
003 I = 1
004 100 READNEXT P ELSE
005  WRITE I ON "MAXI"
006  STOP
007  END
008 WRITE P ON I
009 I = I+1
010 GOTO 100
011 END

The FILL.INDEX program creates a file of pointers that can be easily searched to find a part number for a given description. Careful study will show that, just as B-tree files avoid indexing problems, the five problems with plain inversion disappear when an index file is created with the FILL.INDEX program.

Once the INDEX file has been built by FILL.INDEX, a browser program can be used to find part numbers for a given description. For example,

    FIND.PART
001 EQU desc.amc TO 1 ;* AMC number in PARTS file
002 EQU middle TO 11 ;* Middle screen row
003 EQU around TO 10 ;* Number of rows around middle
004 OPEN "INDEX" TO I.FILE ELSE STOP
005 OPEN "PARTS" TO P.FILE ELSE STOP
006 CRT @(-1): ;* Clear screen
007 READ MAX.ID FROM I.FILE, "MAXI" ELSE MAX.ID = 0
008 INDEX.ID = INT(MAX.ID/2) ;* Start at midpoint
009 100 CRT @(0,middle-around): ;* Move to middle row
010 MIDDLE.ID = INDEX.ID ;* Remember middle id
011 FOR INDEX.ID = MIDDLE.ID-around TO MIDDLE.ID+around
012   GOSUB get.desc ;* Get PART.ID and DESC
013   CRT PART.ID "L#20" : DESC "L#50" ;* Show item
014 NEXT INDEX.ID
015 CRT @(0,23):@(-3):"U or X or  or ":
016 INPUT COMMAND:
017 BEGIN CASE
018 CASE COMMAND = "U" ; INDEX.ID = MIDDLE.ID-around
019 CASE COMMAND = "X" ; STOP
020 CASE COMMAND = "" ; INDEX.ID = MIDDLE.ID+around
021 CASE 1 ;* Assume COMMAND is a description
022  LEFT = 0 ; RIGHT = MAX.ID ; LAST.ID = ""
023  GOSUB search ;* Perform binary search
024 END CASE
025 GOTO 100
026 *
027 get.desc: * Get PART.ID and DESC for INDEX.ID
028 READ PART.ID FROM I.FILE,INDEX.ID ELSE PART.ID=""
029 READV DESC FROM P.FILE,PART.ID,desc.amc ELSE DESC=""
030 RETURN
031 *
032 search: * Perform binary search
033 INDEX.ID = INT((LEFT+RIGHT)/2)
034 IF INDEX.ID = LAST.ID THEN RETURN ;* No change
035 LAST.ID = INDEX.ID
036 GOSUB get.desc ;* Get DESC for INDEX.ID
037 IF DESC = COMMAND THEN RETURN ;* Matches
038 IF DESC > COMMAND THEN ;* We're too far right
039   RIGHT = INDEX.ID
040   END ELSE LEFT = INDEX.ID ;* Too far left
041 GOTO search
042 *
043 END

Just like other B-TREE-P browsers, the FIND.PART program does a nice job of allowing the operator to locate part numbers by description, but it uses a smaller and simpler index file. Enhancements such as highlighting the middle row or stopping scrolling at the beginning and end of the file are left as exercises.

It's easy to adapt FIND.PART for browsing other indices and attributes, or to create B-tree-style lookup/previous/next subroutines for accessing the index file. But subroutines to allow unlimited insert/delete operations for this type of index file are not possible.

Of course, any time the data file keys change, the index file must be rebuilt. But rebuilding is faster than building a B-tree, and shouldn't have to be done often for stagnant data files.

Note that if your data entry programs are immediately and correctly updating your B-trees the moment a data file item is created, changed, or deleted, then your B-trees are always up-to-date, and would only have to be rebuilt after a disastrous crash. But if you're not immediately updating your B-trees and instead periodically rebuilding them from scratch, then the B-trees are not any more valuable than the simpler type of index files presented here.