How to BUILD fast

The original BUILD utility can be very slow when creating an initial B-tree for a large file. That's because large numbers of nodes and pointers are written and rewritten to disk as the nodes are created while the B-tree is grown from scratch. The data file being indexed is also accessed numerous times for comparisons as new keys are inserted in the B-tree.

The following FAST.BUILD utility program provides an alternative way to create the initial B-tree to match an existing data file. FAST.BUILD relies on a SSELECT list that already has all identifiers sorted in the order in which they will be saved in the B-tree. That means FAST.BUILD never has to access the actual data file being indexed, and B-tree nodes can be constructed in their final form without having to grow from scratch, one insert at a time. As a result, FAST.BUILD is fast. For example, a benchmark to create a B-tree for 1,330 items with BUILD took over nine and a half minutes, while creating an equivalent B-tree with FAST.BUILD took only twelve seconds!

To use FAST.BUILD, edit, compile, and catalog the following source code:

    FAST.BUILD
001 EQU levels TO 3
002 OPEN "B-TREE" TO BFILE ELSE STOP "No B-TREE"
003 CRT "What B-tree name do you want to create":
004 INPUT BTNAME ; IF BTNAME = "" THEN STOP
005 CRT "Store how many items in each B-tree node":
006 INPUT SIZE ; SIZE = ICONV(SIZE,"MD0")
007 IF (SIZE<1) OR (SIZE>2400) THEN STOP "Bad size"
008 CHILDREN = SIZE+1
009 KEYS = PWR(CHILDREN,levels)-1
010 CRT "Before running this program,"
011 CRT "did you SSELECT at least ":KEYS
012 CRT "items in the desired sort order":
013 INPUT RESPONSE ; IF RESPONSE # "YES" THEN STOP
014 READ NID FROM BFILE, "NEXT.ID" ELSE NID=0
015 ROOT = SIZE
016 ROOT.ID = NID ; NID = NID+1
017 FOR I = 1 TO CHILDREN
018   NODE = SIZE
019   NODE.ID = NID ; NID = NID+1
020   FOR J = 1 TO CHILDREN
021     LEAF = SIZE
022     FOR K = 1 TO SIZE
023       READNEXT ITEM.ID ELSE ABORT
024       LEAF<2,-1> = ITEM.ID
025     NEXT K
026     LEAF<4> = NODE.ID
027     WRITE LEAF ON BFILE, NID
028     NODE<3,-1> = NID ; NID = NID+1
029     IF J # CHILDREN THEN
030       READNEXT ITEM.ID ELSE ABORT
031       NODE<2,-1> = ITEM.ID
032       END
033   NEXT J
034   NODE<4> = ROOT.ID
035   WRITE NODE ON BFILE, NODE.ID
036   ROOT<3,-1> = NODE.ID
037   IF I # CHILDREN THEN
038     READNEXT ITEM.ID ELSE ABORT
039     ROOT<2,-1> = ITEM.ID
040     END
041   NEXT I
042 WRITE NID ON BFILE, "NEXT.ID"
043 WRITE ROOT ON BFILE, ROOT.ID
044 WRITE ROOT.ID ON BFILE, BTNAME
045 CRT KEYS:" items have been inserted"
046 STOP
047 END

To actually use FAST.BUILD to create the initial B-tree that matches an existing data file, you must first SSELECT the data file in the same order you want the data items ordered in the B-tree. Then give the FAST.BUILD command. For example, to create the LNAME B-tree for the original NAMES file presented in BTP/Branches #1, use a command like SSELECT NAMES BY LNAME BY FNAME BY ID, where LNAME, FNAME, and ID are dictionary definitions for attributes 2, 1, and 0, respectively, with ID defined as right justified. Then give the FAST.BUILD command. The SSELECT will cause the items to be selected in the same order the BTPKEY subroutine would sort them, and the FAST.BUILD program will save the list of item identifiers as a finished B-tree.

It is extremely important that the SSELECT sorts the data items for the B-tree in the same order specified by the BTPKEY subroutine. Watch out for correlatives and justifications in the SSELECT that don't match how the attributes are used in the BTPKEY routine. If an attribute is justified or modified by a correlative in BTPKEY, for example, ITEM<3> "R#10" or ICONV(ITEM<2>,"G1*1"), then the same justification and correlatives must occur in the dictionary words used by SSELECT. If BTPKEY uses simple attributes and performs no conversions, then the SSELECT must also use simple, unconverted attributes. Otherwise, FAST.BUILD will create a B-tree that is incompatible with subsequent BTPINS and BTPDEL calls because the items will be sorted differently than the order specified by BTPKEY.

As another example, the command SSELECT NAMES BY COMP BY LNAME BY FNAME BY ID would be the correct selection for FAST.BUILD if the COMP B-tree were being built for the NAMES file, and if COMP, LNAME, FNAME, and ID refer to attributes 3, 2, 1, and 0, respectively, without any correlatives, and only ID is right justified.

Once FAST.BUILD is executed, it prompts for the name of the B-tree you wish to create. Note that FAST.BUILD accepts any non-null name, and doesn't bother checking if the B-tree name is already in use. If you specify the name of a B-tree that already exists, its root pointer is eventually overwritten by FAST.BUILD, thereby causing that B-tree's previous nodes to become inaccessible.

After prompting for the name of the B-tree to create, FAST.BUILD requests the number of item identifiers to be initially stored in each B-tree node. This determines the total number of identifiers that FAST.BUILD can store in the whole B-tree, since FAST.BUILD is designed to only create a B-tree of exactly three levels using nodes of all the same size: a root node at level 1, intermediate nodes at level 2, and final "leaf" nodes at level 3. The total number of identifiers that FAST.BUILD can place in a B-tree is given by the formula PWR(nodesize+1, 3) - 1, where nodesize is the number of item identifiers per node that FAST.BUILD requests after the B-tree name. The following table uses the formula to show some sample node sizes and the resulting number of items FAST.BUILD must place in the B-tree:

NODE ITEMS IN SIZE B-TREE 5 215 10 1330 15 4095 20 9260 25 17575 30 29790 35 46655 40 68920 45 97335 50 132650 75 438975 100 1030300 200 8120600

For example, if you plan to specify a node size of 20, you must SSELECT at least 9,260 items before giving the FAST.BUILD command.

After you input your desired number of items per node, FAST.BUILD asks if you have SSELECTed at least as many items as the node size allows FAST.BUILD to put in the B-tree. If your SSELECT has selected at least as many items as FAST.BUILD requires, answer YES and FAST.BUILD will proceed to create the B-tree. If your SSELECT has not selected at least the number of items requested by FAST.BUILD, answer NO so FAST.BUILD will stop. Attempting a FAST.BUILD without selecting at least as many items as FAST.BUILD requests will cause FAST.BUILD to prematurely abort at one of its READNEXT statements, and the partially built B-tree left behind will be invalid.

Because FAST.BUILD reads the NEXT.ID item from the B-TREE file at the start of execution, creates all necessary nodes, then saves the final NEXT.ID node number just before stopping, FAST.BUILD should not be used at the same time other programs are trying to create nodes in the B-TREE file.

When it completes execution, FAST.BUILD reports the number of item identifiers taken from the SSELECT list and inserted in the new B-tree. Any extra identifiers in the SSELECT list not processed by FAST.BUILD will have to be inserted separately into the B-tree using BTPINS calls, just like the original BUILD does. When FAST.BUILD finishes execution, the new B-tree is immediately available for browsing and for updating with standard BTPINS and BTPDEL calls.

Remember that all BTPINS and BTPDEL calls must specify a node size, which equals both the minimum and half the maximum number of identifiers that can be in each node. After using FAST.BUILD, the node size you use in subsequent BTPINS and BTPDEL calls must be less than or equal to the node size used by FAST.BUILD, and greater than or equal to half that number. For example, if FAST.BUILD is used to create a B-tree with 20 identifiers in every node, then the B-tree can be thought of as size 20 (and every node has the minimum number of identifiers), or the B-tree can be thought of as being size 10 (and every node is currently full), or the B-tree can be considered any size between 10 and 20. Therefore, BTPINS and BTPDEL can use any node size parameter from 10 to 20. Whichever size you decide to use, you must pick one size and stick to it from then on. If you're not sure what size to use in BTPINS and BTPDEL after using FAST.BUILD, just use the same number you gave FAST.BUILD for the number of items per node.

To summarize how to use FAST.BUILD to create an initial B-tree: SSELECT your data items, then give the FAST.BUILD command. Be sure your SSELECT sorts by the exact same criteria as the BTPKEY statement that will be used for the B-tree. FAST.BUILD will only insert a certain number of selected items into the B-tree, depending on the node size you pick. All remaining data items must be inserted into the B-tree with BTPINS, as usual. Once FAST.BUILD is done, the B-tree is immediately available for browsing and updating.