Master Address File

Ten Questions Most Frequently Asked About B-TREE-P

1. What is B-TREE-P?

B-TREE-P is software that allows data to be instantly located and output, without having to wait for lengthy file sorts or searches. B-TREE-P enhances the Pick operating system and makes it easier to use.

2. How can B-TREE-P help me?

There are two major weaknesses in your present Pick software. First, data records in Pick files can only have one key by which the data can be immediately retrieved. To find values in other fields, files must be sequentially searched, record by record. For example, a customer file typically uses the customer number as a record key, so any customer can be instantly looked up by customer number. But Pick must search sequentially through the customers, one by one, to find a customer with a given ZIP code or name or phone number.

A second major weakness is that an operator can only view each report page once, after a Pick file is searched and sorted and the resulting report is displayed page by page on a terminal. The operator can't back up and re-display a previous page without re-sorting and re-listing the entire report.

B-TREE-P overcomes both problems. B-TREE-P allows any data in any Pick file to be instantly located and output in any sort order, without having to wait for Pick's own SORT or SELECT commands. Users can immediately search, display, and browse through their data, forward or backward, quickly and conveniently.

3. How does B-TREE-P work?

B-TREE-P enhances the Pick system by maintaining special files called B-trees. B-trees are special data structures for quickly manipulating your data files. A B-tree (which is computer science jargon for a "balanced tree") is similar to a SELECT list, but with some additional "pointer" information to allow very fast lookups, insertions, and deletions anywhere in the list. If you want to be able to find a customer with a given ZIP code, then B-TREE-P automatically searches a B-tree that keeps your customer numbers in ZIP code order. But unlike a SELECT list, any ZIP can immediately be found in the B-tree without having to search sequentially through the whole list.

4. What software comes in the B-TREE-P package?

The B-TREE-P package includes all necessary source code and documentation for a B-tree system that works with any Pick file. There are four BASIC subroutines that allow B-tree insertion, deletion, lookup, and previous/next operations. Even though you get complete source code, those routines are "building blocks" that should never have to be modified by you, regardless of your application or the data you will be indexing.

Complete, tutorial-style instructions are intertwined with the source code listings and explain every part of using B-TREE-P. Also included are BASIC utilities and demonstration programs, along with step-by-step instructions for creating a demonstration system that uses B-TREE-P to maintain a name and address file. There's an editor program for creating and changing name and address records, a browser program for lookups and scrolling through files, and a printer program for listing name and address records without having to wait for a SORT.

All B-TREE-P software is "vanilla-flavored" BASIC that can easily compile and execute on any Pick implementation. There is no assembler code at all.

5. How do I make B-TREE-P work with my existing files?

Say you have a customer file and you want to be able to access it by ZIP code. For one time only, a short utility program reads each customer record and calls our insert subroutine to create a complete B-tree from scratch. From then on, you can use our browser program at any time to look up and immediately display any customer or range of customers by ZIP code. Since you have the source code, you can modify the browser to display or print customer data in any format you want, if you don't like the way our browser happens to paint the screen.

As customers are added or deleted from your data file, your customer update program can call our insert and delete subroutines so that the B-tree is always up to date and exactly reflects the customer file. Or, you can simply periodically rebuild the B-tree completely from scratch, if you want to avoid adding a few CALL statements to your update programs. Note that absolutely no modifications to your existing customer file are necessary.

If you later decide you want to also look up customers by company name instead of by ZIP, just create another B-tree. From then on, you'll be able to always locate and output customers by company name or ZIP. For example, you might keep three B-trees for your customer file: one B-tree sorts and indexes all customers by ZIP by address by company by last name by first name by ID, the second B-tree sorts customers with a corporate name by company by last name by first name by ID, and the third B-tree sorts customers with a personal name by last name by first name by ID. You'll no longer need to wait for SORTs or SELECTs for inquiries, reports, mailing labels, and so on.

6. How can I use B-trees from ENGLISH or ACCESS?

The item identifiers in B-trees can be extracted by QSELECT or FORM-LIST commands for use by ACCESS or ENGLISH. BASIC subroutine calls can also extract specific ranges of identifiers from a B-tree, and a program can then generate output directly, or the identifiers can be saved as an item list or scratch file for input to a QSELECT or FORM-LIST.

7. How does the browser program work?

The browser presents a spreadsheet-like display showing the contents of any Pick file you specify. Normally, fields are columns on the display and records are rows, so a mailing list file might show a column of names, a column of addresses, a column of ZIPs, and so on, with each row being a complete mailing list record. With a single-keystroke command, the operator can specify which column should be sorted, and the screen instantly reorganizes the display to show all records in the desired order. The browser accesses the appropriate B-tree so that the operator can locate and display any record in the file just by typing one or more characters that match some field in the record. Previous and next records, limited only by the size of the screen, are also displayed. The operator can jump to any record in a file at any time, then browse through the file by scrolling up or down, a record at a time, or a page at a time. Screen refresh and response are immediate, no matter what the size of the data file is.

8. How much disk space do B-trees require?

Each B-tree is only a list of item identifiers, along with a few extra pointers, so a B-tree only takes about 10% more than the number of frames used by a SELECT list of the items being indexed. For example, if a SELECT list of a customer file takes about 100 frames, than a B-tree for the same file will probably take about 110 frames.

Each of your data files can have any number of B-trees. B-trees are stored in their own file space, any number of B-trees can be saved in one file, and B-trees and indexed data files can grow to any size.

Since only item identifiers are saved, a B-tree takes the same amount of space regardless of how the data file is indexed and sorted. For example, a B-tree for customer lookups by ZIP codes takes the same amount of space as a B-tree for lookups by company name or by phone number or by invoice date.

9. Why are B-trees better than inverted files?

B-trees can do many things that inverted files can't. First, inverted files usually fail if the inverted list of identifiers grows beyond the Pick item limit of 32K, and can require slow sequential searches of those large items. B-tree items always stay very small and never run into any Pick size limitations.

Second, inverted files typically can only find actual keys known to already exist in the data. If you try to look up ZIP code 95003, and it's not in the file, an inverted file can do nothing more. But a B-tree can tell you the next closest match, such as ZIP 95005, and even identify any size "neighborhood" of adjacent records before and after the match.

Third, inverted files typically only allow searches for a complete key. With B-trees, you can search for the name JOHNSON, or just JOH, or just the letter J, or as many characters as you want, and B-TREE-P will immediately find the closest match. If an inverted file tries to store all those substrings, it requires enormous amounts of disk space.

Fourth, inverted files typically cannot step sequentially through the index file. With B-trees, you can start at any key, then step sequentially through all preceding or following keys.

Fifth, inverted files typically only allow searches for a primary key. With B-trees, you not only can immediately retrieve the list of items all having ZIP 10020, you can also immediately find the one item with ZIP 10020 and an address of 207 BROADWAY.

Sixth, software for inverted files is usually tied closely to the data being indexed, and must often be modified if a new type of data file or field is to be indexed. The B-TREE-P "building block" routines never have to be modified, regardless of the application or type of indexing.

B-trees are much more efficient, flexible, and user-friendly than inverted files.

10. How were B-trees invented?

B-tree algorithms were developed by various computer manufacturers and independent research groups beginning in about the late 1960s, and are now the basis for many commercial database systems, although not the Pick system.

After abandoning the use of inverted files on a Pick system in late 1985, B-TREE-P developers wrote their own B-tree software to simplify and improve various applications. Articles published about those efforts attracted numerous requests from other Pick users asking to acquire the software, so it was packaged as B-TREE-P beginning in April 1986.