# TREES

### Trees are effective structures for organizing, retrieving, and presenting information.

__DEFINITION__: "Tree" structures are ways of organizing information, where each element of information connects to a limited (typically, but not necessarily, 2) number of other elements (Thareja, 2014). The connections follow systematic rules that contribute to organizing the structure.

Tree structures are very __powerful__. For example, consider the task of finding a telephone number based on a name (e.g. "Kepler") in an alphabetically-organized contact list. The contact list contains names and their corresponding telephone numbers in alphabetical order. For simplicity, imagine that our contact list has only has 26 names (one for each letter).

If we were to look for a name to find the corresponding telephone number, we could start at the beginning of the contact list ("A") and read each name until we got to the name we were looking for. To find "Kepler," we would have to read through 11 letters (A through K) including "K" for "Kepler."

Imagine that we needed to search for the phone numbers of __every__ name in the contact list. If we started each search with "A," half the time the letters would be in the first half of the alphabet (A-M) and half the time the letters would be in the second half of the alphabet (N-Z). The __average__ number of letters that we would have to go through to find the desired name would be (1+2+3+4+...+26)/26, or 13.5, close to half of the total number of letters in the English alphabet (26 letters total).

However, imagine that we organized the contact list in a tree structure (Figure 1). The tree structure has a simple rule: start at "P." For each letter you are at, if the letter you are looking for comes before the letter you are at, follow the left branch. If the letter you are looking for comes after the letter you are at, follow the right branch (you might actually use a similar algorithm to do something like search for a name in a telephone book if you open the book half-way, look at which letter you are at, open either the left half or right half etc.).

If you examine the tree structure, you can see that the most you have to read to get to __any__ letter of the alphabet is 5. By using a tree structure, we have cut our work by more than half!

Using a tree structure may seem too much trouble to avoid reading fewer than 8 elements on average. However, the true power in tree structures is how they * scale*: how they perform as the tree gets bigger and bigger.

Imagine a contact list with one *million* names (N). If you start reading from "A" each time, the number of names that you have to read in a contact list before getting to the name you are looking for will __average__ about half of N, or about 500,000. That's a lot of names. However, it turns out that if you build a tree structure with a million names, the number of names you have to read before finding the name you are looking for is approximately log2(N), or about 20! Instead of cutting our work in half, building the tree cut our work by a factor of 25,000! Using a "binary tree" substantially reduced the amount of *information* that we needed to process to search the contact list (Shannon, 1948).

Clearly, tree structures are __powerful__. Similar ideas allow you to do things like search the Internet (!).

Although clearly we do not have to deal with as much information as a million-name contact list (or the Internet) in scientific communication, we can still take advantage of the power of tree structures to reduce the amount of information necessary to understand any aspect of a paper or presentation (Dumont, 2009; Wigmore, 1913).

For example, consider that we were conducting research on childhood obesity (a major public health issue; Centers for Disease Control, 2018). Child obesity is a complex issue. However, using a tree hierarchy could help us organize our approach to conducting research and constructing arguments about obesity.

We could first consider the contributors to obesity. Obesity can occur if people take in more energy (calories) than they expend, resulting in the body storing excess calories as fat. There are two general contributors to energy balance: energy input (how much someone eats) and energy output (basal metabolism and how much someone exercises). We could visualize our two factors using a tree:

Creating the dichotomy between energy *input* and energy *output* allows us to consider each contributor separately (although there are some connections between the two; Astorino et al, 2018). Consider that we first focus on energy *input*. Three categories of consumption that affect energy intake might be snacks, drinks, and meals.

Again, creating a tree structure can help us consider each contributor to energy intake separately. Note that we do not have to use a binary (two-branch) tree! In this case, we have three categories of energy input: Snacks, Drinks, and Meals (although limiting ourselves to three or fewer categories is helpful). Snacks, Drinks, and Meals are still fairly large categories, so we could focus even more closely on one category like __Drinks__.

Within Drinks, we might reasonably hypothesize that __soda consumption__ contributes to obesity. Our general hypothesis could be supported by two arguments: (1) that there is a * correlation* between soda consumption and obesity; and (2) that there are plausible physiological

*that link soda consumption and obesity. We could express the two alternatives as a part of our evolving tree:*

__mechanisms__Now that we have two arguments, we can organize our evidence. In this case, we could use *inductive* reasoning to support each argument based on specific premises supported by published studies:

Once we have our findings organized into a tree structure, we can then use the tree structure to help us express our arguments as paragraphs using logical transitions. Each branch of the tree supports ONE main idea. Therefore, each branch of the tree can become ONE paragraph:

"*Both correlational and physiological evidence suggest that soda consumption may contribute to obesity in children*.

First, both soda consumption and obesity increased over the past 40 years. From 1970 to 1990 there was a 123% increase in soda consumption among children (Hu and Malik, 2010). Similarly, from 1986 to 2006, the percentage of obese children doubled (Hedley et al., 2004). Therefore, increased obesity occurred concurrently with increased soda consumption.

Second, several plausible physiological mechanisms link soda consumption and obesity. Sugar consumption increases calorie intake (Ludwig et al., 2001). Sugar consumption also changes metabolism to favor fat storage (Brand-Miller et al., 2002). Moreover, consuming sugar as liquid is less satisfying, increasing calorie intake (DiMeglio and Mattes, 2000). Finally, Soda consumption displaces milk consumption among children, reducing the obesity-preventing role of calcium (Borrud et al., 2006). Therefore, the association between soda consumption and obesity is physiologically plausible.

Because there is both an association between soda consumption and obesity, and plausible mechanisms linking soda consumption and obesity, we hypothesize that soda consumption contributes to obesity."

If we wish to visualize the entire tree that we have constructed (so far), it would look like:

If we need to consider other aspects of energy input (e.g. Snacks or Meals), then we could expand those branches of the tree in the same way we expanded the branch for Drinks. Likewise, we could organize the energy output branch of the tree using a similar procedure. In every case, we can __compartmentalize__ our thinking to specific branches of the tree so we do not become overwhelmed with information.

Just as in the contact list example, a tree structure __limits__ the amount of information that we need to process to find (or put into context) a specific finding.