String functions

There is a matching Excel file that you can use to follow along and practice, which you can download from here.

String functions

These are extremely handy tools that you can use for data cleanup (particularly splitting names) or during analysis. They allow you to grab only a piece of the information in a field based on certain criteria. These functions are also available in other software, including most SQL database programs, and coding languages like R and Python. They often work in a similar fashion but have slight variations in syntax.

LEFT: This tells the computer to start at the first byte on the left side of the field. Then we have to tell it how many bytes (or characters) to take.

Syntax: LEFT(celladdress, number of bytes to take)

Example: LEFT(B5, 5) --- this will extract the first 5 characters of the contents of cell B5

MID: To use this function, you have to tell the computer which cell to work on, where to start and where to stop. If you want to take everything that remains in the field, just put a really big number in that will likely encompass all possibilities.

Syntax: MID(celladdress, byte number to start at, number of bytes to take)

Example: MID(B5,10,4) --- this will start at the 10th byte and take 4 bytes.

SEARCH: This works as a sort of search tool to tell the computer to either start or stop taking a “string” at a certain character (or space). This is how we can tell the program to split a name field at the comma, for example. For this type of work, it is used in conjunction with the MID function. The character you what to find should be enclosed in quotes.

Syntax: SEARCH(“character we want to find”, celladdress)

Example: SEARCH(“,”,B5)

You can combine this with Mid to explain that you either want to start or stop at a certain character (even if the character isn’t located at the same byte in every record).

EXAMPLE: MID(b5, search(“,”, b5), 100)

**the above example uses the search function to find the “start” position, then tells the computer to take 100 bytes from there.

EXAMPLE: MID(b5, 10, search(“,”, b5))

**the above example uses search to find the “end” position.

**Note: If you don’t want to include the character that you searched for in your result, use a –1 or +1 just after the search phrase to either go back a space (-1) or move forward and start a space farther (+1). Here’s an example that will start at the comma, then move one space forward and take 100 bytes from there:

=mid(b5, search(“,”,b5)+1, 100)

There is also a RIGHT function, which starts at the first byte on the right side of the field and then you can tell it how many bytes to take. (it isn’t as useful as the others, however)

Intro to string functions

Use worksheet: Basic strings

When I teach people how to use string functions, usually the first response I get is: “Why can’t you just use Text-to-Columns?” This great feature (found under the Data menu) is very handy if the column you are trying to split is delimited by something. For example, names such as: “Smith, John”

But sometimes it won’t work and sometimes your data won’t be that neat and tidy. And when you hit that day, you’ll thank me for teaching you string functions.

The data in this worksheet is an example of that very situation. This is school district expenditure data that had been reported to the state through their financial accounting database called UFARS. Each row is a “bucket” of expenditures, coded by the revenue source used (“finance”), by the “program” it was spent on and by the “object” for the expenditure (i.e. “food”, “postage”, “salaries”, “health insurance”)

In the first column you can see those codes are all strung together, separated neatly by dashes. (Yes, the first two rows only have one of the three types of codes)

I wanted to separate these out into their own columns – finance, program and object.

I tried to do text-to-columns, first by making a copy of that column and dropping it in a blank column. Then highlighting that column and selecting “Text-to-Columns” from the data menu. I told Excel, it was delimited by a dash (“other”) and to treat consecutive delimiters as one. It looked like everything would work perfectly. But then the new columns no longer had the leading zeros (which are necessary)!

That’s when I realized string functions would save the day.

To get the finance code (the first three digits), I used the LEFT() function.

=LEFT(a6, 3)

To get the program code (the middle code), I used the MID() function. You tell it which cell to work on, which byte to start on, and then how many bytes to take. (Note: you have to look carefully to see how many dashes are in there. The second code starts at byte 9)

=MID(a6, 9, 3)

And to get the last code, you can either use MID in the same fashion….

=MID(a6, 16,3)

OR….

You can use the RIGHT function

=RIGHT(a6, 3)

Splitting names

This tutorial shows you how to split a name into first, last and middle initial, even when it doesn't always have a middle initial. It uses string functions plus an IF statement. If you haven't learned how to use IF statements, first check out the first video.

A tipsheet that goes with this string functions video is here.

Splitting city state

Use worksheet called “citystate”

This trick is only going to work in specific circumstances, but it’s one you might encounter with some frequency. Here’s the deal…you’ve got a spreadsheet that has a column containing both the city name (or perhaps a county name) and a two-digit state abbreviation but there isn’t a comma separating the two items, so it’s not easy to parse.

You can use the LEN function to determine how long the full string is and then subtract 2 digits to find out what byte position that last space is at. (since that’s the byte position you want to use for splitting the info).

So in column B, put in this formula and copy it down:

=len(a2)-2

A2 is the first cell where our data (city-state column) is located. The first part of the formula – LEN(A2) – is calculating the number of bytes there are in the cell. And then -2 is just subtracting two bytes.

Check your numbers on a few examples to make sure it’s hitting the right position. Then you can use that number you just created (in the B column)

To grab the city name:

=LEFT(a2,b2)

See how I substituted “b2” instead of using SEARCH() like we did in an earlier example?

Then you can grab the state abbreviation either by using:

=RIGHT(A2,2)

OR

=MID(A2,B2,2)