Strings

We have worked with strings a little bit, seeing how to display strings (the print function), concatenate them (the + operator), repeat them (the * operator), and how to convert between strings and integers (the str and int functions). Python also provides the float function to convert from a string, like “2.6” to a floating point number, and the bool function to convert from a string like “True” to a boolean value.

A string is a sequence of characters. That may leave you wondering what exactly a character is.

Unicode and ASCII

Every character on your keyboard has a unique Unicode value. Internally, a computer stores everything as numbers. Unicode defines the mapping between characters and the numbers that the computer stores. For example, an uppercase A has the Unicode value 65 (sometimes represented in “hexadecimal” with 0041), while a lowercase a has the value 97 (hex 0061). The more common letters, digits and symbols use smaller-valued numbers; these "ASCII" codes are a subset of the Unicode values. Here's a reference with more info if you’re interested! There are many more Unicode values to explore. You might be surprised by the vast range of things that can be stored as a single Unicode character, including Emojis and characters from many alphabets!

To work with a Unicode character, you need to know its code; then you can construct a string encoding it. It turns out there are different ways of doing this encoding, but Python allows us to use \U followed by the Unicode value (represented in hexadecimal), padded out to 8 characters long. Try this out:

print("\U0001F602")

To find the Unicode value for a character, use the built-in Python function ord:

letter:str = "A"

code:int = ord( letter )

print( code )

To convert from the code to the letter, you can use the string format of the \U or perhaps more convenient, the chr function:

code:int = 76

letter:str = chr( code )

print( letter )

What's neat about this is that you can shift a letter up or down in the alphabet by doing a little math on the codes!

letter:str = "L"

code:int = ord( letter )

previousCode:int = code - 1

previousLetter:str = chr(previousCode)

print( previousLetter )

There are some pretty fun characters way "up" in the Unicode values!

def shiftCharacter( unicodeChar:str, shiftAmount:int ) -> str:

""" Shift a character up using its Unicode. """

return chr(shiftAmount+ord(unicodeChar))


# the input function prints the passed argument to the console

# then waits for the user to hit return

# whatever characters they input are returned as a string by

# the function

userChar:str = input( "Enter a character: " )

print( "You entered: " + userChar )

print( "It shifts up to: " + shiftCharacter( "a", 128000 ) )

String operators

We already know that we can use the + operator to concatenate strings and the * operator to repeat. Here are a few more built-in operators for strings!

String comparison operators

Relational operators (<, <=, ==, etc.) can be used to compare strings. These operators take in two operands and evaluate to a single boolean (True or False). Comparison is done using lexicographical ordering. This is similar, but not the same as alphabetical ordering. Instead, it is based on the Unicode values of the characters.

There are a few important implications of this:

  • It allows us to compare strings that include things other than just letters.

  • Since uppercase letters and lowercase letters have different Unicode values, it means that ‘A’ does not equal ‘a’. In Unicode, all uppercase letters have lower values than all lowercase letters. As a result ‘Z’ < ‘a’. All words that begin with an uppercase letter are < all words that begin with a lowercase letter, regardless of what the letter is.

  • The space character as in "Life is good" is a character. So "Life is good" is not equal to "Lifeisgood". The space character comes before all letters in the Unicode ordering, so "Life is good" < "Lifeisgood". Special characters, like 'é' or 'þ', come after all letters; generally, letters and characters in other languages, as well as emojis, come after that, based on when they were added. For a list of all characters in order, look here: https://unicode-table.com/en/#002

The in operator

The in operator checks to see if a character or substring is in a string, evaluating to either True or False

print( "e" in "hello" ) # prints True

print( "E" in "hello" ) # prints False

print ( "leg" in "college" ) # prints True

Basic string functionality

fruit:str = "banana" # assign the variable fruit the string literal "banana"

fruitLength:int = len( fruit ) # use the len function to obtain the number of characters

print( fruitLength ) # print the value of the fruitLength variable

firstChar:str = fruit[0] # access the character at index 0 and assign it to a variable

lastChar:str = fruit[len(fruit)-1] # access last character and assign it to a variable

print( firstChar + lastChar ) # glue the first and last characters together and print the result

shortFruit:str = fruit[2:len(fruit)] # slice the fruit to keep only the letters starting at index 2

print( shortFruit ) # print it out

The len function

There are a number of useful functions that can be applied to strings. To find the length of a function, we use the len function, which returns an int:

nameLength:int = len('Jean Sammet')


or

name:str = 'Jean Sammet'

nameLength:int = len(name)


It’s important to keep in mind that the space is considered a character, and an empty string (a string of length 0) is a string of no characters. We can see this by using the len function to look at a string containing a space and a string of no characters.

Getting a character from a string

If we have a string, we might want to extract a character or characters from a particular position, or index. An index is a position within a sequential collection, like the sequence of characters that make up a string. We will use square brackets [] to "index into" or "access" one or more characters of a string.

Python considers the first character in the string to be at index 0! While this may be surprising, in fact, most programming languages behave this way. When we have a sequence of things, whether it is characters or something else, the first one will be at index 0.

Also, notice how we get the last character out of the string. If we have the string “MHC”, len("MHC") evaluates to 3. Since the first index is 0, "MHC"[0] is ‘M’, "MHC"[1] is ‘H’, and "MHC"[2] is ‘C’. The index for the last character of the string will always be one less than the length of the string.

If we try to get a character using a bad value for the index, we'll get an error:
IndexError: string index out of range

spaceChar:str = " " # the space character

print( spaceChar ) # you can't really see it, but it's there...

lenOfSpaceChar:int = len(spaceChar) # get the number of characters

print( lenOfSpaceChar ) # should print 1

emptyString:str = "" # this is a string literal with no characters

print( len(emptyString) ) # really, nothing prints out here

color:str = "blue" # set up for trying out a bad index

badChar:str = color[4] # this will cause an error!

String slices

We can slice a string to extract a substring that may contain more than 1 character using square brackets with a colon specifying a range, as in
*string_object*[ start : end ]

  • Python will include the character at the start index and go up to but not include the character at the end index.

fruit:str = "banana"

slicedFruit:str = fruit[2:6] # evaluates to "nana"

print( slicedFruit )

  • If we leave out the start, it defaults to 0.

fruit:str = "banana"

slicedFruit:str = fruit[:2] # evaluates to "ba"

print( slicedFruit )

  • If we leave out the end, it defaults to the string object's length

fruit:str = "banana"

slicedFruit:str = fruit[2:] # evaluates to "nana"

print( slicedFruit )

Formatted strings

Starting with Python 3, we can use formatted strings. These are convenient for embedding expressions in a string instead of converting and concatenating.

The syntax for a formatted string is

f"...{expression to be evaluated and embedded}..."

x:int = 3

whatIsX:str = f"The value of x is {x}."

print( whatIsX )

print( f"The value of x*x is {x*x}.")

Filters and Maps

Since strings are sequences of characters, we often want to apply some action to each character simultaneously. Python gives some convenient functions for this. We won't dig into what is going on under the hood here, but they'll be helpful for letting us manipulate strings.

Filtering a string

The filter function takes in a sieve function along with a string and keeps only the characters in the string that pass the sieve function

    • the filtering function must take in a string and return a boolean

    • we can use this pattern to convert what filter returns into a string

"".join( filter( _sieveFunctionName_, _stringToFilter_) )

def isVowel( letter:str ) -> bool:

""" Assumes letter is a single character and returns

True if it is a vowel """

return letter.lower() in "aeiou"


color:str = "green" # create a string variable


# the following will filter to keep only characters in the

# string color for which isVowel returns True

# filter returns a special object which we won't talk about now

# what we just take as a pattern is to use

# "".join( filter( ____, ____) )

# to get back a string with the filter applied

colorVowels:str = "".join(filter( isVowel, color ))

print(colorVowels)

Mapping a function over a string

The map function takes in an apply function along with a string and applies that function to each character.

    • the apply function must take in a string and return a string

    • we can use this pattern to convert what map returns into a string

"".join( map( _applyFunctionName_, _stringToMapOver_) )

def doubleLetter( letter:str ) -> str:

""" Assumes letter is a single character and returns

2 copies of it. """

return letter*2


color:str = "green" # create a string variable


# the following will map a function over a string, applying

# it to each character

# map returns a special object which we won't talk about now

# what we just take as a pattern is to use

# "".join( map( ____, ____) )

# to get back a string with the filter applied

doubledColor:str = "".join(map( doubleLetter, color ))

print(doubledColor)