Strings covered in Lesson 7 / Chapter 6. The following is complementary to the information there- not a replacement or summary!
Strings are one form of collection- because a single string can be a collection of many characters. A fellow pythonista has put together a cheat sheet which is worth printing out for reference, or just making some notes from.
Strings are an IMMUTABLE Data Structure.
Any letter, symbol, space or return is a character (often referred to as a char).
On a computer, a character is a number under the hood. By that, we mean that each character has a number associated with it. This is called ascii, and was developed in the 1960s to standardise character encoding across the electronics industry. Up to then, individual manufacturers would encode data differently, making sharing files -even of text- very difficult across different platforms.
ASCII is abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.
the ascii table shows the characters and their underlying number in both decimal and hex
We can find the number value, or the ordinal value of a character using a builtin python function
>>> ord('a')
97
This will only work with a single character, as each char has it's own underlying value.
We can do the reverse using another builtin function in Python to take an int, and give us the corresponding char.
>>> chr(97)
'a'
This is why a string can be considered a collection or one, many or no chars. Because it is a collection, we can index into it and iterate through it with a loop
You might think looking at the numbers representing a letter is cool, but even cooler is when we look at the binary code for characters and see what the difference between uppercase and lowercase really is.
You could even check the binary representation of the chars for "0... 9" and you would find a relationship between the binary representation and the number it represents. Just an interesting bit of "by the way".
Each char in a string with a len of >= 1 has an index that points to that particular char.
For the string s shown, we can index forward into it or use the backward direction index:
>>> s[3]
'h'
>>> s[-5]
'y'
but if we go beyond those numbers we get an error.
IndexError: string index out of range
We know we can use the + and * operators with strings, and we know- deep in our minds- that when we use these operators with strings, they are different from the same-looking ones we use with numbers. They are indeed different.
Under the hood, the plus and multiply operators are implemented for strings because there is a string method that tells python what to do when one of these operators are used with a string. The str class has a special __add__ and a special __mul__ method that allows it to use the + and *. We'll learn more about these dunder methods (double underscore methods) later in OOP.
We may know from experience that the - (minus) and the / (divide) operators will not work.
>>> "hello" - 'o' # does not result in "hell"
Traceback (most recent call last):
File "<pyshell>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'str'
So very glad you asked!. Look at the following code and then look to the ascii table to discover why the code is doing what it is doing:
>>> 'a' > 'Z'
True
>>> 'a' > 'z'
False
These operators are also implemented under the hood for strings, with their own defined dunder methods to implement them.
So, one of the most commonly used functions with strings is the len function.
>>> len('hello')
5
but what exactly is a method then? All methods are functions, but not all functions are methods. Confused?
With a function, we pass an argument within the parens:
>>> max("hello") # single str
'o'
>>> max(1, 2, 3, 4) # many ints
4
>>> max('a', 'D', 'z', 'S') # many strs
'z'
so the max function can accept a wide variety of args... but with a method we use dot notation.
>>> 'hello'.upper() # returns a copy- does not change the original
'HELLO'
>>> str.upper('hello')
'HELLO'
You can see from this example that calling the .upper() method on the string effectively passes the string to str.upper as an argument. Strings can be changed to upper, lower, title and so on. If we tried to use the .upper method on an int, float or bool, we would get an error. This is because methods belong to the class. The str class has these methods associated with it, but the int, float, bool and other classes do not have these methods.
This means that, while methods are functions, they are functions specific to a class.
To distinguish between a method and a function, methods will use dot notation which implicitly passes the string, while functions pass args explicitly.
These are some of the commonly used string methods.
Python help function will let you look up any of these as long as you tell it where the method lives- using dot notation.
>>> help(str.upper)
Help on method_descriptor:
upper(self, /)
Return a copy of the string converted to uppercase.
So, this tells us that the upper method lives inside the str class.
isnumeric() is a useful method that checks if a string contains a valid numerical string- this can be used to check validity before casting the str to a float, for example. It may allow you to avoid using a try/except!
Strings are immutable. If we want to change one character inside a word, we cannot mutate that char.
>>> word = 'hello'
>>> word[1] = 'a'
Traceback (most recent call last):
File "<pyshell>", line 1, in <module>
TypeError: 'str' object does not support item assignment
If we want to effectively do that job, we need to splice together a new string using copied parts of the existing string. We can then assign that new string to be stored in the original variable, and "throw away" the original string.
>>> new_word = word[0] + 'a' + word[2:]
>>> word = new_word
>>> word
'hallo'
or more succinctly, we could do all this in one line
>>> word = word[0] + 'a' + word[2:]
Strings are iterable, meaning we can use a for loop to iterate, or step through a string. We could also use a while loop, but for loops are a convenient construct created for iterating through finite sequences.
word = 'hello'
for letter in word:
print(ord(letter), end='|')
Output is:
104|101|108|108|111|
word = 'HELLO'
index = 0
while index < len(word):
print(ord(word[index]), end='|')
index += 1
Output is
72|69|76|76|79|
You can see that the for loop is simply an easy way to traverse an iterable object, allowing us to use a variable name to describe each object as we traverse the string. There are often instances where we must use the while loop however such as when comparing adjacent chars in a string, for example.
The range function is a very useful builtin that is often used when processing strings and other iterables.
The range function can take 1, 2 or 3 args. If we pass it one arg, it can generate a list of numbers from 0 to the single arg:
>>> print(list(range(4)))
[0, 1, 2, 3]
If we pass it two args, it can generate a list of numbers from arg1 up to but not including arg2:
>>> print(list(range(3, 14)))
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
If we pass it three args, it can generate a list of numbers from arg 1 up to but not including arg2, in increments of arg3:
>>> print(list(range(3, 14, 3)))
[3, 6, 9, 12]
so the general syntax is
range(start_num, stop_num, increment)
To use the range function to help us iterate through a string, we would need to get the length of the str and then iterate using the indices of the chars in the string.
>>> s = 'hello'
>>> limit = len(s)
>>> for idx in range(limit):
print(s[idx])
h
e
l
l
o
where idx represents the index of the char. Remember that range(limit) in this case gives us something like:
[0, 1, 2, 3, 4]
This may seem trivial, but is in fact very important to understand- when a str is passed to a function, the function receives a copy of the str- not a reference to the "box" holding that variable. This is important because other data types allow a function to mutate the object passed.
str: passes to function by value (a copy)
By contrast:
list: passes to function by reference (the actual object)