5.1. Reading from a File
A lot of data is available in text files. Reading from a file is particularly useful in data analysis applications, but it's also applicable to any situation in which you want to analyze or modify information stored in a file. When you want to work with the information in a text file, the first step is to read the file into memory. You can read the entire contents of a file, or you can work through the file one line at a time.
Reading an Entire File
To begin, we need a file with a few lines of text in it. Let's start with a file that contains pi to 30 decimal places, with 10 decimal places per line:
Here's a program that opens this file, reads it, and prints the contents of the file to the screen.
> The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If you want to remove the extra blank line, use print(contents.rstrip()). Try it out.
Reading Line by Line
When you're reading a file, you'll often want to examine each line of the file. You can use a for loop on the file object to examine each line from a file one at a time.
When we print each line, we find even more blank lines. These blank lines appear because an invisible newline character is at the end of each line. Using rstrip() on each line in the print() can eliminates these extra blank lines: print(line.rstring()). Try it out.
Making a List of Lines from File
When you use with, the file object returned by open() is only available inside the with block that contains it. If you want to retain access to a file's contents outside the with block, you can store the file's lines in a list inside the block and then work with that list.
Working with a File's Contents
After you've read a file into memory, you can do whatever you want with that data. Let start with building a single string containing all the digits in the file with no whitespace in it.
> We then create a loop that adds each line of digits to pi_string and removes the newline character from each line ❷.
>At ❸, we print this string and also show how long the string is.
Large Files: One Million Digits
So far we've focused on analyzing a text file that contains only three lines, but the code in these examples would work just as well on much larger files. If we start with a text file that contains pi to 1,000,000 decimal places instead of just 30, we can create a single string containing all these digits.
We'll also print just the first 50 decimal places, so we don't have to watch a million digits scroll by.
Is Your Birthday Contained in Pi?
Are you curious to know if your birthday appears anywhere in the digits of pi? Let's use the program to find out if your birthday appears anywhere in the first million digits of pi. We can do this by expressing each birthday as a string of digits and seeing if that string appears anywhere in pi_string:
Exercise 5.1
Learning Python
Make a blank file in your text editor and write a few lines summarizing what you've learned about Python so far.Start each line with the phrase In Python you can...
Save the file as learning_python.txt. Write a program that reads the file and prints what you wrote three times. Print the contents once by reading in the entire file, once by looping over the file object, and once by storing the lines in a list and then working with them outside the with block.
Learning C
You can use the replace() method to replace any word in a string with a different word. Here's a quick example showing how to replace 'dog' with 'cat' in a sentence:
>>> message = "I really like dogs."
>>> message.replace('dog', 'cat')
'I really like cats.'
Read in each line from the file you just created, learning_python.txt, and replace the word Python with the name of another language, such as C. Print each modified line to the screen.