5.1. Reading from a File

A lot of data is available in text files. Reading from a file is particularly useful in data analysis applications, but it's also applicable to any situation in which you want to analyze or modify information stored in a file. When you want to work with the information in a text file, the first step is to read the file into memory. You can read the entire contents of a file, or you can work through the file one line at a time.

Reading an Entire File

To begin, we need a file with a few lines of text in it. Let's start with a file that contains pi to 30 decimal places, with 10 decimal places per line:

Here's a program that opens this file, reads it, and prints the contents of the file to the screen.

> In line , the open() function needs one argument: the name of the file you want to open. Python assigns this object to file_object which we'll work with later in the program. In line , we use read() method to read the entire contents of the file and store it as one long string in contents. When we print the value of contents, we get the entire text file back.
> The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line. If you want to remove the extra blank line, use print(contents.rstrip()). Try it out.

Reading Line by Line

When you're reading a file, you'll often want to examine each line of the file. You can use a for loop on the file object to examine each line from a file one at a time.

When we print each line, we find even more blank lines. These blank lines appear because an invisible newline character is at the end of each line. Using rstrip() on each line in the print() can eliminates these extra blank lines: print(line.rstring()). Try it out.

> At ❶, we assign the name of the file we're reading from the variable filename. This is a common convention when working with files. At ❷, after we call open(), an object representing the file and its contents is assigned to the variable file_object. We again use the with syntax to let Python open and close the file properly. > To examine the file's contents, we work through each line in the file by looping over the file object .

Making a List of Lines from File

When you use with, the file object returned by open() is only available inside the with block that contains it. If you want to retain access to a file's contents outside the with block, you can store the file's lines in a list inside the block and then work with that list.

> At , the readlines() method takes each line from a file and stores it in a list. This list is then assigned to lines, which we can continue to work with after the with block ends.> At ❷, we use a simple for loop to print each line from lines. Because each item in lines corresponds to each line in the file, the output matches the contents of the file exactly.

Working with a File's Contents

After you've read a file into memory, you can do whatever you want with that data. Let start with building a single string containing all the digits in the file with no whitespace in it.

> At , we create a variable, pi_string, to hold the digits of pi.
> We then create a loop that adds each line of digits to pi_string and removes the newline character from each line .
>At ❸, we print this string and also show how long the string is.
> The variable pi_string will contain the whitespace that was on the left side of the digits in each line, but we can get rid of that by using strip() instead of rstrip().

Large Files: One Million Digits

So far we've focused on analyzing a text file that contains only three lines, but the code in these examples would work just as well on much larger files. If we start with a text file that contains pi to 1,000,000 decimal places instead of just 30, we can create a single string containing all these digits.

We'll also print just the first 50 decimal places, so we don't have to watch a million digits scroll by.

> Python has no inherent limit to how much data you can work with; you can work with as much data as your system's memory can handle.

Is Your Birthday Contained in Pi?

Are you curious to know if your birthday appears anywhere in the digits of pi? Let's use the program to find out if your birthday appears anywhere in the first million digits of pi. We can do this by expressing each birthday as a string of digits and seeing if that string appears anywhere in pi_string:


> At ❶, we prompt for the user's birthday, and then at ❷ we check if that string is in pi_string. Once you've read from a file, you can analyze its contents in any way you want.

Exercise 5.1

  1. Learning Python
    Make a blank file in your text editor and write a few lines summarizing what you've learned about Python so far.

  2. Start each line with the phrase In Python you can...
    Save the file as learning_python.txt. Write a program that reads the file and prints what you wrote three times. Print the contents once by reading in the entire file, once by looping over the file object, and once by storing the lines in a list and then working with them outside the with block.


  1. Learning C
    You can use the replace() method to replace any word in a string with a different word. Here's a quick example showing how to replace 'dog' with 'cat' in a sentence:

    >>> message = "I really like dogs."
    >>>
    message.replace('dog', 'cat')
    'I really like cats.'

    Read in each line from the file you just created, learning_python.txt, and replace the word Python with the name of another language, such as C. Print each modified line to the screen.