Intro
We have seen how we can write a "for" loop to work with different types of objects.
for x1 in range( 3 ) :
... print ( x1 )
...
0
1
2
>>> for x1 in "Test" :
... print( x1 )
...
T
e
s
t
>>> list1 = [ 2 , 3 , 4 ]
>>> for x1 in list1 :
... print( x1 )
...
2
3
4
We know that the objects "Range" , string and list are sequence objects. The "for" loop can work on any sequence object.
However the object that the for loop can work with can be any iterable object. An iterable object is something that can produce either a physical sequence or a virtual sequence. An example of a virtual sequence can be the lines in a file. The iterable object can be a custom class also.
Ex:
class Counter:
def __init__(self, low, high):
self.current = low
self.high = high
def __iter__(self):
return self #Counter(3,14)
def __next__(self) :
if self.current > self.high:
raise StopIteration
else:
self.current += 1
return self.current - 1
for c1 in Counter(3, 8):
print( c1 )
Output:
[amittal@hills chapter14]$ python3 iter1.py
3
4
5
6
7
8
There are 2 functions in the above class: "iter" and "next" . The "iter" function returns an iterator object. Once we have the iterator object we keep calling "next" on it to obtain the values till there are no more values and at that point our function throws an Exception of the type "StopIteration" . In our example the "iter" function returns the "Counter" object itself but it's quite possible that a different object can be returned. Once we get the iterator object then we call the method "next" on it and that ends up producing a value and keeps on producing a value till the "StopIteration" is called. This is how the iterator process works .
File Iterators
Our data file "data.txt" consists of 4 lines:
Line 1
Line 2
Line 3
Line 4
file1.py
fileObj = open( "data.txt" )
print( fileObj.read() )
Output:
$ python3 file1.py
Line 1
Line 2
Line 3
Line 4
The function "read()" essentially reads the whole file and prints the contents. We know the function "readline()" reads a single line. and if we keep calling the function it will return the line read and returns the empty string when there are no more lines to be read. Our next program illustrates this:
File: file2.py
fileObj = open( "data.txt" )
line1 = fileObj.readline()
print( "Length of line:" , len( line1) , ":" , line1 , ":" )
line1 = fileObj.readline()
print( "Length of line:" , len( line1) , ":" , line1 , ":" )
line1 = fileObj.readline()
print( "Length of line:" , len( line1) , ":" , line1 , ":" )
line1 = fileObj.readline()
print( "Length of line:" , len( line1) , ":" , line1 , ":" )
line1 = fileObj.readline()
print( "Length of line:" , len( line1) , ":" , line1 , ":" )
Output:
[amittal@hills chapter14]$ python3 file2.py
Length of line: 7 : Line 1
:
Length of line: 7 : Line 2
:
Length of line: 7 : Line 3
:
Length of line: 6 : Line 4 :
Length of line: 0 : :
We keep reading each line and when there are no more lines to be read the "readline()" returns 0 . The file object also supports the "next" call.
File: file3.py
fileObj = open( "data.txt" )
print( fileObj.__next__() )
print( fileObj.__next__() )
print( fileObj.__next__() )
print( fileObj.__next__() )
#This will throw the StopIteration error
print( fileObj.__next__() )
Output:
Line 1
Line 2
Line 3
Line 4
Traceback (most recent call last):
File "file3.py", line 13, in <module>
print( fileObj.__next__() )
StopIteration
After doing "next" after the 4th line we hit an exception. This is exactly the iteration protocol. So we can use the file object in a for loop to read the file line by line.
File:
for line in open( "data.txt" ) :
print( line )
Output:
python3 file4.py
Line 1
Line 2
Line 3
Line 4
The question is we know that behind the scenes a file object will get created but does it get closed automatically. We can test that out by writing some more code.
#Does the file get closed
#
for line in open( "data.txt" ) :
print( line )
#This doe not create a problem
#So we can assume the file is closed
# properly
for line in open( "data.txt" ) :
print( line )
The above works because Python garbage collects the first file object used in the first for loop. We can also write the above without relying on the iterator.
File:
f1 = open( "data.txt" )
while True:
line = f1.readline()
if not line: break
print(line.upper(), end='')
f1.close()
This does the same thing--reads the file line by line and prints the line out. However it is not as concise. Also the iterator approach is recommended as it may be more efficient.
We can use the "iter" function to obtain the iterator for an object and then call "next" on it to get the values.
File:
list1 = [ 10, 11 , 12 ]
iter1 = iter( list1 )
print( iter1.__next__() )
print( iter1.__next__() )
print( iter1.__next__() )
print( iter1.__next__() )
Output:
[amittal@hills chapter14]$ python3 iter2.py
10
11
12
Traceback (most recent call last):
File "iter2.py", line 14, in <module>
print( iter1.__next__() )
StopIteration
We obtained the iterator object and then called "next" on the iterator object to obtain the values. We cannot just use the list object as an iterator.
list1 = [ 10, 11 , 12 ]
print( list1.__next__() )
The reason is list by itself is not an iterator object. It's possible that we may have several different iterator objects for the same list pointing to different objects in the list. However in some cases like the file "open" the file object is also an iterator object.
File:
list1 = [ 10, 11 , 12 ]
iter1 = iter( list1 )
iter2 = iter( list1 )
print( iter1.__next__() )
print( iter1.__next__() )
print( iter2.__next__() )
Output:
[amittal@hills chapter14]$ python3 iter4.py
10
11
10
We can see from the above example that our 2 iterator objects are pointing to different positions in the list.
There is another way we can check if the object returned by the "iter" call is the same or not.
File:
list1 = [ 10, 11 , 12 ]
print( iter(list1 ) is list1 )
f1 = open( "data.txt" )
print( iter( f1 ) is f1 )
Output:
[amittal@hills chapter14]$ python3 iter_check1.py
False
True
The iterator for a list object is not the same as the list object so we have a value of "False" and the for a file object we get a value of "True" because the file object is also an iterator object.
Manual Iteration
File: "man1.py"
list1 = [ 10, 11 , 12 ]
#Automatic Iteration
for x1 in list1 :
print( x1 , end = " " )
print()
#Manual Iteration
iter1 = iter( list1 )
while True:
try:
x1 = next( iter1 )
except StopIteration:
break
print( x1 , end=" " )
print()
Output
10 11 12
10 11 12:
Other Iterables
Let's see some other examples of iterables :
>>> dict1 = { 1:"one" , 2:"two" }
>>> for x1 in dict1 :
... print( x1 )
...
1
2
The dictionary iteration goes through the keys in the dictionary.
for x1 in enumerate( "test" ) :
... print( x1 )
...
(0, 't')
(1, 'e')
(2, 's')
(3, 't')
The "enumerate" function produces tuples with each tuple containing the position as well as the content.
We can combine the dictionary with the enumerate function to iterate through the keys and their positions.
>>> dict1 = { 1:"one" , 2:"two" }
>>>
>>> for x1 in enumerate( dict1 ) :
... print( x1 )
...
(0, 1)
(1, 2)
>>>
The map is also an iterable object.
>>> m1 = map( abs, (-1, 0, 1 ) )
>>> for x1 in m1 :
... print( x1 )
...
1
0
1
>>> L1 = [ 1 , 2, 3, 4 ]
>>> L2 = [ 5 ,6 ,7, 8 ]
>>> for x1 in zip( L1 , L2 ) :
... print( x1 )
...
(1, 5)
(2, 6)
(3, 7)
(4, 8)
filter
>>> list( filter (bool, ["test" , "" , "class" ] ) )
['test', 'class']
>>> for x1 in filter (bool, ["test" , "" , "class" ] ) :
... print( x1 )
...
test
class
List Comprehension
List comprehension follows the same rules as the for loops and works with iterators.
File: "data.txt"
Line1
Line2
Line3
Line4
>>> f1 = open( "data.txt" )
>>> lines = f1.readlines()
>>> lines
['Line 1\n', 'Line 2\n', 'Line 3\n', 'Line 4']
>>>
To take out the end of line character "\n" we can use comprehension.
>>> lines = [ line.rstrip() for line in lines ]
>>> lines
['Line 1', 'Line 2', 'Line 3', 'Line 4']
We can do the same thing in a single line:
>>> lines = [ line.rstrip() for line in open("data.txt") ]
>>> lines
['Line 1', 'Line 2', 'Line 3', 'Line 4']
The above works because the comprehension works on iterable objects.