16-Functional Approach

Python provides several functions which enable a functional approach to programming. These functions are all convenience features in that they can be written in Python fairly easily.

Map

One of the common things we do with list and other sequences is applying an operation to each item and collect the result.

map(aFunction, aSequence)

All the iterable arguments are unpacked together, and passed into the given function. That's a little cryptic, so let's take a look at an example. Imagine we have two list of numbers, maybe prices from two different stores on exactly the same items. And we wanted to find the minimum that we would have to pay if we bought the cheaper item between the two stores. To do this, we could iterate through each list, comparing items and choosing the cheapest. With map, we can do this comparison in a single statement.

But when we go to print out the map, we see that we get an odd reference value instead of a list of items that we're expecting. This is called lazy evaluation. In Python, the map function returns to you a map object. It doesn't actually try and run the function min on two items, until you look inside for a value. This is an interesting design pattern of the language, and it's commonly used when dealing with big data. This allows us to have very efficient memory management, even though something might be computationally complex.

Maps are iterable, just like lists and tuples, so we can use a for loop to look at all of the values in the map.

store1 = [10.00, 11.00, 12.34, 2.34]

store2 = [9.00, 11.10, 12.34, 2.01]

cheapest = map(min, store1, store2)

cheapest

<map at 0x7f226012c630>

Now let's iterate through the map object to see the values.

for item in cheapest: print(item)

9.0 11.0 12.34 2.01

This passing around of functions and data structures which they should be applied to, is a hallmark of functional programming. It's very common in data analysis and cleaning. Here's a problem for you to try, that brings together some of the tasks you might be expecting to do with data cleaning.

For example, updating all the items in a list can be done easily with a for loop:

>>> items = [1, 2, 3, 4, 5]

>>> squared = []

>>> for x in items:

squared.append(x ** 2)


>>> squared

[1, 4, 9, 16, 25]

>>>

Since this is such a common operation, actually, we have a built-in feature that does most of the work for us.

The map(aFunction, aSequence) function applies a passed-in function to each item in an iterable object and returns a list containing all the function call results.

>>> items = [1, 2, 3, 4, 5]

>>>

>>> def sqr(x): return x ** 2

>>> list(map(sqr, items))

[1, 4, 9, 16, 25]

>>>

We passed in a user-defined function applied to each item in the list. map calls sqr on each list item and collects all the return values into a new list.

Because map expects a function to be passed in, it also happens to be one of the places where lambda routinely appears:

>>> list(map((lambda x: x **2), items))

[1, 4, 9, 16, 25]

>>>

In the short example above, the lambda function squares each item in the items list.

As shown earlier, map is defined like this:

map(aFunction, aSequence)

While we still use lamda as a aFunction, we can have a list of functions as aSequence:

def square(x):

return (x**2)

def cube(x):

return (x**3)

funcs = [square, cube]

for r in range(5):

value = map(lambda x: x(r), funcs)

print value

Output:

[0, 0]

[1, 1]

[4, 8]

[9, 27]

[16, 64]

Because using map is equivalent to for loops, with an extra code we can always write a general mapping utility:

>>> def mymap(aFunc, aSeq):

result = []

for x in aSeq: result.append(aFunc(x))

return result

>>> list(map(sqr, [1, 2, 3]))

[1, 4, 9]

>>> mymap(sqr, [1, 2, 3])

[1, 4, 9]

>>>

Since it's a built-in, map is always available and always works the same way. It also has some performance benefit because it is usually faster than a manually coded for loop. On top of those, map can be used in more advance way. For example, given multiple sequence arguments, it sends items taken form sequences in parallel as distinct arguments to the function:

>>> pow(3,5)

243

>>> pow(2,10)

1024

>>> pow(3,11)

177147

>>> pow(4,12)

16777216

>>>

>>> list(map(pow, [2, 3, 4], [10, 11, 12]))

[1024, 177147, 16777216]

>>>

As in the example above, with multiple sequences, map() expects an N-argument function for N sequences. In the example, pow function takes two arguments on each call.

Here is another example of map() doing element-wise addition with two lists:

x = [1,2,3]

y = [4,5,6]

from operator import add

print map(add, x, y) # output [5, 7, 9]

The map call is similar to the list comprehension expression. But map applies a function call to each item instead of an arbitrary expression. Because of this limitation, it is somewhat less general tool. In some cases, however, map may be faster to run than a list comprehension such as when mapping a built-in function. And map requires less coding.

If function is None, the identity function is assumed; if there are multiple arguments, map() returns a list consisting of tuples containing the corresponding items from all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any iterable object; the result is always a list:

>>> m = [1,2,3]

>>> n = [1,4,9]

>>> new_tuple = map(None, m, n)

>>> new_tuple

[(1, 1), (2, 4), (3, 9)]

For Python 3, we may want to use itertools.zip_longest instead:

>>> m = [1,2,3]

>>> n = [1,4,9]

>>> from itertools import zip_longest

>>> for i,j in zip_longest(m,n):

... print(i,j)

...

1 1

2 4

3 9

The zip_longest() makes an iterator that aggregates elements from the two iterables (m & n).

Filter and Reduce

As the name suggests filter extracts each element in the sequence for which the function returns True. The reduce function is a little less obvious in its intent. This function reduces a list to a single value by combining elements via a supplied function. The map function is the simplest one among Python built-ins used for functional programming.

These tools apply functions to sequences and other iterables. The filter filters out items based on a test function which is a filter and apply functions to pairs of item and running result which is reduce.

Because they return iterables, range and filter both require list calls to display all their results in Python 3.0.

As an example, the following filter call picks out items in a sequence that are less than zero:

>>> list(range(-5,5))

[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

>>>

>>> list( filter((lambda x: x < 0), range(-5,5)))

[-5, -4, -3, -2, -1]

>>>

Items in the sequence or iterable for which the function returns a true, the result are added to the result list. Like map, this function is roughly equivalent to a for loop, but it is built-in and fast:

>>>

>>> result = []

>>> for x in range(-5, 5):

if x < 0:

result.append(x)


>>> result

[-5, -4, -3, -2, -1]

>>>

Here is another use case for filter(): finding intersection of two lists:

a = [1,2,3,5,7,9]

b = [2,3,5,6,7,8]

print filter(lambda x: x in a, b) # prints out [2, 3, 5, 7]

Note that we can do the same with list comprehension:

a = [1,2,3,5,7,9]

b = [2,3,5,6,7,8]

print [x for x in a if x in b] # prints out [2, 3, 5, 7]

The reduce is in the functools in Python 3.0. It is more complex. It accepts an iterator to process, but it's not an iterator itself. It returns a single result:

>>>

>>> from functools import reduce

>>> reduce( (lambda x, y: x * y), [1, 2, 3, 4] )

24

>>> reduce( (lambda x, y: x / y), [1, 2, 3, 4] )

0.041666666666666664

>>>

At each step, reduce passes the current product or division, along with the next item from the list, to the passed-in lambda function. By default, the first item in the sequence initialized the starting value.

Here's the for loop version of the first of these calls, with the multiplication hardcoded inside the loop:

>>> L = [1, 2, 3, 4]

>>> result = L[0]

>>> for x in L[1:]:

result = result * x

>>> result

24

>>>

Let's make our own version of reduce.

>>> def myreduce(fnc, seq):

tally = seq[0]

for next in seq[1:]:

tally = fnc(tally, next)

return tally

>>> myreduce( (lambda x, y: x * y), [1, 2, 3, 4])

24

>>> myreduce( (lambda x, y: x / y), [1, 2, 3, 4])

0.041666666666666664

>>>

We can concatenate a list of strings to make a sentence. Using the Dijkstra's famous quote on bug:

import functools

>>> L = ['Testing ', 'shows ', 'the ', 'presence', ', ','not ', 'the ', 'absence ', 'of ', 'bugs']

>>> functools.reduce( (lambda x,y:x+y), L)

'Testing shows the presence, not the absence of bugs'

>>>

We can get the same result by using join :

>>> ''.join(L)

'Testing shows the presence, not the absence of bugs'

We can also use operator to produce the same result:

>>> import functools, operator

>>> functools.reduce(operator.add, L)

'Testing shows the presence, not the absence of bugs'

>>>

The built-in reduce also allows an optional third argument placed before the items in the sequence to serve as a default result when the sequence is empty.

Lambda

Lambdas are one line functions. They are also known as anonymous functions in some other languages. You might want to use lambdas when you don’t want to use a function twice in a program. They are just like normal functions and even behave like them.

Lambda's are Python's way of creating anonymous functions. These are the same as other functions, but they have no name. The intent is that they're simple or short lived and it's easier just to write out the function in one line instead of going to the trouble of creating a named function.

The lambda syntax is fairly simple. But it might take a bit of time to get used to.

You declare a lambda function with the word lambda followed by a list of arguments, followed by a colon and then a single expression and this is key. There's only one expression to be evaluated in a lambda. The expression value is returned on execution of the lambda.

The return of a lambda is a function reference. So in this case, you would execute my_function and pass in three different parameters.

Note that you can't have default values for lambda parameters and you can't have complex logic inside of the lambda itself because you're limited to a single expression.

So lambdas are really much more limited than full function definitions. But I think they're very useful for simple little data cleaning tasks.

Blueprint

lambda argument: manipulate(argument)

Example

add = lambda x, y: x + y

print(add(3, 5))

# Output: 8

Here are a few useful use cases for lambdas and just a few way in which they are used in the wild:

List sorting

a = [(1, 2), (4, 1), (9, 10), (13, -3)]

a.sort(key=lambda x: x[1])

print(a)

# Output: [(13, -3), (4, 1), (1, 2), (9, 10)]

Parallel sorting of lists

data = zip(list1, list2)

data.sort()

list1, list2 = map(lambda t: list(t), zip(*data))

List Comprehensions

Just like with lambdas, list comprehensions are a condensed format which may offer readability and performance benefits and you'll often find them being used in data science tutorials.

Let's iterate from 0 to 999 and return the even numbers.

In [ ]:

my_list = []

for number in range(0, 1000):

if number % 2 == 0:

my_list.append(number)

my_list

Now the same thing but with list comprehension.

In [ ]:

my_list

my_list = [number for number in range(0,1000) if number % 2 == 0]

my_list

Convert a function into a list comprehension

def times_tables():

lst = []

for i in range(10):

for j in range (10):

lst.append(i*j)

return lst

times_tables() == [(i*j) for i in range(10) for j in range (10)]

Create all userid with 2 characters followed by 2 digts (e.g. aa12)

lowercase = 'abcdefghijklmnopqrstuvwxyz'

digits = '0123456789'

answer = [i+j+k+l for i in lowercase for j in lowercase for k in digits for l in digits]

correct_answer == answer