Go Vs Python

This post evaluates the runtime performance of Go Vs Python and simplicity of writing concurrent code in both

Synchronous Execution of requests

Python ~17,000 req/s

PyPy ~22,000 req/s

Go ~21,000 req/s

Concurrent Execution of requests

3 clients running perf2.py

Python 275 req/s per perf2.py instance -- concurrency.py takes 188MB of RAM

Go (GOMAXPROCS=3): 12500 req/s per perf2.py instance -- concurrency.go takes 120MB of RAM

Go is significantly faster than Python; this is fine and expected. What I find more disturbing is how much easier it is to morph a synchronous program into its concurrent equivalent. In addition the resulting piece of Go code is also more readable and easier to reason about. Not all problems require a concurrent solution but for the ones that do Go has a lot to offer.

At Pycon in Montreal few weeks ago there was a talk:Python concurrency from the Ground Up: LIVE! by David Beazley. The video is available on YouTube.

The gist of the talk is that going from a synchronous to a concurrent program in Python requires a significant amount of leg work. The talk took a simple socket program that calculates the Fibonaccisum synchronously and tries to make it concurrent. It compares and contrasts various approachs: threads, multiple processes, and corountines.

There are a zillion ways of doing it in Python but none of them are great at taking advantage of multi cores. When I went through the process of typing the code used in his demo I decided for the fun of it to port it to Go.

The first surprises for me was how similar the synchronous version is in both languages. The code and the micro benchmarks that follow should be taken with a grain of salt like always.

Synchronous

The Go version requires a bit more typing and type ceremonies but the structure is very similar.

Synchronous Micro-Benchmark

The benchmark consists of running one instance of perf2.py which simulates a client hammering on our micro service.

# synchronous.pyfrom socket import *def fib(n): if n <= 2: return 1 else: return fib(n-1) + fib(n-2)def fib_server(address): sock = socket(AF_INET, SOCK_STREAM) sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) sock.bind(address) sock.listen(5) while True: conn, addr = sock.accept() # blocking print("connection", addr) fib_handler(conn)def fib_handler(conn): while True: req = conn.recv(100) # blocking if not req: break n = int(req) result = fib(n) resp = str(result).encode('ascii') + b'\n' conn.send(resp) # blocking print('closed')if __name__ == "__main__": fib_server(('', 25000))

// synchronous.gopackage mainimport ( "bytes" "flag" "fmt" "log" "net" "strconv")func fib(n int64) int64 { if n <= 2 { return 1 } else { return fib(n-1) + fib(n-2) }}func fibServer(addr string) { ln, err := net.Listen("tcp", addr) if err != nil { log.Fatal(fmt.Errorf("An error occured while listening to: %s -- %s", addr, err)) } for { conn, err := ln.Accept() log.Println("connection", addr) if err != nil { log.Println(err) continue } fibHandler(conn) // prefix by `go` to get concurrent.go }}func fibHandler(conn net.Conn) { buf := make([]byte, 100) var req int for { n, err := conn.Read(buf) if err != nil || n == 0 { conn.Close() break } reqStr := string(bytes.Trim(buf[0:n], "\n")) req, err = strconv.Atoi(reqStr) if err != nil { log.Println("The request must be a number", reqStr, err) } result := fmt.Sprintf("%v\n", fib(int64(req))) _, err = conn.Write([]byte(result)) if err != nil { fmt.Println("Error while writing to the socket") } } log.Println("closed")}func main() { flag.Parse() args := flag.Args() if len(args) != 1 { fmt.Println("You must provide an addr (127.0.0.1:25000)") return } fibServer(args[0])}

#perf2.pyfrom socket import *import timefrom threading import Threadsock = socket(AF_INET, SOCK_STREAM)sock.connect(('localhost', 25000))n= 0def monitor(): global n while True: time.sleep(1) print(n, 'reqs/s') n = 0Thread(target=monitor).start()while True: sock.send(b'1') resp = sock.recv(100) n += 1

- Python ~17,000 req/s
- PyPy ~22,000 req/s
- Go ~21,000 req/s

PyPy is faster than go by a small margin but as far as I am concerned I would say that the 3 solutions are within the same order of magnitude.

Concurrency

The beauty of Go is that it only takes 2 letters to move from a synchronous to a concurrent version. Simply add go in front of the function call to fibHandler(conn). Not only is it simple, but, unlike Python, there is one obvious way to do it.

The Python equivalent is way harder to pull off, one could argue that it is probably out of reach for a huge portion of experienced Python developers. David Beazley illustrates very well the phenomenal diversity of approaches that could be taken, all broken to some extent. I am sure some other candidates comes to your mind: asyncio, Twisted, Tornado, etc.

Below you can see the coroutines version with a zest of ProcessPoolExecutor.

from socket import *from collections import dequefrom concurrent.futures import ProcessPoolExecutor as Poolfrom select import selectpool = Pool(4)def fib(n): if n <= 2: return 1 else: return fib(n-1) + fib(n-2)def fib_server(address): sock = socket(AF_INET, SOCK_STREAM) sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) sock.bind(address) sock.listen(5) while True: yield 'recv', sock conn, addr = sock.accept() # blocking print("connection", addr) tasks.append(fib_handler(conn))def fib_handler(conn): while True: yield 'recv', conn req = conn.recv(100) # blocking if not req: break n = int(req) future = pool.submit(fib, n) yield 'future', future result = future.result() # blocking resp = str(result).encode('ascii') + b'\n' yield 'send', conn conn.send(resp) # blocking print('closed')tasks = deque()recv_wait = {}send_wait = {}future_wait = {}future_notify, future_event = socketpair()def future_done(future): tasks.append(future_wait.pop(future)) future_notify.send(b'x')def future_monitor(): while True: yield 'recv', future_event future_event.recv(100)tasks.append(future_monitor())def run(): while any([tasks, recv_wait, send_wait]): while not tasks: # no active task to run wait for IO can_recv, can_send, _ = select(recv_wait, send_wait, []) for s in can_recv: tasks.append(recv_wait.pop(s)) for s in can_send: tasks.append(send_wait.pop(s)) task = tasks.popleft() try: why, what = next(task) if why == 'recv': recv_wait[what] = task elif why == 'send': send_wait[what] = task elif why == 'future': future_wait[what] = task what.add_done_callback(future_done) else: raise RuntimeError("We don't know what to do with :", why) except StopIteration: print('task done')if __name__ == "__main__": tasks.append(fib_server(('localhost', 25000))) run()

The interesting part is that even with all this work the Python version can't take advantage of all the cores. Where the Go equivalent is controlled by an environment variable called GOMAXPROCS that determines how many cores you want to allocate to your programs. The performance characteristics are also different by an order of magnitude:

Concurrent Micro-Benchmark

This micro-benchmark does not include PyPy because some of the features used in concurrency.py are not currently supported, specifically the concurrent module.

fib(30)

A single iteration with 30 as the argument to fib.

- Python 231ms
- Go 5ms

Requests per second

3 clients running perf2.py

- Python 275 req/s per perf2.py instance -- concurrency.py takes 188MB of RAM
- Go (GOMAXPROCS=3): 12500 req/s per perf2.py instance -- concurrency.go takes 120MB of RAM

Go is significantly faster than Python; this is fine and expected. What I find more disturbing is how much easier it is to morph a synchronous program into its concurrent equivalent. In addition the resulting piece of Go code is also more readable and easier to reason about. Not all problems require a concurrent solution but for the ones that do Go has a lot to offer.

Reference: Concurrency in Python vs GO