Working with Unicode, ASCII and utf-8

Post date: Sep 12, 2016 9:10:02 PM

When you work with non-English string, this is a good pattern that usually solves lots of problems:

http://stackoverflow.com/a/13093911/1613297

The idea is to

1) encode the string to unicode using command

> s.decode('utf-8') # assume string s is utf-8 encoded

2) Say, you want to replace '?' in the string by '!', then you do

> s.decode('utf-8').replace(u'?', u'!')

3) Then you might want to encode it back to its original encoding

> s.decode('utf-8').replace(u'?', u'!').encode('utf-8')