Understanding Unicode strings (Figure 2.1)

In [10]:
unicode_string = u'\u3042\u308a\u304c\u3068\u3046'
In [11]:
print unicode_string
ありがとう
In [12]:
print len(unicode_string)
5

Reading content from a file (Listing 2.2)

In [13]:
file_path = r'/home/johann/Desktop' #Adjust as required (e.g. r'C:\documents')
In [14]:
import os
In [15]:
os.chdir(file_path)
In [16]:
my_string = open('myfile.txt', 'rb').read().rstrip()
In [17]:
print len(my_string)
15

Decoding strings (Listing 2.3)

In [18]:
my_unicode_string = my_string.decode('utf-8')
In [19]:
print len(my_unicode_string)
5
In [2]:
import codecs
In [21]:
my_new_string = codecs.open('myfile.txt', 'rb', 'utf-8').read().rstrip()
In [22]:
print len(my_new_string)
5

Manipulating strings (Listing 2.4)

In [4]:
print 'this is a string'[0]
t
In [6]:
print 'this is a string'[0:4]
this