All the usual announcements...
%%html
'<iframe src="http://free.timeanddate.com/countdown/i5vf6j5p/n43/cf11/cm0/cu4/ct1/cs1/ca0/co0/cr0/ss0/cac09f/cpc09f/pct/tcfff/fs100/szw576/szh243/iso2017-09-19T10:07:00" allowTransparency="true" frameborder="0" width="177" height="35"></iframe>'
Midterm and final will be designed to ensure you will do fine as long as:
You read deeply all lecture notes before lectures.
You gave an honest attempt (on your own!) to all non bonus problems.
Went over feedback and understood what you got wrong (use OH if needed).
Used the sections and Piazza to solidify your understanding
import random,lzma,lz4.block
def printnum(a):
print("{:,d}".format(len(shorttext)*8))
def compress(b):
return lzma.compress(b,preset=9)
# return lz4.block.compress(b,mode='high_compression',compression=12)
letters = bytes("abcdefghijklmnopqrstuvwxyz","ascii")
longtext = bytearray(1000000)
for i in range(len(longtext)):
longtext[i] = random.choice(letters)
longtext[50:60]
bytearray(b'vptswbixml')
shorttext = compress(longtext) # use built-in Python3 lzma library
shorttext
is compression of 1,000,000 random letters
What is approximately the length in bits of shorttext
?
a. 1,000,000
b. 125,000
c. 4,700,000
d. 8,000,000
printnum(len(shorttext)*8)
4,847,872
English text is about 40% vowels. If we chose longtext2
to be 1M random letters from such distribution and made shorttext2 = compress(longtext2)
, then shorttext2
is going to be:
a. Shorter than shorttext
b. Longer than shorttext
120480684*8 / 10**9 # Compression of 10^9 characters of English Wikipedia XML dump
0.963845472
Best compression algorithms compress English text to less than one bit per character. (e.g. see large text compression benchmark )