07/01/2018
Python has many wonderfully useful datatypes baked in. They are easy to use and the garbage collectors easily let's us forget what we have allocated. But of course there is a cost to everything. Here is a little finger exercise sys.getsizeof
, which reports a Python object's memory size in byte.
First, let's look at what is the minimal size in memory any type requires.
import sys
sys.getsizeof('')
49
sys.getsizeof(42)
28
sys.getsizeof(1.3)
24
sys.getsizeof(True)
28
sys.getsizeof(tuple())
48
sys.getsizeof([])
64
sys.getsizeof(set())
224
sys.getsizeof(dict())
240
Second, let' examine how memory grows as we add more and more elements.
import numpy as np
import time
from matplotlib import pyplot as plt
d = dict(); size_d = []
m = set(); size_m = []
l = list(); size_l = []
s = ''; size_s = []
t = tuple(); size_t = []
times = []
t0 = time.time()
for i in range(1, 100001):
d[i] = i
size_d.append(sys.getsizeof(d))
m.add(i)
size_m.append(sys.getsizeof(m))
l.append(i)
size_l.append(sys.getsizeof(l))
s += '.'
size_s.append(sys.getsizeof(s))
t = tuple(d)
size_t.append(sys.getsizeof(t))
# collect timing information
t1 = time.time()
times.append(t1 - t0)
t0 = t1
fig = plt.figure(figsize=(10, 10))
plt.loglog(size_d, label='dict')
plt.loglog(size_m, label='set')
plt.loglog(size_l, label='list')
plt.loglog(size_s, label='string')
plt.xlabel('#elements')
plt.ylabel('memory size [bytes]')
plt.loglog(size_t, label='tuple')
plt.legend()
plt.title(i)
plt.show()
fig = plt.figure(figsize=(10, 10))
plt.title(i)
plt.plot(size_d, label='dict')
plt.plot(size_m, label='set')
plt.plot(size_l, label='list')
plt.plot(size_s, label='string')
plt.plot(size_t, label='tuple')
plt.xlabel('#elements')
plt.ylabel('memory size [bytes]')
plt.legend()
plt.show()
N = 100000
size_t[-1] / N, size_s[-1] / N, size_d[-1] / N, size_l[-1] / 100000, size_m[-1] / 100000
(8.00048, 1.00049, 52.42976, 8.24464, 41.94528)
The plots above show the memory size for each mutable type, first as a loglog plot and second as a linear plot.
As expected all data structures grow linearly with their number of stored elements. However there are some interesting details. The string grows the slowest and very reliably with 1 byte per element. Next come tuple and list the also grow quiet smoothly with 8 bytes per elements. Here however, the list seems to allocate memory in larger steps, possibly to optimize runtime. This leads to obvious steps in the memory size.
Finally, set() and dict() show the most complex behavior, growing in larger steps and on average with 40-55 bytes per element. The dict() allocates approximately twice as often then the set() data structure.
fig = plt.figure(figsize=(10, 10))
plt.title('runtime ' + str(i))
plt.ylabel('runtime [s]')
plt.xlabel('size')
plt.plot(times)
[<matplotlib.lines.Line2D at 0x10c8e7e80>]
Finally, we turn to the run time. Due to the way, the runtime information was collected this plot only shows the sum of allocation time for all data structure. The main take away here is that especially for less than 10,000 elements the allocation time typically stays on the order of a few or below one milliseconds. For 40,000 or more elements sometimes allocation may spike and take up a full 0.150 seconds for one step.