%pylab inline
import numpy as np
import time
import flops_benchmark
import pandas as pd
import cPickle as pickle
import seaborn as sns
sns.set_style('whitegrid')
Populating the interactive namespace from numpy and matplotlib
We are going to benchmark the simple function below, which simply generates two matrices and computes their matrix (dot) product. The matrices are of size MAT_N
and we will compute the product loopcount
times.
def compute_flops(loopcount, MAT_N):
A = np.arange(MAT_N**2, dtype=np.float64).reshape(MAT_N, MAT_N)
B = np.arange(MAT_N**2, dtype=np.float64).reshape(MAT_N, MAT_N)
t1 = time.time()
for i in range(loopcount):
c = np.sum(np.dot(A, B))
FLOPS = 2 * MAT_N**3 * loopcount
t2 = time.time()
return FLOPS / (t2-t1)
All of the actual benchmark code is in a stand-alone python file, which you can call as follows. It places the output in small.pickle
!python flops_benchmark.py --workers=10 --loopcount=10 --matn=1024 --outfile="small.pickle"
('invocation done, dur=', 2.776315927505493) ('callset id: ', '0c7bfb9e-a511-47db-a204-aa312584b6fd') ('total time', 27.661000967025757) 7.76357894843 GFLOPS
We can plot a histogram of the results:
exp_results = pickle.load(open("small.pickle", 'r'))
results_df = flops_benchmark.results_to_dataframe(exp_results)
sns.distplot(results_df.intra_func_flops/1e9, bins=np.arange(10, 30), kde=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7f522df832d0>
Now we will run a very large number of lambdas simultaneously. Note that this is very dependent on the maximum number of simultaneous lambdas that AWS has enabled for your account. You can e-mail them and ask for a limit increase. Note that due to stragglers this can also take a while
# My account has a lambda limit of 3000 simultaneous lambdas, so I'm using 2800 to give us some headroom
!python flops_benchmark.py --workers=2800 --loopcount=10 --matn=4096 --outfile="big.pickle"
('invocation done, dur=', 27.324092149734497) ('callset id: ', '3f39c719-f8fc-4c5b-b218-9ea6ad4f5ba3') ('total time', 230.5793092250824) 16689.6618354 GFLOPS
big_exp_results = pickle.load(open("big.pickle", 'r'))
big_results_df = flops_benchmark.results_to_dataframe(big_exp_results)
sns.distplot(results_df.intra_func_flops/1e9, bins=np.arange(10, 36), kde=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7f522c71af10>
est_total_flops = big_results_df['est_flops']
total_jobs = len(big_results_df)
JOB_GFLOPS = est_total_flops /1e9 /total_jobs
# grid jobs running time
time_offset = np.min(big_results_df.host_submit_time)
max_time = np.max(big_results_df.download_output_timestamp ) - time_offset
runtime_bins = np.linspace(0, max_time, max_time, endpoint=False)
runtime_flops_hist = np.zeros((len(big_results_df), len(runtime_bins)))
for i in range(len(big_results_df)):
row = big_results_df.iloc[i]
s = (row.start_time + row.setup_time) - time_offset
e = row.end_time - time_offset
a, b = np.searchsorted(runtime_bins, [s, e])
if b-a > 0:
runtime_flops_hist[i, a:b] = row.est_flops / float(b-a)
results_by_endtime = big_results_df.sort_values('download_output_timestamp')
results_by_endtime['job_endtime_zeroed'] = big_results_df.download_output_timestamp - time_offset
results_by_endtime['flops_done'] = results_by_endtime.est_flops.cumsum()
results_by_endtime['rolling_flops_rate'] = results_by_endtime.flops_done/results_by_endtime.job_endtime_zeroed
fig = pylab.figure(figsize=(4, 3))
ax = fig.add_subplot(1, 1, 1)
ax.plot(runtime_flops_hist.sum(axis=0)/1e9, label='peak GFLOPS')
ax.plot(results_by_endtime.job_endtime_zeroed,
results_by_endtime.rolling_flops_rate/1e9, label='effective GFLOPS')
ax.set_xlabel('time (sec)')
ax.set_ylabel("GFLOPS")
pylab.legend()
ax.grid(False)
sns.despine()
fig.tight_layout()
fig.savefig("flops_benchmark.gflops.png")
fig.savefig("flops_benchmark.gflops.pdf")
This plot computes two things:
We see "peak GFLOPS" peaks in the middle of the job, when all 2800 lambdas are running at once. "Effective GFLOPS" starts climbing as results quickly return, but stragglers mean that in reality our total effective GFLOPS for the job is only ~20GFLOPS. Still not bad for pure python!