Let's check the implementation of an AVl Tree (https://bitbucket.org/snippets/lion137/88nLq)
import AVL_tree as tr
As a tool I will use python cProfiler; creating and make an instance of a profiler class:
import cProfile
import pstats
profiler = cProfile.Profile()
Test file now: creating symbole table with milion ints.
%%writefile tree_tests.py
import random
def test():
t1 = tr.BinarySearchTree()
for _ in range(int(1e6)):
t1[str(_)] = _
#for _ in range(int(1e5)):
#t1.get(random.randint(-10000, 10000))
if __name__ == '__main__':
test()
Overwriting tree_tests.py
# %load test file now
import random
def test():
t1 = tr.BinarySearchTree()
for _ in range(int(1e6)):
t1[str(_)] = _
#for _ in range(int(1e5)):
#t1.get(random.randint(-10000, 10000))
if __name__ == '__main__':
test()
profiler = cProfile.Profile() # make an instance
profiler.runcall(test) # profiler's doing the job...
profiler.print_stats() # and printing stats
61854400 function calls (38691689 primitive calls) in 40.765 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 1.360 1.360 40.765 40.765 <ipython-input-20-917dc017f4b6>:4(test) 1000000 0.797 0.000 0.797 0.000 AVL_tree.py:10(__init__) 1000000 0.478 0.000 39.405 0.000 AVL_tree.py:111(__setitem__) 1000000 0.722 0.000 38.927 0.000 AVL_tree.py:114(put) 21140998/999999 18.283 0.000 38.205 0.000 AVL_tree.py:122(_put) 898233 2.148 0.000 7.959 0.000 AVL_tree.py:142(rotateLeft) 352954 0.831 0.000 1.201 0.000 AVL_tree.py:170(rotateRight) 4021711/999999 5.019 0.000 16.360 0.000 AVL_tree.py:197(updateBalance) 6218045 0.849 0.000 0.849 0.000 AVL_tree.py:21(hasLeftChild) 919456 0.755 0.000 9.914 0.000 AVL_tree.py:211(rebalance) 14922953 1.915 0.000 1.915 0.000 AVL_tree.py:24(hasRightChild) 4353400 1.194 0.000 1.194 0.000 AVL_tree.py:27(isLeftChild) 2273086 0.569 0.000 0.569 0.000 AVL_tree.py:30(isRightChild) 1251187 0.214 0.000 0.214 0.000 AVL_tree.py:33(isRoot) 1 0.000 0.000 0.000 0.000 AVL_tree.py:98(__init__) 1251187 0.366 0.000 0.366 0.000 {built-in method builtins.max} 1251187 5.266 0.000 5.266 0.000 {built-in method builtins.min} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
There is lots calls to updatebalance, also, built in min consumes 4.915.
We can, also, make nice visualization via snakeviz (pip install snakeviz)
profiler2 = cProfile.Profile()
profiler2.runcall(test)
stats = pstats.Stats('tree_tests.stats') # stats in file for later
Snakeviz is starting in new tab and it's look like (in real is interactive):
! snakeviz tree_tests.stats
snakeviz web server started on 127.0.0.1:8080; enter Ctrl-C to exit http://127.0.0.1:8080/snakeviz/%2Fhome%2Flion%2Fprojects%2Fpython%2Fnotebooks%2Fblog_and_notes%2Fprofiling%2Ftree_tests.stats ^C Bye!
from IPython.display import HTML, display
display(HTML('/home/lion/projects/python/notebooks/blog_and_notes/profiling/tree_tests.html'))
An error occurred processing your profile. You can try a lower depth, a larger cutoff, or try profiling a smaller portion of your code. If you continue to have problems you can contact us on GitHub.
Now, I'll try to replace calls to min and max with local functions.
import AVL as tr # import local copy, to not mess with the file in the path
# %load tree_tests.py
import random
def test():
t1 = tr.BinarySearchTree()
for _ in range(int(1e6)):
t1[str(_)] = _
#for _ in range(int(1e5)):
#t1.get(random.randint(-10000, 10000))
if __name__ == '__main__':
test()
profiler = cProfile.Profile()
profiler.runcall(test)
profiler.print_stats()
61854400 function calls (38691689 primitive calls) in 38.852 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 1.338 1.338 38.852 38.852 <ipython-input-10-2f1ccca03190>:4(test) 1 0.000 0.000 0.000 0.000 AVL.py:102(__init__) 1000000 0.479 0.000 37.514 0.000 AVL.py:115(__setitem__) 1000000 0.717 0.000 37.035 0.000 AVL.py:118(put) 21140998/999999 19.022 0.000 36.318 0.000 AVL.py:126(_put) 1000000 3.849 0.000 3.849 0.000 AVL.py:14(__init__) 898233 2.082 0.000 2.743 0.000 AVL.py:146(rotateLeft) 352954 0.802 0.000 1.053 0.000 AVL.py:174(rotateRight) 4021711/999999 4.924 0.000 10.783 0.000 AVL.py:201(updateBalance) 919456 0.713 0.000 4.508 0.000 AVL.py:215(rebalance) 6218045 0.825 0.000 0.825 0.000 AVL.py:25(hasLeftChild) 14922953 1.838 0.000 1.838 0.000 AVL.py:28(hasRightChild) 4353400 1.122 0.000 1.122 0.000 AVL.py:31(isLeftChild) 2273086 0.540 0.000 0.540 0.000 AVL.py:34(isRightChild) 1251187 0.189 0.000 0.189 0.000 AVL.py:37(isRoot) 1251187 0.217 0.000 0.217 0.000 AVL.py:5(my_min) 1251187 0.194 0.000 0.194 0.000 AVL.py:9(my_max) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
As is seen, in general is not a big deal, but calls to my_max and my_min take now 0.411 secs, before was: 5.632 - so there is an improvement.
There is also line by line profilling, another method of visualization called pycallgraph and more. You can check here: http://blog.thehumangeo.com/2015/07/28/profiling-in-python/ and also there is great video from pycon here: https://www.youtube.com/watch?v=JDSGVvMwNM8 .