FProfile.jl provides an alternative interface for Julia's sampling profiler (@profile
). If you've never used a sampling profiler before, please read the introduction of this document before proceeding .
You can build a profile by calling @fprofile(code, delay=0.001, n_samples=1000000)
:
using FProfile, Calculus
pd = @fprofile second_derivative(sin, 1.0)
ProfileData(49 backtraces)
@fprofile(N, ...)
is shorthand for @fprofile(for _ in 1:N ... end)
:
pd = @fprofile 1000000 second_derivative(sin, 1.0)
ProfileData(910 backtraces)
Do not forget that Julia compiles code the first time a function is run; if you do not want to measure compilation time, execute your code once before profiling.
using DataFrames
df = flat(pd)
head(df, 15) # show only the first 15 rows (the 15 rows with the highest counts)
count_pct | stackframe | |
---|---|---|
1 | 99.89 | execute_request(::ZMQ.Socket, ::IJulia.Msg) at execute_request.jl:154 |
2 | 99.89 | anonymous at <missing>:? |
3 | 99.89 | macro expansion at FProfile.jl:52 [inlined] |
4 | 99.89 | (::IJulia.##14#17)() at task.jl:335 |
5 | 99.89 | eventloop(::ZMQ.Socket) at eventloop.jl:8 |
6 | 99.89 | macro expansion at profile.jl:23 [inlined] |
7 | 99.89 | include_string(::Module, ::String, ::String) at Compat.jl:464 |
8 | 99.89 | include_string(::String, ::String) at loading.jl:515 |
9 | 99.67 | macro expansion at FProfile.jl:67 [inlined] |
10 | 99.23 | second_derivative(::Function, ::Float64) at derivative.jl:71 |
11 | 53.63 | derivative(::Function, ::Symbol, ::Symbol) at derivative.jl:3 |
12 | 18.35 | finite_difference_hessian(::Function, ::Function, ::Float64, ::Symbol) at finite_difference.jl:224 |
13 | 11.43 | finite_difference(::Calculus.##1#3{Base.#sin,Symbol}, ::Float64, ::Symbol) at finite_difference.jl:55 |
14 | 6.26 | sin at math.jl:419 [inlined] |
15 | 6.15 | finite_difference(::Calculus.##1#3{Base.#sin,Symbol}, ::Float64, ::Symbol) at finite_difference.jl:27 |
(REPL note: if the output of flat
is incomplete, try showall(flat(pd))
or Matrix(flat(pd))
)
The first column shows what fraction of backtraces (in %) go through the method at file:line_number
in the stackframe
column. It's the same quantity as in Base.Profile.print()
, except for recursive calls: if factorial(2)
calls factorial(1)
, that's 2 counts in Base's report, but only 1 count in FProfile.
You can select a subset of the dataframe by using one of the five accessors: get_specialization, get_method, get_file, get_function
and get_module
.
df[get_function.(df[:stackframe]) .== derivative, :] # select the `derivative` rows
count_pct | stackframe | |
---|---|---|
1 | 53.63 | derivative(::Function, ::Symbol, ::Symbol) at derivative.jl:3 |
2 | 0.66 | derivative(::Function, ::Symbol, ::Symbol) at derivative.jl:0 |
It is common to focus optimization efforts on one or more modules at a time (... the ones you're developing). flat(pd, MyModule)
filters out other modules and adds a useful column: self_pct
measures how much MyModule
-specific work is done on that line.
For instance, in the code below, while the do_computation()
call takes a long time (it has a high count_percent
), it merely calls another Main
function, so it has a low self_pct
. sum_of_sin
has self_pct = 100%
because while it calls sum
and sin
, those are defined in another module (Base
), and counted as external to Main
.
flat(pd, (Module1, Module2, ...))
is also accepted.
@noinline do_computation(n) = sum_of_sin(n)
@noinline sum_of_sin(n) = sum(sin, 1:n)
pd2 = @fprofile do_computation(10000000)
flat(pd2, Main)
count_pct | self_pct | stackframe | |
---|---|---|---|
1 | 100.0 | 0.0 | do_computation(::Int64) at In[5]:1 |
2 | 100.0 | 100.0 | sum_of_sin(::Int64) at In[5]:2 |
It pays to make sure that functions with a high self_pct
are well optimized.
Another way to reduce the level of detail is to aggregate by :specialization, :method, :file, :function
, or :module
.
df_by_fn = flat(pd, combineby=:function)
count_pct | function | |
---|---|---|
1 | 99.89 | IJulia.execute_request |
2 | 99.89 | include_string |
3 | 99.89 | IJulia.eventloop |
4 | 99.89 | IJulia.#14 |
5 | 99.89 | FProfile.MissingInfo.missing_info |
6 | 99.45 | Calculus.second_derivative |
7 | 54.29 | Calculus.derivative |
8 | 18.9 | Calculus.finite_difference_hessian |
9 | 14.95 | Calculus.finite_difference |
You can see the context (caller/called functions) around each of these rows by passing it to tree
:
tree(pd, df_by_fn, 9) # show the context of the 9th row of `df_by_method`
136 .../finite_difference.jl:224; finite_difference_hessian(::Funct... 104 .../finite_difference.jl:55; finite_difference(::Calculus.##1#... 65 .../finite_difference.jl:55; finite_difference(::Base.#sin, ::... 57 ./math.jl:419; FProfile.MissingInfo.missing_info 27 .../finite_difference.jl:27; finite_difference(::Base.#sin, ::... 3 .../finite_difference.jl:0; finite_difference(::Base.#sin, ::... 1 .../finite_difference.jl:54; finite_difference(::Base.#sin, ::... 29 .../finite_difference.jl:27; finite_difference(::Calculus.##1#... 3 .../finite_difference.jl:0; finite_difference(::Calculus.##1#...
Other useful dataframe commands:
sort(df, :self_pct, rev=true) # sort by self_pct
showall(df) # show the whole dataframe
See ?flat
for more options.
Pass two ProfileData
objects to flat
to compare them. The results are sorted with the biggest regressions (in absolute terms) at the top and the biggest improvements at the bottom (see ?DataFrames.tail
).
pd2 = @fprofile 1000000 second_derivative(sin, 1.0)
flat(pd, pd2, combineby=:function)
count_pct_1 | count_pct_2 | count_pct_diff | function | |
---|---|---|---|---|
1 | 14.95 | 16.21 | 1.26 | Calculus.finite_difference |
2 | 18.9 | 19.96 | 1.06 | Calculus.finite_difference_hessian |
3 | 99.45 | 99.56 | 0.11 | Calculus.second_derivative |
4 | 99.89 | 99.78 | -0.11 | IJulia.execute_request |
5 | 99.89 | 99.78 | -0.11 | include_string |
6 | 99.89 | 99.78 | -0.11 | IJulia.eventloop |
7 | 99.89 | 99.78 | -0.11 | IJulia.#14 |
8 | 99.89 | 99.78 | -0.11 | FProfile.MissingInfo.missing_info |
9 | 54.29 | 52.48 | -1.81 | Calculus.derivative |
Of course, this is most useful when comparing different algorithms or commits (use reload
or Revise.jl to update your code). The differences in the above table are just noise.
FProfile's tree view looks the same as Base.Profile.print(format=:tree)
. The numbers represent raw counts. (If some branches seem out of place, see this issue)
tr = tree(pd)
909 ./task.jl:335; (::IJulia.##14#17)() 909 ...Julia/src/eventloop.jl:8; eventloop(::ZMQ.Socket) 909 ...rc/execute_request.jl:154; execute_request(::ZMQ.Socket, ::... 909 ...Compat/src/Compat.jl:464; include_string(::Module, ::Stri... 909 ./loading.jl:515; include_string(::String, ::String) 909 ./<missing>:?; FProfile.MissingInfo.missing_info 909 ...ile/src/FProfile.jl:52; FProfile.MissingInfo.missing_info 909 ./profile.jl:23; FProfile.MissingInfo.missing_info 907 ...ile/src/FProfile.jl:67; FProfile.MissingInfo.missing_info 903 .../src/derivative.jl:71; second_derivative(::Function... 488 ...src/derivative.jl:3; derivative(::Function, ::Sy... 167 ...ite_difference.jl:224; finite_difference_hessian(:... 104 ...ite_difference.jl:55; finite_difference(::Calculu... 65 ...ite_difference.jl:55; finite_difference(::Base.#s... 57 ./math.jl:419; FProfile.MissingInfo.missi... 1 ./math.jl:300; FProfile.MissingInfo.missi... 27 ...ite_difference.jl:27; finite_difference(::Base.#s... 3 ...ite_difference.jl:0; finite_difference(::Base.#s... 1 ...ite_difference.jl:54; finite_difference(::Base.#s... 29 ...ite_difference.jl:27; finite_difference(::Calculu... 3 ...ite_difference.jl:0; finite_difference(::Calculu... 5 ...ite_difference.jl:0; finite_difference_hessian(:... 5 ...src/derivative.jl:0; derivative(::Function, ::Sy... 2 .../src/derivative.jl:0; second_derivative(::Function... 1 ...lus/src/derivative.jl:0; derivative(::Function, ::Symbol, ...
Like flat
reports, trees can be aggregated by :specialization, :method, :file, :function
, or :module
:
tree(pd, combineby=:module)
909 IJulia 909 Compat 909 Base 909 FProfile.MissingInfo 905 Calculus 57 FProfile.MissingInfo 1 Calculus
If you're only interested in a particular module/file/method/function, you can pass it to tree
, along with an optional neighborhood range.
tr_deriv = tree(pd, second_derivative, -1:1) # -1:1 = show one level of callers and one level of called functions
905 ...rofile/src/FProfile.jl:67; FProfile.MissingInfo.missing_info 903 ...lus/src/derivative.jl:71; second_derivative(::Function, ::F... 488 ...lus/src/derivative.jl:3; derivative(::Function, ::Symbol,... 167 .../finite_difference.jl:224; finite_difference_hessian(::Func... 5 .../finite_difference.jl:0; finite_difference_hessian(::Func... 5 ...lus/src/derivative.jl:0; derivative(::Function, ::Symbol,... 2 ...lus/src/derivative.jl:0; second_derivative(::Function, ::F...
Trees are an indexable, prunable (use prune(tree, depth)
) and filterable datastructure. Use the accessors (see above) and is_inline/is_C_call
in your filter
predicate.
ProfileData
objects can be passed to ProfileView.view
. This is purely a convenience; it's equivalent to normal ProfileView usage. See ProfileView.jl for details.
using ProfileView
pd = @fprofile ...
ProfileView.view(pd)
(if you want to build your own analysis)
The raw profiler data is available either through Base.Profile.retrieve()
, or through pd.data
and pd.lidict
. However, you might find FProfile.backtraces(::ProfileData)
more immediately useful.
count, trace = backtraces(pd)[1] # get the first unique backtrace
@show count # the number of times that trace occurs in the raw data
trace
count = 2
8-element Array{StackFrame,1}: (::IJulia.##14#17)() at task.jl:335 eventloop(::ZMQ.Socket) at eventloop.jl:8 execute_request(::ZMQ.Socket, ::IJulia.Msg) at execute_request.jl:154 include_string(::Module, ::String, ::String) at Compat.jl:464 include_string(::String, ::String) at loading.jl:515 anonymous at <missing>:? macro expansion at FProfile.jl:52 [inlined] macro expansion at profile.jl:23 [inlined]
Use the get_method, get_file, ...
functions on StackFrame
objects (see above). tree(pd::ProfileData)
is defined as tree(backtraces(pd))
, and similarly for flat
, so you can modify the backtraces and get a tree/flat view of the results.