TraceCalls.jl is a functional tracing package for debugging and exploring Julia code. It records and displays a tree of function calls. For example, here is how Calculus.jl computes the second-derivative of sin(x) + sqrt(x)
:
using Calculus, TraceCalls
@traceable f(x) = sin(x) + sqrt(x)
trace_derivative = @trace Calculus second_derivative(f, 1.0)
#1() => -1.0914708926553454
Calculus.second_derivative(f, 1.0) => -1.0914708926553454
Calculus.derivative(f, :central) => Calculus.#20
Calculus.derivative(f, :scalar, :central) => Calculus.#20
Calculus.finite_difference_hessian(f, Calculus.#20, 1.0, :central) => -1.0914708926553454
Calculus.finite_difference(Calculus.#20, 1.0, :central) => -1.0914708926553454
Calculus.finite_difference(f, 1.0000060554544523, :central) => 1.0402956965076036
f(1.0000121109455733) => 1.8414835837724568
f(0.9999999999633314) => 1.8414709847697501
Calculus.finite_difference(f, 0.9999939445455476, :central) => 1.0403089152121567
f(1.0) => 1.8414709848078965
f(0.9999878890910952) => 1.8414583857213915
The output of @trace
is a Trace
object --- an explorable tree-like datastructure:
trace_derivative[1][1] # get the first call of the first call.
Calculus.derivative(f, :central) => Calculus.#20
Calculus.derivative(f, :scalar, :central) => Calculus.#20
trace_derivative[1][1].args[2] # get its second argument
:central
It can work as a more-informative stack-trace (which contains values rather than just types - see @stacktrace
):
@trace Calculus second_derivative(f, 0.0)
#3() => DomainError()
Calculus.second_derivative(f, 0.0) => DomainError()
Calculus.derivative(f, :central) => Calculus.#58
Calculus.derivative(f, :scalar, :central) => Calculus.#58
Calculus.finite_difference_hessian(f, Calculus.#58, 0.0, :central) => DomainError()
Calculus.finite_difference(Calculus.#58, 0.0, :central) => DomainError()
Calculus.finite_difference(f, 6.0554544523933395e-6, :central) => 288.35028436717505
f(1.2110908904786679e-5) => 0.0034921840266401915
f(0.0) => 0.0
Calculus.finite_difference(f, -6.0554544523933395e-6, :central) => DomainError()
f(0.0) => 0.0
f(-1.2110908904786679e-5) => DomainError()
greenred(map(:@allocated, trace_derivative)) # compute how many bytes were allocated in each function call
#1() => 256
Calculus.second_derivative(f, 1.0) => 256
Calculus.derivative(f, :central) => 208
Calculus.derivative(f, :scalar, :central) => 208
Calculus.finite_difference_hessian(f, Calculus.#20, 1.0, :central) => 16
Calculus.finite_difference(Calculus.#20, 1.0, :central) => 16
Calculus.finite_difference(f, 1.0000060554544523, :central) => 0
f(1.0000121109455733) => 0
f(0.9999999999633314) => 0
Calculus.finite_difference(f, 0.9999939445455476, :central) => 0
f(1.0) => 0
f(0.9999878890910952) => 0
@traceable function foo()
...
end
@trace (some_function, SomeModule, "some_interactively_included_file.jl") bar(1)
This call to @trace
will execute bar(1)
while tracing all methods of some_function
, all methods defined in SomeModule / "some_interactively_included_file.jl"
, and all methods defined with @traceable
(in this case, foo()
).
Tracing Julia code involves a fair amount of code analysis; TraceCalls.jl will error, warn, or silently skip (when tracing a module) the following:
eval
(func::Functor)(x) = ...
)The @traceable
macro (to be used primarily for interactively-defined functions) remembers the function definition without modifying it. It has no impact on the performance of your code.
It might be tempting to run @trace MyModule some_long_computation()
, but a) there is a small cost to every traced call, and b) a trace with a million calls is too large and unwieldy to be of much use anyway. If you are only interested in the details of one particular my_fun
call that happens during some_long_computation()
, then it's usually better to do this:
trace_my_fun = @trace my_fun some_long_computation() # fast, because we're only tracing one function
trace = @trace MyModule trace_my_fun[3]() # select the third `my_fun()` call, and trace it fully
This strategy also works with @stacktrace
.
@trace some_fn foo(10)
performs these operations:
some_fn
some_fn
with a tracing version (using eval
)foo(10)
some_fn
(using eval
)The downside of this scheme is that @trace
can trigger significant JIT compiling time. Tracing large modules can be slow the first time around, but caching is used to avoid repeated computations.
Consider computing a random walk on a small graph, using LightGraphs.jl
using LightGraphs, TraceCalls
graph = Graph(3) # build an undirected graph with three connected vertices
add_edge!(graph, 1, 2); add_edge!(graph, 2, 3)
trace_walk = @trace LightGraphs randomwalk(graph, 2, 5)
#1() => [2, 1, 2, 3, 2]
LightGraphs.randomwalk({3, 2} undirected simple Int64 graph, 2, 5) => [2, 1, 2, 3, 2]
eltype({3, 2} undirected simple Int64 graph) => Int64
LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3
LightGraphs.nv({3, 2} undirected simple Int64 graph) => 3
eltype({3, 2} undirected simple Int64 graph) => Int64
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph) => Array{Int64,1}[[2], [1, 3], [2]]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 1) => [2]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 1) => [2]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 3) => [2]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 3) => [2]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]
The trace can be indexed:
trace_walk[1][3] # can also be written trace_walk[1,3]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]
trace_walk[1][3][:v] # get the second argument by name. To get it by position, use trace_walk[1][3].args[2]
2
Called:
trace_walk[1,3]() # call `LightGraphs.out_neighbors(...)`
2-element Array{Int64,1}: 1 3
Or pruned (useful for exploring large traces):
pruned_trace = prune(trace_walk,
2, # maximum depth
4) # maximum length of each trace (eg. if foo() calls bar() 100 times, keep the first 4 times)
#1() => [2, 1, 2, 3, 2]
LightGraphs.randomwalk({3, 2} undirected simple Int64 graph, 2, 5) => [2, 1, 2, 3, 2]
eltype({3, 2} undirected simple Int64 graph) => Int64
LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]
LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 1) => [2]
A subcall of a pruned/filtered trace can be reexpanded by retracing it:
@trace LightGraphs pruned_trace[1, 2]() # don't forget the (); it triggers the computation for that call
#7() => 1:3
LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3
LightGraphs.nv({3, 2} undirected simple Int64 graph) => 3
eltype({3, 2} undirected simple Int64 graph) => Int64
LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph) => Array{Int64,1}[[2], [1, 3], [2]]
TraceCalls.jl calls TraceCalls.show_val(io::IO, mime, x)
to display each argument and return value. It defaults to show(io, x)
, but can be customized, either to highlight certain values, or to shorten objects that are not important for the task at hand. Implement TraceCalls.show_val(::IO, ::MIME"text/html", value)
for Atom/IJulia, and TraceCalls.show_val(::IO, ::MIME"text/plain", value)
for the REPL and other non-HTML environments. See the Julia manual on custom pretty-printing for more information, and print_with_color / Crayons.jl for colorful REPL display.
TraceCalls.show_val(io::IO, ::MIME"text/html", v::Vector{Int}) =
write(io, string("[", join([x==2 ? "<font color=red>2</font>" : x for x in v], ","), "]"))
TraceCalls.show_val(io::IO, ::MIME"text/html", ::Graph) = # could also be done with `@show_val_only_type Graph`
write(io, "AnyOldGraph")
trace_walk
#17() => [2,1,2,1,2]
LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]
eltype(AnyOldGraph) => Int64
LightGraphs.vertices(AnyOldGraph) => 1:3
LightGraphs.nv(AnyOldGraph) => 3
eltype(AnyOldGraph) => Int64
LightGraphs.SimpleGraphs.fadj(AnyOldGraph) => Array{Int64,1}[[2], [1, 3], [2]]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 1) => [2]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 1) => [2]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
See also ?@show_val_only_type
, ?TraceCalls.show_call
, ?TraceCalls.show_return_val
and ?highlight
.
Each function call in a trace is represented by an instance of the Trace
structure:
struct Trace
func::Function # the function called
args::Tuple # the positional arguments
kwargs::Tuple # the keyword arguments
called::Vector{Trace} # the functions called within the execution of this function call
value # This is the return value of the func(args...; kwargs...) call, but it's also where
# the result of `map(f, ::Trace)` will be stored.
end
The following filtering functions are provided:
filter(f, trace)
keeps all function calls for which f(::Trace)
is true. It's useful to cut out uninteresting intermediate functions.filter_cutting(f, trace)
is like filter
, but will remove all descendents of the traces for which f(trace)
is false.filter_lineage(f, trace)
keeps all function calls for which f(::Trace)
is true of at least one of its descendents or ancestors. Use it to focus on some interesting part of the trace.# Get rid of the `fadj` calls
filter(trace->trace.func != LightGraphs.SimpleGraphs.fadj, trace_walk)
#17() => [2,1,2,1,2]
LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]
eltype(AnyOldGraph) => Int64
LightGraphs.vertices(AnyOldGraph) => 1:3
LightGraphs.nv(AnyOldGraph) => 3
eltype(AnyOldGraph) => Int64
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
# Focus on the callers and callees of `LightGraphs.out_neighbors(AnyOldGraph, 2)`
filter_lineage(trace -> trace.func==out_neighbors && trace[:v]==2,
trace_walk; highlight=true)
#17() => [2,1,2,1,2]
LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]
LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]
map(f, trace)
applies f
(whether a function or quoted macro such as :@which
) to each Trace
, and stores the result in Trace
's value
field.
# Take the second argument of every call to LightGraphs.out_neighbors
map(trace->trace.func == LightGraphs.out_neighbors ? trace.args[2] : nothing,
filter(trace->trace.func == LightGraphs.out_neighbors, trace_walk))
#21() => nothing
LightGraphs.out_neighbors(AnyOldGraph, 2) => 2
LightGraphs.out_neighbors(AnyOldGraph, 3) => 3
LightGraphs.out_neighbors(AnyOldGraph, 2) => 2
LightGraphs.out_neighbors(AnyOldGraph, 1) => 1
LightGraphs.out_neighbors(AnyOldGraph, 2) => 2
objects_in(trace)
returns all passed arguments and the return value as a vector. For example, filter(tr->any(x==3.1 for x in objects_in(tr)), trace)
keeps all calls that contain the value 3.1
. collect(trace)
returns a Vector
of all Trace
objects in trace
.
Traces can be saved and reloaded with JLD2.jl:
using TraceCalls, JLD2, FileIO
save("test.jld2", "tracing_foo", some_trace)
reloaded_trace = load("test.jld2", "tracing_foo")
Notes:
ReconstructedTypes
instances), they will usually not be callable.import/using
all the modules defining the types and functions that the saved trace refers to.By default, @trace
stores the function call's arguments without copying them. This can yield surprising results when tracing functions that modify their arguments. Consider generating a vector of n
5s using push!
:
@traceable push5!(vec::Vector) = push!(vec, 5)
@traceable function many_5s(n)
vec = Int[]
for i in 1:n
push5!(vec)
end
return vec
end
@trace many_5s(3)
#31() => [5,5,5]
many_5s(3) => [5,5,5]
push5!([5,5,5]) => [5,5,5]
push5!([5,5,5]) => [5,5,5]
push5!([5,5,5]) => [5,5,5]
When push5!
was first called, vec
was empty, but this trace makes it look like it already had three 5s in it. This is because all vectors in that trace are the same object. You can fix this (if you care) by telling TraceCalls to make a copy of every vector argument and return value:
TraceCalls.store(x::Vector) = copy(x) # the default is `store(x) = x`
@trace many_5s(3)
#33() => [5,5,5]
many_5s(3) => [5,5,5]
push5!([]) => [5]
push5!([5]) => [5,5]
push5!([5,5]) => [5,5,5]
Alternatively, it's sometimes simpler to filter out all mutating functions:
tr = @trace many_5s(3)
filter(!is_mutating, tr) # filter out every function that ends with a ! (see https://docs.julialang.org/en/stable/manual/style-guide/#Append-!-to-names-of-functions-that-modify-their-arguments-1)
#35() => [5,5,5]
many_5s(3) => [5,5,5]
The most drastic solution is to have TraceCalls store the text/HTML representation of each value:
TraceCalls.store(x) = REPR(x)
This essentially turns TraceCalls.jl into a traditional, non-inspectable tracing package, but it guarantees that each value is shown as it was when the call was made.
The starting point for debugging exceptions is @stacktrace
. It works like @trace
, but uses filter
to keep only the part of the trace involved in the exception.
using Optim, TraceCalls
@traceable logplus10(x) = log(x[1]+10)
TraceCalls.store(v::Vector) = copy(v)
# Minimize the function x -> log(x[1]+10) starting at x = 0
strace = @stacktrace (Optim, Calculus) optimize(logplus10, [0.0], BFGS())
#37() => DomainError()
Optim.optimize(logplus10, [0.0], Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.Options{Void}(1.0e-32, 1.0e-32, 1.0e-8, false, 1000, false, false, false, false, 1, nothing, NaN)) => DomainError()
Optim.optimize(Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!), [0.0], Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.Options{Void}(1.0e-32, 1.0e-32, 1.0e-8, false, 1000, false, false, false, false, 1, nothing, NaN)) => DomainError()
Optim.update_state!(Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!), Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true)) => DomainError()
Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()
Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()
logplus10([-12.4999]) => DomainError()
Then we can zoom in on the most interesting part of the stack-trace:
strace[1,1,1,1] # could also use `strace[bottom-2]`, or `strace[top+4]` --- bottom/top are special values like `end`
Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()
Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()
logplus10([-12.4999]) => DomainError()
Calling a Trace
object performs its computation. By tracing strace[1,1,1,1]()
, we get the full trace for this part of the computation:
linesearch_trace = @trace (Optim, Calculus) strace[1,1,1,1]()
#39() => DomainError()
Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()
Optim.checked_dphi0!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true)) => -0.009999999996070853
Optim.alphaguess!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), -0.009999999996070853, Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => 1.0
Calculus.finite_difference!(logplus10, [-0.1], [0.133333], :central) => nothing
logplus10([-0.0999939]) => 2.2925353688044074
logplus10([-0.100006]) => 2.2925341454802757
logplus10([-0.1]) => 2.2925347571425285
Calculus.finite_difference!(logplus10, [-0.5], [0.10101], :central) => nothing
logplus10([-0.499994]) => 2.25129243603289
logplus10([-0.500006]) => 2.2512911612003736
logplus10([-0.5]) => 2.251291798616835
Calculus.finite_difference!(logplus10, [-2.5], [0.105263], :central) => nothing
logplus10([-2.49998]) => 2.01490503909053
logplus10([-2.50002]) => 2.0149010021208964
logplus10([-2.5]) => 2.0149030206077505
Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()
logplus10([-12.4999]) => DomainError()
Focusing on just finite_difference!
shows what happened: the line search keeps going for lower and lower values of x
, until it tries log(-12.5+10) == log(-2.5)
, which is a DomainError
.
filter(tr->tr.func==Calculus.finite_difference!, linesearch_trace)
#39() => DomainError()
Calculus.finite_difference!(logplus10, [-0.1], [0.133333], :central) => nothing
Calculus.finite_difference!(logplus10, [-0.5], [0.10101], :central) => nothing
Calculus.finite_difference!(logplus10, [-2.5], [0.105263], :central) => nothing
Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()
We could fix the error (for instance, by changing logplus10
's definition) and call linesearch_trace()
to check that this part of the computation now returns a sensible result.
filter
is useful even when there is no exception to debug. By selecting only certain parts of the trace, it's the TraceCalls equivalent of setting a breakpoint.
TraceCalls.jl implements which(::Trace)
and edit(::Trace)
. They point to the source location for that function call.
When a test used to pass, but now fails after a code change, compare_past_trace
can be used to highlight the differences.
?compare_past_trace
search: compare_past_trace
compare_past_trace(old_trace::Trace; filter_out_equal=true))
reruns every function call in old_trace
, and shows in red where the new result differs from the old. If filter_out_equal==true
, only show the non-equal results.
To use it:
git checkout
the bad code (or make the changes manually - I recommend git stash for switching back and forth). TraceCalls
imports Revise.jl
, so the code changes will take effect automatically.compare_past_trace
on the trace from step 1You may also call compare_past_trace
on an old trace loaded from disk.
To display intermediate results in your traces, use trace_log
instead of println/@show
:
using TraceCalls
@traceable function hypothenuse_length(a, b)
a2 = a * a
b2 = b * b
trace_log(a2=a2, b2=FontColor(:blue, b2)) # FontColor is also useful in Base.show_val or Base.map
return sqrt(a2+b2)
end
@trace hypothenuse_length(3, 4)
#18() => 5.0
hypothenuse_length(3, 4) => 5.0
TraceCalls.trace_log(; a2=9, b2=16) => nothing
measure(:@measuring_macro, trace)
applies the given quoted macro (put a :
in front of the @
) to every function call in trace
. Useful macros/functions are @allocated
and @elapsed
, as well as BenchmarkTools' @belapsed
, median∘benchmark
and minimum∘benchmark
(see here). Because measuring involves rerunning every function call in the trace, it can be slow. A good rule-of-thumb is to use the more accurate @belapsed
when the code to profile takes less than half a second and the trace is relatively short, and @elapsed
when it takes less than a minute. When the trace is very large, try the explore_worst=true
option, which will only profile the worst callee of each call (so the runtime is $O(wd)$ instead of $O(w^d)$). You can also prune
or filter
the trace first.
The output of measure
is a normal, explorable and callable Trace object. Remember that due to timing fluctuations, even if f(x)
calls g(x)
, it does not imply that @elapsed(f(x)) >= @elapsed(g(x))
.
## Profile the PyCall code for accessing Python's `math.pi`
using PyCall, TraceCalls
math = pyimport(:math)
trace_pycall = @trace PyCall math[:pi];
using BenchmarkTools: @belapsed
m_trace = measure(:@belapsed, trace_pycall; normalize=true, threshold=0.005) # only show/explore calls that take >0.5% of total runtime
#8() => 1.0
getindex(PyObject, :pi) => 0.9699
getindex(PyObject, "pi") => 0.114
convert(PyCall.PyAny, PyObject 3.141592653589793) => 0.7366
PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => 0.4832
PyCall.pyint_query(PyObject 3.141592653589793) => 0.1354
PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x000000011ef9cc98) => 0.0663
PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x000000011ef9eaa8) => 0.06704
convert(Float64, PyObject 3.141592653589793) => 0.009888
PyCall.asscalar(PyObject 3.141592653589793) => 0.005534
normalize(m_trace[1,2,1])
can be used to make the numbers in a subtrace more comparable.
code_warntype(trace_pycall[1,2])
returns the type-annotated code for that function call. Similarly for code_llvm, code_lowered, code_native
and code_typed
. is_inferred(::Trace)
uses the same code as Base.Test.@inferred
, but returns true/false
, which is useful with map
:
redgreen(map(is_inferred, trace_pycall)) # ignore the top-most call - it will always be `false`
#1() => false
getindex(PyObject, :pi) => false
getindex(PyObject, "pi") => true
Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject) => true
convert(PyCall.PyAny, PyObject 3.141592653589793) => false
PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => false
PyCall.pyint_query(PyObject 3.141592653589793) => false
PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => true
Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true
PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124958aa8) => true
Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true
PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true
PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true
PyCall.pyfloat_query(PyObject 3.141592653589793) => false
PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124954998) => true
Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true
convert(Float64, PyObject 3.141592653589793) => true
PyCall.asscalar(PyObject 3.141592653589793) => true
PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true
Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true
PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => true
TraceCalls.jl also provides a flat, more efficient profiling tool: trace_benchmark
. It groups the calls by their signature (for example, sin(1.3)
and sin(2.3)
will be in the same group, but sin(1)
will be in a different one) and only profiles one of the calls in each group.
using BenchmarkTools # necessary for trace_benchmark
bench = trace_benchmark(trace_pycall)
2 calls like convert(PyCall.PyAny, PyObject 3.141592653589793) => TrialEstimate(900.426 ns, 16 bytes)
1 call like #1() => TrialEstimate(1.355 μs, 224 bytes)
1 call like getindex(PyObject, :pi) => TrialEstimate(1.350 μs, 224 bytes)
1 call like PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => TrialEstimate(605.153 ns, 0 bytes)
3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => TrialEstimate(86.650 ns, 0 bytes)
1 call like getindex(PyObject, "pi") => TrialEstimate(194.738 ns, 16 bytes)
1 call like PyCall.pyint_query(PyObject 3.141592653589793) => TrialEstimate(177.301 ns, 0 bytes)
3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => TrialEstimate(3.368 ns, 0 bytes)
5 calls like Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject) => TrialEstimate(1.843 ns, 0 bytes)
1 call like PyCall.asscalar(PyObject 3.141592653589793) => TrialEstimate(6.723 ns, 0 bytes)
1 call like PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => TrialEstimate(5.806 ns, 0 bytes)
1 call like PyCall.pyfloat_query(PyObject 3.141592653589793) => TrialEstimate(5.797 ns, 0 bytes)
After doing some code changes, you can rerun the same benchmark with run
(technical note: it will use the same tuning parameters). Then call ratio
or judge
to get a report on regressions and improvements.
bench_new = run(bench);
judge(bench_new, bench; sort=true)
1 call like PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => TrialJudgement(+4.89% => invariant)
2 calls like convert(PyCall.PyAny, PyObject 3.141592653589793) => TrialJudgement(+0.97% => invariant)
1 call like PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => TrialJudgement(+0.90% => invariant)
1 call like PyCall.pyint_query(PyObject 3.141592653589793) => TrialJudgement(+0.34% => invariant)
5 calls like Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject) => TrialJudgement(+0.00% => invariant)
1 call like PyCall.pyfloat_query(PyObject 3.141592653589793) => TrialJudgement(+0.00% => invariant)
3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => TrialJudgement(-0.03% => invariant)
1 call like PyCall.asscalar(PyObject 3.141592653589793) => TrialJudgement(-0.18% => invariant)
1 call like getindex(PyObject, "pi") => TrialJudgement(-0.49% => invariant)
1 call like #1() => TrialJudgement(-1.75% => invariant)
3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => TrialJudgement(-0.97% => invariant)
1 call like getindex(PyObject, :pi) => TrialJudgement(-5.19% => improvement)
It's possible to save the benchmark to disk, but please read these notes carefully. The reloaded benchmark is likely to have one or more "broken" closure, and those will show up in the run/judge
outputs as red exceptions (which can usually be ignored).
Parametric types sometimes trigger very long compilation times. You can use @compilation_times to_trace expr
to estimate each method's total compilation time (in seconds). This is very, very approximate and experimental. This macro has the same syntax and limitations as @trace
. It will estimate the compilation time of every method specialization that appears in @trace to_trace expr
. It is based on precompile
and is subject to precompile's limitations (notably, keyword-argument functions are essentially ignored).
@compilation_times PyCall math[:pi]
DataStructures.OrderedDict{Any,Float64} with 13 entries: convert(::Type{T}, po::PyCall.PyObject) where T<:Real in PyCall… => 0.0822036 getindex(o::PyCall.PyObject, s::Symbol) in PyCall at /Users/ced… => 0.00703522 pyerr_check(msg::AbstractString, val) in PyCall at /Users/cedri… => 9.4938e-5 pyisinstance(o::PyCall.PyObject, t::Union{Ptr{PyCall.PyObject_s… => 9.1195e-5 asscalar(o::PyCall.PyObject) in PyCall at /Users/cedric/.julia/… => 8.8809e-5 convert(::Type{PyCall.PyAny}, o::PyCall.PyObject) in PyCall at … => 8.6389e-5 unsafe_convert(::Type{Ptr{PyCall.PyObject_struct}}, po::PyCall.… => 6.4736e-5 pytype_query(o::PyCall.PyObject, default::Union{Tuple{Vararg{Ty… => 4.3544e-5 (::##29#30)() in Main at /Users/cedric/.julia/v0.6/TraceCalls/s… => 1.2242e-5 getindex(o::PyCall.PyObject, s::AbstractString) in PyCall at /U… => 8.009e-6 pyint_query(o::PyCall.PyObject) in PyCall at /Users/cedric/.jul… => 4.477e-6 pyfloat_query(o::PyCall.PyObject) in PyCall at /Users/cedric/.j… => 3.236e-6 pyisinstance(o::PyCall.PyObject, t::PyCall.PyObject) in PyCall … => 3.118e-6
To improve compilation time, consider ::ANY, SnoopCompile.jl, and using parametric types less.
Technical note: consider the two function definitions traced_1() = traced_2() + untraced()
and traced_3() = untraced()
. The compilation time of traced_2
won't be part of the compilation time of traced_1
(by design), but untraced()
may be counted in either traced_1
or traced_3
, depending on which one is profiled first. You can either accept this as measurement noise, or make sure that untraced
is traced.