Notebook

Contents¶

Overview

Tracing code

Syntax
Tracing tips
Implementation

Manipulating traces

Custom text/HTML
Map, filter, collect

Saving traces to disk

Working with mutable state

Debugging with traces

Debugging test failures
Displaying extra information

Profiling

Type stability
Grouping and benchmarking
Profiling the compiler

Overview¶

TraceCalls.jl is a functional tracing package for debugging and exploring Julia code. It records and displays a tree of function calls. For example, here is how Calculus.jl computes the second-derivative of sin(x) + sqrt(x):

In [1]:

using Calculus, TraceCalls

@traceable f(x) = sin(x) + sqrt(x)
trace_derivative = @trace Calculus second_derivative(f, 1.0)

Out[1]:

#1() => -1.0914708926553454

Calculus.second_derivative(f, 1.0) => -1.0914708926553454

Calculus.derivative(f, :central) => Calculus.#20

Calculus.derivative(f, :scalar, :central) => Calculus.#20

Calculus.finite_difference_hessian(f, Calculus.#20, 1.0, :central) => -1.0914708926553454

Calculus.finite_difference(Calculus.#20, 1.0, :central) => -1.0914708926553454

Calculus.finite_difference(f, 1.0000060554544523, :central) => 1.0402956965076036

f(1.0000121109455733) => 1.8414835837724568

f(0.9999999999633314) => 1.8414709847697501

Calculus.finite_difference(f, 0.9999939445455476, :central) => 1.0403089152121567

```
f(1.0) => 1.8414709848078965
```

f(0.9999878890910952) => 1.8414583857213915

The output of @trace is a Trace object --- an explorable tree-like datastructure:

In [2]:

trace_derivative[1][1]          # get the first call of the first call.

Out[2]:

Calculus.derivative(f, :central) => Calculus.#20

Calculus.derivative(f, :scalar, :central) => Calculus.#20

In [3]:

trace_derivative[1][1].args[2]  # get its second argument

Out[3]:

:central

It can work as a more-informative stack-trace (which contains values rather than just types - see @stacktrace):

In [4]:

@trace Calculus second_derivative(f, 0.0)

Out[4]:

#3() => DomainError()

Calculus.second_derivative(f, 0.0) => DomainError()

Calculus.derivative(f, :central) => Calculus.#58

Calculus.derivative(f, :scalar, :central) => Calculus.#58

Calculus.finite_difference_hessian(f, Calculus.#58, 0.0, :central) => DomainError()

Calculus.finite_difference(Calculus.#58, 0.0, :central) => DomainError()

Calculus.finite_difference(f, 6.0554544523933395e-6, :central) => 288.35028436717505

f(1.2110908904786679e-5) => 0.0034921840266401915

```
f(0.0) => 0.0
```

Calculus.finite_difference(f, -6.0554544523933395e-6, :central) => DomainError()

```
f(0.0) => 0.0
```

f(-1.2110908904786679e-5) => DomainError()

And finally, since full call data is recorded, we can rerun every part of the trace for profiling, debugging, and testing.

In [18]:

greenred(map(:@allocated, trace_derivative))    # compute how many bytes were allocated in each function call

Out[18]:

#1() => 256

Calculus.second_derivative(f, 1.0) => 256

Calculus.derivative(f, :central) => 208

Calculus.derivative(f, :scalar, :central) => 208

Calculus.finite_difference_hessian(f, Calculus.#20, 1.0, :central) => 16

Calculus.finite_difference(Calculus.#20, 1.0, :central) => 16

Calculus.finite_difference(f, 1.0000060554544523, :central) => 0

```
f(1.0000121109455733) => 0
```
```
f(0.9999999999633314) => 0
```

Calculus.finite_difference(f, 0.9999939445455476, :central) => 0

```
f(1.0) => 0
```
```
f(0.9999878890910952) => 0
```

Tracing code¶

Syntax¶

@traceable function foo()
...
end

@trace (some_function, SomeModule, "some_interactively_included_file.jl") bar(1)

This call to @trace will execute bar(1) while tracing all methods of some_function, all methods defined in SomeModule / "some_interactively_included_file.jl", and all methods defined with @traceable (in this case, foo()).

Tracing Julia code involves a fair amount of code analysis; TraceCalls.jl will error, warn, or silently skip (when tracing a module) the following:

Inner constructors
Functions that are defined by eval
Callable objects (eg. (func::Functor)(x) = ...)

The @traceable macro (to be used primarily for interactively-defined functions) remembers the function definition without modifying it. It has no impact on the performance of your code.

Tracing tips¶

It might be tempting to run @trace MyModule some_long_computation(), but a) there is a small cost to every traced call, and b) a trace with a million calls is too large and unwieldy to be of much use anyway. If you are only interested in the details of one particular my_fun call that happens during some_long_computation(), then it's usually better to do this:

trace_my_fun = @trace my_fun some_long_computation()     # fast, because we're only tracing one function
trace = @trace MyModule trace_my_fun[3]()                # select the third `my_fun()` call, and trace it fully

This strategy also works with @stacktrace.

Implementation of @trace¶

@trace some_fn foo(10) performs these operations:

Find the source code of some_fn
Replace the definition(s) of some_fn with a tracing version (using eval)
Run foo(10)
Restore the original definition(s) of some_fn (using eval)

The downside of this scheme is that @trace can trigger significant JIT compiling time. Tracing large modules can be slow the first time around, but caching is used to avoid repeated computations.

Manipulating traces¶

Consider computing a random walk on a small graph, using LightGraphs.jl

In [1]:

using LightGraphs, TraceCalls

graph = Graph(3)                                      # build an undirected graph with three connected vertices
add_edge!(graph, 1, 2); add_edge!(graph, 2, 3) 

trace_walk = @trace LightGraphs randomwalk(graph, 2, 5)

Out[1]:

#1() => [2, 1, 2, 3, 2]

LightGraphs.randomwalk({3, 2} undirected simple Int64 graph, 2, 5) => [2, 1, 2, 3, 2]

eltype({3, 2} undirected simple Int64 graph) => Int64

LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3

LightGraphs.nv({3, 2} undirected simple Int64 graph) => 3

eltype({3, 2} undirected simple Int64 graph) => Int64

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph) => Array{Int64,1}[[2], [1, 3], [2]]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 1) => [2]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 1) => [2]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 3) => [2]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 3) => [2]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]

The trace can be indexed:

In [2]:

trace_walk[1][3]    # can also be written trace_walk[1,3]

Out[2]:

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph, 2) => [1, 3]

In [3]:

trace_walk[1][3][:v]   # get the second argument by name. To get it by position, use trace_walk[1][3].args[2]

Out[3]:

Called:

In [4]:

trace_walk[1,3]()   # call `LightGraphs.out_neighbors(...)`

Out[4]:

2-element Array{Int64,1}:
 1
 3

Or pruned (useful for exploring large traces):

In [5]:

pruned_trace = prune(trace_walk, 
                     2,  # maximum depth
                     4)  # maximum length of each trace (eg. if foo() calls bar() 100 times, keep the first 4 times)

Out[5]:

#1() => [2, 1, 2, 3, 2]

LightGraphs.randomwalk({3, 2} undirected simple Int64 graph, 2, 5) => [2, 1, 2, 3, 2]

eltype({3, 2} undirected simple Int64 graph) => Int64

LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 2) => [1, 3]

LightGraphs.out_neighbors({3, 2} undirected simple Int64 graph, 1) => [2]

A subcall of a pruned/filtered trace can be reexpanded by retracing it:

In [8]:

@trace LightGraphs pruned_trace[1, 2]()   # don't forget the (); it triggers the computation for that call

Out[8]:

#7() => 1:3

LightGraphs.vertices({3, 2} undirected simple Int64 graph) => 1:3

LightGraphs.nv({3, 2} undirected simple Int64 graph) => 3

eltype({3, 2} undirected simple Int64 graph) => Int64

LightGraphs.SimpleGraphs.fadj({3, 2} undirected simple Int64 graph) => Array{Int64,1}[[2], [1, 3], [2]]

Custom text/HTML¶

TraceCalls.jl calls TraceCalls.show_val(io::IO, mime, x) to display each argument and return value. It defaults to show(io, x), but can be customized, either to highlight certain values, or to shorten objects that are not important for the task at hand. Implement TraceCalls.show_val(::IO, ::MIME"text/html", value) for Atom/IJulia, and TraceCalls.show_val(::IO, ::MIME"text/plain", value) for the REPL and other non-HTML environments. See the Julia manual on custom pretty-printing for more information, and print_with_color / Crayons.jl for colorful REPL display.

In [10]:

TraceCalls.show_val(io::IO, ::MIME"text/html", v::Vector{Int}) = 
    write(io, string("[", join([x==2 ? "<font color=red>2</font>" : x for x in v], ","), "]"))
TraceCalls.show_val(io::IO, ::MIME"text/html", ::Graph) =  # could also be done with `@show_val_only_type Graph`
    write(io, "AnyOldGraph")   
trace_walk

Out[10]:

#17() => [2,1,2,1,2]

LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]

```
eltype(AnyOldGraph) => Int64
```

LightGraphs.vertices(AnyOldGraph) => 1:3

LightGraphs.nv(AnyOldGraph) => 3

```
eltype(AnyOldGraph) => Int64
```

LightGraphs.SimpleGraphs.fadj(AnyOldGraph) => Array{Int64,1}[[2], [1, 3], [2]]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 1) => [2]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 1) => [2]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

See also ?@show_val_only_type, ?TraceCalls.show_call, ?TraceCalls.show_return_val and ?highlight.

Map, filter, collect¶

Each function call in a trace is represented by an instance of the Trace structure:

struct Trace
    func::Function        # the function called
    args::Tuple           # the positional arguments
    kwargs::Tuple         # the keyword arguments
    called::Vector{Trace} # the functions called within the execution of this function call 
    value                 # This is the return value of the func(args...; kwargs...) call, but it's also where
                          # the result of `map(f, ::Trace)` will be stored.
end

The following filtering functions are provided:

filter(f, trace) keeps all function calls for which f(::Trace) is true. It's useful to cut out uninteresting intermediate functions.
filter_cutting(f, trace) is like filter, but will remove all descendents of the traces for which f(trace) is false.
filter_lineage(f, trace) keeps all function calls for which f(::Trace) is true of at least one of its descendents or ancestors. Use it to focus on some interesting part of the trace.

In [11]:

# Get rid of the `fadj` calls
filter(trace->trace.func != LightGraphs.SimpleGraphs.fadj, trace_walk)

Out[11]:

#17() => [2,1,2,1,2]

LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]

```
eltype(AnyOldGraph) => Int64
```

LightGraphs.vertices(AnyOldGraph) => 1:3

LightGraphs.nv(AnyOldGraph) => 3

```
eltype(AnyOldGraph) => Int64
```

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 1) => [2]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

In [55]:

# Focus on the callers and callees of `LightGraphs.out_neighbors(AnyOldGraph, 2)`
filter_lineage(trace -> trace.func==out_neighbors && trace[:v]==2, 
               trace_walk; highlight=true)

Out[55]:

#17() => [2,1,2,1,2]

LightGraphs.randomwalk(AnyOldGraph, 2, 5) => [2,1,2,1,2]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

LightGraphs.out_neighbors(AnyOldGraph, 2) => [1,3]

LightGraphs.SimpleGraphs.fadj(AnyOldGraph, 2) => [1,3]

map(f, trace) applies f (whether a function or quoted macro such as :@which) to each Trace, and stores the result in Trace's value field.

In [14]:

# Take the second argument of every call to LightGraphs.out_neighbors
map(trace->trace.func == LightGraphs.out_neighbors ? trace.args[2] : nothing,
    filter(trace->trace.func == LightGraphs.out_neighbors, trace_walk))

Out[14]:

#21() => nothing

LightGraphs.out_neighbors(AnyOldGraph, 2) => 2

LightGraphs.out_neighbors(AnyOldGraph, 3) => 3

LightGraphs.out_neighbors(AnyOldGraph, 2) => 2

LightGraphs.out_neighbors(AnyOldGraph, 1) => 1

LightGraphs.out_neighbors(AnyOldGraph, 2) => 2

objects_in(trace) returns all passed arguments and the return value as a vector. For example, filter(tr->any(x==3.1 for x in objects_in(tr)), trace) keeps all calls that contain the value 3.1. collect(trace) returns a Vector of all Trace objects in trace.

Saving traces to disk¶

Traces can be saved and reloaded with JLD2.jl:

using TraceCalls, JLD2, FileIO
save("test.jld2", "tracing_foo", some_trace)
reloaded_trace = load("test.jld2", "tracing_foo")

Notes:

JLD2.jl will save closures without error, but when they are reloaded (as ReconstructedTypes instances), they will usually not be callable.
Before loading a trace, make sure to import/using all the modules defining the types and functions that the saved trace refers to.

Working with mutable state¶

By default, @trace stores the function call's arguments without copying them. This can yield surprising results when tracing functions that modify their arguments. Consider generating a vector of n 5s using push!:

In [15]:

@traceable push5!(vec::Vector) = push!(vec, 5)
@traceable function many_5s(n)
    vec = Int[]
    for i in 1:n
        push5!(vec)
    end
    return vec
end

@trace many_5s(3)

Out[15]:

#31() => [5,5,5]

many_5s(3) => [5,5,5]

```
push5!([5,5,5]) => [5,5,5]
```
```
push5!([5,5,5]) => [5,5,5]
```
```
push5!([5,5,5]) => [5,5,5]
```

When push5! was first called, vec was empty, but this trace makes it look like it already had three 5s in it. This is because all vectors in that trace are the same object. You can fix this (if you care) by telling TraceCalls to make a copy of every vector argument and return value:

In [16]:

TraceCalls.store(x::Vector) = copy(x)    # the default is `store(x) = x`
@trace many_5s(3)

Out[16]:

#33() => [5,5,5]

many_5s(3) => [5,5,5]

```
push5!([]) => [5]
```
```
push5!([5]) => [5,5]
```
```
push5!([5,5]) => [5,5,5]
```

Alternatively, it's sometimes simpler to filter out all mutating functions:

In [17]:

tr = @trace many_5s(3)
filter(!is_mutating, tr)    # filter out every function that ends with a ! (see https://docs.julialang.org/en/stable/manual/style-guide/#Append-!-to-names-of-functions-that-modify-their-arguments-1)

Out[17]:

#35() => [5,5,5]

```
many_5s(3) => [5,5,5]
```

The most drastic solution is to have TraceCalls store the text/HTML representation of each value:

TraceCalls.store(x) = REPR(x)

This essentially turns TraceCalls.jl into a traditional, non-inspectable tracing package, but it guarantees that each value is shown as it was when the call was made.

Debugging with traces¶

The starting point for debugging exceptions is @stacktrace. It works like @trace, but uses filter to keep only the part of the trace involved in the exception.

In [18]:

using Optim, TraceCalls
@traceable logplus10(x) = log(x[1]+10)
TraceCalls.store(v::Vector) = copy(v)

# Minimize the function x -> log(x[1]+10) starting at x = 0
strace = @stacktrace (Optim, Calculus) optimize(logplus10, [0.0], BFGS())

Out[18]:

#37() => DomainError()

Optim.optimize(logplus10, [0.0], Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.Options{Void}(1.0e-32, 1.0e-32, 1.0e-8, false, 1000, false, false, false, false, 1, nothing, NaN)) => DomainError()

Optim.optimize(Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!), [0.0], Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.Options{Void}(1.0e-32, 1.0e-32, 1.0e-8, false, 1000, false, false, false, false, 1, nothing, NaN)) => DomainError()

Optim.update_state!(Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!), Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true)) => DomainError()

Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()

Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()

```
logplus10([-12.4999]) => DomainError()
```

Then we can zoom in on the most interesting part of the stack-trace:

In [19]:

strace[1,1,1,1]   # could also use `strace[bottom-2]`, or `strace[top+4]` --- bottom/top are special values like `end`

Out[19]:

Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()

Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()

```
logplus10([-12.4999]) => DomainError()
```

Calling a Trace object performs its computation. By tracing strace[1,1,1,1](), we get the full trace for this part of the computation:

In [20]:

linesearch_trace = @trace (Optim, Calculus) strace[1,1,1,1]()

Out[20]:

#39() => DomainError()

Optim.perform_linesearch!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => DomainError()

Optim.checked_dphi0!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true)) => -0.009999999996070853

Optim.alphaguess!(Optim.BFGSState{Float64,1,Array{Float64,1}}("BFGS", 1, [0.0], 2.302585092994046, 1, 1, 0, [NaN], [0.1], [0.1], NaN, [2.37575e-314], [1.69069e-320], [NaN], [1.0], [-0.1], [-12.4999], [0.133333], 1.0, false, LineSearches.LineSearchResults{Float64}([0.0, 1.0, 5.0, 25.0], [2.30259, 2.29253, 2.25129, 2.0149], [-0.01, -0.010101, -0.0105263, -0.0133333], 0)), Optim.BFGS{LineSearches.#hagerzhang!,Optim.##115#119}(LineSearches.hagerzhang!, Optim.#115, true), -0.009999999996070853, Optim.OnceDifferentiable(logplus10, Optim.g!, Optim.fg!)) => 1.0

Calculus.finite_difference!(logplus10, [-0.1], [0.133333], :central) => nothing

logplus10([-0.0999939]) => 2.2925353688044074

logplus10([-0.100006]) => 2.2925341454802757

```
logplus10([-0.1]) => 2.2925347571425285
```

Calculus.finite_difference!(logplus10, [-0.5], [0.10101], :central) => nothing

logplus10([-0.499994]) => 2.25129243603289

logplus10([-0.500006]) => 2.2512911612003736

```
logplus10([-0.5]) => 2.251291798616835
```

Calculus.finite_difference!(logplus10, [-2.5], [0.105263], :central) => nothing

logplus10([-2.49998]) => 2.01490503909053

logplus10([-2.50002]) => 2.0149010021208964

```
logplus10([-2.5]) => 2.0149030206077505
```

Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()

```
logplus10([-12.4999]) => DomainError()
```

Focusing on just finite_difference! shows what happened: the line search keeps going for lower and lower values of x, until it tries log(-12.5+10) == log(-2.5), which is a DomainError.

In [21]:

filter(tr->tr.func==Calculus.finite_difference!, linesearch_trace)

Out[21]:

#39() => DomainError()

Calculus.finite_difference!(logplus10, [-0.1], [0.133333], :central) => nothing

Calculus.finite_difference!(logplus10, [-0.5], [0.10101], :central) => nothing

Calculus.finite_difference!(logplus10, [-2.5], [0.105263], :central) => nothing

Calculus.finite_difference!(logplus10, [-12.5], [0.133333], :central) => DomainError()

We could fix the error (for instance, by changing logplus10's definition) and call linesearch_trace() to check that this part of the computation now returns a sensible result.

filter is useful even when there is no exception to debug. By selecting only certain parts of the trace, it's the TraceCalls equivalent of setting a breakpoint.

TraceCalls.jl implements which(::Trace) and edit(::Trace). They point to the source location for that function call.

Debugging test failures¶

When a test used to pass, but now fails after a code change, compare_past_trace can be used to highlight the differences.

In [22]:

?compare_past_trace

search: compare_past_trace

Out[22]:

compare_past_trace(old_trace::Trace; filter_out_equal=true)) reruns every function call in old_trace, and shows in red where the new result differs from the old. If filter_out_equal==true, only show the non-equal results.

To use it:

Trace the failing test/computation under the correct code (it won't fail, of course)
git checkout the bad code (or make the changes manually - I recommend git stash for switching back and forth). TraceCalls imports Revise.jl, so the code changes will take effect automatically.
Call compare_past_trace on the trace from step 1

You may also call compare_past_trace on an old trace loaded from disk.

Displaying extra information¶

To display intermediate results in your traces, use trace_log instead of println/@show:

In [26]:

using TraceCalls

@traceable function hypothenuse_length(a, b)
    a2 = a * a
    b2 = b * b
    trace_log(a2=a2, b2=FontColor(:blue, b2))   # FontColor is also useful in Base.show_val or Base.map
    return sqrt(a2+b2)
end

@trace hypothenuse_length(3, 4)

Out[26]:

#18() => 5.0

hypothenuse_length(3, 4) => 5.0

TraceCalls.trace_log(; a2=9, b2=16) => nothing

Profiling¶

measure(:@measuring_macro, trace) applies the given quoted macro (put a : in front of the @) to every function call in trace. Useful macros/functions are @allocated and @elapsed, as well as BenchmarkTools' @belapsed, median∘benchmark and minimum∘benchmark (see here). Because measuring involves rerunning every function call in the trace, it can be slow. A good rule-of-thumb is to use the more accurate @belapsed when the code to profile takes less than half a second and the trace is relatively short, and @elapsed when it takes less than a minute. When the trace is very large, try the explore_worst=true option, which will only profile the worst callee of each call (so the runtime is $O(wd)$ instead of $O(w^d)$). You can also prune or filter the trace first.

The output of measure is a normal, explorable and callable Trace object. Remember that due to timing fluctuations, even if f(x) calls g(x), it does not imply that @elapsed(f(x)) >= @elapsed(g(x)).

In [3]:

## Profile the PyCall code for accessing Python's `math.pi` 
using PyCall, TraceCalls

math = pyimport(:math)
trace_pycall = @trace PyCall math[:pi];

using BenchmarkTools: @belapsed
m_trace = measure(:@belapsed, trace_pycall; normalize=true, threshold=0.005)  # only show/explore calls that take >0.5% of total runtime

Out[3]:

#8() => 1.0

getindex(PyObject , :pi) => 0.9699

```
getindex(PyObject , "pi") => 0.114
```

convert(PyCall.PyAny, PyObject 3.141592653589793) => 0.7366

PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => 0.4832

PyCall.pyint_query(PyObject 3.141592653589793) => 0.1354

PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x000000011ef9cc98) => 0.0663

PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x000000011ef9eaa8) => 0.06704

convert(Float64, PyObject 3.141592653589793) => 0.009888

PyCall.asscalar(PyObject 3.141592653589793) => 0.005534

normalize(m_trace[1,2,1]) can be used to make the numbers in a subtrace more comparable.

Type stability¶

code_warntype(trace_pycall[1,2]) returns the type-annotated code for that function call. Similarly for code_llvm, code_lowered, code_native and code_typed. is_inferred(::Trace) uses the same code as Base.Test.@inferred, but returns true/false, which is useful with map:

In [2]:

redgreen(map(is_inferred, trace_pycall))   # ignore the top-most call - it will always be `false`

Out[2]:

#1() => false

getindex(PyObject , :pi) => false

getindex(PyObject , "pi") => true

Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject ) => true

convert(PyCall.PyAny, PyObject 3.141592653589793) => false

PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => false

PyCall.pyint_query(PyObject 3.141592653589793) => false

PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => true

Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true

PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124958aa8) => true

Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true

PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true

PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true

PyCall.pyfloat_query(PyObject 3.141592653589793) => false

PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124954998) => true

Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true

convert(Float64, PyObject 3.141592653589793) => true

PyCall.asscalar(PyObject 3.141592653589793) => true

PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => true

Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject 3.141592653589793) => true

PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => true

Grouping and benchmarking¶

TraceCalls.jl also provides a flat, more efficient profiling tool: trace_benchmark. It groups the calls by their signature (for example, sin(1.3) and sin(2.3) will be in the same group, but sin(1) will be in a different one) and only profiles one of the calls in each group.

In [3]:

using BenchmarkTools    # necessary for trace_benchmark
bench = trace_benchmark(trace_pycall)

Out[3]:

Benchmark

2 calls like convert(PyCall.PyAny, PyObject 3.141592653589793) => TrialEstimate(900.426 ns, 16 bytes)

1 call like #1() => TrialEstimate(1.355 μs, 224 bytes)

1 call like getindex(PyObject , :pi) => TrialEstimate(1.350 μs, 224 bytes)

1 call like PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => TrialEstimate(605.153 ns, 0 bytes)

3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => TrialEstimate(86.650 ns, 0 bytes)

1 call like getindex(PyObject , "pi") => TrialEstimate(194.738 ns, 16 bytes)

1 call like PyCall.pyint_query(PyObject 3.141592653589793) => TrialEstimate(177.301 ns, 0 bytes)

3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => TrialEstimate(3.368 ns, 0 bytes)

5 calls like Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject ) => TrialEstimate(1.843 ns, 0 bytes)

1 call like PyCall.asscalar(PyObject 3.141592653589793) => TrialEstimate(6.723 ns, 0 bytes)

1 call like PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => TrialEstimate(5.806 ns, 0 bytes)

1 call like PyCall.pyfloat_query(PyObject 3.141592653589793) => TrialEstimate(5.797 ns, 0 bytes)

After doing some code changes, you can rerun the same benchmark with run (technical note: it will use the same tuning parameters). Then call ratio or judge to get a report on regressions and improvements.

In [4]:

bench_new = run(bench);
judge(bench_new, bench; sort=true)

Out[4]:

Benchmark

1 call like PyCall.pyerr_check("ccall(@pysym(:PyFloat_AsDouble), Cdouble, (PyPtr,), asscalar(po))", 3.141592653589793) => TrialJudgement(+4.89% => invariant)

2 calls like convert(PyCall.PyAny, PyObject 3.141592653589793) => TrialJudgement(+0.97% => invariant)

1 call like PyCall.pytype_query(PyObject 3.141592653589793, PyCall.PyObject) => TrialJudgement(+0.90% => invariant)

1 call like PyCall.pyint_query(PyObject 3.141592653589793) => TrialJudgement(+0.34% => invariant)

5 calls like Base.unsafe_convert(Ptr{PyCall.PyObject_struct}, PyObject ) => TrialJudgement(+0.00% => invariant)

1 call like PyCall.pyfloat_query(PyObject 3.141592653589793) => TrialJudgement(+0.00% => invariant)

3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, PyObject NULL) => TrialJudgement(-0.03% => invariant)

1 call like PyCall.asscalar(PyObject 3.141592653589793) => TrialJudgement(-0.18% => invariant)

1 call like getindex(PyObject , "pi") => TrialJudgement(-0.49% => invariant)

1 call like #1() => TrialJudgement(-1.75% => invariant)

3 calls like PyCall.pyisinstance(PyObject 3.141592653589793, Ptr{PyCall.PyObject_struct} @0x0000000124956c98) => TrialJudgement(-0.97% => invariant)

1 call like getindex(PyObject , :pi) => TrialJudgement(-5.19% => improvement)

It's possible to save the benchmark to disk, but please read these notes carefully. The reloaded benchmark is likely to have one or more "broken" closure, and those will show up in the run/judge outputs as red exceptions (which can usually be ignored).

Profiling the compiler¶

Parametric types sometimes trigger very long compilation times. You can use @compilation_times to_trace expr to estimate each method's total compilation time (in seconds). This is very, very approximate and experimental. This macro has the same syntax and limitations as @trace. It will estimate the compilation time of every method specialization that appears in @trace to_trace expr. It is based on precompile and is subject to precompile's limitations (notably, keyword-argument functions are essentially ignored).

In [4]:

@compilation_times PyCall math[:pi]

Out[4]:

DataStructures.OrderedDict{Any,Float64} with 13 entries:
  convert(::Type{T}, po::PyCall.PyObject) where T<:Real in PyCall… => 0.0822036
  getindex(o::PyCall.PyObject, s::Symbol) in PyCall at /Users/ced… => 0.00703522
  pyerr_check(msg::AbstractString, val) in PyCall at /Users/cedri… => 9.4938e-5
  pyisinstance(o::PyCall.PyObject, t::Union{Ptr{PyCall.PyObject_s… => 9.1195e-5
  asscalar(o::PyCall.PyObject) in PyCall at /Users/cedric/.julia/… => 8.8809e-5
  convert(::Type{PyCall.PyAny}, o::PyCall.PyObject) in PyCall at … => 8.6389e-5
  unsafe_convert(::Type{Ptr{PyCall.PyObject_struct}}, po::PyCall.… => 6.4736e-5
  pytype_query(o::PyCall.PyObject, default::Union{Tuple{Vararg{Ty… => 4.3544e-5
  (::##29#30)() in Main at /Users/cedric/.julia/v0.6/TraceCalls/s… => 1.2242e-5
  getindex(o::PyCall.PyObject, s::AbstractString) in PyCall at /U… => 8.009e-6
  pyint_query(o::PyCall.PyObject) in PyCall at /Users/cedric/.jul… => 4.477e-6
  pyfloat_query(o::PyCall.PyObject) in PyCall at /Users/cedric/.j… => 3.236e-6
  pyisinstance(o::PyCall.PyObject, t::PyCall.PyObject) in PyCall … => 3.118e-6

To improve compilation time, consider ::ANY, SnoopCompile.jl, and using parametric types less.

Technical note: consider the two function definitions traced_1() = traced_2() + untraced() and traced_3() = untraced(). The compilation time of traced_2 won't be part of the compilation time of traced_1 (by design), but untraced() may be counted in either traced_1 or traced_3, depending on which one is profiled first. You can either accept this as measurement noise, or make sure that untraced is traced.