Presented to the Polyglot Programming DC Meetup, August 7th, 2014.
GitHub repo: https://github.com/HarlanH/JuliaPolygotPresentation
Huge parts of this presentation are cribbed from:
And see also:
Stefan Karpinski, Jeff Bezanson, Viral Shaw, Alan Edelman (and the Father of Floating Point):
Pros: fast matrix algebra, REPL, easy to start using
Cons: commercial software, slow or impractical for many non-numeric tasks, syntax issues
Pros: multiple dispatch, macros
Cons: obscure syntax, hard to learn
Pros: dynamic, great ecosystem, easy to start using, elegant OO design
Cons: have to extend in C for performance, version and package issues
Pros: blazing fast, best-of-class algorithms
Cons: hard to do anything but number crunching
Pros: syntax matters, dynamic, great ecosystem
Cons: very slow, especially for numerical work
Pros: domain-specific but not domain-limited, huge package ecosystem, great interactivity
Cons: forces vectorization for speed, hard to contribute to core, quirky
Pros: blazing fast JIT compilers, forces asynchronous design thinking
Cons: everything else
Pros: simple syntax, very fast
Cons: no REPL, error-prone memory handling, static typing means hard to start
Common pattern: Outer scripting language wraps inner systems language (JCL + Asm, Matlab + Fortran, R/Python + C, Javascript + C++)
Object oriented programming: Easy to add new types!
Web-centric languages! Yay!
Julia was designed from the beginning to have certain features that are necessary for high performance:
Additionally, certain oft-requested features were not included which make high performance much more difficult – or impossible – to achieve. These include:
function collatz(n) # unproven: always terminates
k = 0
while n > 1
n = isodd(n) ? 3n+1 : n>>1
k += 1
end
return k
end
collatz (generic function with 1 method)
3n+1
works like God intendedcollatz(89)
30
for i = 2:2:20
α = collatz(i)
println("$i = $α")
end
2 = 1 4 = 2 6 = 8 8 = 3 10 = 6 12 = 9 14 = 17 16 = 4 18 = 20 20 = 7
@time for i = 1:1e6 collatz(18) end
elapsed time: 0.041812339 seconds (96 bytes allocated)
@code_native collatz(123)
.section __TEXT,__text,regular,pure_instructions Filename: In[1] Source line: 4 push RBP mov RBP, RSP xor EAX, EAX cmp RDI, 2 jl 36 Source line: 4 test DIL, 1 jne 8 sar RDI jmpq 5 lea RDI, QWORD PTR [RDI + 2*RDI + 1] Source line: 5 inc RAX cmp RDI, 1 jg -36 Source line: 7 pop RBP ret
@code_llvm collatz(123)
define i64 @"julia_collatz;20213"(i64) { top: %1 = icmp slt i64 %0, 2, !dbg !1800 br i1 %1, label %L6, label %L, !dbg !1800 L: ; preds = %top, %L3 %k.0 = phi i64 [ %7, %L3 ], [ 0, %top ] %n.0 = phi i64 [ %n.1, %L3 ], [ %0, %top ] %2 = and i64 %n.0, 1, !dbg !1801 %3 = icmp eq i64 %2, 0, !dbg !1801 br i1 %3, label %L2, label %if1, !dbg !1801 if1: ; preds = %L %4 = mul i64 %n.0, 3, !dbg !1801 %5 = add i64 %4, 1, !dbg !1801 br label %L3, !dbg !1801 L2: ; preds = %L %6 = ashr i64 %n.0, 1, !dbg !1801 br label %L3, !dbg !1801 L3: ; preds = %L2, %if1 %n.1 = phi i64 [ %6, %L2 ], [ %5, %if1 ] %7 = add i64 %k.0, 1, !dbg !1802 %8 = icmp sgt i64 %n.1, 1, !dbg !1802 br i1 %8, label %L, label %L6, !dbg !1802 L6: ; preds = %L3, %top %k.1 = phi i64 [ 0, %top ], [ %7, %L3 ] ret i64 %k.1, !dbg !1803 }
From Stefan Karpinski, The Design Impact of Multiple Dispatch, and Jiahao Chen, Julia Compiler and Community.
Tends to be written as function application:
f(a,b,c)
⟸ LIKE THIS
a.f(b,c)
⟸ NOT THIS
f(a::Any, b) = "fallback"
f(a::Number, b::Number) = "a and b are both numbers"
f(a::Number, b) = "a is a number"
f(a, b::Number) = "b is a number"
f(a::Integer, b::Integer) = "a and b are both integers"
f (generic function with 5 methods)
a::Any
is unnecessaryFloat64
, Int64
, etc., inferredf(1.5, 2)
"a and b are both numbers"
print(typeof(1.5), ", ", typeof(2))
Float64, Int64
f(1, "bar")
"a is a number"
f(1, 2)
"a and b are both integers"
f("foo", [1,2])
"fallback"
f{T<:Number}(a::T, b::T) = "a and b are both $(T)s"
f (generic function with 6 methods)
methods(f)
f(big(1.5), big(2.5))
"a and b are both BigFloats"
f("foo", "bar") #<== still doesn't apply to non-numbers
"fallback"
immutable Interval{T<:Real} <: Number
lo::T
hi::T
end
(a::Real)..(b::Real) = Interval(a,b)
Base.show(io::IO, iv::Interval) = print(io, "($(iv.lo))..($(iv.hi))")
show (generic function with 85 methods)
(1..2) + 3 # tries but fails to find a way to reconcile two Numbers
no promotion exists for Interval{Int64} and Int64 while loading In[18], in expression starting on line 1 in + at promotion.jl:158
..
is a binary operator/function with syntax but no built-in methodsshow()
in Base module1..2
(1)..(2)
typeof(ans)
Interval{Int64} (constructor with 1 method)
sizeof(1..2) # two 64-bit/8-byte ints
16
(1//2)..(2//3)
(1//2)..(2//3)
a::Interval + b::Interval = (a.lo + b.lo)..(a.hi + b.hi)
a::Interval - b::Interval = (a.lo - b.hi)..(a.hi - b.lo)
- (generic function with 138 methods)
(2..3) + (-1..1)
(1)..(4)
(2..3) + (1.0..3.14159) # autoconverts
(3.0)..(6.14159)
@code_native (2..3) + (-1..1)
.section __TEXT,__text,regular,pure_instructions Filename: In[23] Source line: 1 push RBP mov RBP, RSP Source line: 1 add RDI, RDX add RSI, RCX mov RAX, RDI mov RDX, RSI pop RBP ret
(below quoted from Stefan Karpinski)
Generic functions in Julia aren't special – they're the default
+
are generic functionsThis means you're free to extend everything
Since generic functions are open:
We're forced to think much harder about the meaning of operations
Results, if done well, are abstractions, defined generically, that extend easily
methods(round) # click through...
round(123.321)
123.0
round(123.321, 2)
123.32
round(123.321, -1)
120.0
round(123.321, 1, 2)
123.5
for i = 1:10 println(round(123.321, i, 2)) end
123.5 123.25 123.375 123.3125 123.3125 123.328125 123.3203125 123.3203125 123.3203125 123.3212890625
Pkg.installed()
Dict{ASCIIString,VersionNumber} with 22 entries: "Homebrew" => v"0.1.8" "Nettle" => v"0.1.4" "REPLCompletions" => v"0.0.1" "SortingAlgorithms" => v"0.0.1" "NLopt" => v"0.1.1" "Color" => v"0.2.11" "ZMQ" => v"0.1.13" "ArrayViews" => v"0.4.6" "JSON" => v"0.3.7" "StatsBase" => v"0.6.3" "DataArrays" => v"0.2.0" "Iterators" => v"0.1.6" "IJulia" => v"0.1.12" "RDatasets" => v"0.1.1" "GZip" => v"0.2.13" "MathProgBase" => v"0.2.5" "BinDeps" => v"0.2.14" "Cairo" => v"0.2.15" "DataFrames" => v"0.5.7" "Reexport" => v"0.0.1" "VennEuler" => v"0.0.0-" "URIParser" => v"0.0.2"
using RDatasets
iris = dataset("datasets", "iris")
SepalLength | SepalWidth | PetalLength | PetalWidth | Species | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3.0 | 1.1 | 0.1 | setosa |
15 | 5.8 | 4.0 | 1.2 | 0.2 | setosa |
16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa |
17 | 5.4 | 3.9 | 1.3 | 0.4 | setosa |
18 | 5.1 | 3.5 | 1.4 | 0.3 | setosa |
19 | 5.7 | 3.8 | 1.7 | 0.3 | setosa |
20 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
21 | 5.4 | 3.4 | 1.7 | 0.2 | setosa |
22 | 5.1 | 3.7 | 1.5 | 0.4 | setosa |
23 | 4.6 | 3.6 | 1.0 | 0.2 | setosa |
24 | 5.1 | 3.3 | 1.7 | 0.5 | setosa |
25 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
26 | 5.0 | 3.0 | 1.6 | 0.2 | setosa |
27 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
28 | 5.2 | 3.5 | 1.5 | 0.2 | setosa |
29 | 5.2 | 3.4 | 1.4 | 0.2 | setosa |
30 | 4.7 | 3.2 | 1.6 | 0.2 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
typeof(iris)
DataFrame (constructor with 22 methods)
by(iris, :Species, df -> DataFrame(mean_length = mean(df[:PetalLength])))
Species | mean_length | |
---|---|---|
1 | setosa | 1.462 |
2 | versicolor | 4.26 |
3 | virginica | 5.552 |
using
pulls dependencies too, in this case including DataFrames
show(DataFrame)
outputs Markdown-compatible tablesby
is part of split-apply-combine idiom for DFs:Species
is a symbol, ala LISP, used frequently instead of Enumsdf -> ...
is an anonymous functionDataFrame
constructor gets the new column name by tricky use of named argumentsusing GLM
lm1 = fit(LinearModel, SepalLength ~ SepalWidth + PetalLength, iris)
DataFrameRegressionModel{LinearModel{DensePredQR{Float64}},Float64}: Coefficients: Estimate Std.Error t value Pr(>|t|) (Intercept) 2.24914 0.24797 9.07022 <1e-15 SepalWidth 0.595525 0.0693282 8.58994 <1e-13 PetalLength 0.47192 0.0171177 27.5692 <1e-59
Warning: could not import Base.add! into NumericExtensions
a ~ b + c
Formula: a ~ b + c
fit
is a generic function, specialized here on a model type, a Formula, and datalm1
is interesting~
is syntactic sugar that calls a macro that captures the expression in a FormulaThis work is licensed under a Creative Commons Attribution 4.0 International License.