# Environment setup
const DEPENDENCIES = ["Images", "BenchmarkTools", "TiledIteration", "CairoMakie"];
import Pkg
Pkg.activate(temp=true)
Pkg.add(name="DataFlowTasks", rev="0e708676eedca1e3ec825246f68b2de4c0906d86")
foreach(Pkg.add, DEPENDENCIES)
Activating new project at `/tmp/jl_cYFG9I` Resolving package versions... Updating `/tmp/jl_cYFG9I/Project.toml` [d1549cb6] + DataFlowTasks v0.2.0 `https://github.com/maltezfaria/DataFlowTasks.jl.git#0e70867` Updating `/tmp/jl_cYFG9I/Manifest.toml` [34da2185] + Compat v4.12.0 [d1549cb6] + DataFlowTasks v0.2.0 `https://github.com/maltezfaria/DataFlowTasks.jl.git#0e70867` [bac558e1] + OrderedCollections v1.6.3 [6c6a2e73] + Scratch v1.2.1 [0dad84c5] + ArgTools v1.1.1 [56f22d72] + Artifacts [2a0f44e3] + Base64 [ade2ca70] + Dates [f43a241f] + Downloads v1.6.0 [7b1f6079] + FileWatching [b77e0a4c] + InteractiveUtils [b27032c2] + LibCURL v0.6.4 [76f85450] + LibGit2 [8f399da3] + Libdl [37e2e46d] + LinearAlgebra [56ddb016] + Logging [d6f4376e] + Markdown [ca575930] + NetworkOptions v1.2.0 [44cfe95a] + Pkg v1.10.0 [de0858da] + Printf [3fa0cd96] + REPL [9a3f8284] + Random [ea8e919c] + SHA v0.7.0 [9e88b42a] + Serialization [6462fe0b] + Sockets [fa267f1f] + TOML v1.0.3 [a4e569a6] + Tar v1.10.0 [cf7118a7] + UUIDs [4ec0a83e] + Unicode [e66e0078] + CompilerSupportLibraries_jll v1.0.5+1 [deac9b47] + LibCURL_jll v8.4.0+0 [e37daf67] + LibGit2_jll v1.6.4+0 [29816b5a] + LibSSH2_jll v1.11.0+1 [c8ffd9c3] + MbedTLS_jll v2.28.2+1 [14a3606d] + MozillaCACerts_jll v2023.1.10 [4536629a] + OpenBLAS_jll v0.3.23+2 [83775a58] + Zlib_jll v1.2.13+1 [8e850b90] + libblastrampoline_jll v5.8.0+1 [8e850ede] + nghttp2_jll v1.52.0+1 [3f19e933] + p7zip_jll v17.4.0+2 Resolving package versions... Updating `/tmp/jl_cYFG9I/Project.toml` [916415d5] + Images v0.26.0 Updating `/tmp/jl_cYFG9I/Manifest.toml` [621f4979] + AbstractFFTs v1.5.0 [79e6a3ab] + Adapt v4.0.1 [ec485272] + ArnoldiMethod v0.2.0 [4fba245c] + ArrayInterface v7.7.0 [13072b0f] + AxisAlgorithms v1.1.0 [39de3d68] + AxisArrays v0.4.7 [62783981] + BitTwiddlingConvenienceFunctions v0.1.5 [fa961155] + CEnum v0.5.0 [2a0fbf3d] + CPUSummary v0.2.4 [aafaddc9] + CatIndices v0.2.2 [d360d2e6] + ChainRulesCore v1.21.0 [fb6a15b2] + CloseOpenIntervals v0.1.12 [aaaa29a8] + Clustering v0.15.7 [35d6a980] + ColorSchemes v3.24.0 [3da002f7] + ColorTypes v0.11.4 [c3611d14] + ColorVectorSpace v0.10.0 [5ae59095] + Colors v0.12.10 [ed09eef8] + ComputationalResources v0.3.2 [150eb455] + CoordinateTransformations v0.6.3 [adafc99b] + CpuId v0.3.1 [dc8bdbbb] + CustomUnitRanges v1.0.2 [9a962f9c] + DataAPI v1.16.0 [864edb3b] + DataStructures v0.18.16 [b4f34e82] + Distances v0.10.11 [ffbed154] + DocStringExtensions v0.9.3 [4f61f5a4] + FFTViews v0.3.2 [7a1cc6ca] + FFTW v1.8.0 [5789e2e9] + FileIO v1.16.2 [53c48c17] + FixedPointNumbers v0.8.4 [a2bd30eb] + Graphics v1.1.2 [86223c79] + Graphs v1.9.0 [2c695a8d] + HistogramThresholding v0.3.1 [3e5b6fbb] + HostCPUFeatures v0.1.16 [615f187c] + IfElse v0.1.1 [2803e5a7] + ImageAxes v0.6.11 [c817782e] + ImageBase v0.1.7 [cbc4b850] + ImageBinarization v0.3.0 [f332f351] + ImageContrastAdjustment v0.3.12 [a09fc81d] + ImageCore v0.10.2 [89d5987c] + ImageCorners v0.1.3 [51556ac3] + ImageDistances v0.2.17 [6a3955dd] + ImageFiltering v0.7.8 [82e4d734] + ImageIO v0.6.7 [6218d12a] + ImageMagick v1.3.0 [bc367c6b] + ImageMetadata v0.9.9 [787d08f9] + ImageMorphology v0.4.5 [2996bd0c] + ImageQualityIndexes v0.3.7 [80713f31] + ImageSegmentation v1.8.2 [4e3cecfd] + ImageShow v0.3.8 [02fcd773] + ImageTransformations v0.10.1 [916415d5] + Images v0.26.0 [9b13fd28] + IndirectArrays v1.0.0 [d25df0c9] + Inflate v0.1.4 [1d092043] + IntegralArrays v0.1.5 [a98d9a8b] + Interpolations v0.15.1 [8197267c] + IntervalSets v0.7.9 [92d709cd] + IrrationalConstants v0.2.2 [c8e1da08] + IterTools v1.10.0 [033835bb] + JLD2 v0.4.45 [692b3bcd] + JLLWrappers v1.5.0 [b835a17e] + JpegTurbo v0.1.5 [10f19ff3] + LayoutPointers v0.1.15 [8cdb02fc] + LazyModules v0.3.1 [2ab3a3ac] + LogExpFunctions v0.3.26 [bdcacae8] + LoopVectorization v0.12.166 [1914dd2f] + MacroTools v0.5.13 [d125e4d3] + ManualMemory v0.1.8 [dbb5928d] + MappedArrays v0.4.2 [626554b9] + MetaGraphs v0.7.2 [e1d29d7a] + Missings v1.1.0 [e94cdb99] + MosaicViews v0.3.4 [77ba4419] + NaNMath v1.0.2 [b8a86587] + NearestNeighbors v0.4.16 [f09324ee] + Netpbm v1.1.1 [6fe1bfb0] + OffsetArrays v1.13.0 [52e1d378] + OpenEXR v0.3.2 [f57f5aa1] + PNGFiles v0.4.3 [5432bcbf] + PaddedViews v0.5.12 [d96e819e] + Parameters v0.12.3 [eebad327] + PkgVersion v0.3.3 [1d0040c9] + PolyesterWeave v0.2.1 ⌅ [f27b6e38] + Polynomials v3.2.13 [aea7be01] + PrecompileTools v1.2.0 [21216c6a] + Preferences v1.4.1 [92933f4c] + ProgressMeter v1.9.0 [4b34888f] + QOI v1.0.0 [94ee1d12] + Quaternions v0.7.6 [b3c3ace0] + RangeArrays v0.3.2 [c84ed2f1] + Ratios v0.4.5 [c1ae055f] + RealDot v0.1.0 [3cdcf5f2] + RecipesBase v1.3.4 [189a3867] + Reexport v1.2.2 [dee08c22] + RegionTrees v0.3.2 [ae029012] + Requires v1.3.0 [6038ab10] + Rotations v1.7.0 [94e857df] + SIMDTypes v0.1.0 [476501e8] + SLEEFPirates v0.6.42 [699a6c99] + SimpleTraits v0.9.4 [47aef6b3] + SimpleWeightedGraphs v1.4.0 [45858cf5] + Sixel v0.1.3 [a2af1166] + SortingAlgorithms v1.2.1 [cae243ae] + StackViews v0.1.1 [aedffcd0] + Static v0.8.9 [0d7ed370] + StaticArrayInterface v1.5.0 [90137ffa] + StaticArrays v1.9.2 [1e83bf80] + StaticArraysCore v1.4.2 [82ae8749] + StatsAPI v1.7.0 [2913bbd2] + StatsBase v0.34.2 [62fd8b95] + TensorCore v0.1.1 [8290d209] + ThreadingUtilities v0.5.2 ⌅ [731e570b] + TiffImages v0.6.8 [06e1c1a7] + TiledIteration v0.5.0 [3bb67fe8] + TranscodingStreams v0.10.3 [3a884ed6] + UnPack v1.0.2 [3d5dd08c] + VectorizationBase v0.21.65 [efce3f68] + WoodburyMatrices v1.0.0 [f5851436] + FFTW_jll v3.3.10+0 ⌅ [c73af94c] + ImageMagick_jll v6.9.10-12+3 [905a6f67] + Imath_jll v3.1.7+0 [1d5cc7b8] + IntelOpenMP_jll v2024.0.2+0 [aacddb02] + JpegTurbo_jll v3.0.1+0 [88015f11] + LERC_jll v3.0.0+1 ⌅ [89763e89] + Libtiff_jll v4.4.0+0 [856f044c] + MKL_jll v2024.0.0+0 [18a262bb] + OpenEXR_jll v3.1.4+0 [3161d3a3] + Zstd_jll v1.5.5+0 [b53b4c65] + libpng_jll v1.6.40+0 [075b6546] + libsixel_jll v1.10.3+0 [8ba89e20] + Distributed [4af54fe1] + LazyArtifacts [a63ad114] + Mmap [1a1011a3] + SharedArrays [2f01184e] + SparseArrays v1.10.0 [10745b16] + Statistics v1.10.0 [4607b0f0] + SuiteSparse [05823500] + OpenLibm_jll v0.8.1+2 [bea87d4a] + SuiteSparse_jll v7.2.1+1 Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m` Precompiling project... ✓ Polynomials → PolynomialsChainRulesCoreExt 1 dependency successfully precompiled in 1 seconds. 159 already precompiled. Resolving package versions... Updating `/tmp/jl_cYFG9I/Project.toml` [6e4b80f9] + BenchmarkTools v1.4.0 Updating `/tmp/jl_cYFG9I/Manifest.toml` [6e4b80f9] + BenchmarkTools v1.4.0 [682c06a0] + JSON v0.21.4 [69de0a69] + Parsers v2.8.1 [9abbd945] + Profile Resolving package versions... Updating `/tmp/jl_cYFG9I/Project.toml` [06e1c1a7] + TiledIteration v0.5.0 No Changes to `/tmp/jl_cYFG9I/Manifest.toml` Resolving package versions... Updating `/tmp/jl_cYFG9I/Project.toml` ⌃ [13f3f980] + CairoMakie v0.10.12 Updating `/tmp/jl_cYFG9I/Manifest.toml` [398f06c4] + AbstractLattices v0.3.0 [1520ce14] + AbstractTrees v0.4.4 [27a7e980] + Animations v0.4.1 [67c07d97] + Automa v1.0.3 [159f3aea] + Cairo v1.0.5 ⌃ [13f3f980] + CairoMakie v0.10.12 [49dc2e85] + Calculus v0.5.1 [523fee87] + CodecBzip2 v0.8.2 [944b1d66] + CodecZlib v0.7.4 [a2cac450] + ColorBrewer v0.4.0 [861a8166] + Combinatorics v1.0.2 [bbf7d656] + CommonSubexpressions v0.3.0 [187b0558] + ConstructionBase v1.5.4 [d38c429a] + Contour v0.6.2 [e2d170a0] + DataValueInterfaces v1.0.0 [927a84f5] + DelaunayTriangulation v0.8.12 [163ba53b] + DiffResults v1.1.0 [b552c78f] + DiffRules v1.15.1 [31c24e10] + Distributions v0.25.107 [fa6b7ba4] + DualNumbers v0.6.8 [4e289a0a] + EnumX v1.0.4 [429591f6] + ExactPredicates v2.2.8 [411431e0] + Extents v0.1.2 [1a297f60] + FillArrays v1.9.3 [6a86dc24] + FiniteDiff v2.22.0 [59287772] + Formatting v0.4.2 [f6369f11] + ForwardDiff v0.10.36 [b38be410] + FreeType v4.1.1 [663a7486] + FreeTypeAbstraction v0.10.1 [46192b85] + GPUArraysCore v0.1.6 [cf35fbd7] + GeoInterface v1.3.3 [5c1252a2] + GeometryBasics v0.4.10 ⌅ [3955a311] + GridLayoutBase v0.9.2 [42e2da0e] + Grisu v1.0.2 [34004b35] + HypergeometricFunctions v0.3.23 [18e54dd8] + IntegerMathUtils v0.1.2 [d1acc4aa] + IntervalArithmetic v0.22.7 [f1662d9f] + Isoband v0.1.1 [82899510] + IteratorInterfaceExtensions v1.0.0 [5ab0869b] + KernelDensity v0.6.8 [b964fa9f] + LaTeXStrings v1.3.1 [9c8b4983] + LightXML v0.9.1 [d3d80556] + LineSearches v7.2.0 [9b3f67b0] + LinearAlgebraX v0.2.7 ⌅ [ee78f7c6] + Makie v0.19.12 ⌅ [20f20a25] + MakieCore v0.6.9 [b8f27783] + MathOptInterface v1.25.2 [0a4f8689] + MathTeXEngine v0.5.7 [7475f97c] + Mods v2.2.4 [3b2b4ff1] + Multisets v0.4.4 [d8a4904e] + MutableArithmetics v1.4.0 [d41bc354] + NLSolversBase v7.8.3 [510215fc] + Observables v0.5.5 [429524aa] + Optim v1.9.2 [90014a1f] + PDMats v0.11.31 [19eb6ba3] + Packing v0.5.0 [2ae35dd2] + Permutations v0.4.20 [3bbf5609] + PikaParser v0.6.1 [995b91a9] + PlotUtils v1.4.0 [647866c9] + PolygonOps v0.1.2 [85a6dd25] + PositiveFactorizations v0.2.4 [27ebfcd6] + Primes v0.5.5 [1fd47b50] + QuadGK v2.9.4 [05181044] + RelocatableFolders v1.0.1 [286e9d63] + RingLists v0.2.8 [79098fc4] + Rmath v0.7.1 [5eaf0fd0] + RoundingEmulator v0.2.1 [efcf1570] + Setfield v1.1.1 [65257c39] + ShaderAbstractions v0.4.1 [992d4aef] + Showoff v1.0.3 [73760f76] + SignedDistanceFields v0.4.0 [55797a34] + SimpleGraphs v0.8.6 [ec83eff0] + SimplePartitions v0.3.1 [cc47b68c] + SimplePolynomials v0.2.17 [a6525b86] + SimpleRandom v0.3.1 [276daf66] + SpecialFunctions v2.3.1 [c5dd0088] + StableHashTraits v1.1.6 [4c63d2b9] + StatsFuns v1.3.0 [09ab397b] + StructArrays v0.6.17 [3783bdb8] + TableTraits v1.0.1 [bd369af6] + Tables v1.11.1 [981d1d27] + TriplotBase v0.1.0 [9d95972d] + TupleTools v1.4.3 [1cfade01] + UnicodeFun v0.4.1 [6e34b625] + Bzip2_jll v1.0.8+1 [4e9b3aee] + CRlibm_jll v1.0.1+0 [83423d85] + Cairo_jll v1.16.1+1 [5ae413db] + EarCut_jll v2.2.4+0 [2e619515] + Expat_jll v2.5.0+0 [b22a6f82] + FFMPEG_jll v4.4.4+1 [a3f928ae] + Fontconfig_jll v2.13.93+0 [d7e528f0] + FreeType2_jll v2.13.1+0 [559328eb] + FriBidi_jll v1.0.10+0 [78b55507] + Gettext_jll v0.21.0+0 [7746bdde] + Glib_jll v2.76.5+0 [3b182d85] + Graphite2_jll v1.3.14+0 [2e76f6c2] + HarfBuzz_jll v2.8.1+1 [c1c5ebd0] + LAME_jll v3.100.1+0 [1d63c593] + LLVMOpenMP_jll v15.0.7+0 [dd4b983a] + LZO_jll v2.10.1+0 ⌅ [e9f186c6] + Libffi_jll v3.2.2+1 [d4300ac3] + Libgcrypt_jll v1.8.7+0 [7add5ba3] + Libgpg_error_jll v1.42.0+0 [94ce4f54] + Libiconv_jll v1.17.0+0 [4b2f31a3] + Libmount_jll v2.35.0+0 [38a345b3] + Libuuid_jll v2.36.0+0 [e7412a2a] + Ogg_jll v1.3.5+1 [458c3c95] + OpenSSL_jll v3.0.13+0 [efe28fd5] + OpenSpecFun_jll v0.5.5+0 [91d4177d] + Opus_jll v1.3.2+0 [36c8627f] + Pango_jll v1.50.14+0 [30392449] + Pixman_jll v0.42.2+0 [f50d1b31] + Rmath_jll v0.4.0+0 [02c8fc9c] + XML2_jll v2.12.2+0 [aed1982a] + XSLT_jll v1.1.34+0 [4f6342f7] + Xorg_libX11_jll v1.8.6+0 [0c0b7dd1] + Xorg_libXau_jll v1.0.11+0 [a3789734] + Xorg_libXdmcp_jll v1.1.4+0 [1082639a] + Xorg_libXext_jll v1.3.4+4 [ea2f1a96] + Xorg_libXrender_jll v0.9.10+4 [14d82f49] + Xorg_libpthread_stubs_jll v0.1.1+0 [c7cfdc94] + Xorg_libxcb_jll v1.15.0+0 [c5fb5394] + Xorg_xtrans_jll v1.5.0+0 [9a68df92] + isoband_jll v0.2.3+0 [a4ae2306] + libaom_jll v3.4.0+0 [0ac62f75] + libass_jll v0.15.1+0 [f638f0a6] + libfdk_aac_jll v2.0.2+0 [f27f6e37] + libvorbis_jll v1.3.7+1 [1270edf5] + x264_jll v2021.5.5+0 [dfaa095f] + x265_jll v3.5.0+0 [8bf52ea8] + CRC32c [9fa8497b] + Future [8dfed614] + Test [efcefdf7] + PCRE2_jll v10.42.0+1 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m` Precompiling project... ✓ Makie ✓ DataFlowTasks → DataFlowTasks_Makie_Ext ✓ CairoMakie 3 dependencies successfully precompiled in 91 seconds. 315 already precompiled. 3 dependencies precompiled but different versions are currently loaded. Restart julia to access the new versions
This example illustrate the use of DataFlowTasks.jl
to parallelize the tiled
application of two kernels used in image processing. The application first
applies a blur filter on each pixel of the image; in a second step, the Roberts
cross operator is applied to
detect edges in the image.
Let us first load a test image:
using Images
url = "https://upload.wikimedia.org/wikipedia/commons/c/c3/Equus_zebra_hartmannae_-_Etosha_2015.jpg"
ispath("test-image.jpg") || download(url, "test-image.jpg")
img = Gray.(load("test-image.jpg"))
We start by defining a few helper functions:
the contract
and expand
functions manipulate ranges of indices in order
to respectively contract or expand them by a few pixels;
the img2mat
and mat2img
convert between a Gray-scale image and a matrix
of floating-point pixel intensities. The filters will work on this latter
representation, which may need a renormalization to be converted back to a
Gray-scale image.
contract(range,n) = range[begin+n:end-n]
expand(range,n) = range[begin]-n:range[end]-n
function img2mat(img)
PixelType = eltype(img)
mat = Float64.(img)
return (PixelType, mat)
end
function mat2img(PixelType, mat)
m1, m2 = extrema(mat)
PixelType.((mat .- m1) ./ (m2-m1))
end
PixelType, mat = img2mat(img);
The blur!
function averages the value of each pixel with the values of all
pixels less than width
pixels away in manhattan distance. In order to
simplify the implementation, the filter is applied only to pixels that are
sufficiently far from the boundary to have all their neighbors correctly
defined.
Results are written in-place in a pre-allocated dest
array. Unless otherwise
specified, the filter is applied to the whole image, but can be reduced to a
tile if a smaller range
argument is provided.
function blur!(dest, src; range=axes(src), width)
ri, rj = intersect.(range, contract.(axes(src), width))
weight = 1/(2*width+1)^2
@inbounds for i in ri, j in rj
dest[i,j] = 0
for δi in -width:width, δj in -width:width
dest[i,j] += src[i+δi, j+δj]
end
dest[i,j] *= weight
end
end
blur! (generic function with 1 method)
In the following, we'll use a filter width of 5 pixels, which produces the following results on the test image:
width = 5
blurred = zero(mat)
blur!(blurred, mat; width)
mat2img(PixelType, blurred)
The roberts!
function applies the Roberts cross operator to the provided
image. Like above, it operates by default on all pixels in the image
(provided they are sufficiently far from the boundaries), but can be
restricted to work on a tile if the range
argument is provided.
function roberts!(dest, src; range=axes(src))
ri, rj = intersect.(range, contract.(axes(src), 1))
for i in ri, j in rj
dest[i,j] = (
+ (sqrt(src[i, j]) - sqrt(src[i+1,j+1]))^2
+ (sqrt(src[i+1,j]) - sqrt(src[i ,j+1]))^2
)^(0.25)
end
end
roberts! (generic function with 1 method)
Applying this edge detection filter on the original image produces the following results:
contour = zero(mat)
roberts!(contour, mat)
mat2img(PixelType, contour)
Chaining the blur and roberts filters may make edge detection less noisy:
function blur_roberts!(img; width, tmp=zero(img))
blur!(tmp, img; width)
roberts!(img, tmp)
end
mat1 = copy(mat)
tmp = zero(mat)
blur_roberts!(mat1; width, tmp)
mat2img(PixelType, mat1)
The elapsed time in this sequential version will serve as reference to evaluate the performance of other implementations:
using BenchmarkTools
t_seq = @belapsed blur_roberts!(x, width=$width, tmp=$tmp) setup=(x=copy(mat)) evals=1
2.01966651
The TiledIteration.jl
package implements various tools allowing to define and iterate over disjoint
tiles of a larger array. We'll use it to apply the filters tile by tile.
The map_tiled!
higher-order function automates the application of a filter
fun!
on all pixels of an image src
decomposed with a tilesize ts
. This
higher-order function is then used to define tiled versions of the blur and
roberts filters.
using TiledIteration
function map_tiled!(fun!, dest, src, ts)
for tile in TileIterator(axes(src), (ts, ts))
fun!(dest, src, tile)
end
end
blur_tiled!(dest, src, ts; width) = map_tiled!(dest, src, ts) do dest, src, tile
blur!(dest, src; width, range=tile)
end
roberts_tiled!(dest, src, ts) = map_tiled!(dest, src, ts) do dest, src, tile
roberts!(dest, src; range=tile)
end
function blur_roberts_tiled!(img, ts; width, tmp=zero(img))
blur_tiled!(tmp, img, ts; width)
roberts_tiled!(img, tmp, ts)
end
blur_roberts_tiled! (generic function with 1 method)
Decomposing the original image in tiles of size $512\times 512$, the tiled application of the filters yields the same result as above:
ts = 512
mat1 .= mat
blur_roberts_tiled!(mat1, ts; width, tmp)
mat2img(PixelType, mat1)
Depending on the system, the fact that memory is now accessed in blocks may (or may not) have a significant impact on the performance, due to cache effects.
t_tiled = @belapsed blur_roberts_tiled!(x, ts; width=$width, tmp=$tmp) setup=(x=copy(mat)) evals=1
1.898831304
Parallelizing the tiled filter application is relatively straightforward using
DataFlowTasks.jl
. As usual, it involves specifying which data is accessed by
each task.
using DataFlowTasks
function blur_dft!(dest, src, ts; width)
map_tiled!(dest, src, ts) do dest, src, tile
outer = intersect.(expand.(tile, width), axes(src))
@dspawn begin
@R view(src, outer...)
@W view(dest, tile...)
blur!(dest, src; width, range=tile)
end label="blur ($tile)"
end
@dspawn @R(dest) label="blur (result)"
end
function roberts_dft!(dest, src, ts)
map_tiled!(dest, src, ts) do dest, src, tile
outer = intersect.(expand.(tile, 1), axes(src))
@dspawn begin
@R view(src, outer...)
@W view(dest, tile...)
roberts!(dest, src; range=tile)
end label="roberts ($tile)"
end
@dspawn @R(dest) label="roberts (result)"
end
roberts_dft! (generic function with 1 method)
Note how each filter spawns one task for each tile, and an extra task to get the results in the end. This allows applying a given filter independently of the other.
However, the filters remain composable: when applying both filters one after the other, no implicit synchronization is enforced at the end of the blurring stage, and the runtime may decide to intersperse blurring and roberts tasks (as long as the blurring of a tile and all its neighbors is performed before the application of the roberts filter on this tile).
function blur_roberts_dft!(img, ts; width, tmp=zero(img))
blur_dft!(tmp, img, ts; width)
roberts_dft!(img, tmp, ts)
@dspawn @R(img) label="result"
end
blur_roberts_dft! (generic function with 1 method)
Again this yields the same results on the test image:
mat1 .= mat;
blur_roberts_dft!(mat1, ts; width, tmp) |> wait
mat2img(PixelType, mat1)
t_dft = @belapsed wait(blur_roberts_dft!(x, ts; width=$width, tmp=$tmp)) setup=(x = copy(mat)) evals=1
0.296088332
DataFlowTasks.stack_weakdeps_env!()
using CairoMakie
barplot([t_seq, t_tiled, t_dft],
axis = (; title = "Elapsed time [s]",
xticks=(1:3, ["sequential", "tiled", "DataFlowTasks"])))
Status `~/.julia/scratchspaces/d1549cb6-e9f4-42f8-98cc-ffc8d067ff5b/weakdeps-1.10/Project.toml` ⌃ [13f3f980] CairoMakie v0.11.5 ⌃ [e9467ef8] GLMakie v0.9.5 [f526b714] GraphViz v0.2.0 ⌅ [ee78f7c6] Makie v0.20.4 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
A comparison of the performances of all implementations shows that the DataFlowTasks-based implementation produces a good speedup:
(;
nthreads = Threads.nthreads(),
speedup = t_seq / t_dft)
(nthreads = 8, speedup = 6.821162105097745)
We can gain more insight by collecting profiling data:
GC.gc()
mat1 .= mat;
log_info = DataFlowTasks.@log wait(blur_roberts_dft!(mat1, ts; width, tmp))
DataFlowTasks.describe(log_info)
• Elapsed time : 0.297 ├─ Critical Path : 0.083 ╰─ No-Wait : 0.284 • Run time : 2.373 ├─ Computing : 2.269 │ ╰─ unlabeled : 2.269 ├─ Task Insertion : 0.001 ╰─ Other (idle) : 0.104
The parallel trace shows how blur and roberts tasks are interspersed in the time line:
trace = plot(log_info, categories=["blur", "roberts"])
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions. └ @ Makie ~/.julia/packages/Makie/z2T2o/src/scenes.jl:220 [ Info: Computing : 2.268793593999999 [ Info: Inserting : 0.0005881950000000001 [ Info: Other : 0.103786724178825
This notebook was generated using Literate.jl.