by Amit Ramon amit@riseup.net, 2019
This Jupyter notebook contains a brief introduction to Transducers in Clojure. The purpose of it is to present the topic mainly from a user's perspective, trying not to get too deep into the gory implementation details.
Assume we want to process the following data (sample of the Iris flower data set)
(def iris-data
[{:sepal_length 5.1 :sepal_width 3.5 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
{:sepal_length 6.9 :sepal_width 3.1 :petal_length 4.9 :petal_width 1.5 :species "versicolor"}
{:sepal_length 4.7 :sepal_width 3.2 :petal_length 1.3 :petal_width 0.2 :species "setosa"}
{:sepal_length 7.1 :sepal_width 3.0 :petal_length 5.9 :petal_width 2.1 :species "virginica"}
{:sepal_length 4.6 :sepal_width 3.1 :petal_length 1.5 :petal_width 0.2 :species "setosa"}
{:sepal_length 5.0 :sepal_width 3.6 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
{:sepal_length 7.0 :sepal_width 3.2 :petal_length 4.7 :petal_width 1.4 :species "versicolor"}
{:sepal_length 6.5 :sepal_width 2.8 :petal_length 4.6 :petal_width 1.5 :species "versicolor"}
{:sepal_length 4.9 :sepal_width 3.0 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
{:sepal_length 5.7 :sepal_width 2.8 :petal_length 4.5 :petal_width 1.3 :species "versicolor"}
{:sepal_length 6.3 :sepal_width 3.3 :petal_length 6.0 :petal_width 2.5 :species "virginica"}
{:sepal_length 6.4 :sepal_width 3.2 :petal_length 4.5 :petal_width 1.5 :species "versicolor"}
{:sepal_length 5.8 :sepal_width 2.7 :petal_length 5.1 :petal_width 1.9 :species "virginica"}
{:sepal_length 5.5 :sepal_width 2.3 :petal_length 4.0 :petal_width 1.3 :species "versicolor"}
{:sepal_length 6.3 :sepal_width 2.9 :petal_length 5.6 :petal_width 1.8 :species "virginica"}])
#'user/iris-data
We are seeking a method that is:
Before presenting a candidate solution, we need some background...
In Clojure's context, Transducing is a technique for processing sequential data by applying to it a series of one or more transformations.
The term "Transduce" is a combination of the terms "Transform" and "Reduce".
[
transducers
were added in Clojure 1.7]
Think of shell pipe in Unix:
cat some-file | grep hello | cut -f 2 | sort | uniq | wc -l
Before diving into transducers
, let's see some examples—we'll start by explaining "Transformation" and "Reduction".
Apply a function to items of a collection, get a new, transformed collection.
map
function¶map
[f coll]
f
¶[item] --> new-item
Examples:
(map inc [1 2 3 4])
(2 3 4 5)
(map #(* % %) [1 2 3 4])
(1 4 9 16)
(map (fn [x] {x (count x)}) ["hello" "world" "Clojure" "is" "great"])
({"hello" 5} {"world" 5} {"Clojure" 7} {"is" 2} {"great" 5})
map
:¶(partition 3 (range 12))
((0 1 2) (3 4 5) (6 7 8) (9 10 11))
(filter odd? (range 10))
(1 3 5 7 9)
;; no initial value supplied
(reduce * (range 1 10)) ; 9!
362880
;; supplying an initial value
(reduce * 1 (range 1 10)) ; 9!
362880
Sometimes you must provide an initial value. Look at the following example—what would happen if you do not provide an initial value?
;; sum of squares - notice the significance of the initial value
(reduce
(fn [acc item] (+ acc (* item item))) ; reducing function
0 ; initial value
[3 4 5]) ; collection
50
;; converting a vector to a set
(reduce conj #{} [:a :b :c])
#{:c :b :a}
Almost there :-)
(def simple-data [2 -3 7 27 -8 2 5 9 -21 11])
(def step-1 (map inc simple-data))
(def step-2 (filter even? step-1))
(reduce + step-2)
42
(reduce + (filter even? (map inc simple-data)))
42
(->> simple-data (map inc) (filter even?) (reduce +))
42
How many passes over the data have we made in each of the above methods?
Cons:
The transduce function
transduce
[xform f coll]
[xform f init coll]
Where:
xform
- a transducerf
- a reducing functioninit
- an initial value for the reducing functioncoll
- a collection to processtransducer
do?¶A transducer
(or xform
) is a function that
reducing function
reducing function
, with the transformation 'built-in'Note: the reducing
operation is not changed, but a transformation is added 'on top of it'
This enables chaining of xforms
Example: transduce
vs. reduce
(transduce (map inc) + (range 5))
15
(reduce + (map inc (range 5)))
15
transducer
examples¶;; no need to provide an initial value!
(transduce (map #(* % % %)) conj [1 2 3 4])
[1 8 27 64]
;; here you must provide an initial value
(reduce conj [] (map #(* % % %) [1 2 3 4]))
[1 8 27 64]
The following Clojure's built-in functions create a transducer
when called without a collection:
map
cat
mapcat
filter
remove
take
take-while
take-nth
drop
drop-while
replace
partition-by
partition-all
keep
keep-indexed
map-indexed
distinct
interpose
dedupe
random-sample
simple-data ;; defined above, just a reminder
[2 -3 7 27 -8 2 5 9 -21 11]
(transduce (comp (map inc) (filter even?)) conj simple-data)
[-2 8 28 6 10 -20 12]
The same as above, in a (maybe) cleaner and more modular syntax
(def xform (comp (map inc) (filter even?)))
(transduce xform conj simple-data)
[-2 8 28 6 10 -20 12]
We can easily plug-in different reducing functions
(transduce xform * simple-data)
6451200
xform
Examples¶A 'complex' transducer. Notice that the type of the output of each transformation should be compatible with the type of input expected by the transformation following it.
For example, (random-sample 0.42)
produces (in this example) an int
, which is compatible with what (filter odd?)
expects. As another example, (partition-all 2)
produces pairs of integers, which are compatible with the input type expected by (map #(apply * %))
.
(def test-data (range 100))
(def test-xform-1
(comp
(random-sample 0.42)
(filter odd?)
(remove #(> % 23))
(map inc)
(take 5)
(partition-all 2)
(map #(apply * %))))
(transduce test-xform-1 + test-data)
198
(transduce test-xform-1 conj test-data)
[80 22]
(transduce (comp test-xform-1 (map #(/ % 2))) conj test-data)
[24 10]
(def xf-prepare-data (comp
(filter #(= (:species %) "virginica"))
(random-sample 0.7)))
(def xf-calculate (comp
(map (fn [item] [(:petal_length item) (:petal_width item)]))
(map (fn [[l w]] {:petal_ratio (/ l w) :petal_area (* l w)}))))
(def xf (comp xf-prepare-data xf-calculate))
(transduce xf conj iris-data)
[{:petal_ratio 2.8095238095238098, :petal_area 12.39} {:petal_ratio 2.4, :petal_area 15.0} {:petal_ratio 2.6842105263157894, :petal_area 9.69} {:petal_ratio 3.1111111111111107, :petal_area 10.08}]
comp
from built-in
functions and/or other xformsUsing the transduce
function is perhaps the most common way for
using transducers, but transducers can be used via other methods:
[xform coll]
[xform coll & colls]
The sequence
function is called with a transducer
and one or more
collections, and return a lazy sequence of applications of the
transform to the items of the collection(s).
(def my-seq (sequence (map inc) (range 100)))
(take 5 my-seq)
(1 2 3 4 5)
[to-coll xform from-coll]
The into
function adds the items of a given collection to a target
collection. When it is given a transducer
, it applies the transform
to the items of the given collection and then adds the transformed
items to the target collection.
(into #{} (map inc) (range 10))
#{7 1 4 6 3 2 9 5 10 8}
[xform* coll]
The eduction
function takes one or more transducers and a
collection, and returns a reducible/iterable application of the
transducers to the items of the collection.
Notice that no new collection is created when calling eduction
- the
transform is applied only when iterating over the object returned by
eduction
. Also note that you can call reduce or iterate the eduction
object as many times as you want.
(def ed (eduction (map inc) (filter odd?) [1 2 3 4 5 6 7 8 9 10]))
(reduce + 0 ed)
35
(take 3 ed)
(3 5 7)
eduction
is useful for bundling together a transform and a collection
in advance. You will later be able to apply reduce
to the
bundled object at different times and locations, possibly with different
reducing functions.
Some stuff worth looking at: