Introduction to Transducers

by Amit Ramon [email protected], 2019

This Jupyter notebook contains a brief introduction to Transducers in Clojure. The purpose of it is to present the topic mainly from a user's perspective, trying not to get too deep into the gory implementation details.

A Sample Task

Assume we want to process the following data (sample of the Iris flower data set)

In [1]:
(def iris-data
  [{:sepal_length 5.1 :sepal_width 3.5 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
   {:sepal_length 6.9 :sepal_width 3.1 :petal_length 4.9 :petal_width 1.5 :species "versicolor"}
   {:sepal_length 4.7 :sepal_width 3.2 :petal_length 1.3 :petal_width 0.2 :species "setosa"}
   {:sepal_length 7.1 :sepal_width 3.0 :petal_length 5.9 :petal_width 2.1 :species "virginica"}
   {:sepal_length 4.6 :sepal_width 3.1 :petal_length 1.5 :petal_width 0.2 :species "setosa"}
   {:sepal_length 5.0 :sepal_width 3.6 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
   {:sepal_length 7.0 :sepal_width 3.2 :petal_length 4.7 :petal_width 1.4 :species "versicolor"}
   {:sepal_length 6.5 :sepal_width 2.8 :petal_length 4.6 :petal_width 1.5 :species "versicolor"}
   {:sepal_length 4.9 :sepal_width 3.0 :petal_length 1.4 :petal_width 0.2 :species "setosa"}
   {:sepal_length 5.7 :sepal_width 2.8 :petal_length 4.5 :petal_width 1.3 :species "versicolor"}
   {:sepal_length 6.3 :sepal_width 3.3 :petal_length 6.0 :petal_width 2.5 :species "virginica"}
   {:sepal_length 6.4 :sepal_width 3.2 :petal_length 4.5 :petal_width 1.5 :species "versicolor"}
   {:sepal_length 5.8 :sepal_width 2.7 :petal_length 5.1 :petal_width 1.9 :species "virginica"}
   {:sepal_length 5.5 :sepal_width 2.3 :petal_length 4.0 :petal_width 1.3 :species "versicolor"}
   {:sepal_length 6.3 :sepal_width 2.9 :petal_length 5.6 :petal_width 1.8 :species "virginica"}])
Out[1]:
#'user/iris-data

Task Definition

  1. Analyze "virginica" samples
  2. Take a random sample
  3. Calculate petal's length-to-width ratio and area
  4. Return the result as a vector of maps

What Is Required From A Good Solution?

We are seeking a method that is:

  • modular
  • efficient
  • clean & legible

Before presenting a candidate solution, we need some background...

Transducers To The Rescue

What Are "Transducers"?

In Clojure's context, Transducing is a technique for processing sequential data by applying to it a series of one or more transformations.

The term "Transduce" is a combination of the terms "Transform" and "Reduce".

[transducers were added in Clojure 1.7]

Think of shell pipe in Unix:

cat some-file | grep hello | cut -f 2 | sort | uniq | wc -l

Before diving into transducers, let's see some examples—we'll start by explaining "Transformation" and "Reduction".

Transformation

Apply a function to items of a collection, get a new, transformed collection.

The map function

map
[f coll]

The mapping function f

[item] --> new-item

Examples:

In [2]:
(map inc [1 2 3 4])
Out[2]:
(2 3 4 5)
In [3]:
(map #(* % %) [1 2 3 4])
Out[3]:
(1 4 9 16)
In [4]:
(map (fn [x] {x (count x)}) ["hello" "world" "Clojure" "is" "great"])
Out[4]:
({"hello" 5} {"world" 5} {"Clojure" 7} {"is" 2} {"great" 5})

Transformation can be done with functions other than map:

In [5]:
(partition 3 (range 12))
Out[5]:
((0 1 2) (3 4 5) (6 7 8) (9 10 11))
In [6]:
(filter odd? (range 10))
Out[6]:
(1 3 5 7 9)

Reduction

Reduce a collection to a scalar value (a somewhat oversimplified definition)

The reduce function

 reduce
 [f coll]
 [f val coll]

The reducing function f

[acc item] --> new-acc

Examples:

In [7]:
;; no initial value supplied
(reduce * (range 1 10)) ; 9!
Out[7]:
362880
In [8]:
;; supplying an initial value
(reduce * 1 (range 1 10)) ; 9!
Out[8]:
362880

Sometimes you must provide an initial value. Look at the following example—what would happen if you do not provide an initial value?

In [9]:
;; sum of squares - notice the significance of the initial value
(reduce
    (fn [acc item] (+ acc (* item item))) ; reducing function
    0 ; initial value
    [3 4 5]) ; collection
Out[9]:
50

The result of reduction need not be a scalar! (But you'll have to supply an appropriate initial value.)

In [10]:
;; converting a vector to a set
(reduce conj #{} [:a :b :c])
Out[10]:
#{:c :b :a}

Combining Transformations "pre-transducers" era

Almost there :-)

In [11]:
(def simple-data [2 -3 7 27 -8 2 5 9 -21 11])
(def step-1 (map inc simple-data))
(def step-2 (filter even? step-1))
(reduce + step-2)
Out[11]:
42
In [12]:
(reduce + (filter even? (map inc simple-data)))
Out[12]:
42
In [13]:
(->> simple-data (map inc) (filter even?) (reduce +))
Out[13]:
42

How many passes over the data have we made in each of the above methods?

Combining Transformations "pre-transducers" era (cont.)

Cons:

  • intermediate sequences
  • memory usage
  • non-modular

Transducers Are Here

The transduce function

transduce
[xform f coll]
[xform f init coll]

Where:

  • xform - a transducer
  • f - a reducing function
  • init - an initial value for the reducing function
  • coll - a collection to process

What does a transducer do?

A transducer (or xform) is a function that

  • takes a reducing function
  • wraps it with some transformation
  • returns a new reducing function, with the transformation 'built-in'

Note: the reducing operation is not changed, but a transformation is added 'on top of it'

This enables chaining of xforms

Example: transduce vs. reduce

In [14]:
(transduce (map inc) + (range 5))
Out[14]:
15
In [15]:
(reduce + (map inc (range 5)))
Out[15]:
15

Some more transducer examples

In [16]:
;; no need to provide an initial value!
(transduce (map #(* % % %)) conj [1 2 3 4])
Out[16]:
[1 8 27 64]
In [17]:
;; here you must provide an initial value
(reduce conj [] (map #(* % % %) [1 2 3 4]))
Out[17]:
[1 8 27 64]

Functions That Create Transducers

The following Clojure's built-in functions create a transducer when called without a collection:

map cat mapcat filter remove take take-while take-nth drop drop-while replace partition-by partition-all keep keep-indexed map-indexed distinct interpose dedupe random-sample

defining transformations with transducers

xforms

In [18]:
simple-data ;; defined above, just a reminder
Out[18]:
[2 -3 7 27 -8 2 5 9 -21 11]
In [19]:
(transduce (comp (map inc) (filter even?)) conj simple-data)
Out[19]:
[-2 8 28 6 10 -20 12]

The same as above, in a (maybe) cleaner and more modular syntax

In [20]:
(def xform (comp (map inc) (filter even?)))

(transduce xform conj simple-data)
Out[20]:
[-2 8 28 6 10 -20 12]

We can easily plug-in different reducing functions

In [21]:
(transduce xform * simple-data)
Out[21]:
6451200

More xform Examples

A 'complex' transducer. Notice that the type of the output of each transformation should be compatible with the type of input expected by the transformation following it.

For example, (random-sample 0.42) produces (in this example) an int, which is compatible with what (filter odd?) expects. As another example, (partition-all 2) produces pairs of integers, which are compatible with the input type expected by (map #(apply * %)).

In [22]:
(def test-data (range 100))

(def test-xform-1
    (comp
        (random-sample 0.42)
        (filter odd?)
        (remove #(> % 23))
        (map inc)
        (take 5)
        (partition-all 2)
        (map #(apply * %))))

(transduce test-xform-1 + test-data)
Out[22]:
198
In [23]:
(transduce test-xform-1 conj test-data)
Out[23]:
[80 22]
In [24]:
(transduce (comp  test-xform-1 (map #(/ % 2))) conj test-data)
Out[24]:
[24 10]

Back To The Iris Data

Iris data

Here is our solution. To make it more modular, separate xforms are defined for intermediate tasks, and are later combined for making the final xform.

In [25]:
(def xf-prepare-data (comp
                       (filter #(= (:species %) "virginica"))
                       (random-sample 0.7)))

(def xf-calculate (comp
                    (map (fn [item] [(:petal_length item) (:petal_width item)]))
                    (map (fn [[l w]] {:petal_ratio (/ l w) :petal_area (* l w)}))))
    
(def xf (comp xf-prepare-data xf-calculate))    

(transduce xf conj iris-data)
Out[25]:
[{:petal_ratio 2.8095238095238098, :petal_area 12.39} {:petal_ratio 2.4, :petal_area 15.0} {:petal_ratio 2.6842105263157894, :petal_area 9.69} {:petal_ratio 3.1111111111111107, :petal_area 10.08}]

Benefits Of Transduers

  • Processing is done in one pass
  • Initial value is generated automatically (when not provided)
  • Modularity: xforms can be created using comp from built-in functions and/or other xforms
  • The "transformation" (described by an xform) is separated from the "reduction" (described by a reducing function); thus the same xform can be used with different reducing functions and vice versa

Additional Usages

Using the transduce function is perhaps the most common way for using transducers, but transducers can be used via other methods:

  • sequence - lazy pass
  • into
  • eduction
  • core.async channels

sequence

[xform coll]
[xform coll & colls]

The sequence function is called with a transducer and one or more collections, and return a lazy sequence of applications of the transform to the items of the collection(s).

In [26]:
(def my-seq (sequence (map inc) (range 100)))
(take 5 my-seq)
Out[26]:
(1 2 3 4 5)

into

[to-coll xform from-coll]

The into function adds the items of a given collection to a target collection. When it is given a transducer, it applies the transform to the items of the given collection and then adds the transformed items to the target collection.

In [27]:
(into #{} (map inc) (range 10))
Out[27]:
#{7 1 4 6 3 2 9 5 10 8}

eduction

[xform* coll]

The eduction function takes one or more transducers and a collection, and returns a reducible/iterable application of the transducers to the items of the collection.

Notice that no new collection is created when calling eduction - the transform is applied only when iterating over the object returned by eduction. Also note that you can call reduce or iterate the eduction object as many times as you want.

In [28]:
(def ed (eduction (map inc) (filter odd?) [1 2 3 4 5 6 7 8 9 10]))

(reduce + 0 ed)
Out[28]:
35
In [29]:
(take 3 ed)
Out[29]:
(3 5 7)

eduction is useful for bundling together a transform and a collection in advance. You will later be able to apply reduce to the bundled object at different times and locations, possibly with different reducing functions.