{ "value": { "nested": [ 1 ] } } ⟶ ( None, { "value": { "nested": [ 1 ] } } )
|
yield ( [1, 2, 3 ], { "value" : 10 } ) ⟶ { "value" : 10 }
</td>
</tr>
</table>
Overwriting singleReducerSortLocal.py
====================================================================================================
Single Reducer Local Sorted Output - MRJob
====================================================================================================
30 "do"
28 "dataset"
27 "creating"
27 "driver"
27 "experiements"
26 "def"
26 "descent"
25 "compute"
24 "code"
24 "done"
23 "descent"
22 "corresponding"
19 "consists"
19 "evaluate"
17 "drivers"
15 "computational"
15 "computing"
15 "document"
14 "center"
13 "efficient"
10 "clustering"
9 "change"
9 "during"
7 "contour"
5 "distributed"
4 "develop"
3 "different"
2 "cluster"
1 "cell"
0 "current"
Overwriting singleReducerSort.py
====================================================================================================
Single Reducer Sorted Output - MRJob
====================================================================================================
30 "do"
28 "dataset"
27 "creating"
27 "driver"
27 "experiements"
26 "def"
26 "descent"
25 "compute"
24 "code"
24 "done"
23 "descent"
22 "corresponding"
19 "consists"
19 "evaluate"
17 "drivers"
15 "computational"
15 "computing"
15 "document"
14 "center"
13 "efficient"
10 "clustering"
9 "change"
9 "during"
7 "contour"
5 "distributed"
4 "develop"
3 "different"
2 "cluster"
1 "cell"
0 "current"
Overwriting MRJob_unorderedTotalOrderSort.py
Found 4 items
-rw-r--r-- 1 koza supergroup 0 2016-08-20 19:28 /user/koza/sort/un_output/_SUCCESS
-rw-r--r-- 1 koza supergroup 116 2016-08-20 19:28 /user/koza/sort/un_output/part-00000
-rw-r--r-- 1 koza supergroup 125 2016-08-20 19:28 /user/koza/sort/un_output/part-00001
-rw-r--r-- 1 koza supergroup 152 2016-08-20 19:28 /user/koza/sort/un_output/part-00002
----/part-00000-----
B 19 consists
B 19 evaluate
B 17 drivers
B 15 computational
B 15 computing
B 15 document
B 14 center
B 13 efficient
----/part-00001-----
C 10 clustering
C 9 change
C 9 during
C 7 contour
C 5 distributed
C 4 develop
C 3 different
C 2 cluster
C 1 cell
C 0 current
----/part-00001-----
A 30 do
A 28 dataset
A 27 creating
A 27 driver
A 27 experiements
A 26 def
A 26 descent
A 25 compute
A 24 code
A 24 done
A 23 descent
A 22 corresponding
Overwriting MRJob_multipleReducerTotalOrderSort.py
====================================================================================================
Total Order Sort with multiple reducers - notice that the part files are also in order.
====================================================================================================
/part-00000
----------------------------------------------------------------------------------------------------
B 30 do
B 28 dataset
B 27 creating
B 27 driver
B 27 experiements
B 26 def
B 26 descent
B 25 compute
B 24 code
B 24 done
B 23 descent
B 22 corresponding
----------------------------------------------------------------------------------------------------
/part-00001
----------------------------------------------------------------------------------------------------
C 19 consists
C 19 evaluate
C 17 drivers
C 15 computational
C 15 computing
C 15 document
C 14 center
C 13 efficient
----------------------------------------------------------------------------------------------------
/part-00002
----------------------------------------------------------------------------------------------------
A 10 clustering
A 9 change
A 9 during
A 7 contour
A 5 distributed
A 4 develop
A 3 different
A 2 cluster
A 1 cell
Overwriting MRJob_RandomSample.py
Out[88]:
(array([ 0.00986008, 0.04131517, 0.06350346, 0.09810477]),
array([ 0., 25., 50., 75.]))
Sample Data min 0.00986007565667
Sample Data max 0.514811335171
[0.009860075656665268, 0.04131516964443623, 0.06350346288221721, 0.0981047679070244]
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Python version 2.7.11 (default, Dec 6 2015 18:57:58)
SparkSession available as 'spark'.
3 Partitions
==================================================
partition 0
==================================================
30 do
28 dataset
27 creating
27 driver
27 experiements
26 def
26 descent
25 compute
24 code
24 done
23 descent
22 corresponding
==================================================
partition 1
==================================================
19 consists
19 evaluate
17 drivers
15 computational
15 computing
15 document
14 center
13 efficient
==================================================
partition 2
==================================================
10 clustering
9 change
9 during
7 contour
5 distributed
4 develop
3 different
2 cluster
1 cell
0 current
|