Our client wants to prove that our dataset is nicely distributed around the mean value of 100.
They asked us to run some tests on several subsections of it to make sure they won't get a non-descriptive section of our data.
Look at the mean value of each subtask.
# importing the necessary dependencies
import numpy as np
# loading the Dataset
dataset = np.genfromtxt('../../Datasets/normal_distribution_splittable.csv', delimiter=',')
Since we need several rows of our dataset to complete the given task, we have to use indexing to get the right rows.
To recap, index:
# indexing the second row of the dataset (2nd row)
second_row = dataset[1]
np.mean(second_row)
96.90038836444445
# indexing the last element of the dataset (last row)
last_row = dataset[-1]
np.mean(last_row)
100.18096645222221
# indexing the first value of the first row (1st row, 1st value)
first_val_first_row = dataset[0][0]
np.mean(first_val_first_row)
99.14931546
# indexing the last value of the second to last row (we want to use the combined access syntax here)
last_val_second_last_row = dataset[-2, -1]
np.mean(last_val_second_last_row)
101.2226037
Other than the single rows and values we also need to get some subsets of the dataset.
Use slicing for:
# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns
subsection_2x2 = dataset[1:3, 1:3]
np.mean(subsection_2x2)
95.63393608250001
Several smaller values can cluster in such a small subsection leading to the value being really low.
If we make our subsection larger, we have a higher chance of getting a more expressive view of our data.
# selecting every second element of the fifth row
every_other_elem = dataset[4, ::2]
np.mean(every_other_elem)
98.35235805800001
# reversing the entry order, selecting the first two rows in reversed order
reversed_last_row = dataset[-1, ::-1]
np.mean(reversed_last_row)
100.18096645222222
Our client's team only wants to use a small subset of the given dataset.
Therefore we need to first split it into 3 equal pieces and then give them the first half of the first split.
They sent us this drawing to show us what they need:
1, 2, 3, 4, 5, 6 1, 2 3, 4 5, 6 1, 2
3, 2, 1, 5, 4, 6 => 3, 2 1, 5 4, 6 => 3, 2 => 1, 2
5, 3, 1, 2, 4, 3 5, 3 1, 2 4, 3 3, 2
1, 2, 2, 4, 1, 5 1, 2 2, 4 1, 5 5, 3
1, 2
Note:
We are using a very small dataset here but imagine you have a huge amount of data and only want to look at a small subset of it to tweak your visualizations
# splitting up our dataset horizontally on indices one third and two thirds
hor_splits = np.hsplit(dataset,(3))
# splitting up our dataset vertically on index 2
ver_splits = np.vsplit(hor_splits[0],(2))
# requested subsection of our dataset which has only half the amount of rows and only a third of the columns
print("Dataset", dataset.shape)
print("Subset", ver_splits[0].shape)
Dataset (24, 9) Subset (12, 3)
Once you sent over the dataset they tell you that they also need a way iterate over the whole dataset element by element as if it would be a one-dimensional list.
However, they want to also now the position in the dataset itself.
They send you this piece of code and tell you that it's not working as mentioned.
Come up with the right solution for their needs using the ndenumerate method
.
# iterating over whole datagmaiset (each value in each row)
curr_index = 0
for x in np.nditer(dataset):
print(x, curr_index)
curr_index += 1
99.14931546 0 104.03852715 1 107.43534677 2 97.85230675 3 98.74986914 4 98.80833412 5 96.81964892 6 98.56783189 7 101.34745901 8 92.02628776 9 97.10439252 10 99.32066924 11 97.24584816 12 92.9267508 13 92.65657752 14 105.7197853 15 101.23162942 16 93.87155456 17 95.66253664 18 95.17750125 19 90.93318132 20 110.18889465 21 98.80084371 22 105.95297652 23 98.37481387 24 106.54654286 25 107.22482426 26 91.37294597 27 100.96781394 28 100.40118279 29 113.42090475 30 105.48508838 31 91.6604946 32 106.1472841 33 95.08715803 34 103.40412146 35 101.20862522 36 103.5730309 37 100.28690912 38 105.85269352 39 93.37126331 40 108.57980357 41 100.79478953 42 94.20019732 43 96.10020311 44 102.80387079 45 98.29687616 46 93.24376389 47 97.24130034 48 89.03452725 49 96.2832753 50 104.60344836 51 101.13442416 52 97.62787811 53 106.71751618 54 102.97585605 55 98.45723272 56 100.72418901 57 106.39798503 58 95.46493436 59 94.35373179 60 106.83273763 61 100.07721494 62 96.02548256 63 102.82360856 64 106.47551845 65 101.34745901 66 102.45651798 67 98.74767493 68 97.57544275 69 92.5748759 70 91.37294597 71 105.30350449 72 92.87730812 73 103.19258339 74 104.40518318 75 101.29326772 76 100.85447132 77 101.2226037 78 106.03868807 79 97.85230675 80 110.44484313 81 93.87155456 82 101.5363647 83 97.65393524 84 92.75048583 85 101.72074646 86 96.96851209 87 103.29147111 88 99.14931546 89 101.3514185 90 100.37372248 91 106.6471081 92 100.61742813 93 105.0320535 94 99.35999981 95 98.87007532 96 95.85284217 97 93.97853495 98 97.21315663 99 107.02874163 100 102.17642112 101 96.74630281 102 95.93799169 103 102.62384733 104 105.07475277 105 97.59572169 106 106.57364584 107 95.65982034 108 107.22482426 109 107.19119932 110 102.93039474 111 85.98839623 112 95.19184343 113 91.32093303 114 102.35313953 115 100.39303522 116 100.39303522 117 92.0108226 118 97.75887636 119 93.18884302 120 100.44940274 121 108.09423367 122 96.50342927 123 99.58664719 124 95.19184343 125 103.1521596 126 109.40523174 127 93.83969256 128 99.95827854 129 101.83462816 130 99.69982772 131 103.05289628 132 103.93383957 133 104.15899829 134 106.11454989 135 88.80221141 136 94.5081787 137 94.59300658 138 101.08830521 139 96.34622848 140 96.89244283 141 98.07122664 142 100.28690912 143 96.78266211 144 99.84251605 145 104.03478031 146 106.57052697 147 105.13668343 148 105.37011896 149 99.07551254 150 104.15899829 151 98.75108352 152 101.86186193 153 103.61720152 154 99.57859892 155 99.4889538 156 103.05541444 157 98.65912661 158 98.72774132 159 104.70526438 160 110.44484313 161 97.49594839 162 96.59385486 163 104.63817694 164 102.55198606 165 105.86078488 166 96.5937781 167 93.04610867 168 99.92159953 169 100.96781394 170 96.76814836 171 91.6779221 172 101.79132774 173 101.20773355 174 98.29243952 175 101.83845792 176 97.94046856 177 102.20618501 178 91.37294597 179 106.89005002 180 106.57364584 181 102.26648279 182 107.40064604 183 99.94318168 184 103.40412146 185 106.38276709 186 98.00253006 187 97.10439252 188 99.80873105 189 101.63973121 190 106.46476468 191 110.43976681 192 100.69156231 193 99.99579473 194 101.32113654 195 94.76253572 196 97.24130034 197 96.10020311 198 94.57421727 199 100.80409326 200 105.02389857 201 98.61325194 202 95.62359311 203 97.99762409 204 103.83852459 205 101.2226037 206 94.11176915 207 99.62387832 208 104.51786419 209 97.62787811 210 93.97853495 211 98.75108352 212 106.05042487 213 100.07721494 214 106.89005002 215
# iterating over whole dataset with indices matching the position in the dataset
for index, value in np.ndenumerate(dataset):
print(index, value)
(0, 0) 99.14931546 (0, 1) 104.03852715 (0, 2) 107.43534677 (0, 3) 97.85230675 (0, 4) 98.74986914 (0, 5) 98.80833412 (0, 6) 96.81964892 (0, 7) 98.56783189 (0, 8) 101.34745901 (1, 0) 92.02628776 (1, 1) 97.10439252 (1, 2) 99.32066924 (1, 3) 97.24584816 (1, 4) 92.9267508 (1, 5) 92.65657752 (1, 6) 105.7197853 (1, 7) 101.23162942 (1, 8) 93.87155456 (2, 0) 95.66253664 (2, 1) 95.17750125 (2, 2) 90.93318132 (2, 3) 110.18889465 (2, 4) 98.80084371 (2, 5) 105.95297652 (2, 6) 98.37481387 (2, 7) 106.54654286 (2, 8) 107.22482426 (3, 0) 91.37294597 (3, 1) 100.96781394 (3, 2) 100.40118279 (3, 3) 113.42090475 (3, 4) 105.48508838 (3, 5) 91.6604946 (3, 6) 106.1472841 (3, 7) 95.08715803 (3, 8) 103.40412146 (4, 0) 101.20862522 (4, 1) 103.5730309 (4, 2) 100.28690912 (4, 3) 105.85269352 (4, 4) 93.37126331 (4, 5) 108.57980357 (4, 6) 100.79478953 (4, 7) 94.20019732 (4, 8) 96.10020311 (5, 0) 102.80387079 (5, 1) 98.29687616 (5, 2) 93.24376389 (5, 3) 97.24130034 (5, 4) 89.03452725 (5, 5) 96.2832753 (5, 6) 104.60344836 (5, 7) 101.13442416 (5, 8) 97.62787811 (6, 0) 106.71751618 (6, 1) 102.97585605 (6, 2) 98.45723272 (6, 3) 100.72418901 (6, 4) 106.39798503 (6, 5) 95.46493436 (6, 6) 94.35373179 (6, 7) 106.83273763 (6, 8) 100.07721494 (7, 0) 96.02548256 (7, 1) 102.82360856 (7, 2) 106.47551845 (7, 3) 101.34745901 (7, 4) 102.45651798 (7, 5) 98.74767493 (7, 6) 97.57544275 (7, 7) 92.5748759 (7, 8) 91.37294597 (8, 0) 105.30350449 (8, 1) 92.87730812 (8, 2) 103.19258339 (8, 3) 104.40518318 (8, 4) 101.29326772 (8, 5) 100.85447132 (8, 6) 101.2226037 (8, 7) 106.03868807 (8, 8) 97.85230675 (9, 0) 110.44484313 (9, 1) 93.87155456 (9, 2) 101.5363647 (9, 3) 97.65393524 (9, 4) 92.75048583 (9, 5) 101.72074646 (9, 6) 96.96851209 (9, 7) 103.29147111 (9, 8) 99.14931546 (10, 0) 101.3514185 (10, 1) 100.37372248 (10, 2) 106.6471081 (10, 3) 100.61742813 (10, 4) 105.0320535 (10, 5) 99.35999981 (10, 6) 98.87007532 (10, 7) 95.85284217 (10, 8) 93.97853495 (11, 0) 97.21315663 (11, 1) 107.02874163 (11, 2) 102.17642112 (11, 3) 96.74630281 (11, 4) 95.93799169 (11, 5) 102.62384733 (11, 6) 105.07475277 (11, 7) 97.59572169 (11, 8) 106.57364584 (12, 0) 95.65982034 (12, 1) 107.22482426 (12, 2) 107.19119932 (12, 3) 102.93039474 (12, 4) 85.98839623 (12, 5) 95.19184343 (12, 6) 91.32093303 (12, 7) 102.35313953 (12, 8) 100.39303522 (13, 0) 100.39303522 (13, 1) 92.0108226 (13, 2) 97.75887636 (13, 3) 93.18884302 (13, 4) 100.44940274 (13, 5) 108.09423367 (13, 6) 96.50342927 (13, 7) 99.58664719 (13, 8) 95.19184343 (14, 0) 103.1521596 (14, 1) 109.40523174 (14, 2) 93.83969256 (14, 3) 99.95827854 (14, 4) 101.83462816 (14, 5) 99.69982772 (14, 6) 103.05289628 (14, 7) 103.93383957 (14, 8) 104.15899829 (15, 0) 106.11454989 (15, 1) 88.80221141 (15, 2) 94.5081787 (15, 3) 94.59300658 (15, 4) 101.08830521 (15, 5) 96.34622848 (15, 6) 96.89244283 (15, 7) 98.07122664 (15, 8) 100.28690912 (16, 0) 96.78266211 (16, 1) 99.84251605 (16, 2) 104.03478031 (16, 3) 106.57052697 (16, 4) 105.13668343 (16, 5) 105.37011896 (16, 6) 99.07551254 (16, 7) 104.15899829 (16, 8) 98.75108352 (17, 0) 101.86186193 (17, 1) 103.61720152 (17, 2) 99.57859892 (17, 3) 99.4889538 (17, 4) 103.05541444 (17, 5) 98.65912661 (17, 6) 98.72774132 (17, 7) 104.70526438 (17, 8) 110.44484313 (18, 0) 97.49594839 (18, 1) 96.59385486 (18, 2) 104.63817694 (18, 3) 102.55198606 (18, 4) 105.86078488 (18, 5) 96.5937781 (18, 6) 93.04610867 (18, 7) 99.92159953 (18, 8) 100.96781394 (19, 0) 96.76814836 (19, 1) 91.6779221 (19, 2) 101.79132774 (19, 3) 101.20773355 (19, 4) 98.29243952 (19, 5) 101.83845792 (19, 6) 97.94046856 (19, 7) 102.20618501 (19, 8) 91.37294597 (20, 0) 106.89005002 (20, 1) 106.57364584 (20, 2) 102.26648279 (20, 3) 107.40064604 (20, 4) 99.94318168 (20, 5) 103.40412146 (20, 6) 106.38276709 (20, 7) 98.00253006 (20, 8) 97.10439252 (21, 0) 99.80873105 (21, 1) 101.63973121 (21, 2) 106.46476468 (21, 3) 110.43976681 (21, 4) 100.69156231 (21, 5) 99.99579473 (21, 6) 101.32113654 (21, 7) 94.76253572 (21, 8) 97.24130034 (22, 0) 96.10020311 (22, 1) 94.57421727 (22, 2) 100.80409326 (22, 3) 105.02389857 (22, 4) 98.61325194 (22, 5) 95.62359311 (22, 6) 97.99762409 (22, 7) 103.83852459 (22, 8) 101.2226037 (23, 0) 94.11176915 (23, 1) 99.62387832 (23, 2) 104.51786419 (23, 3) 97.62787811 (23, 4) 93.97853495 (23, 5) 98.75108352 (23, 6) 106.05042487 (23, 7) 100.07721494 (23, 8) 106.89005002