In [1]:

```
%load_ext watermark
```

In [2]:

```
%watermark -v -p numpy -d -u
```

[More information](https://github.com/rasbt/watermark) about the `watermark` magic command extension.

This is just a quick overview of how to deal with missing values (i.e., "NaN"s for "Not-a-Number") in NumPy and I am happy to expand it over time. Yes, and there will also be a separate one for pandas some time!

I would be happy to hear your comments and suggestions. Please feel free to drop me a note via twitter, email, or google+.

- Sample data from a CSV file
- Determining if a value is missing
- Counting the number of missing values
- Calculating the sum of an array that contains NaNs
- Removing all rows that contain missing values
- Convert missing values to 0
- Converting certain numbers to NaN
- Remove all missing elements from an array

Let's assume that we have a CSV file with missing elements like the one shown below.

In [3]:

```
%%file example.csv
1,2,3,4
5,6,,8
10,11,12,
```

The `np.genfromtxt`

function has a `missing_values`

parameters which translates missing values into `np.nan`

objects by default. This allows us to construct a new NumPy `ndarray`

object, even if elements are missing.

In [4]:

```
import numpy as np
ary = np.genfromtxt('./example.csv', delimiter=',')
print('%s x %s array:\n' %(ary.shape[0], ary.shape[1]))
print(ary)
```

A handy function to test whether a value is a `NaN`

or not is to use the `np.isnan`

function.

In [5]:

```
np.isnan(np.nan)
```

Out[5]:

It is especially useful to create boolean masks for the so-called "fancy indexing" of NumPy arrays, which we will come back to later.

In [6]:

```
np.isnan(ary)
```

Out[6]:

In order to find out how many elements are missing in our array, we can use the `np.isnan`

function that we have seen in the previous section.

In [7]:

```
np.count_nonzero(np.isnan(ary))
```

Out[7]:

If we want to determine the number of non-missing elements, we can simply revert the returned `Boolean`

mask via the handy "tilde" sign.

In [8]:

```
np.count_nonzero(~np.isnan(ary))
```

Out[8]:

`NaN`

s¶As we will find out via the following code snippet, we can't use NumPy's regular `sum`

function to calculate the sum of an array.

In [9]:

```
np.sum(ary)
```

Out[9]:

Since the `np.sum`

function does not work, use `np.nansum`

instead:

In [10]:

```
print('total sum:', np.nansum(ary))
```

In [11]:

```
print('column sums:', np.nansum(ary, axis=0))
```

In [12]:

```
print('row sums:', np.nansum(ary, axis=1))
```

Here, we will use the `Boolean mask`

again to return only those rows that DON'T contain missing values. And if we want to get only the rows that contain `NaN`

s, we could simply drop the `~`

.

In [14]:

```
ary[~np.isnan(ary).any(1)]
```

Out[14]:

Certain operations, algorithms, and other analyses might not work with `NaN`

objects in our data array. But that's not a problem: We can use the convenient `np.nan_to_num`

function will convert it to the value 0.

In [15]:

```
ary0 = np.nan_to_num(ary)
ary0
```

Out[15]:

Vice versa, we can also convert any number to a `np.NaN`

object. Here, we use the array that we created in the previous section and convert the `0`

s back to `np.nan`

objects.

In [16]:

```
ary0[ary0==0] = np.nan
ary0
```

Out[16]:

This is one is a little bit more tricky. We can remove missing values via a combination of the `Boolean`

mask and fancy indexing, however, this will have the disadvantage that it will flatten our array (we can't just punch holes into a NumPy array).

In [17]:

```
ary[~np.isnan(ary)]
```

Out[17]:

Thus, this is a method that would better work on individual rows:

In [21]:

```
x = np.array([1,2,np.nan])
x[~np.isnan(np.array(x))]
```

Out[21]: