Distributions and Sampling with NumPy¶

EXERCISES¶

1. When we print my_list and my_array it appears to produce the same result. How would you check the type of data structure for each one?

my_list = [11, 12, 33]
print(my_list)

my_array = np.array([11, 12, 33])
print(my_array)

SOLUTION

In [6]:

import numpy as np

In [7]:

my_list = [11, 12, 33]
print(my_list, type(my_list))

my_array = np.array([11, 12, 33])
print(my_array, type(my_array))

# printing also the type, would give us an indication of the data structure

[11, 12, 33] <class 'list'>
[11 12 33] <class 'numpy.ndarray'>

2. Define two vectors $X$ and $Y$, with a set of 5 numbers of your choice. Secondly, try to perform the following operation $(X + Y) / 2$. Is it possible to make this operation with both Lists and Arrays? if not why?

SOLUTION

In [8]:

#Trying out with list
X = [11, 12, 12, 14, 15]
Y = [10, 20, 30, 40, 50]

(X + Y) /2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-ee5320697cea> in <module>
      3 Y = [10, 20, 30, 40, 50]
      4 
----> 5 (X + Y) /2

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [9]:

#Trying out with array
X = np.array([11, 12, 12, 14, 15])
Y = np.array([10, 20, 30, 40, 50])

(X + Y) /2

Out[9]:

array([10.5, 16. , 21. , 27. , 32.5])

It does not work with lists simply because X and Y are Python objects, and one can't make mathematical operations with two objects just like that. On the other hands, Numpy makes arrays to be seen as a mathematical vector, which make it possible to perform operations.

3. Let's say that I do want to import the whole stats library and then use the Uniform distribution

import scipy.stats as stats

Why is the following function not working anymore? Make the appropiate changes to fix the problem.

uniform.rvs(size=100)

SOLUTION

There are two ways to import libraries:

In [10]:

# importing the whole library
import scipy.stats as stats

# adding the alias at the begining
stats.uniform.rvs(size=10)

Out[10]:

array([0.86471247, 0.86690817, 0.05705536, 0.46569065, 0.29175586,
       0.66459976, 0.79557259, 0.85127024, 0.02890321, 0.03092554])

In [11]:

# importing only the fucntion needed
from scipy.stats import uniform

# no alias needed
uniform.rvs(size=10)

Out[11]:

array([0.28894512, 0.47649095, 0.21860038, 0.80341151, 0.32551943,
       0.82727626, 0.48046056, 0.08225253, 0.66734134, 0.31230785])

_4. All of the following 3 functions generate a sample of 20 random numbers from a Normal distribution with mean = 10 and sd = 5.

They have the same parameters, but do they produce the same results? What are the differences or similarities among them?_

norm.rvs(10, 5, 20)

norm.rvs(loc=10, scale=5, size=20, random_state=2021)

norm.rvs(random_state=2021, scale=5, loc=10, size=20)

SOLUTION

When the name of the arguments in the fucntion are not specified, then the order of the parameters matter

norm.rvs(10, 5, 20)

When the name of the arguments are specified, then the parameters can be put in any order, like so

norm.rvs(loc=10, scale=5, size=20)

norm.rvs(scale=5, loc=10, size=20)

5. Given the following vector in an array form. Explore why it cannot be sampled. How do you fix this?

V = np.array([0, 1, 2, 3, 4, 5])
random.sample(V, 2)

In [12]:

V = np.array([0, 1, 2, 3, 4, 5])
random.sample(V, 2) 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-a9019d9632b4> in <module>
      1 V = np.array([0, 1, 2, 3, 4, 5])
----> 2 random.sample(V, 2)

~/opt/anaconda3/lib/python3.7/random.py in sample(self, population, k)
    315             population = tuple(population)
    316         if not isinstance(population, _Sequence):
--> 317             raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
    318         randbelow = self._randbelow
    319         n = len(population)

TypeError: Population must be a sequence or set.  For dicts, use list(d).

SOLUTION

This can't be sampled because of 2 reasons.

The library random need to be imported
the array need to be converted to a list

In [13]:

import random
V = np.array([0, 1, 2, 3, 4, 5])
V = list(V)
random.sample(V, 2) 

Out[13]:

[1, 2]

_6. Define a vector with elements of your wish, you can use Lists or arrays to do this. Then write a code that will take a sample corresponding to the 20% total amount of elements

Hint: you can use len() function

SOLUTION

In [20]:

import random

def percetage_sample(vector, percentage):
    "This function takes a vector and a desire percentage sample in decimal notation"
    # calculate elements to sample
    to_sample = len(vector)*percentage
    # convert to closest integer
    to_sample = int(to_sample)
    # do random sample
    sampled = random.sample(list(vector), to_sample) 
    return sampled

In [21]:

netflix = ["Luis Miguel", "New Amsterdam", "Lupin", "Shtisel", "Taco Chronicles", "The Queen's Gambit", 
           "Too Hot to Handle", "The Crown", "Rick and Morty", "Anne+", "Selling Sunset", "Vikings"] 

# Sampling 20% of the Netflix list
percetage_sample(netflix, 0.20)

Out[21]:

['Too Hot to Handle', 'Rick and Morty']