#!/usr/bin/env python # coding: utf-8 # # Introduction to Python for Scientific Computing - Tutorial n°3 # ***Mohammed Ait Lahcen, Economic Theory Group, Faculty of Business and Economics, University of Basel*** # In this tutorial we take a look at three of the most important scientific packages in Python: NumPy, Matplotlib and Scipy. # # For a more detailed introduction to Python for scientific computing you can refer to the lectures available at [QuantEcon.org](https://lectures.quantecon.org/py/learning_python.html). # ## Import statements and packages # Since Python is a general purpose language, the core language is quite small for practical reasons (easier maintainance and learning). Most of the "advanced" functionalities are created by developers as separate packages (also called modules or libraries) that must be installed and imported in order to be used. This is in particular the case for scientific computing in Python. # For example, you can import the Numpy package by writing: # In[1]: import numpy numpy.sqrt(16) # The next way of doing is the "standard" way: # In[2]: import numpy as np np.sqrt(16) # Another way of importing the `sqrt` function could be: # In[3]: from numpy import sqrt sqrt(16) # The advantage of the third method is less typing if `sqrt` is used often in the code. However, in a long program, it’s hard for the reader to know which package the function `sqrt` has been imported from. This can be important if similarly named functions are available from several packages. # # Using `*` instead of `sqrt` will import all of the content of Numpy. It is highly recommended to avoid doing this because it imports lots of variable names which can be a potential source of conflict. # ## Numpy # Numpy is the fundamental package for scientific computing with Python. It offers a powerful N-dimensional array object, sophisticated array operations and functions, tools for integrating C/C++ and Fortran code, useful linear algebra, Fourier transform, and random number capabilities. # # We start by importing Numpy into the `global` namespace (if it's not aleady there): # In[4]: import numpy as np # ### Math with Numpy # Numpy offers a set of highly optimized standard math functions: # In[5]: x = 5 np.exp(x) # In[6]: y = -3 np.abs(y) # In[7]: np.pi # In[8]: np.e # ### Arrays # Numpy’s main object is the homogeneous multidimensional array `numpy.ndarray`. An array is a table of elements (usually numbers), **all of the same type**, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is called rank. # # Numpy arrays are less flexible but much more memory efficient and faster to access compared to Python lists. # In[9]: a = np.zeros(3) a # In[10]: type(a) # The most important attributes of an `ndarray` object are: # * `ndarray.dtype` : data type of the elements in the array. # * `ndarray.ndim` : the number of axes (dimensions) of the array. # * `ndarray.shape` : the dimensions or rank of the array (size of the array in each dimension). # * `ndarray.size` : the total number of elements of the array. # # One-dimensional arrays are printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices... # In[11]: a = np.arange(15) a # In[12]: a.dtype # In[13]: a = a.reshape(3, 5) a # In[14]: a.shape # In[15]: a.ndim # In[16]: a.size # In[17]: len(a) # ### Array creation # There are several ways to create arrays. You can create an empty array: # In[18]: a = np.empty(5) a # You can also create a regular grid: # In[19]: a = np.linspace(-1,1,11) # Specifies the grid size a # In[20]: a = np.arange(-1,1,0.2) # Specifies the grid step, excludes the upper bound a # You can also create arrays of zeros or ones: # In[21]: a = np.ones(3) a # In[22]: a = np.identity(3) a # and arrays of random numbers: # In[23]: a = np.random.rand(3,2) a # NumPy arrays can be made from regular Python sequences (lists or tuples). `np.array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on. # In[24]: b = np.array([[1,2,3], [4,5,6]]) b # `np.array` accepts a single list (of lists) of numbers as an argument and not multiple numeric arguments: # In[79]: a = np.array(1,2,3,4) # WRONG a # In[25]: a = np.array([1,2,3,4]) # RIGHT a # ### Array indexing # In[26]: a = np.linspace(1,5,5) a # One-dimensional array indexing is similar to list indexing: # In[27]: a[0] # To access the last element use # In[28]: a[-1] # `a[:]` : returns a copy of the whole array # In[29]: a[:] # `a[start:]` : returns elements from start to the rest of the array # In[30]: a[2:] # `a[:end]` : returns elements from the beginning through end-1 # In[31]: a[:5] # `a[start:end]` : returns elements from start through end-1 # In[32]: a[2:4] # Indexing multi-dimensional arrays is different from list indexing: # In[33]: b = np.array([[1, 2], [3, 4]]) b # In[34]: b[0] # or # In[35]: b[0,:] # Slicing along the second dimension: # In[36]: b[:,1] # To index an individual element, we have to provide the coordinates: # In[37]: b[1,0] # ### Array methods # The `ndarray` data type offers several methods: # In[38]: a = np.array([3, 0, 15, 5]) a # In[39]: a.sort() a # In[40]: a.sum() # In[41]: a.min() # In[42]: a.mean() # In[43]: a.var() # In[44]: a.shape = (2,2) a # In[45]: a.T # ### Array operations # Arithmetic operations apply element-wise: # In[46]: a = np.array([0,1,2,3]) b = np.array([4,5,6,7]) # In[47]: a + b # In[48]: a * b # In[49]: a - 4 # In[50]: a.shape = (2,2) b.shape = (2,2) # In[51]: a # In[52]: b # Element-wise multiplication: # In[53]: a * b # For matrix multiplication, you can either use the `@` operator: # In[54]: a @ b # or the array `dot` method # In[55]: a.dot(b) # or the Numpy `dot` product function: # In[56]: np.dot(a,b) # Numpy math functions are optimized to work directly on arrays: # In[57]: np.exp(a) # In[58]: np.sin(a) # ## Matplotlib # There are many powerful possibilities when it comes to data vizualisation in Python (Matplotlib, Seaborn, Bokeh, ggplot, etc). The one mostly used for non-interactive 2D and 3D plots is Matplotlib. The Matplotlib library is similar to Matlab's plotting interface. # In[59]: import matplotlib.pyplot as plt # We first generate two arrays to plot: # In[60]: x = np.linspace(-2, 2, 100) y = x ** 2 z = x ** 3 # In[61]: plt.figure() plt.plot(x,y,'-b',label = r'$x^2$') plt.plot(x,z,'--g',label = r'$x^3$') plt.legend(loc='best') plt.title('A simple example',fontsize='20') plt.ylabel(r'$f(x)$',fontsize='16') plt.xlabel(r'$x$',fontsize='16') plt.show() #fig.savefig('foo.pdf') # The Jupyter magic command `%matplotlib` enables the choice of the plotting backend. To list all available backends type: # In[62]: get_ipython().run_line_magic('matplotlib', '--list') # The backend `notebook` allows for interactive figures inside the notebook: # In[63]: get_ipython().run_line_magic('matplotlib', 'notebook') # ## SciPy # SciPy is the main stack of scientific libraries for Python. It is built on top of Numpy and includes routines for almost everything you might need in your computational economics and finance work: # * Optimization and root finding # * Probability distributions # * Numerical integration and differentiation # * Function interpolation # * Linear algebra # * ... # # You can find a list of all SciPy subpackages in [this online tutorial](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html). # # In what follows, we will limit ourselves to exploring the use of the numerical optimization subpackage `scipy.optimize` which provides several commonly used root-finding and optimization algorithms. A detailed list is available [here](https://docs.scipy.org/doc/scipy/reference/optimize.html#module-scipy.optimize). # # Note that the common practice when using Scipy is to import the required subpackages indiviually instead of importing all of Scipy. # ### Root finding # The function `root` offers a simple interface to several root finding methods. In order to use it, we first define the objective function, then pass it to `root` along with an initial guess of the roots. # # Let's give it a try with the following simple quadratic equation: # $$ # x^2 - 2 x = 0 # $$ # In[64]: def f(x): """ The objective function """ return x ** 2 - 2 * x # Let's first plot the function to get a better understanding of its behavior: # In[65]: x = np.linspace(-3,3,100) plt.figure(figsize=(9,6)) plt.plot(x,f(x)) plt.show() # From the figure above we can see that the equation has two roots $0$ and $2$. Now let's solve numerically for these roots: # In[66]: from scipy.optimize import root root(f,[-3,4],method='hybr') # As expected, the solver returns 0 and 2 as the roots of the objective function. # # Notice that we had to provide two initial guesses, that is one for each root. Had we provided one initial guess the solver would have returned only one root. # # In order to access the results provided by the solver we can just store them in a variable and then call the appropriate field. For example: # In[67]: sol = root(f,[-3,4],method='hybr') sol.x # In[68]: sol.success # Choosing the right root finding method will depend on the nature of the problem at hand. The list of all the methods accessible through `root` can be found [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.root.html#scipy.optimize.root). # The function `root` can also be used to solve systems of linear and non-linear equations: # $$ # \begin{align} # x^2 + y^2 &= 3 # \\ # x - 3\log{y} &= 1 # \end{align} # $$ # # For that, we have to write the equations passed to the root-finding solver in the form $f(x,y) = 0$: # In[69]: def g(x): return [x[0]**2 + x[1]**2 - 3, x[0] - 3*np.log(x[1]) -1] # In[70]: sol = root(g,[1,1],method='lm') sol.x # ### Optimization # `scipy.optimize` includes a number of the most common optimization algorithms (also called solvers or routines) used to find the extrema of a user-defined objective function. Fortunately, Scipy offers a standardized interface or "wrapper" to use those solvers called `minimize`. The list of algorithms available through `minimize` can be found [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize). # # Notice that most optimization solvers, including the ones in Scipy, allow to only minimize the objective function. However, this is not an issue since maximizing $f$ is equivalent to minimizing $-f$. # #### Unconstrained Optimization # We want to minimize the following function: # $$ # f(x) = 4x^3 + (x -2)^2 + x^4 # $$ # In[71]: def f(x): return 4 * x ** 3 + (x-2) ** 2 + x ** 4 # Again, we plot the function and see how it behaves: # In[72]: x = np.linspace(-4, 2, 100) plt.figure(figsize=(9,6)) plt.plot(x, f(x)) plt.show() # As you can see, the function has two local minima of which one is global. Depending on the solver we choose and the initial guess we provide, the solver may or may not get "stuck" at the local minimum: # In[73]: from scipy.optimize import minimize sol = minimize(f, x0=2, method='Nelder-Mead') sol.x # In[74]: sol = minimize(f, x0=2, method='CG') sol.x # If we don’t know the neighborhood of the global minimum to choose the initial point, we need to resort to costlier # global optimization methods. To find the global minimum, the simplest algorithm is the brute force algorithm `brute`, in which the function is evaluated on each point of a given grid. We can also provide a `finish` optimizer which will take the solution of `brute` as an initial guess and try to “polish” it, i.e. seek a more precise (local) minimum near the gridpoint returned by `brute`: # In[75]: from scipy.optimize import brute r = (-4,2) sol_global = brute(f,ranges=(r,),finish=None) sol_global # In[76]: sol_global = brute(f,ranges=(r,),finish=minimize) sol_global # #### Constrained Optimization # We consider the following maximization problem: # $$ # \max_{x,y} \, \, 3xy - x^3 # $$ # subject to the following constraints: # $$ # \begin{align} # 2x - y &= -5 # \\ # 5x + 2y &\geq 37 # \\ # x &\geq 0 # \\ # y &\geq 0 # \end{align} # $$ # # To solve this problem we use `minimize` with the method SLSQP which accepts both equality and inequality constraints as well as interval bounds on the choice variables: # In[77]: def g(x): return - (3 * x[0] * x[1] - x[0] ** 3) # The constraints are defined as a sequence of dictionaries, with keys `type`, `fun` and `jac`. The constraints can be passed as normal or `lambda` functions. # # Bounds for the choice variables $x$ (only for L-BFGS-B, TNC and SLSQP) have to be defined as (min, max) pairs for each element in $x$. You can use `None` or `np.inf` for one of min or max when there is no bound in that direction. # # Method-specific options such as the maximum number of iterations or convergence criteria can be supplied through the `options` dict. # In[78]: from scipy.optimize import minimize cons = ({'type': 'eq', 'fun': lambda x: 2*x[0] - x[1] + 5}, {'type': 'ineq', 'fun': lambda x: 5*x[0] + 2*x[1] -37}) bnd = [(0, np.inf),(0,np.inf)] sol = minimize(g, x0=[0, 0], method='SLSQP', constraints=cons, bounds=bnd, options={'disp': True,'ftol': 1e-10}) sol.x # Note that when writing the objective function you must take into account that only its first argument is considered a choice variable. The remaining arguments are considered parameters which must be declared as such to the optimization routine. We will see this in the next tutorial.