#!/usr/bin/env python
# coding: utf-8

# In this blog post, I want to share something that I recently discovered while using Scipy’s optimizing library and its general optimizers. I’ll try to recreate a context in which the same problem occurs.
# 
# Let’s say we want to find the miminum of a function of two parameters.

# In[33]:


from turtle import color
import numpy as np
import matplotlib.pyplot as plt


def f(x):
    return np.exp(-1/(.01*x[0]**2 + x[1]**2))

X, Y = np.meshgrid(np.linspace(-1, 1, 100), np.linspace(-2, 2, 200))


plt.pcolormesh(X, Y, f([X, Y]), shading="auto")
plt.colorbar()
plt.contour(X, Y, f([X, Y]), 20, colors="w")
plt.xlabel("x")
plt.ylabel("y")


# What makes this founction special? It is very insensitive to its first parameter, $x$, while being quite sensitive to its second parameter $y$.
# 
# Where does its minimum lie? At (0, 0). However, judging by its graph, it’s not so easy to be sure of it. Let’s plot the same thing in log scale to see if it’s easier to spot.
# 

# In[37]:


plt.pcolormesh(X, Y, np.log10(f([X, Y])), shading="auto")
plt.colorbar(label="log-scale")
plt.contour(X, Y, np.log10(f([X, Y])), 3, colors="w")
plt.xlabel("x")
plt.ylabel("y")


# In the log-scale, it appears that the minimum is indeed quite flat along the x-direction.
# 
# Now imagine that we need to find the minimum of this function, using gradient information. How does the standard BFGS opitimizer do?

# In[53]:


from scipy import optimize


result = optimize.minimize(f, [1, 1], method="BFGS", tol=1e-10, options={'eps': 1e-5})

result


# Interestingly, the optimizer does not work as we would expect. The reason is clear: the gradient becomes very small and below the tolerance set by the options so the procedure stops.
# 
# However, we are still quite far from the true optimum, and more so in the first dimension. So in this case, we cannot find the true solution.

# One way of getting out of this situation is to rescale the variables. Let’s define a second function with rescaled values.

# In[68]:


def g(x):
    return f([100*x[0], 10*x[1]])

result = optimize.minimize(g, [.01, .1], method="BFGS", tol=1e-10, options={'eps': 1e-5})

result


# What did just happen? The rescaled version of the same function leads to a much better solution! Is this surprising? Let’s see a plot of the function `g`.

# In[65]:


plt.pcolormesh(X, Y, g([X, Y]), shading="auto")
plt.colorbar()
plt.contour(X, Y, g([X, Y]), 40, colors="w")
plt.xlabel("x")
plt.ylabel("y")


#