Notebook

In this notebook you will learn about the redispatching capabilities offered by grid2op.¶

Objectives

In the previous notebooks, we presented actions in a discrete action space. However, more actions are available in Grid2Op. Redispatching is a kind of continuous action that will be described here.

This notebook will:

present what redispatching is
show how it can be used in grid2op
detail the redispatching actions
show an example of a redispatching Agent.

In [1]:

import os
import sys
import grid2op
from grid2op.Agent import DoNothingAgent, BaseAgent
from tqdm.notebook import tqdm
import numpy as np
max_iter = 100  # to make computation much faster we will only consider 100 time steps
import pdb
import matplotlib.pyplot as plt

In [2]:

res = None
try:
    from jyquickhelper import add_notebook_menu
    res = add_notebook_menu()
except ModuleNotFoundError:
    print("Impossible to automatically add a menu / table of content to this notebook.\nYou can download \"jyquickhelper\" package with: \n\"pip install jyquickhelper\"")
res

Impossible to automatically add a menu / table of content to this notebook.
You can download "jyquickhelper" package with: 
"pip install jyquickhelper"

How to implement redispatching actions¶

Having a suitable environment¶

By default, some environments do not specify the cost of generators, their maximum and minimum production values, etc. In this case it is not possible to redispatching in grid2op.

To know more about what is needed for using redispatching, it is advised to look at this help online : https://grid2op.readthedocs.io/en/latest/space.html#grid2op.Space.GridObjects.redispatching_unit_commitment_availble for the most recent documentation. When this notebook was created, what was needed was:

"gen_type": the type of generator
"gen_pmin": the minimum value a generator can produce
"gen_pmax" : the maximum value a generator can produce
"gen_redispatchable": whether this generator can be dispatched
"gen_max_ramp_up": the maximum increase of power a generator can have between two consecutive time steps
"gen_max_ramp_down": the maximum decrease of power a generator can have between two consecutive time steps
"gen_min_uptime": the minimum time a generator needs to be turned on (it's impossible to disconnect it if it's not connected for a least "gen_min_uptime" consecutive time step)
"gen_min_downtime": same as above, but for downtime
"gen_cost_per_MW": the generation cost. For each MW of electricity produced in a time step, this is the amount paid
"gen_startup_cost": the cost to start a generator
"gen_shutdown_cost": the cost to shutdown a generator

We made available a dedicated environment, based on the IEEE case14 powergrid that has all this features. It is advised to use this small environment for testing and getting familiar with redispatching.

This environment includes 5 generators, as the original case14 system. It has one solar generator and one wind generator (that cannot be dispatched), one nuclear powerplant (dispatchable) and 2 thermal generators (also dispatchable). Thus, redispatching is a problem of continuous control with 3 degress of freedom here.

In [3]:

env = grid2op.make("rte_case14_redisp", test=True)
print("Is this environment suitable for redispatching: {}".format(env.redispatching_unit_commitment_availble))

/home/benjamin/Documents/grid2op_dev/getting_started/grid2op/MakeEnv/Make.py:240: UserWarning: You are using a development environment. This environment is not intended for training agents.
  warnings.warn(_MAKE_DEV_ENV_WARN)

Is this environment suitable for redispatching: True

As we can see, this environment is suitable for redispatching. It means that all quantities described above are set (and visible).

In the L2RPN 2019 challenges, we rewarded participants based on the use of the powerlines. In next challenges, or for other uses of this platform where redispatching should be considered, it's better to consider the economic cost of the sytem as an evaluation metric. However, usually a cost should be minimized, while a reward should be maximized. To take this into account, a simple reward named "EconomicReward" has been created. It has the following properties:

it returns -1 if there has been a game over
otherwise (there has been no game over, no errors, etc) it is always strictly positive
maximizing this reward is equivalent to minimizing the cost

Note that this reward doesn't take into account the cost to perform a redispatching action. This reward can be used to build what is called an "economic dispatch", a problem that is particularly interesting for electricity producers but of lower interest for Transmission System Operators (as opposed to the topology).

Compared to standard "economic dispatch" problems, for now storages are not implemented (coming soon) and we don't fully take into account the startup cost, shutdown cost, as well as minimum downtime and minimum uptime (even though all of these features are implemented). Also, note that redispatching is implemented in differences, meaning that you first need to provide an economic dispatch, and then think in terms of variations from that point onward. This is the usage that will be explained in this notebook. For real unit commitment / economic dispatch problems, the key words "injections" / "prod_p" for the action would probably be more suited.

Redispatching implementation¶

Unlike topological actions, that are always feasible (this is an assumption that is made in this package), redispatching actions are limited by physical constraints on the generators. For example:

it is not possible for a generator to produce more (resp. less) than pmax (resp. pmin). Unlike the curent flow on the powerline, this is a strict physical constraint.
it is not possible, for the same physical limitations, to increase (or decrease) too much the value between two consecutive time steps.
redispatching actions stack with one another. It means that if you ask to increase the production of the generator 1 by $10MW$ at time step t, and by $20MW$ at time step $t+1$, it means that the set point at time step $t+1$ will be $+30MW$ higher than if no redispatching had been made (it would have been only $+10MW$ higher if the second redispatching action had not been performed).

That being said, a lot of thing can happen, that make redispatching a bit less trivial than topology:

When you do a redispatching action, you don't know what the time series of the environment look like. For example, say the pmax of generator 1 is 100. The setpoint of this generator at time t is $60MW$, and you want to increase its value by another $40MW$. This action is legal: $60+40 \leq pmax (=100) $. So at time $t$ everything is fine. Now let's suppose that the environment also moved the setpoint of this same generator from $60$ to be at $70MW$ at time $t+1$. With the redispatching action, this would mean that the setpoint would be $70+40 = 110 > pmax$. This is not possible. In this case, the redispatching action will be capped : instead of the desired redispatching of $+40$, only a smaller redispatching of $+30$ will be implemented on the powergrid.
Another problem can arise with the fundamental principle of power grid: power energy cannot be stored in a grid. At each time we then have $\sum \text{Prod} = \sum \text{Load} + \text{Losses}$. In this competition, the data is generated such that this condition holds (approximately) for all time steps. This means that if you ask for a redispatching of +xxx MW on a given generator, then the other ones must compensate and "absorb" xxx MW such that it sums to 0 overall.

Out of simplicity for the participants, there are some "automatons" that automatically transform an proposed redispatching action into a valid redispatching action. These automatons basically ensure that the two above-mentionned conditions hold. It explains the differences between "target_redispatching" which is the desired setpoint enter by the agent, and the "actual_redispatching" which is the one that has actually been implemented on the powergrid after it has been corrected by the automatons.

More cases for ambiguous actions¶

In [4]:

env.gen_redispatchable

Out[4]:

array([ True,  True, False, False,  True])

The above vector tells which generator is dispatchable and which is not. Any attempt to dispatch a generator that is not dispatchable leads to an ambiguous action.

In [5]:

act = env.action_space({"redispatch": [(2,+10)]})
act.is_ambiguous()

Out[5]:

(True,
 Grid2OpException AmbiguousAction InvalidRedispatching InvalidRedispatching('Trying to apply a redispatching action on a non redispatchable generator'))

As we see, this action is ambiguous because we are Trying to apply a redispatching action on a non redispatchable generator. Indeed, as shown above, the generator 2 is not dispatchable.

Generators also have physical constraints. You cannot ask them to change the active production value too fast, this would damage the generator, and breaking a nuclear plant is often a terrible idea. In grid2op it is implemented as an ambiguous action. Trying to go beyond this value will result in an ambiguous action.

This value is called the "ramp" and it's available through the "max_ramp_up" attribute. On the next cell, you can see that the ramp is $5MW$ for the first generator and $10MW$ for the second one and the last one. For the other 2 it's irrelevant because they are not dispatchable.

In [6]:

env.gen_max_ramp_up

Out[6]:

array([ 5., 10.,  0.,  0., 10.], dtype=float32)

Any attempt to go beyond this value will raise an ambiguous error :

In [7]:

act = env.action_space({"redispatch": [(0,+10)]})
act.is_ambiguous()

Out[7]:

(True,
 Grid2OpException AmbiguousAction InvalidRedispatching InvalidRedispatching('Some redispatching amount are above the maximum ramp up'))

In the previous action, we asked the generator 0 to increase its setpoint by $10MW$. However, its maximum ramp up is only $5MW$. Thus, this action is ambiguous.

Of course, there are some perfectly valid redispatching actions:

In [8]:

act = env.action_space({"redispatch": [(1,+10)]})
act.is_ambiguous()

Out[8]:

(False, None)

The (desired) setpoint is not the implementation¶

As said in the preamble of this section, the target dispatching, what we want to achieve (the target), is not equal to the implemented dispatching. So that we can see the operations that are actually being performed, both of these values are present in the observation, as shown in the cell below.

In [9]:

observed = []
# perform a valid redispatching action
env.set_id(0)  # make sure to use the same environment input data.
obs_init = env.reset()  # reset the environment
act = env.action_space()
act = env.action_space({"redispatch": [(0,+5)]})
# act = env.action_space({"redispatch": [(0,0)]})
obs, reward, done, info = env.step(act)
print("actual dispatch at time step 0: {}".format(obs.actual_dispatch))
observed.append(obs)

actual dispatch at time step 0: [ 5.  -2.5  0.   0.  -2.5]

The target dispatch is exactly what we wanted, eg the generator 0 has increased its production by $5MW$. To compensate for this increase, both generator 1 and 4 have seen their setpoint diminish by 2.5MW.

Let's draw the generators' base productions in this scenario, and what is implemented with the current redispatching :

In [10]:

# Create a matplot figure
redisp_fig = plt.figure()
ax = redisp_fig.add_axes([0,0,1,1])
# X axis is the generators
x_gens = np.arange(obs.n_gen)
# Y axis is the production in MW
y_scenario_p = obs.prod_p - obs.actual_dispatch
y_redisp_p = obs.prod_p
# Blue bars for scenario productions
ax.bar(x_gens - 0.2, y_scenario_p, color = 'b', width = 0.4)
# Red bars for production with redispatch
ax.bar(x_gens + 0.2, y_redisp_p, color = 'r', width = 0.4)
# Set some legend to describe what's above
ax.set_ylabel('MW')
ax.set_xlabel('Generator ID')
ax.legend(labels=['Scenario', 'Redispatched'])

Out[10]:

<matplotlib.legend.Legend at 0x7f1d9efa4b20>

In the following cell, we won't be performing any redispatching action. We will simply do nothing. This example is here to illustrate that, until the original redispatching action is removed (ie until the opposite command is sent), grid2op will continue to apply the previous redispatching configuration over time.

In [11]:

donothing = env.action_space()
obs1, reward, done, info = env.step(donothing)
print("actual dispatch at time step 1: {}".format(obs1.actual_dispatch))
observed.append(obs1)
obs2, reward, done, info = env.step(donothing)
print("actual dispatch at time step 2: {}".format(obs2.actual_dispatch))
observed.append(obs2)

actual dispatch at time step 1: [ 5.  -2.5  0.   0.  -2.5]
actual dispatch at time step 2: [ 5.  -2.5  0.   0.  -2.5]

Here, the original redispatching action was to increase the setpoint of generator 0 by $5MW$. If we want to remove this, we need to decrease its setpoint by $5MW$ :

In [12]:

act = env.action_space({"redispatch": [(0,-5)]})
obs3, reward, done, info = env.step(act)
print("actual dispatch at time step 3: {}".format(obs3.actual_dispatch))
observed.append(obs3)

actual dispatch at time step 3: [ 1.310003  -0.6550015  0.         0.        -0.6550015]

As we see in the cell above, there are still residuals on the dispatch. This is because of the physical limit of the ramp of the generator 0. We wanted it to return to its original setpoint value, but at the same time step, the environment also modified the setpoint of this generator by -1.3MW. Therefore, the total desired decrease for the setpoint of generator 0 was $5+1.3 = 6.3 > maxrampdown$ and the action could not be immediately and completely performed. Grid2op capped the redispatch occuring at this timestep to $maxrampdown$.

That is why we can see a small part of the dispatch left. If we wait for another timestep and do nothing, the generator will likely be in order.

In [13]:

obs4, reward, done, info = env.step(donothing)
print("actual dispatch at time step 4: {}".format(obs4.actual_dispatch))
observed.append(obs4)
obs5, reward, done, info = env.step(donothing)
print("actual dispatch at time step 5: {}".format(obs5.actual_dispatch))
observed.append(obs5)

actual dispatch at time step 4: [ 0.000000e+00  1.110223e-16  0.000000e+00  0.000000e+00 -1.110223e-16]
actual dispatch at time step 5: [ 0.000000e+00  1.110223e-16  0.000000e+00  0.000000e+00 -1.110223e-16]

Now everything is set up as it should be. The system is back to its original state. Let's see what happens if we ask to increase again the value of this generator 0.

In [14]:

act = env.action_space({"redispatch": [(0,+5)]})
# act = env.action_space({"redispatch": [(0,0)]})
obs6, reward, done, info = env.step(act)
print("actual dispatch at time step 6: {}".format(obs6.actual_dispatch))
observed.append(obs6)
obs7, reward, done, info = env.step(donothing)
print("actual dispatch at time step 7: {}".format(obs7.actual_dispatch))
observed.append(obs7)
obs8, reward, done, info = env.step(donothing)
print("actual dispatch at time step 8: {}".format(obs8.actual_dispatch))
observed.append(obs8)

actual dispatch at time step 6: [ 4.6900043 -2.3450022  0.         0.        -2.3450022]
actual dispatch at time step 7: [ 5.  -2.5  0.   0.  -2.5]
actual dispatch at time step 8: [ 5.  -2.5  0.   0.  -2.5]

This time we see that the full (valid) redispatching action is not applied completely. This is due to the same phenomenon that occurred previously. The environment increased the value of this generator, and at the same time, we also asked to increase it by its "max ramp up" value. Consequently, our action was "capped" and only 4.7MW (out of 5) were indeed added to the generator's production. At the next time step, the action would be fully implemented.

To conclude on redispatching, we saw that there is a difference between the value we ask for, and the value that is implemented by the environment. This is mainly because:

the implemented vector should sum to 0.
if a redispatching is close to the maximum value it can take (due to ramping limitation or hard limitations) and if at the same time the environment itself "wants" to increase this value, physical limitations have to be respected.

Redispatching actions also last in time. One action must be explicitely canceled to be reset to 0. This cancellation, because of the limitations mentionned above, can take a few time steps to be fully effective.

To summarize, we can look at the productions over time for generators 0 & 4:

In [15]:

# Create a matplot figure
redisp_fig = plt.figure(figsize=(16, 9))
# X axis is the timesteps
x_gens = np.arange(len(observed), dtype=np.int32)
# Y axis is the production in MW
y_scenario_p_0 = [o.prod_p[0] - o.actual_dispatch[0] for o in observed] 
y_redisp_p_0 = [o.prod_p[0] for o in observed]
y_scenario_p_4 = [o.prod_p[4] - o.actual_dispatch[4] for o in observed] 
y_redisp_p_4 = [o.prod_p[4] for o in observed]
# Blue lines for gen 0
plt.plot(x_gens, y_scenario_p_0, 'b', x_gens, y_redisp_p_0, 'b--')
# Red lines for gen 4
plt.plot(x_gens, y_scenario_p_4, 'r', x_gens, y_redisp_p_4, 'r--')
# Set some legend to describe what's above
plt.ylabel('MW')
plt.xlabel('Timesteps')
plt.legend(labels=['Scenario gen 0', 'Redispatched gen 0', 'Scenario gen 4', 'Redispatched gen 4'])

Out[15]:

<matplotlib.legend.Legend at 0x7f1d9f273be0>

Example of use: economic dispatch problem¶

In [16]:

agent = DoNothingAgent(env.action_space)
done = False
reward = env.reward_range[0]

env.set_id(0)  # make sure to evaluate the models on the same experiments
obs = env.reset()
cum_reward = 0
nrow = env.chronics_handler.max_timestep() if max_iter <= 0 else max_iter
gen_p = np.zeros((nrow, env.n_gen))
gen_p_setpoint = np.zeros((nrow, env.n_gen))
load_p = np.zeros((nrow, env.n_load))
rho = np.zeros((nrow, env.n_line))
i = 0
with tqdm(total=max_iter, desc="step") as pbar:
    while not done:
        act = agent.act(obs, reward, done)
        obs, reward, done, info = env.step(act)
        data_generator = env.chronics_handler.real_data.data
        gen_p_setpoint[i,:] = data_generator.prod_p[data_generator.current_index, :]
        gen_p[i,:] = obs.prod_p
        load_p[i,:] = obs.load_p
        rho[i,:] = obs.rho
        cum_reward += reward
        i += 1
        pbar.update(1)
        if i >= max_iter:
            break
print("The cumulative reward with this agent is {:.0f}".format(cum_reward))

HBox(children=(FloatProgress(value=0.0, description='step', style=ProgressStyle(description_width='initial')),…

The cumulative reward with this agent is 121369

Now let's try to redispatch as much production as possible to the cheapest generator, and leave the rest unchanged.

In [17]:

class GreedyEconomic(BaseAgent):
    def __init__(self, action_space):
        super().__init__(action_space)
        self.do_nothing = action_space()
        
    def act(self, obs, reward, done):
        act = self.do_nothing
        if obs.prod_p[0] < obs.gen_pmax[0] - 1 and \
        obs.target_dispatch[0] < (obs.gen_pmax[0] - obs.gen_max_ramp_up[0]) - 1 and\
        obs.prod_p[0] > 0.:
            # if the cheapest generator is significantly bellow its maximum cost
            if obs.target_dispatch[0] < obs.gen_pmax[0]:
                #in theory i can still ask for more
                act = env.action_space({"redispatch": [(0, obs.gen_max_ramp_up[0])]})
        return act
    
agent = GreedyEconomic(env.action_space)
done = False
reward = env.reward_range[0]

env.set_id(0) # reset the env to the same id
obs = env.reset()
cum_reward = 0
nrow = env.chronics_handler.max_timestep() if max_iter <= 0 else max_iter
gen_p = np.zeros((nrow, env.n_gen))
gen_p_setpoint = np.zeros((nrow, env.n_gen))
load_p = np.zeros((nrow, env.n_load))
rho = np.zeros((nrow, env.n_line))
i = 0
with tqdm(total=max_iter, desc="step") as pbar:
    while not done:
        act = agent.act(obs, reward, done)
        obs, reward, done, info = env.step(act)
#         print("act: {}".format(act))
#         print("info: {}".format(info['exception']))
#         if info['exception'] is not None:
        if np.abs(np.sum(obs.actual_dispatch)) > 1e-2:
            pdb.set_trace()
        data_generator = env.chronics_handler.real_data.data
        gen_p_setpoint[i,:] = data_generator.prod_p[data_generator.current_index, :]
        gen_p[i,:] = obs.prod_p
        load_p[i,:] = obs.load_p
        rho[i,:] = obs.rho
        cum_reward += reward
        i += 1
        pbar.update(1)
        if i >= max_iter:
            break
print("The cumulative reward with this agent is {:.0f}".format(cum_reward))

HBox(children=(FloatProgress(value=0.0, description='step', style=ProgressStyle(description_width='initial')),…

The cumulative reward with this agent is 97832

As we can see, that second agent performed worse than the DoNothing agent, demonstrating that having the cheapest generators produce the most energy is not always the best solution.