It is recommended to have a look at the 0_basic_functionalities notebook before getting into this one.
Objective
This notebook will cover the basics of how to "code" an Agent that takes action on the powergrid. Examples will be given of "expert agent" that can take actions based on some fixed rules. More generic type of Agent, relying for example on machine learning / deep learning will be covered in the notebook 3_TrainingAnAgent.
This notebook will also cover the description of the Observation class, usefull to take some actions.
import os
import sys
import grid2op
res = None
try:
from jyquickhelper import add_notebook_menu
res = add_notebook_menu()
except ModuleNotFoundError:
print("Impossible to automatically add a menu / table of content to this notebook.\nYou can download \"jyquickhelper\" package with: \n\"pip install jyquickhelper\"")
res
Impossible to automatically add a menu / table of content to this notebook. You can download "jyquickhelper" package with: "pip install jyquickhelper"
In this paragraph we will cover the observation class. For more information about it, we recommend to have a look at the official documentation, or here or in the Observations.py files for more information. Only basic concepts are detailed in this notebook.
An observationn of the environment at current time step can be accessed when calling env.step()
. The next cell is dedicated to create an environment, and to get an observation instance. We use the default case14_fromfile
from Grid2Op framework.
env = grid2op.make() #"case14_fromfile")
/home/benjamin/Documents/grid2op_test/getting_started/grid2op/MakeEnv.py:683: UserWarning: Your are using only 2 chronics for this environment. More can be download by running, from a command line: python -m grid2op.download --name "case14_realistic" --path_save PATH\WHERE\YOU\WANT\TO\DOWNLOAD\DATA
To perform a step, as stated on the short description above, we need an action. More information about them is given in the 2_ActionRepresentation notebook. Here we use a DoNothingAgent that does no thing. obs is the observation of the environment.
do_nothing_act = env.helper_action_player({})
obs, reward, done, info = env.step(do_nothing_act)
In this notebook we will detail only the "CompleteObservation". Grid2Op
allows to modeled different kind of observations, for example some with incomplete data, or with noisy data etc. CompleteObservation
gives the full state of the powergrid, without any noise. It's the default observation used.
An observation has calendar data (eg the time stamp of the observation):
obs.year, obs.month, obs.day, obs.hour_of_day, obs.minute_of_hour, obs.day_of_week
(2019, 1, 1, 0, 10, 1)
It has some powegrid generic information:
print("Number of generators of the powergrid: {}".format(obs.n_gen))
print("Number of loads of the powergrid: {}".format(obs.n_load))
print("Number of powerline of the powergrid: {}".format(obs.n_line))
print("Number of elements connected to each substations in the powergrid: {}".format(obs.sub_info))
print("Total number of elements: {}".format(obs.dim_topo))
Number of generators of the powergrid: 5 Number of loads of the powergrid: 11 Number of powerline of the powergrid: 20 Number of elements connected to each substations in the powergrid: [3 6 4 6 5 6 3 2 5 3 3 3 4 3] Total number of elements: 56
It has some information about the generators (each generator can be viewed as a point in a 3 dimensional space)
print("Generators active production: {}".format(obs.prod_p))
print("Generators reactive production: {}".format(obs.prod_q))
print("Generators voltage setpoint : {}".format(obs.prod_v))
Generators active production: [81.6 81.1 12.9 0. 77.7200955] Generators reactive production: [ 21.79066889 70.21426433 48.05804585 24.508776 -16.54165458] Generators voltage setpoint : [142.1 142.1 22. 13.2 142.1]
It has some information about the loads (each load is a point in a 3 dimensional space too)
print("Loads active consumption: {}".format(obs.load_p))
print("Loads reactive consumption: {}".format(obs.prod_q))
print("Loads voltage (voltage magnitude of the bus to which it is connected) : {}".format(obs.load_v))
Loads active consumption: [25.4 84.8 45. 6.8 12.7 28.8 9.5 3.4 5.6 11.9 15.4] Loads reactive consumption: [ 21.79066889 70.21426433 48.05804585 24.508776 -16.54165458] Loads voltage (voltage magnitude of the bus to which it is connected) : [142.1 142.1 138.70157283 139.39478126 22. 21.09213719 21.08565879 21.45353383 21.56920312 21.43075597 20.69996034]
In this setting a powerline can be viewed as a point in an 8 dimensional space:
from both of its end.
For example, suppose line1 is denoted by connecting two node A and B. Active flow on line1 has two values, flow from node A to node B (origin) and flow from node B to node A (extremity).
It is then:
print("Origin active flow: {}".format(obs.p_or))
print("Origin reactive flow: {}".format(obs.q_or))
print("Origin current flow: {}".format(obs.a_or))
print("Origin voltage (voltage magnitude to the bus to which the origin end is connected): {}".format(obs.v_or))
print("Extremity active flow: {}".format(obs.p_ex))
print("Extremity reactive flow: {}".format(obs.q_ex))
print("Extremity current flow: {}".format(obs.a_ex))
print("Extremity voltage (voltage magnitude to the bus to which the origin end is connected): {}".format(obs.v_ex))
Origin active flow: [ 3.99223950e+01 3.77977005e+01 2.17807668e+01 4.01616963e+01 3.38599023e+01 1.78608153e+01 -2.80522122e+01 9.77398067e+00 7.72872347e+00 1.80516148e+01 3.35936909e+00 7.78590005e+00 -6.14406783e+00 2.03646124e+00 7.87604573e+00 2.54371835e+01 1.45080856e+01 3.53543190e+01 -1.28785871e-14 -2.54371835e+01] Origin reactive flow: [-15.33405699 -1.20759759 -7.00249457 0.66352597 -0.38311737 7.32923584 -2.94368951 10.46283538 5.57631797 14.9276252 -0.85521377 4.0310593 -7.4643436 1.48429454 7.41027529 -15.62563657 -2.70329755 -5.64124747 -23.63431484 -5.57332913] Origin current flow: [173.75765493 153.64987498 92.95599257 163.19866666 137.58110509 78.44050551 117.40948087 375.74684345 250.10808762 614.72674809 94.88822467 239.99175304 264.71528254 67.45319927 291.33413073 124.26482512 61.42982996 148.28416711 918.84080639 712.80312847] Origin voltage (voltage magnitude to the bus to which the origin end is connected): [142.1 142.1 142.1 142.1 142.1 142.1 138.70157283 22. 22. 22. 21.09213719 21.09213719 21.08565879 21.56920312 21.43075597 138.70157283 138.70157283 139.39478126 14.85053552 21.09213719] Extremity active flow: [-3.96023654e+01 -3.70686934e+01 -2.15608153e+01 -3.92743782e+01 -3.32429776e+01 -1.76186787e+01 2.81573520e+01 -9.61306287e+00 -7.63646124e+00 -1.77516465e+01 -3.35593217e+00 -7.69804766e+00 6.21306287e+00 -2.02439918e+00 -7.70195234e+00 -2.54371835e+01 -1.45080856e+01 -3.53543190e+01 1.28785871e-14 2.54371835e+01] Extremity reactive flow: [ 10.71275486 -0.90132842 3.2850285 -1.4910293 -1.33275675 -8.03634708 3.27533264 -10.12585338 -5.38429454 -14.33689404 0.8643436 -3.84418549 7.62585338 -1.47338125 -7.05581451 17.39024818 3.82920055 8.39126729 24.508776 6.24406666] Extremity current flow: [ 166.6869517 153.5778135 88.61223426 163.59877739 137.7975573 80.60723679 117.40948087 375.74684345 250.10808762 614.72674809 94.88822467 239.99175304 264.71528254 67.45319927 291.33413073 1197.94841837 410.7259861 953.58582187 1071.98094079 1018.29018352] Extremity voltage (voltage magnitude to the bus to which the origin end is connected): [142.1 139.39478126 142.1 138.70157283 139.39478126 138.70157283 139.39478126 21.45353383 21.56920312 21.43075597 21.08565879 20.69996034 21.45353383 21.43075597 20.69996034 14.85053552 21.09213719 22. 13.2 14.85053552]
The last informations about the powerlines is the $\rho$ ratio, ie. the ratio between the current flow on each powerlines and the its thermal limits. It can be accessed with:
obs.rho
array([0.45143563, 0.39919409, 0.24462103, 0.42947018, 0.87631277, 0.20642238, 0.30897232, 0.34864962, 0.54149989, 0.79855347, 0.3521812 , 0.62351687, 0.34830958, 0.17750842, 0.38333438, 0.32284949, 0.26599897, 0.86817705, 0.27006916, 0.20950979])
Observation (obs) of the environment also store information of the topology and the state of the powerline.
obs.timestep_overflow # the number of timestep each of the powerline is in overflow (1 powerline per component)
obs.line_status # the status of each powerline: True connected, False disconnected
obs.topo_vect # the topology vector the each element (generator, load, each end of a powerline) to which the object
# is connected: 1 = bus 1, 2 = bus 2.
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
In grid2op all objects (end of a powerline, load or generator) can be either disconnected, connected to the first bus at its substation, or connected to the second bus of its substation.
If an object is disconnected, then the component of the topo_vect
vector corresponding to this object will be -1
. If it's connected to the first bus it will be 1
and 2
if it's connected to the second bus.
More information about this topology vector
is given in the documentation here.
More information about this topology vector will be given in the notebook dedicated to vizualisation.
It can be converted to / from flat numpy vector. This function is usefull for interacting with machine learning or to store it, but probably less human readable. It consists in stacking all the above-mentionned information in a single numpy.float64
vector.
vector_representation_of_observation = obs.to_vect()
vector_representation_of_observation
array([ 2.01900000e+03, 1.00000000e+00, 1.00000000e+00, 0.00000000e+00, 1.00000000e+01, 1.00000000e+00, 8.16000000e+01, 8.11000000e+01, 1.29000000e+01, 0.00000000e+00, 7.77200955e+01, 2.17906689e+01, 7.02142643e+01, 4.80580458e+01, 2.45087760e+01, -1.65416546e+01, 1.42100000e+02, 1.42100000e+02, 2.20000000e+01, 1.32000000e+01, 1.42100000e+02, 2.54000000e+01, 8.48000000e+01, 4.50000000e+01, 6.80000000e+00, 1.27000000e+01, 2.88000000e+01, 9.50000000e+00, 3.40000000e+00, 5.60000000e+00, 1.19000000e+01, 1.54000000e+01, 1.78000000e+01, 5.96000000e+01, 3.08000000e+01, 4.60000000e+00, 8.70000000e+00, 1.97000000e+01, 6.60000000e+00, 2.50000000e+00, 3.90000000e+00, 8.40000000e+00, 1.09000000e+01, 1.42100000e+02, 1.42100000e+02, 1.38701573e+02, 1.39394781e+02, 2.20000000e+01, 2.10921372e+01, 2.10856588e+01, 2.14535338e+01, 2.15692031e+01, 2.14307560e+01, 2.06999603e+01, 3.99223950e+01, 3.77977005e+01, 2.17807668e+01, 4.01616963e+01, 3.38599023e+01, 1.78608153e+01, -2.80522122e+01, 9.77398067e+00, 7.72872347e+00, 1.80516148e+01, 3.35936909e+00, 7.78590005e+00, -6.14406783e+00, 2.03646124e+00, 7.87604573e+00, 2.54371835e+01, 1.45080856e+01, 3.53543190e+01, -1.28785871e-14, -2.54371835e+01, -1.53340570e+01, -1.20759759e+00, -7.00249457e+00, 6.63525968e-01, -3.83117366e-01, 7.32923584e+00, -2.94368951e+00, 1.04628354e+01, 5.57631797e+00, 1.49276252e+01, -8.55213773e-01, 4.03105930e+00, -7.46434360e+00, 1.48429454e+00, 7.41027529e+00, -1.56256366e+01, -2.70329755e+00, -5.64124747e+00, -2.36343148e+01, -5.57332913e+00, 1.42100000e+02, 1.42100000e+02, 1.42100000e+02, 1.42100000e+02, 1.42100000e+02, 1.42100000e+02, 1.38701573e+02, 2.20000000e+01, 2.20000000e+01, 2.20000000e+01, 2.10921372e+01, 2.10921372e+01, 2.10856588e+01, 2.15692031e+01, 2.14307560e+01, 1.38701573e+02, 1.38701573e+02, 1.39394781e+02, 1.48505355e+01, 2.10921372e+01, 1.73757655e+02, 1.53649875e+02, 9.29559926e+01, 1.63198667e+02, 1.37581105e+02, 7.84405055e+01, 1.17409481e+02, 3.75746843e+02, 2.50108088e+02, 6.14726748e+02, 9.48882247e+01, 2.39991753e+02, 2.64715283e+02, 6.74531993e+01, 2.91334131e+02, 1.24264825e+02, 6.14298300e+01, 1.48284167e+02, 9.18840806e+02, 7.12803128e+02, -3.96023654e+01, -3.70686934e+01, -2.15608153e+01, -3.92743782e+01, -3.32429776e+01, -1.76186787e+01, 2.81573520e+01, -9.61306287e+00, -7.63646124e+00, -1.77516465e+01, -3.35593217e+00, -7.69804766e+00, 6.21306287e+00, -2.02439918e+00, -7.70195234e+00, -2.54371835e+01, -1.45080856e+01, -3.53543190e+01, 1.28785871e-14, 2.54371835e+01, 1.07127549e+01, -9.01328416e-01, 3.28502850e+00, -1.49102930e+00, -1.33275675e+00, -8.03634708e+00, 3.27533264e+00, -1.01258534e+01, -5.38429454e+00, -1.43368940e+01, 8.64343601e-01, -3.84418549e+00, 7.62585338e+00, -1.47338125e+00, -7.05581451e+00, 1.73902482e+01, 3.82920055e+00, 8.39126729e+00, 2.45087760e+01, 6.24406666e+00, 1.42100000e+02, 1.39394781e+02, 1.42100000e+02, 1.38701573e+02, 1.39394781e+02, 1.38701573e+02, 1.39394781e+02, 2.14535338e+01, 2.15692031e+01, 2.14307560e+01, 2.10856588e+01, 2.06999603e+01, 2.14535338e+01, 2.14307560e+01, 2.06999603e+01, 1.48505355e+01, 2.10921372e+01, 2.20000000e+01, 1.32000000e+01, 1.48505355e+01, 1.66686952e+02, 1.53577813e+02, 8.86122343e+01, 1.63598777e+02, 1.37797557e+02, 8.06072368e+01, 1.17409481e+02, 3.75746843e+02, 2.50108088e+02, 6.14726748e+02, 9.48882247e+01, 2.39991753e+02, 2.64715283e+02, 6.74531993e+01, 2.91334131e+02, 1.19794842e+03, 4.10725986e+02, 9.53585822e+02, 1.07198094e+03, 1.01829018e+03, 4.51435630e-01, 3.99194086e-01, 2.44621033e-01, 4.29470175e-01, 8.76312771e-01, 2.06422383e-01, 3.08972318e-01, 3.48649620e-01, 5.41499895e-01, 7.98553469e-01, 3.52181199e-01, 6.23516865e-01, 3.48309582e-01, 1.77508419e-01, 3.83334383e-01, 3.22849486e-01, 2.65998967e-01, 8.68177053e-01, 2.70069157e-01, 2.09509785e-01, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, -1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
An observation can be copied, of course:
obs2 = obs.copy()
Or reset:
obs2.reset()
print(obs2.prod_p)
[nan nan nan nan nan]
Or loaded from a vector:
obs2.from_vect(vector_representation_of_observation)
obs2.prod_p
array([81.6 , 81.1 , 12.9 , 0. , 77.7200955])
It is also possible to assess whether two observations are equals or not:
obs == obs2
True
For this type of observation, it is also possible to retrieve the topology as a matrix. The topology matrix can be obtained in two different format.
Format 1: the connectivity matrix
which has as many row / columns as the number of elements in the powergrid (remember an element is either an end of a powerline, or a generator or a load) and that says if 2 elements are connected to one another or not:
obs.connectivity_matrix()
array([[0., 1., 1., ..., 0., 0., 0.], [1., 0., 1., ..., 0., 0., 0.], [1., 1., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 1., 1.], [0., 0., 0., ..., 1., 0., 1.], [0., 0., 0., ..., 1., 1., 0.]])
This representation has the advantage to give a matrix wil always the same dimension, regardless of the topology of the powergrid.
Format 2: the bus connectivity matrix
has as many row / columns as the number of active buses of the powergrid. It should be understood as followed:
obs.bus_connectivity_matrix()
array([[1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0.], [1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0.], [0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1.], [0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1.], [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1.]])
As opposed to most RL problems, in this framework we add the possibility to "simulate" the impact of a possible action on the power grid. This helps to calculate roll outs in RL setting, and can be close to "model based" reinforcement learning approaches (except that nothing more has to be learned).
This "simulate" method uses the avalable forecast data (forecasts are made available by the way we loaded the data here, with the class GridStateFromFileWithForecasts
. For this class, only 1 time step ahead forecasts are provided, but this might be adapted in the future).
Note that this simulate
function can use a different simulator than the one used by the Environment. Fore more information, we encourage you to read the official documentation or if it has been built locally (recommended) to consult this page.
This function will:
simulate
powerflow simulatorFrom a user point of view, this is the main difference with the previous pypownet framework. This "simulation" used to be performed directly by the environment, thus giving a direct access to the Agent to the Environment, which could break the RL framework (it was not the case in the first edition of the Learning to Run A Power Network as the Environment was fully observable).
do_nothing_act = env.helper_action_player({})
obs_sim, reward_sim, is_done_sim, info_sim = obs.simulate(do_nothing_act)
obs_sim.prod_p
array([81.5 , 79.7 , 12.9 , 0. , 79.57809957])
obs.prod_p
array([81.6 , 81.1 , 12.9 , 0. , 77.7200955])
In this section we will make our first Agent that will act based on these observations.
All Agent must derived from the grid2op.Agent class. The main function to code for the Agents is the "act" function (more information on the official documentation or here ).
Basically, the Agent receive a reward and an observation, and suggest a new action. Some different Agent are pre-define in the grid2op package. We won't expose them here (for more information see the documantation or the Agent.py file), but rather we will make a custom Agent.
This Agent will select among:
by using simulate
on the corresponding actions, and choosing the one that has the highest predicted reward.
Note that this kind of Agent is not particularly smart and is given only as an example.
More information about the creation / manipulation of Action will be given in the notebook 2_Action_GridManipulation
from grid2op.Agent import BaseAgent
import numpy as np
import pdb
class MyAgent(BaseAgent):
def __init__(self, action_space):
# python required method to code
BaseAgent.__init__(self, action_space)
self.do_nothing = self.action_space({})
self.print_next = False
def act(self, observation, reward, done=False):
i_max = np.argmax(observation.rho)
new_status_max = np.zeros(observation.rho.shape)
new_status_max[i_max] = -1
act_max = self.action_space({"set_line_status": new_status_max})
i_min = np.argmin(observation.rho)
new_status_min = np.zeros(observation.rho.shape)
if observation.rho[i_min] > 0:
# all powerlines are connected, i try to disconnect this one
new_status_min[i_min] = -1
act_min = self.action_space({"set_line_status": new_status_min})
else:
# at least one powerline is disconnected, i try to reconnect it
new_status_min[i_min] = 1
# act_min = self.action_space({"set_status": new_status_min})
act_min = self.action_space({"set_line_status": new_status_min,
"set_bus": {"lines_or_id": [(i_min, 1)], "lines_ex_id": [(i_min, 1)]}})
_, reward_sim_dn, *_ = observation.simulate(self.do_nothing)
_, reward_sim_max, *_ = observation.simulate(act_max)
_, reward_sim_min, *_ = observation.simulate(act_min)
if reward_sim_dn >= reward_sim_max and reward_sim_dn >= reward_sim_min:
self.print_next = False
res = self.do_nothing
elif reward_sim_max >= reward_sim_min:
self.print_next = True
res = act_max
print(res)
else:
self.print_next = True
res = act_min
print(res)
return res
We compare this Agent with the Donothing agent (already coded) on the 3 episode made available with this package. To make the comparison more interesting, it's better to use the predefined reward class that has L2RPN rewards.
from grid2op.main import main
from grid2op.Agent import DoNothingAgent
from grid2op.Reward import L2RPNReward
from grid2op.Chronics import GridStateFromFileWithForecasts
max_iter = 10 # to make computation much faster we will only consider 50 time steps instead of 287
res = main(nb_episode=1,
agent_class=DoNothingAgent,
path_casefile=grid2op.CASE_14_FILE,
path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts, "max_iter": max_iter},
reward_class=L2RPNReward
)
print("The results for DoNothing agent are:")
for _, chron_name, cum_reward, nb_time_step, max_ts in res:
msg_tmp = "\tFor chronics with id {}\n".format(chron_name)
msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
print(msg_tmp)
The results for DoNothing agent are: For chronics with id 1 - cumulative reward: 199.998003 - number of time steps completed: 10 / 10
res = main(nb_episode=1,
agent_class=MyAgent,
path_casefile=grid2op.CASE_14_FILE,
path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts, "max_iter": max_iter},
reward_class=L2RPNReward
)
print("The results for the custom agent are:")
for _, chron_name, cum_reward, nb_time_step, max_ts in res:
msg_tmp = "\tFor chronics with id {}\n".format(chron_name)
msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
print(msg_tmp)
The results for the custom agent are: For chronics with id 1 - cumulative reward: 199.998003 - number of time steps completed: 10 / 10
As we can see, there is no change in the performance for both agent (there would be if we didn't limit the episode length to 10 time steps)
This agent is NOT recommended.
NB These scores are given if setting max_iter=-1
in the previous cells. Here, when max_iter=10
we don't see any difference in the score (max_iter=10 has been set to make the tests of these notebooks run faster)
from grid2op.Agent import PowerLineSwitch
res = main(nb_episode=1,
agent_class=PowerLineSwitch,
path_casefile=grid2op.CASE_14_FILE,
path_chronics=grid2op.CHRONICS_MLUTIEPISODE,
names_chronics_to_backend=grid2op.NAMES_CHRONICS_TO_BACKEND,
gridStateclass_kwargs={"gridvalueClass": GridStateFromFileWithForecasts, "max_iter": max_iter},
reward_class=L2RPNReward
)
print("The results for the PowerLineSwitch agent are:")
for _, chron_name, cum_reward, nb_time_step, max_ts in res:
msg_tmp = "\tFor chronics with ID {}\n".format(chron_name)
msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
print(msg_tmp)
The results for the PowerLineSwitch agent are: For chronics with ID 1 - cumulative reward: 199.998150 - number of time steps completed: 10 / 10
We want however to emphasize that do nothing is NOT the best solution, even in this simple case. For example, an agent choosing at each time step to disconnect / reconnect as to greedily maximize the anticipated reward will have a cumulative reward of 199.998*134* in this situation.
NB For these simulations, the score is completely not realistic. Indeed, no special care has been taken to set the thermal limits to plausible values. This explain the very little difference observed between the three agents above.
NB These scores are given if setting max_iter=-1
in the previous cells. Here, when max_iter=10
we don't see any difference in the score (max_iter=10 has been set to make the tests of these notebooks run faster)