Working with pandas in pandapower¶

All data in pandapower is stored in pandas dataframes, which is why all functionalities that are implemented in pandas (http://pandas.pydata.org/) can be used with pandapower. pandas is a very powerful and widely used tool for data analysis.

In the following we will present just a few of the many possibilities how builtin pandas functions can make your life easier when working in pandapower.

This tutorial refers to pandas functionalities and how to use them with pandapower. If you have not worked with pandas before, this should give you a good overview of the relevant functionality. If you are already familiar with pandas, it might still refresh your memory, but maybe not provide new information.

We use the mv_oberrhein example from the pandapower networks package for this tutorial.

In [1]:

#Import the pandapower and the networks module:
import pandapower as pp
import pandapower.networks as nw
import pandas as pd

#Import an example network:
net = nw.mv_oberrhein()

Accessing and Setting Parameters¶

You can accesss all parameters of one element with the .loc command:

In [2]:

#Detailed information about line 4
net.line.loc[4]

Out[2]:

name                                  Line 4
std_type        NA2XS2Y 1x185 RM/25 12/20 kV
from_bus                                 239
to_bus                                   236
length_km                           0.381968
r_ohm_per_km                           0.161
x_ohm_per_km                           0.117
c_nf_per_km                            273.0
g_us_per_km                              0.0
max_i_ka                               0.362
df                                       1.0
parallel                                   1
type                                      cs
in_service                              True
Name: 4, dtype: object

Or one parameter of multiple elements:

In [3]:

#Length information about line 5 and 6
net.line.length_km.loc[[5,6]]

Out[3]:

5    3.3028
6    0.3504
Name: length_km, dtype: float64

The values are returned as a pandas series. To get them in an array instead, use the values function:

In [4]:

#Length information about line 5 and 6 as numpy array
net.line.length_km.loc[[5,6]].values

Out[4]:

array([3.3028, 0.3504])

Multiple parameters of multiple elements are returned as a pandas dataframe:

In [5]:

#Length and type information about line 5 and 6                               
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]]

Out[5]:

	length_km	r_ohm_per_km	x_ohm_per_km	c_nf_per_km
5	3.3028	0.161	0.117	273.0
6	0.3504	0.161	0.117	273.0

The values function in this case yields a multi-dimensional array:

In [6]:

#Length and type information about line 5 and 6  as an array                             
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]].values

Out[6]:

array([[3.3028e+00, 1.6100e-01, 1.1700e-01, 2.7300e+02],
       [3.5040e-01, 1.6100e-01, 1.1700e-01, 2.7300e+02]])

To access exactly one parameter, .at can be used instead of .loc:

In [7]:

net.line.length_km.at[6]

Out[7]:

0.3504

This .at solution can only be used for exactly one parameter, but is siginifcantly faster than using .loc. That is why you should always use .at if possible!

Setting parameters works the same way:

In [8]:

#Set single parameter with .at
net.line.length_km.at[5] = 3.2
#or
net.line.at[5, "length_km"] = 3.2

#Set multiple parameters with .loc
net.line.length_km.loc[[4,6]] = [1.8, 2.2]

#Output
print("Line lengths after parameter setting:")
net.line.length_km.loc[[4,5,6]]

Line lengths after parameter setting:

Out[8]:

4    1.8
5    3.2
6    2.2
Name: length_km, dtype: float64

Statistical Evaluations¶

You can easily find maximum, minimum values of a column with pandas, e.g.

In [9]:

print("Maximum Line Length: %.2f km"%net.line.length_km.max())
print("Minimum Line Length: %.2f km"%net.line.length_km.min())
print("Mean Line Length: %.2f km"%net.line.length_km.mean())

Maximum Line Length: 3.20 km
Minimum Line Length: 0.07 km
Mean Line Length: 0.62 km

Of course you can also combine these, for example to calculate weighted means:

In [10]:

weighted_mean_r = (net.line.length_km * net.line.r_ohm_per_km).sum() / net.line.length_km.sum()
print("Weighted Mean Resistance: %.2f Ohm per kilometer"%weighted_mean_r)

Weighted Mean Resistance: 0.16 Ohm per kilometer

Iterating over Elements¶

If you want to iterate over a dataframe, use the iterrows function. It gives you the index of each row and the row as a pandas series.

In [11]:

for lidx, linerow in net.line.loc[[4,5,6]].iterrows():
        print("the line with index %s is %f kilometers long and has a resistance of %f ohm per kilometers" %(lidx, linerow.length_km, linerow.r_ohm_per_km))

the line with index 4 is 1.800000 kilometers long and has a resistance of 0.161000 ohm per kilometers
the line with index 5 is 3.200000 kilometers long and has a resistance of 0.161000 ohm per kilometers
the line with index 6 is 2.200000 kilometers long and has a resistance of 0.161000 ohm per kilometers

You can also group elements that share the same value in any column of the dataframe with the groubpy function of pandas;

In [12]:

for std_type, linetable in net.line.groupby(net.line.std_type):
        print("there are %u lines with standard type %s with an overall length of %f kilometers"%(len(linetable), std_type, linetable.length_km.sum()))

there are 10 lines with standard type 243-AL1/39-ST1A 20.0 with an overall length of 10.761400 kilometers
there are 160 lines with standard type NA2XS2Y 1x185 RM/25 12/20 kV with an overall length of 96.806343 kilometers
there are 11 lines with standard type NA2XS2Y 1x240 RM/25 12/20 kV with an overall length of 4.343042 kilometers

The first variable is now the value of the column you grouped by (here line standard type) and the second variable is a pandas dataframe of all lines that have this value (here all lines with the standard type defined in the first variable).

You can also directly sum up certain values of a groupby, for example

In [13]:

net.line.groupby(net.line.std_type).sum().length_km

Out[13]:

std_type
243-AL1/39-ST1A 20.0            10.761400
NA2XS2Y 1x185 RM/25 12/20 kV    96.806343
NA2XS2Y 1x240 RM/25 12/20 kV     4.343042
Name: length_km, dtype: float64

gives you a pandas of series with the sum of all line length grouped by standard type.

Querying Dataframes¶

You will often need to select elements with specific characteristics. You can do that with boolean masks:

In [14]:

mask = net.line.length_km > 2.5

the mask variable is now a boolean time series that indicates for every line if it longer than 2500 meters or not. We can now select only the ones which are longer than 2500 meters and output their name and length:

In [15]:

long_lines = net.line[mask]

#Output
long_lines[["name", "length_km", "std_type"]]

Out[15]:

	name	length_km	std_type
5	Line 5	3.200000	NA2XS2Y 1x185 RM/25 12/20 kV
45	Line 45	2.611111	NA2XS2Y 1x185 RM/25 12/20 kV
59	Line 59	3.155390	NA2XS2Y 1x185 RM/25 12/20 kV
91	Line 91	2.528200	NA2XS2Y 1x185 RM/25 12/20 kV
162	Line 162	2.595300	243-AL1/39-ST1A 20.0

or, directly in one step:

In [16]:

net.line[net.line.length_km > 2.5][["name", "length_km", "std_type"]]

Out[16]:

	name	length_km	std_type
5	Line 5	3.200000	NA2XS2Y 1x185 RM/25 12/20 kV
45	Line 45	2.611111	NA2XS2Y 1x185 RM/25 12/20 kV
59	Line 59	3.155390	NA2XS2Y 1x185 RM/25 12/20 kV
91	Line 91	2.528200	NA2XS2Y 1x185 RM/25 12/20 kV
162	Line 162	2.595300	243-AL1/39-ST1A 20.0

You can of course also check for exact values, for example if you want all 110 kV buses:

In [17]:

net.bus[net.bus.vn_kv == 110]

Out[17]:

	name	vn_kv	type	zone	in_service
58	Bus 38	110.0	n	None	True
318	Bus 177	110.0	b	None	True

For string querys, use the str.contains function, for example to find loads with "MV" in the name:

In [18]:

net.load[net.load.name.str.startswith("MV")]

Out[18]:

	name	bus	p_mw	q_mvar	sn_mva	scaling	in_service	type
141	MV Load 0	215	0.5	0.101529	NaN	0.6	True	MV Load
142	MV Load 1	6	0.5	0.101529	NaN	0.6	True	MV Load
143	MV Load 2	48	0.5	0.101529	NaN	0.6	True	MV Load
144	MV Load 3	52	0.5	0.101529	NaN	0.6	True	MV Load
145	MV Load 4	55	0.5	0.101529	NaN	0.6	True	MV Load
146	MV Load 5	235	0.5	0.101529	NaN	0.6	True	MV Load

or to find all all loads that contain "Load 0" in their name:

In [19]:

net.load[net.load.name.str.contains("Load 0")]

Out[19]:

	name	bus	p_mw	q_mvar	const_z_percent	const_i_percent	sn_mva	scaling	in_service	type
0	LV Load 0	103	0.25	0.050765	0.0	0.0	NaN	0.6	True	MV/LV Station
141	MV Load 0	215	0.50	0.101529	0.0	0.0	NaN	0.6	True	MV Load

The isin function allows you to check if certain values are contained in a list, for example:

In [20]:

lines = net.line[net.line.from_bus.isin([39,72])]
lines[["name", "from_bus", "to_bus"]]

Out[20]:

	name	from_bus	to_bus
165	Line 165	39	86

returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus.

You can also combine certain queries with the logical | (or) and & (and) for more complicated queries, for example:

In [21]:

lines = net.line[(net.line.from_bus.isin([39,72])) | (net.line.to_bus.isin([39,72]))]
lines[["name", "from_bus", "to_bus"]]

Out[21]:

	name	from_bus	to_bus
36	Line 36	289	72
54	Line 54	30	72
162	Line 162	80	39
165	Line 165	39	86

returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus or to bus.

Please be aware that the boolean query always returns a copy of the dataframe. Lets say you want to set all medium voltage nodes out of service and try this:

In [22]:

net.bus[net.bus.vn_kv > 25].in_service = False

You will notice that your pandapower network table did not change:

In [23]:

net.bus.in_service.head()

Out[23]:

0    True
1    True
2    True
3    True
4    True
Name: in_service, dtype: bool

This is because the query returns a copy and you change the value in this copy in the same way like the following code:

In [24]:

bus_table = net.bus[net.bus.vn_kv < 25]
bus_table.in_service = False

#Output
print(bus_table.in_service.head())
# but still:
print(net.bus.in_service.head())

0    False
1    False
2    False
3    False
4    False
Name: in_service, dtype: bool
0    True
1    True
2    True
3    True
4    True
Name: in_service, dtype: bool

Here bus_table is a copy of a part of the net.bus table. In the example above, you changed that copy, but not the original table. That is why you need to save the index of the copied bus table and than change the value of the original table:

In [25]:

#Get index and store in ns_nodes
ns_nodes = net.bus[net.bus.vn_kv < 25].index

#Change all values where index is in ns_nodes
net.bus.in_service.loc[ns_nodes] = False

#Output
print(net.bus.in_service.head())

0    False
1    False
2    False
3    False
4    False
Name: in_service, dtype: bool

Extending and Customizing the Framework¶

One of the benefits of pandapower is its easy to customize and extend datastructure. For pandapower to be able to run a loadflow, the standard parameters of the datastructure have to be defined. However, you can extend the pandapower datastructure any way you like.

Say you have network that is partitioned in three zones A, B and C:

In [26]:

nr_buses = len(net.bus.index)
net.bus.loc[net.bus.index[:nr_buses], "zone"] = "A"
net.bus.loc[net.bus.index[nr_buses:], "zone"] = "B"
net.bus.zone.head()

Out[26]:

0    A
1    A
2    A
3    A
4    A
Name: zone, dtype: object

The goal is to apply a load scaling factor of 0.8 in Zone A and of 0.6 in Zone B:

In [27]:

for i, load in net.load.iterrows():
    if net.bus.zone.at[load.bus] == "A":
        net.load.scaling.at[i] = 0.8
    elif net.bus.zone.at[load.bus] == "B":
        net.load.scaling.at[i] = 0.6
net.load.head(n=5)

Out[27]:

	name	bus	p_mw	q_mvar	sn_mva	scaling	in_service	type
0	LV Load 0	103	0.25	0.050765	NaN	0.8	True	MV/LV Station
1	LV Load 1	174	0.63	0.127927	NaN	0.8	True	MV/LV Station
2	LV Load 2	194	0.25	0.050765	NaN	0.8	True	MV/LV Station
3	LV Load 3	34	0.25	0.050765	NaN	0.8	True	MV/LV Station
4	LV Load 4	76	0.25	0.050765	NaN	0.8	True	MV/LV Station

However, you could also extend the line table to include the zone of each line like this:

In [28]:

net.scaling = 1.0 #reset
net.load["zone"] = net.bus.zone.loc[net.load.bus.values].values
net.load.head(n=5)

Out[28]:

	name	bus	p_mw	q_mvar	sn_mva	scaling	in_service	type	zone
0	LV Load 0	103	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
1	LV Load 1	174	0.63	0.127927	NaN	0.8	True	MV/LV Station	A
2	LV Load 2	194	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
3	LV Load 3	34	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
4	LV Load 4	76	0.25	0.050765	NaN	0.8	True	MV/LV Station	A

and then apply your function only to the lines in Zone A:

In [29]:

net.load.loc[net.load[net.load.zone=="A"].index, "scaling"] = 0.8
net.load.loc[net.load[net.load.zone=="B"].index, "scaling"] = 0.6
net.load.head(n=5)

Out[29]:

	name	bus	p_mw	q_mvar	sn_mva	scaling	in_service	type	zone
0	LV Load 0	103	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
1	LV Load 1	174	0.63	0.127927	NaN	0.8	True	MV/LV Station	A
2	LV Load 2	194	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
3	LV Load 3	34	0.25	0.050765	NaN	0.8	True	MV/LV Station	A
4	LV Load 4	76	0.25	0.050765	NaN	0.8	True	MV/LV Station	A

Of course for this simple example, the first solution would also be feasible. But the more complicated your code gets, the more important it is to customize the pandapower framework, so your code is more efficient as well as easier to read and debug.

Merging Dataframes¶

Lets say you want to see the voltage at each bus together with the bus name. The voltage is stored in net.res_bus, whereas the name ist stored in net.bus. The two values are in different tables, but they have the same index, which is why you can easily merge them into a new shared dataframe:

In [30]:

net = nw.mv_oberrhein()
pp.runpp(net) #Run a load flow calculation to get result tables

#Store bus name with bus voltage result of loadflow calculation in bus_results
#axis={horizontal=0, vertical=1}: axis to concatenate along
bus_results = pd.concat([net.bus.name, net.res_bus.vm_pu], axis=1)

#Output
bus_results.head(n=8)

Out[30]:

	name	vm_pu
0	Bus 0	1.009473
1	Bus 1	1.009354
2	Bus 2	1.010306
3	Bus 3	1.008611
4	Bus 4	1.010021
5	Bus 5	1.015559
6	Bus 6	1.023551
7	Bus 7	1.015590

Now bus results shows you the name and the voltage of each bus in one dataframe.

If you want to see the bus voltage of all loads, you can also merge the bus and the load dataframe. They do not share an index, but rather the index of the bus is given in the “bus” column of the load table. In that case you can merge the dataframes like this: (for more information about the arguments see [pandas merging documentation] [pandas_merg]) [pandas_merg]: http://pandas.pydata.org/pandas-docs/stable/merging.html

In [31]:

load_bus_results = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus")

#Output
load_bus_results.head(n=8)

Out[31]:

	vm_pu	va_degree	p_mw_x	q_mvar_x	name	bus	p_mw_y	q_mvar_y	sn_mva	scaling	in_service	type
67	1.009473	-6.778921	0.24	0.048734	LV Load 67	0	0.40	0.081223	NaN	0.6	True	MV/LV Station
17	1.009354	-6.783526	0.15	0.030459	LV Load 17	1	0.25	0.050765	NaN	0.6	True	MV/LV Station
128	1.010306	-6.753059	0.15	0.030459	LV Load 128	2	0.25	0.050765	NaN	0.6	True	MV/LV Station
73	1.008611	-6.807492	0.15	0.030459	LV Load 73	3	0.25	0.050765	NaN	0.6	True	MV/LV Station
53	1.010021	-6.761493	0.15	0.030459	LV Load 53	4	0.25	0.050765	NaN	0.6	True	MV/LV Station
61	1.015559	-6.321147	0.15	0.030459	LV Load 61	5	0.25	0.050765	NaN	0.6	True	MV/LV Station
142	1.023551	-5.478506	0.30	0.060918	MV Load 1	6	0.50	0.101529	NaN	0.6	True	MV Load
101	1.008905	-6.797072	0.24	0.048734	LV Load 101	8	0.40	0.081223	NaN	0.6	True	MV/LV Station

Here net.res_bus is the left and net.load is the right dataframe. To merge the dataframe into one, the index of the left dataframe (left_index=True) and the bus column of the right dataframe (right_on=”bus”) are used. The resulting dataframe shows you the load table with the bus results of the respective load buses.

If you want to know the maximum voltage at a bus that has a load connected to it, you could use:

In [32]:

max_load_voltage = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus").vm_pu.max()

#Output
max_load_voltage

Out[32]:

1.0235513513298844

or alternatively with a boolean mask:

In [33]:

max_load_voltage = net.res_bus[net.bus.index.isin(net.load.bus.values)].vm_pu.max()

#Output
max_load_voltage

Out[33]:

1.0235513513298844

In this simple case, the boolean mask solution is about 5 times faster than merging the dataframes, so only use merging for more complicated operations or if runtime is not important (e.g. debugging).