#!/usr/bin/env python
# coding: utf-8

# # Working with pandas in pandapower
# 
# All data in pandapower is stored in pandas dataframes, which is why all functionalities that are implemented in pandas (http://pandas.pydata.org/) can be used with pandapower. pandas is a very powerful and widely used tool for data analysis.
# 
# In the following we will present just a few of the many possibilities how builtin pandas functions can make your life easier when working in pandapower.
# 
# This tutorial refers to pandas functionalities and how to use them with pandapower. If you have not worked with pandas before, this should give you a good overview of the relevant functionality. If you are already familiar with pandas, it might still refresh your memory, but maybe not provide new information.

# We use the mv_oberrhein example from the pandapower networks package for this tutorial.

# In[1]:


#Import the pandapower and the networks module:
import pandapower as pp
import pandapower.networks as nw
import pandas as pd

#Import an example network:
net = nw.mv_oberrhein()


# ## Accessing and Setting Parameters
# 
# You can accesss all parameters of one element with the .loc command:

# In[2]:


#Detailed information about line 4
net.line.loc[4]


# Or one parameter of multiple elements:

# In[3]:


#Length information about line 5 and 6
net.line.length_km.loc[[5,6]]


# The values are returned as a pandas series. To get them in an array instead, use the values function:

# In[4]:


#Length information about line 5 and 6 as numpy array
net.line.length_km.loc[[5,6]].values


# Multiple parameters of multiple elements are returned as a pandas dataframe:

# In[5]:


#Length and type information about line 5 and 6                               
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]]


# The values function in this case yields a multi-dimensional array:

# In[6]:


#Length and type information about line 5 and 6  as an array                             
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]].values


# To access exactly one parameter, .at can be used instead of .loc:

# In[7]:


net.line.length_km.at[6]


# This .at solution can only be used for exactly one parameter, but is siginifcantly faster than using .loc. That is why you should always use .at if possible!

# Setting parameters works the same way:

# In[8]:


#Set single parameter with .at
net.line.length_km.at[5] = 3.2
#or
net.line.at[5, "length_km"] = 3.2

#Set multiple parameters with .loc
net.line.length_km.loc[[4,6]] = [1.8, 2.2]

#Output
print("Line lengths after parameter setting:")
net.line.length_km.loc[[4,5,6]]


# ## Statistical Evaluations
# 
# You can easily find maximum, minimum values of a column with pandas, e.g.

# In[9]:


print("Maximum Line Length: %.2f km"%net.line.length_km.max())
print("Minimum Line Length: %.2f km"%net.line.length_km.min())
print("Mean Line Length: %.2f km"%net.line.length_km.mean())


# Of course you can also combine these, for example to calculate weighted means:

# In[10]:


weighted_mean_r = (net.line.length_km * net.line.r_ohm_per_km).sum() / net.line.length_km.sum()
print("Weighted Mean Resistance: %.2f Ohm per kilometer"%weighted_mean_r)


# ## Iterating over Elements
# 
# If you want to iterate over a dataframe, use the iterrows function. It gives you the index of each row and the row as a pandas series.

# In[11]:


for lidx, linerow in net.line.loc[[4,5,6]].iterrows():
        print("the line with index %s is %f kilometers long and has a resistance of %f ohm per kilometers" %(lidx, linerow.length_km, linerow.r_ohm_per_km))


# You can also group elements that share the same value in any column of the dataframe with the groubpy function of pandas;

# In[12]:


for std_type, linetable in net.line.groupby(net.line.std_type):
        print("there are %u lines with standard type %s with an overall length of %f kilometers"%(len(linetable), std_type, linetable.length_km.sum()))


# The first variable is now the value of the column you grouped by (here line standard type) and the second variable is a pandas dataframe of all lines that have this value (here all lines with the standard type defined in the first variable).

# You can also directly sum up certain values of a groupby, for example

# In[13]:


net.line.groupby(net.line.std_type).sum().length_km


# gives you a pandas of series with the sum of all line length grouped by standard type.

# ## Querying Dataframes
# 
# You will often need to select elements with specific characteristics. You can do that with boolean masks:

# In[14]:


mask = net.line.length_km > 2.5


# the mask variable is now a boolean time series that indicates for every line if it longer than 2500 meters or not. We can now select only the ones which are longer than 2500 meters and output their name and length:

# In[15]:


long_lines = net.line[mask]

#Output
long_lines[["name", "length_km", "std_type"]]


# or, directly in one step:

# In[16]:


net.line[net.line.length_km > 2.5][["name", "length_km", "std_type"]]


# 
# You can of course also check for exact values, for example if you want all 110 kV buses:

# In[17]:


net.bus[net.bus.vn_kv == 110]


# For string querys, use the str.contains function, for example to find loads with "MV" in the name:

# In[18]:


net.load[net.load.name.str.startswith("MV")]


# or to find all all loads that contain "Load 0" in their name:

# In[19]:


net.load[net.load.name.str.contains("Load 0")]


# The isin function allows you to check if certain values are contained in a list, for example:

# In[20]:


lines = net.line[net.line.from_bus.isin([39,72])]
lines[["name", "from_bus", "to_bus"]]


# returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus.

# You can also combine certain queries with the logical | (or) and & (and) for more complicated queries, for example:

# In[21]:


lines = net.line[(net.line.from_bus.isin([39,72])) | (net.line.to_bus.isin([39,72]))]
lines[["name", "from_bus", "to_bus"]]


# returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus or to bus.

# Please be aware that the boolean query always returns a copy of the dataframe. Lets say you want to set all medium voltage nodes out of service and try this:

# In[22]:


net.bus[net.bus.vn_kv > 25].in_service = False


# You will notice that your pandapower network table did not change:

# In[23]:


net.bus.in_service.head()


# This is because the query returns a copy and you change the value in this copy in the same way like the following code:

# In[24]:


bus_table = net.bus[net.bus.vn_kv < 25]
bus_table.in_service = False

#Output
print(bus_table.in_service.head())
# but still:
print(net.bus.in_service.head())


# Here bus_table is a copy of a part of the net.bus table. In the example above, you changed that copy, but not the original table. That is why you need to save the index of the copied bus table and than change the value of the original table:

# In[25]:


#Get index and store in ns_nodes
ns_nodes = net.bus[net.bus.vn_kv < 25].index

#Change all values where index is in ns_nodes
net.bus.in_service.loc[ns_nodes] = False

#Output
print(net.bus.in_service.head())


# ## Extending and Customizing the Framework
# 
# One of the benefits of pandapower is its easy to customize and extend datastructure. For pandapower to be able to run a loadflow, the standard parameters of the datastructure have to be defined. However, you can extend the pandapower datastructure any way you like.

# Say you have network that is partitioned in three zones A, B and C: 

# In[26]:


nr_buses = len(net.bus.index)
net.bus.loc[net.bus.index[:nr_buses], "zone"] = "A"
net.bus.loc[net.bus.index[nr_buses:], "zone"] = "B"
net.bus.zone.head()


# The goal is to apply a load scaling factor of 0.8 in Zone A and of 0.6 in Zone B:

# In[27]:


for i, load in net.load.iterrows():
    if net.bus.zone.at[load.bus] == "A":
        net.load.scaling.at[i] = 0.8
    elif net.bus.zone.at[load.bus] == "B":
        net.load.scaling.at[i] = 0.6
net.load.head(n=5)


# However, you could also extend the line table to include the zone of each line like this:

# In[28]:


net.scaling = 1.0 #reset
net.load["zone"] = net.bus.zone.loc[net.load.bus.values].values
net.load.head(n=5)


# and then apply your function only to the lines in Zone A:

# In[29]:


net.load.loc[net.load[net.load.zone=="A"].index, "scaling"] = 0.8
net.load.loc[net.load[net.load.zone=="B"].index, "scaling"] = 0.6
net.load.head(n=5)


# Of course for this simple example, the first solution would also be feasible. But the more complicated your code gets, the more important it is to customize the pandapower framework, so your code is more efficient as well as easier to read and debug.

# ## Merging Dataframes
# 
# Lets say you want to see the voltage at each bus together with the bus name. The voltage is stored in net.res_bus, whereas the name ist stored in net.bus. The two values are in different tables, but they have the same index, which is why you can easily merge them into a new shared dataframe:

# In[30]:


net = nw.mv_oberrhein()
pp.runpp(net) #Run a load flow calculation to get result tables

#Store bus name with bus voltage result of loadflow calculation in bus_results
#axis={horizontal=0, vertical=1}: axis to concatenate along
bus_results = pd.concat([net.bus.name, net.res_bus.vm_pu], axis=1)

#Output
bus_results.head(n=8)


# Now bus results shows you the name and the voltage of each bus in one dataframe.

# If you want to see the bus voltage of all loads, you can also merge the bus and the load dataframe. They do not share an index, but rather the index of the bus is given in the “bus” column of the load table. In that case you can merge the dataframes like this: (for more information about the arguments see [pandas merging documentation] [pandas_merg])
# [pandas_merg]: http://pandas.pydata.org/pandas-docs/stable/merging.html 

# In[31]:


load_bus_results = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus")

#Output
load_bus_results.head(n=8)


# Here net.res_bus is the left and net.load is the right dataframe. To merge the dataframe into one, the index of the left dataframe (left_index=True) and the bus column of the right dataframe (right_on=”bus”) are used. The resulting dataframe shows you the load table with the bus results of the respective load buses.

# If you want to know the maximum voltage at a bus that has a load connected to it, you could use:

# In[32]:


max_load_voltage = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus").vm_pu.max()

#Output
max_load_voltage


# or alternatively with a boolean mask:

# In[33]:


max_load_voltage = net.res_bus[net.bus.index.isin(net.load.bus.values)].vm_pu.max()

#Output
max_load_voltage


# In this simple case, the boolean mask solution is about 5 times faster than merging the dataframes, so only use merging for more complicated operations or if runtime is not important (e.g. debugging).