Demonstration of `donut_plot_with_subgroups_from_dataframe.py`¶

Demonstrating use of donut_plot_with_subgroups_from_dataframe.py, see here for more information.

This is a full-featured script that makes a plot similar to the example from The Python Graph Gallery. This script is full-featured without further need for coding that allows you to plug in your own data input.

In addition to this full featured script demonstrated on this page, there is a demonstration notebook that shows the basic coding step to make a plot similar to the example from The Python Graph Gallery using a dataframe ot tabular text as input. That notebook is here. In addition to that there are some others, too. Here is a list:

In the current form, these scripts and demonstration notebook work in JupyterLab, too.

The two main ways of using the script are covered first featuring several of the options demonstrated in the course of that. Then some features important for adjusting the looks to match your needs, particularly through the use of a 'high-low' list to control shading, are covered.
Plus a text-based alternative is highlighted.

Preparation and displaying USAGE block¶

Let's get the script and run 'Help' on it to see the basic USAGE block.

(If you are running this notebook in the session launched from the repo that includes the script, this step is not necessary. However, it is included because there is no harm in running it here and you may be wanting to run this elsewhere or see how to easily acquire the script. If you are on the actual command line, you'd leave off the exclamation point.)

In [1]:

import os
file_needed = "donut_plot_with_subgroups_from_dataframe.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/donut_plots_with_subgroups/master/donut_plot_with_subgroups_from_dataframe.py

In [2]:

%run donut_plot_with_subgroups_from_dataframe.py -h

usage: donut_plot_with_subgroups_from_dataframe.py [-h] [-li] [-lopg] [-lotg]
                                                   [-svg] [-ssn]
                                                   [-hll HILOLIST]
                                                   [-ac ADVANCE_COLOR]
                                                   DF_FILE GROUPS SUBGROUPS

donut_plot_with_subgroups_from_dataframe.py takes a dataframe, and some
information about columns in the dataframe and makes a donut plot. The inner
ring is a breakdown of the subgroupings per each group in the outer ring of
the plot. **** Script by Wayne Decatur (fomightez @ github) ***

positional arguments:
  DF_FILE               Name of file containing the dataframe. Whether it is
                        in the form of a pickled dataframe, tab-separated
                        text, or comma-separated text needs to be indicated by
                        the file extension. So `.pkl`, `.tsv`, or `.csv` for
                        the file extension.
  GROUPS                Text indicating column in dataframe to use as main
                        group data in the outer ring of the plot.
  SUBGROUPS             Text indicating column in dataframe to use as
                        subgroupings for the inner ring.

optional arguments:
  -h, --help            show this help message and exit
  -li, --large_image    add this flag to make the image saved larger than the
                        default of `(7, 8)`
  -lopg, --leave_off_percent_in_group
                        add this flag to not display the percent of the total
                        for each group.
  -lotg, --leave_off_total_in_group
                        add this flag to not display the total amount for each
                        group.
  -svg, --save_vg       add this flag to save as vector graphics
                        (**RECOMMENDED FOR PUBLICATION***) instead of default
                        png. Not default or saved alongside `.png` version
                        normally because not as easy to deal with as typical
                        image file.
  -ssn, --sort_on_subgroup_name
                        add this flag to sort the subgroups display in the
                        inner ring based on the subgroup name like in example
                        at https://python-graph-gallery.com/163-donut-plot-
                        with-subgroups/.
  -hll HILOLIST, --hilolist HILOLIST
                        This flag is used to specify that you want to control
                        the order of the subgroups to range from being dark to
                        light in the degree of color intensity in the plot
                        because the default result does not suffice. Follow
                        the flag with an order listing, high intensity to low,
                        of the subgroup identifiers separated by commas,
                        without spaces or quotes. For example `-hll
                        yes,maybe,no`. When the script is run the identifiers
                        and default order used will be indicated so that
                        you'll have the identifiers at hand when running
                        again.
  -ac ADVANCE_COLOR, --advance_color ADVANCE_COLOR
                        **FOR ADVANCED USE.** Allows for advancing the color
                        palette iterator a specified number of times. The idea
                        is it allows skipping a specified amount of the
                        initial colors to help 'customize' the set of colors
                        in the plot, if needed. Supply the number to advance
                        after the flag on the command line. For example, `-ac
                        4`. If that doesn't allow dialing in a good set of
                        colors, and you know Python, you can edit the
                        `list_of_other_good_sequences`

Use the script by calling it from the command line¶

A dataframe will be used for input data.

In [3]:

import pandas as pd
obs = [('A', 1, "frizzled"), 
       ('A', 1, "lethargic"), 
       ('A', 1, "polythene"), 
       ('A', 1, "epic"),
       ('A', 2, "frizzled"), 
       ('A', 2, "lethargic"), 
       ('A', 2, "epic"),
       ('A', 3, "frizzled"), 
       ('A', 3, "lethargic"),
       ('A', 3, "polythene"),
       ('A', 3, "epic"),
       ('A', 3, "bedraggled"),
       ('B', 1, "frizzled"), 
       ('B', 1, "lethargic"),
       ('B', 1, "polythene"),
       ('B', 1, "epic"),
       ('B', 1, "bedraggled"),
       ('B', 1, "moombahcored"),
       ('B', 2, "frizzled"), 
       ('B', 2, "lethargic"),
       ('B', 2, "polythene"),
       ('B', 2, "epic"),
       ('B', 2, "bedraggled"),
       ('C', 1, "frizzled"), 
       ('C', 1, "lethargic"),
       ('C', 1, "polythene"),
       ('C', 1, "epic"),
       ('C', 1, "bedraggled"),
       ('C', 1, "moombahcored"),
       ('C', 1, "zoned"),
       ('C', 1, "erstaz"),
       ('C', 1, "mined"),
       ('C', 1, "liberated"),
       ('C', 2, "frizzled"), 
       ('C', 2, "lethargic"),
       ('C', 2, "polythene"),
       ('C', 2, "epic"),
       ('C', 2, "bedraggled"),
       ('C', 3, "frizzled"), 
       ('C', 3, "lethargic"),
       ('C', 3, "polythene"),
       ('C', 3, "epic"),
       ('C', 3, "bedraggled"),
       ('C', 4, "bedraggled"),
       ('C', 4, "frizzled"), 
       ('C', 4, "lethargic"),
       ('C', 4, "polythene"),
       ('C', 4, "epic"),
       ('C', 5, "frizzled"), 
       ('C', 5, "lethargic"),
       ('C', 5, "polythene"),
       ('C', 5, "epic"),
       ('C', 5, "bedraggled"),
       ('C', 5, "moombahcored")]
labels = ['group', 'subgroup', 'sub-subgroup']
df = pd.DataFrame.from_records(obs, columns=labels)
df.head()

Out[3]:

	group	subgroup	sub-subgroup
0	A	1	frizzled
1	A	1	lethargic
2	A	1	polythene
3	A	1	epic
4	A	2	frizzled

Let's save that dataframe as tabular text and also as a Pickled pickled dataframe. The former being human readable and the latter not. (The latter is more efficient at storage though if that is an issue.)

First to save as tabular text in tab-separated form. You could change it to be comma-separated, CSV, if you choose.

In [4]:

df.to_csv('data.tsv', sep='\t',index = False)

Now to save the pickled dataframe.

In [5]:

df.to_pickle("data.pkl") 

Now that we have files with input data, we have something we can point the script at for running it.

In addition to providing the data input file name, the text corresponding to the column heading of the groupings and the text corresponding to the column containing the subgroups have to be provided when calling the script.

In [6]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

NOTE:
In the example from The Python Graph Gallery shown below the ordering of the subgroups is different.

plot example

To get ordering like in the example here, the script can be called with the --sort_on_subgroup_name flag.

In [7]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup --sort_on_subgroup_name

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

Note with the addition of the --sort_on_subgroup_name the result is like the example in ordering of the subgroups in the the example from The Python Graph Gallery shown above.

The --large_image flag can be added to make the plot figure saved larger.

In [8]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup --sort_on_subgroup_name --large_image 

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot_larger.png

The --leave_off_percent_in_group and --leave_off_total_in_group options can be used to control whether the percent or total show up in the plot labels. For example, putting both flags will leave both off:

In [9]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup --leave_off_percent_in_group --leave_off_total_in_group  --sort_on_subgroup_name 

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

The --advance_color option followed by an integer can be added to the call to the script to advance the colors initially used from the sequential color palette generator. This is meant to try to make it easier to customize the output to a color combination that seems pleasing without needing to edit the code in the script. (If you want to specify your own colors, you can edit the list_of_other_good_sequences and run the with -ac 4. [Alternatively you can edit the color_brewer_seq_names and not advance the color generator, or see this notebook.])

In [10]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup --advance_color 5  --sort_on_subgroup_name 

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

You can change the size of the image file made with the larger setting by adjusting large_img_size under 'USER ADJUSTABLE VALUES' section in the script code.
Additional customization is possible simply editing settings under the 'USER ADJUSTABLE VALUES' section in the script code file.

Use script in a Jupyter or IPython by calling the main function¶

This will demonstrate importing the main function into a Jupyer environment or IPython console.

Note that it gives you a few more options because it exposes more control as you can set whether to include the subplot titles or adjust after generation the size of the plot, etc..

First, we'll use the files for the dataframe and tabular text saved earlier in the example. After that will move on to not using files and instead use Python objects that are in the memory of the notebook.

We will need to import the main function of the script to be active in the running notebook environment. (There is no harm to running it again even if already run in earlier sections.)

In [11]:

from donut_plot_with_subgroups_from_dataframe import donut_plot_with_subgroups_from_dataframe

Now to try using that with the files from earlier in the demonstration.

First, we'll use the tab-separated table.

It is very similar to using the script from the command line. Here, though we have to specify the input type as file or an in-memory dataframe when we call the function. Provide a file name for df_file for using a data file as inout. Then specify the colum with the binary data and the column to use in grouping like the following.

In [12]:

donut_plot_with_subgroups_from_dataframe(df_file="data.tsv",groups_col="group",subgroups_col="subgroup", sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

In [13]:

donut_plot_with_subgroups_from_dataframe(df_file="data.pkl",groups_col="group",subgroups_col="subgroup", sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

However, the function can also take an in-memory dataframe directly. Let's next see a demonstration of that.

To be sure a dataframe is in memory, we'll read in one from the file saved earlier.
You may note that this step is redundant if you are running all these cells in order, and the one made earlier is in memory still; however, I want to be sure all is on the same page before the next steps while emphasizing here the switch from using a file as data to something in the memory of the current notebook.

In [14]:

df = pd.read_pickle("data.pkl")

We can look at the start of that dataframe to verify it is in memory now.

In [15]:

df.head()

Out[15]:

	group	subgroup	sub-subgroup
0	A	1	frizzled
1	A	1	lethargic
2	A	1	polythene
3	A	1	epic
4	A	2	frizzled

In [16]:

x = donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

Note that, similar to how the --sort_on_subgroup_name flag was used when calling the script as you would on the command line, sort_on_subgroup_name=True was also added when calling the function above to better reflect the style of the the example from The Python Graph Gallery shown above. The defatult is sort_on_subgroup_name=False if no setting for sort_on_subgroup_name is specified.

Additionally, in a parallel to the use of --leave_off_percent_in_group or --leave_off_total_in_group, the function can be called with include_percent_in_grp_label=False or include_total_in_grp_label=False or both to control if the percent or total or both are shown in the group label. They default is to show them both if no settings are provided for thises.

The size of the plot can be set a couple of ways when running the script as a function.

About size, the script is run with fig=plt.figure(figsize=(14, 4)) set by the 'default' setting plot_figure_size.

plot_figure_size can be adjusted in the script.
Alternatively, and perhaps easier when working in Jupyter or IPython, the resulting plot can be enlarged after the fact with x.figure.set_size_inches((28, 9)). (Note setting adjusting plot_figure_size in the current version of script to give to fig=plt.figure(figsize=(28, 9))and restarting kernel and running again will also give that.)

The adjusted figure can be saved as shown in the example, too.

In [17]:

import matplotlib.pyplot as plt
#x.figure(figsize=(17, 11)) # Doesn't work
#plt.figure(figsize=(17, 11)) # Doesn't work
x.figure.set_size_inches((20, 14)) #<--see bottom of section at 
# https://nbviewer.jupyter.org/github/fomightez/cl_sq_demo-binder/blob/master/notebooks/Demo%20of%20script%20to%20plot%20nt%20imbalance%20for%20sequence%20span.ipynb#Use-script-in-a-Jupyter-notebook
x.figure.savefig("larger_gen_demo.png")
x.figure

Out[17]:

That way of adjusting size doesn't change settings for all plots. As can be seen by the next cell. However, it changes size of x until it is reassigned.

In [18]:

y = donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

An amount of times to advance the color from the starting options can be specified when calling the function using the advance_color_increments variable assigned to the amount of steps to advance.

In [19]:

b = donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",advance_color_increments=6,sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

If trying different numbers for the advancing increments still fails to help you find a color combination you like, then you can specify your own colors. See this notebook on using custom color lists.
Other options are that you can edit the list_of_other_good_sequences and then run the function the with -ac 4. (Alternatively you can edit the color_brewer_seq_names and not advance the color generator.) To re-import the updated function after editing, the easiest thing to do is restart the kernel in the notebook.

Using a 'High-Low List'¶

Often you won't quite see the default settings produce the shading intensity you'd like for each subgroup.

If that is the case you can provide a list of the subgroups specifying the order. Because this is so important, I have made it an entire section. This will demonstrate it from the command line equivalent and then using the script's main function.

First we need a dataframe that will demonstrate the benefit.

In [20]:

import pandas as pd
sales = [('Jones LLC', 177887, 'yes'),
         ('Jones LLC', 12387, 'yes'),
         ('Jones LLC', 1772287, 'yes'),
         ('Jones LLC', 19897, 'no'),
         ('Jones LLC', 1187, 'no'),
         ('Jones LLC', 1773297, 'maybe'),
         ('Alpha Co', 157987, 'yes'),
         ('Alpha Co', 158981, 'yes'),
         ('Alpha Co', 159983, 'yes'),
         ('Alpha Co', 167987, 'yes'),
         ('Alpha Co', 158117, 'yes'),
         ('Alpha Co', 1999917, 'maybe'),
         ('Alpha Co', 193917, 'maybe'),
         ('Alpha Co', 1933917, 'maybe'),
         ('Alpha Co', 159333, 'no'),
         ('Alpha Co', 256521, 'no'),
         ('Blue Inc', 111947, 'no')]
labels = ['Manufacturer', 'Item', 'In_Stock']
dfh = pd.DataFrame.from_records(sales, columns=labels)
dfh.head()

Out[20]:

	Manufacturer	Item	In_Stock
0	Jones LLC	177887	yes
1	Jones LLC	12387	yes
2	Jones LLC	1772287	yes
3	Jones LLC	19897	no
4	Jones LLC	1187	no

In [21]:

dfh.to_csv('datahl.tsv', sep='\t',index = False)

Now to run that without a High-Low list.

In [22]:

%run donut_plot_with_subgroups_from_dataframe.py datahl.tsv Manufacturer In_Stock

Note: No list to specify high to low intensity coloring provided, and so using 'yes,no,maybe',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

Note that just because of the way the subgroup categories occured in the defining of the dataframe 'maybe' is last in the intensity coloring. Although one would probably want 'no' the least intensely shaded.

This can be fix with the addition of a 'high-low' list.

In [23]:

%run donut_plot_with_subgroups_from_dataframe.py datahl.tsv Manufacturer In_Stock --hilolist yes,maybe,no

Plot image saved to: donut_plot.png

Note that now the 'no' subgroup is the lightest shading in all the groups.
It is important not to place spaces between the commas in the list you provide on the command line.

To do the same with the main function of the script, you provide a Python list as hilolist:

In [24]:

from donut_plot_with_subgroups_from_dataframe import donut_plot_with_subgroups_from_dataframe
h = donut_plot_with_subgroups_from_dataframe(df=dfh, groups_col="Manufacturer",subgroups_col="In_Stock",hilolist=["yes", "maybe", "no"]);

Plot figure object returned.

Note that 'no' subroup is the latest in all the groups with the 'hilolist' provided.
Feel free to remove assigning the 'hilolist' and calling the main function to verify it reverts.

Changing the plot title¶

You'll probably want to make the plot title better reflect your data or delete it. Because sending text from the command line is fraught with issues, I am going to show you ways to specifically edit the script so you can use it form the command line. This will be covered first.

The situation is easier if you are using importing and calling the main function of the script. That will be covered second.

Using the script at the command line or equivalent

I'll provide two ways to do this. The first for advanced command line users and one for those concerned about editing a complex command that has the potential to erase a file.
If you are comfortable with using complex commands on the command line you could edit the text NEW TITLE GOES HERE the following line to change the title.

In [25]:

!sed -i 's/BREAKDOWN/NEW TITLE GOES HERE/g' donut_plot_with_subgroups_from_dataframe.py

The change can be undone with the following:

In [26]:

!sed -i 's/NEW TITLE GOES HERE/BREAKDOWN/g' donut_plot_with_subgroups_from_dataframe.py

If you aren't comfortable with editing that complex command, the following makess a function that can be called with the new title.

In [27]:

script_name = "donut_plot_with_subgroups_from_dataframe.py"
def change_original_title(s):
    '''
    Change the plot title to the provided text.
    '''
    with open(script_name, 'r') as thefile:
        script=thefile.read()
    script = script.replace('BREAKDOWN', s)
    with open(script_name, 'w') as output_file:
        output_file.write(script)

Now to use it. Call the function, placing the new title between the quotes, like so:

In [28]:

change_original_title("NEW TITLE GOES HERE")

You could use the command in the form change_original_title('NEW TITLE GOES HERE') if you needed to provide double-quotes as part of the title.

To verify that worked:

In [29]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup -ssn

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

You can provide a space between the quotes or nothing to make no title display.

The problem with this approach is that you can only use those solutions once as they are written. And the restore command I provided above only works when changing to NEW TITLE GOES HERE which is probably now what you want to use. You may be better off editing the script directly or restore the line plot_title = "BREAKDOWN" back under 'USER ADJUSTABLE VALUES' in the script if you have to change it more than once from the command line.

Calling the main function of the script

More flexibility is available when importing the main function of the script. Here are a couple of examples:

In [30]:

from donut_plot_with_subgroups_from_dataframe import donut_plot_with_subgroups_from_dataframe
m = donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",plot_title = "Changed From Function once",sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

Note that this doesn't change the default. We can run the script, and we'll see the original default still.

In [31]:

%run donut_plot_with_subgroups_from_dataframe.py data.tsv group subgroup --sort_on_subgroup_name

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Look into adding use of the `--hilolist` option to specify the order.


Plot image saved to: donut_plot.png

And calling the function with an assignment for plot_title has the advantage that we can keep changing the title now.

In [32]:

donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",plot_title = "Changed From Function again",sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

To remove the title call the function with include_title=False. What plot_title is assigned doesn't matter if include_title is set to False.

In [33]:

donut_plot_with_subgroups_from_dataframe(df=df, groups_col="group",subgroups_col ="subgroup",include_title=False,sort_on_subgroup_name=True);

Note: No list to specify high to low intensity coloring provided, and so using '1,2,3,4,5',
where leftmost identifer corresponds to most intense and rightmost is least.
Provide a Python list as `hilolist` when calling the function to specify the order of intensity.

Plot figure object returned.

Alternative: Data table form¶

In the case of the examples seen here, the groupings and subgroups are not too many that information would be too heard to track in text form. And so I'll point out the alternative could just be to show this in text form in a dataframe.

I have a script available here that makes nice summary from the same type of data as the donut plots use and I'll demonstrate it in this section using the data used in the section 'Using a 'High-Low List' above, again.

(As always, the dataframe that is made isn't as nicely rendered statically via Github [unlike the donut plots], but is via nbviewer.org.)

In [34]:

import os
file_needed = "df_subgroups_states2summary_df.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/text_mining/master/df_subgroups_states2summary_df.py
%run df_subgroups_states2summary_df.py datahl.tsv Manufacturer In_Stock --bracket_counts
bc = pd.read_pickle("summary_datahl.pkl")
bc

Summary dataframe saved as a text table easily opened in
different software; file named: `summary_datahl.tsv`. This version meant for presenation only.

Summary dataframe saved in pickled form for ease of use within
Python; file named: `summary_datahl.pkl`. This version meant for
presentation only.


**Also saving data table as forms easier to handle for subsequent steps:**
Summary dataframe saved as a text table easily opened in
different software; file named: `summary_basic_datahl.tsv`

Summary dataframe saved in pickled form for ease of use within
Python; file named: `summary_basic_datahl.pkl`. This will retain the column headers/names formatting best.

Out[34]:

	[n]	yes	maybe	no
ALL	17.0	47.06% [8]	29.41% [5]	23.53% [4]
Alpha Co	10.0	50.00% [5]	30.00% [3]	20.00% [2]
Blue Inc	1.0	0.00% [0]	0.00% [0]	100.00% [1]
Jones LLC	6.0	50.00% [3]	33.33% [2]	16.67% [1]

(Note there are many more alternatives this script can produce depending on the arguments when the script is called or the main function used, see here for a link to a demo of that script for more about that.)

That data table an alternative/complement to the donut plot produced above and to make comparison easier I'll reshow that plot in the cell below:

In [35]:

h.figure

Out[35]:

If you'd like to better understand how the underlying code for this script works, see the next one in this series.
If you'd like some options for additional scripts/functions that generate plots that feature summary plots in addition to a plot with subgroups, check out each of the following two notebooks:

Demonstration of donut_plot_with_subgroups_from_dataframe.py¶