This document will walk through the steps for creating various plots of the flow cytometry data.
Required Files:
SIG_WNV_Flow_Data_Plotting.ipynb
): [Download here]flow_data_plotting_functions.r
: [Download here]** Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [Download R script here]
Required R packages:
All code is available on GitHub: https://github.com/biodev/SIG
There are a number of functions in the accompanying R script (flow_data_cleaning_functions.r) necessary for parsing and then processing the flow cytometry data:
flow_boxplot_data()
:flow_boxplots()
:flow_multiline_plot_data()
:flow_multine_plots()
:flow_heatmap_data()
:flow_heatmap_plot()
:More information on each of these functions is available by calling the describe()
function. For example, the following command will print documentation for the flow_boxplot_data()
function:
describe(flow_boxplot_data)
Remember that, in addition to the help documentation provided with describe()
, you can view the actual function definitions at any time by simply typing the function name without parentheses (e.g. describe
) at the command prompt.
## Load functions for plotting the flow cytometry data
source('./scripts/flow_data_plotting_functions.r')
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: ‘gdata’ The following object is masked from ‘package:stats’: nobs The following object is masked from ‘package:utils’: object.size
## Note: you may have to change the file paths
data_dir = '/Users/mooneymi/Documents/SIG/WNV/Cleaned_Data_Releases/23-Mar-2016/'
## Load data from an Excel spreadsheet (Warning: this can take a few minutes)
flow_full = read.xls(file.path(data_dir, 'Lund_Flow_Full_21-Mar-2016_final.xlsx'),
header=T, as.is=T, na.strings=c(""," ", "NA", "#DIV/0!"))
describe(flow_boxplot_data)
This function aggregates the data needed for creating boxplots of the flow cytometry data. Parameters: flow_df: The dataframe containing the flow data. lines: A numeric vector containing the mouse lines that should be plotted. tissue: A string indicating the tissue (e.g. 'brain' or 'spleen'). flow_vars: A string indicating the flow variable to be plotted. tp: A character vector containing the time points to be included. line_colors: A vector of colors (default=NULL; colors will be determined automatically). mocks_only: Logical value indicating whether mocks only will be plotted. data_type: A number indicating whether percentages (1) or absolute cell counts (2) will be plotted. Returns: A list containing the data to be plotted; input for the flow_boxplots() function.
## Aggregate the data for boxplots
boxplot_data = flow_boxplot_data(flow_full, c(7,8,9), 'brain', 'treg_T_regs', c('7','12','21','28'))
## Create a list of additional options for the boxplot
opts = list(rm_outliers=F, show_data=F, y_min=0, y_max=60)
## Create the boxplot (the 'cex' parameter controls the size of the x-axis text)
bp = flow_boxplots(c(boxplot_data, opts), cex=0.7)
The plotting functions used above have been adapted for a Shiny app: http://church.ohsu.edu:3838/mooneymi/wnv_flow_boxplots/
describe(flow_multiline_plot_data)
This function aggregates the data needed for creating time-series plots of the flow cytometry data. Parameters: flow_df: The dataframe containing the flow data. uw_lines: A numeric vector containing the mouse lines to be plotted. tissue: A string indicating the tissue (e.g. 'brain' or 'spleen'). flow_vars: A character vector containing the variables to be plotted. plot_type: A number indicating whether the plot will compare lines (1) or compare variables (2). FUN: A function that can be applied to transform the data (default=NULL). Returns: A list containing the data to be plotted; input for the flow_multiline_plots() function.
lineplot_data = flow_multiline_plot_data(flow_full, c(30,8,36,38), 'brain', 'treg_T_regs_count', 1)
## Create a list of additional options for the lineplot
## data_type values: 1 = percentages, 2 = cell counts, 3 = percent ratio, 4 = count ratio
opts2 = list(data_type=2, y_min=NA, y_max=NA)
## Create a lineplot that compares a single variable across multiple lines
lp = flow_multiline_plots(c(lineplot_data, opts2))
lineplot_data2 = flow_multiline_plot_data(flow_full, c(9), 'brain', c('treg_T_regs', 'tcell_d7_CD8'), 2)
## Create a list of additional options for the lineplot
## data_type values: 1 = percentages, 2 = cell counts, 3 = percent ratio, 4 = count ratio
opts3 = list(data_type=1, y_min=0, y_max=50)
## Create a lineplot that compares multiple variables for a single line
lp2 = flow_multiline_plots(c(lineplot_data2, opts3))
The plotting functions used above have been adapted for a Shiny app: http://church.ohsu.edu:3838/mooneymi/wnv_flow_lineplots/
These heatmaps are annotated with weight loss, clinical scores, and heritability estimates. This data must be loaded before calling the functions for aggregating the data and plotting. If you want to exclude these annotations, you can skip the next code block and supply the 'annotations=FALSE'
option to the plotting functions (an example is below).
## Load weight, clinical score, and heritability data from the latest data release
## Note: you may have to change the file paths
weights = read.xls(file.path(data_dir, 'Lund_Weight_22-Mar-2016_final.xlsx'),
header=T, as.is=T, na.strings=c(""," ", "NA", "#DIV/0!"))
scores = read.xls(file.path(data_dir, 'Lund_Scores_22-Mar-2016_final.xlsx'),
header=T, as.is=T, na.strings=c(""," ", "NA", "#DIV/0!"))
heritability = read.xls(file.path(data_dir, 'Lund_Flow_Heritability_21-Mar-2016_final.xlsx'),
header=T, as.is=T, na.strings=c(""," ", "NA", "#DIV/0!"))
## Set the rownames of the heritability dataframe
rownames(heritability) = heritability$variable
describe(flow_heatmap_data)
This function aggregates the data needed for creating heatmaps of the flow cytometry data. Parameters: flow_df: The dataframe containing the flow data. lines: A numeric vector containing the lines to be plotted. line_labels: Alternate labels for the lines tissue: A string indicating the tissue (e.g. 'brain' or 'spleen'). flow_vars: A character vector containing the variables to be plotted. var_labels: Alternate labels for the flow variables tp: A character vector containing the timepoints to be plotted. herit_df: The dataframe containing the heritability estimates (default=NULL); line_colors: A vector of color values (default=NULL; colors will be determined automatically). mocks_only: A logical indicating whether mocks only should be plotted. collapse_mocks: A logical indicating whether to combine the two mock timepoints. no_cluster: A logical indicating whether variables should be clustered. cluster_all: A logical indicating whether the variables should be clustered using all the data. annotations: A logical indicating whether annotations should be added to the heatmap (default=TRUE). Returns: A list containing the data to be plotted; input for the flow_heatmap_plot() function.
describe(flow_heatmap_plot)
This function plots the data returned by flow_heatmap_data(). Parameters: hm_data: This should be a list returned by flow_heatmap_data(). weights_df: The dataframe containing the weight loss data (default=NULL). clinical_df: The dataframe containing the clinical score data (default=NULL). weight_cols: A character vector containing the columns names of the weight measurements (default=weight_percents). cs_cols: A character vector containing the columns names of the clinical scores (default=cs_columns). annotations: A logical indicating whether annotations should be added to the heatmap (default=TRUE). Returns: NULL Examples: flow_heatmap_plot(heatmap_data, weights, scores)
## Heatmap with custom labels, mocks collapsed, and no heritability annotations
heatmap_data = flow_heatmap_data(flow_full, lines=c(11,12,14,30,8,36,38),
line_labels=c('CC(017x004)F1','CC(011x042)F1','CC(032x017)F1','CC(032x013)F1','CC(005x001)F1','CC(061x026)F1','CC(016x038)F1'),
tissue='brain',
flow_vars=c('treg_T_regs', 'tcell_d7_CD3', 'tcell_d7_CD4', 'tcell_d7_CD8'),
var_labels=c('Tregs', 'CD3+ Tcell', 'CD4+ Tcell', 'CD8+ Tcell'),
tp=c('7','12','21','28'), collapse_mocks=T)
## Create the heatmap
hm = flow_heatmap_plot(heatmap_data, weights, scores, collapse_mocks=T, annotations=T)
## The heatmap without any annotations
heatmap_data2 = flow_heatmap_data(flow_full, lines=c(7,8,9), tissue='brain',
flow_vars=c('treg_T_regs', 'tcell_d7_CD3', 'tcell_d7_CD4', 'tcell_d7_CD8'),
tp=c('7','12','21','28'), annotations=F)
hm2 = flow_heatmap_plot(heatmap_data2, annotations=F)
The plotting functions used above have been adapted for a Shiny app: http://church.ohsu.edu:3838/mooneymi/wnv_flow_heatmaps/