R / Bioconductor 2 /GEO Database: GSE46268 vitamin D

Biochemistry laboratories - 201

R/Bioconductor exercises files

HTML Hand-outs are at:

R packages

This module requires R packages from both

  • CRAN and
  • Bioconductor

CRAN Packages

Installed with



First download the helper command biocLite.R and then install packages.


NOTE If you were not here in previous Bioconductor session(s) you may have ot add the base Bioconductor packages with:



knitr is the package that helps document comptational research.

To make recalculations faster we can engage a "caching" method by inserting the follwoing code within the .RMD R Markdown document:

    ``` {r global_options_settings, include=TRUE, echo=FALSE}
      # Global options:
      opts_chunk$set(warning=FALSE, message=FALSE, comment="", cache=TRUE)

Affymetrix arrays

Data are microarray data from Affymetrix U133 GeneChips.

Source: http://www.oceanridgebio.com/images/system_rev_630.jpg

GEO database

The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community.

GEO database: data

Data type Description
GEO Platform (GPL) These files describe a particular type of microarray. They are annotation files.
GEO Sample (GSM) Files that contain all the data from the use of a single chip. For each gene there will be multiple scores including the main one, held in the VALUE column.
GEO Series (GSE) Lists of GSM files that together form a single experiment.
GEO Dataset (GDS) These are curated files that hold a summarized combination of a GSE file and its GSM files. They contain normalized expression levels for each gene from each sample (i.e. just the VALUE field from the GSM file).

Today: we'll use a GSE entry that contains multiple samples (each would be a GSM)

GEO database: Formats

Format name Format
SOFT Simple Omnibus Format in Text.
MINiML (MIAME Notation in Markup Language - XML format
Matrix spreadsheet containing the final, normalized values that are comparable across rows and Samples

Today's dataset: GSE46268

Paper All-Trans Retinoic Acid−Triggered Antimicrobial Activity against Mycobacterium tuberculosis Is Dependent on NPC2 Matthew Wheelwright, Elliot W. Kim, Megan S. Inkeles, Avelino De Leon, Matteo Pellegrini, Stephan R. Krutzik and Philip T. Liu J Immunol 2014; 192:2280-2290; Prepublished online 5 February 2014; doi: 10.4049/jimmunol.1301686 http://www.jimmunol.org/content/192/5/2280

Web link:


Acquire in R:

gset <- getGEO("GSE46268", GSEMatrix =TRUE)


We'll explore the GEO2R script created on the web site and add a few more plots.


My challenge....

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

My challenge was: why do we write gset <- gset[[idx]]?

This is why we start by a mini exercise with "lists" as this becomes gset <- gset[[1]] if there is only one dataset.

# Create list L
L  <- list(vn=c(2,3,5), vc=c("sun", "moons"))
# Print list L
  1. 2
  2. 3
  3. 5
  1. 'sun'
  2. 'moons'
# Print first item and class of list L

# First element of list: [1]
# First element of first element [[1]]
  1. 2
  2. 3
  3. 5

