R / Bioconductor 2 /GEO Database: GSE46268 vitamin D

Biochemistry laboratories - 201

Jean-Yves Sgro [email protected]

Find this document here (short URL) today: http://go.wisc.edu/km26g1

Finding R or Rstudio on the iMac

  • Click on the "Finder" at the bottom right of the "Dock" on the bottom of the screen.
  • This will open a new window.
  • On the left click on "Applications"
  • In the alphabetical list choose "R" or "R.app" for using R or RStudio or RStudio.app for using RStudio.
  • double click toopen the R program

R/Bioconductor exercises files

HTML Hand-outs are at:

Note: other formats and other tutorials are at https://biochem.wisc.edu/bcrf/tutorials

R packages

This module requires R packages from both

  • CRAN and
  • Bioconductor

CRAN Packages

Installed with

install.packages("knitr")
install.packages("rmarkdown")

Bioconductor

First download the helper command biocLite.R and then install packages.

source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
biocLite("limma")
biocLite("Biobase")
biocLite("affy")

NOTE If you were not here in previous Bioconductor session(s) you may have ot add the base Bioconductor packages with:

source("http://bioconductor.org/biocLite.R")
biocLite()

knitr

knitr is the package that helps document comptational research.

To make recalculations faster we can engage a "caching" method by inserting the follwoing code within the .RMD R Markdown document:

    ``` {r global_options_settings, include=TRUE, echo=FALSE}
      # Global options:
      opts_chunk$set(warning=FALSE, message=FALSE, comment="", cache=TRUE)
   ```

Affymetrix arrays

Data are microarray data from Affymetrix U133 GeneChips.

Source: http://www.oceanridgebio.com/images/system_rev_630.jpg

GEO database

The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community.

GEO database: data

Data type Description
GEO Platform (GPL) These files describe a particular type of microarray. They are annotation files.
GEO Sample (GSM) Files that contain all the data from the use of a single chip. For each gene there will be multiple scores including the main one, held in the VALUE column.
GEO Series (GSE) Lists of GSM files that together form a single experiment.
GEO Dataset (GDS) These are curated files that hold a summarized combination of a GSE file and its GSM files. They contain normalized expression levels for each gene from each sample (i.e. just the VALUE field from the GSM file).

Today: we'll use a GSE entry that contains multiple samples (each would be a GSM)

GEO database: Formats

Format name Format
SOFT Simple Omnibus Format in Text.
MINiML (MIAME Notation in Markup Language - XML format
Matrix spreadsheet containing the final, normalized values that are comparable across rows and Samples

Today's dataset: GSE46268

Paper All-Trans Retinoic Acid−Triggered Antimicrobial Activity against Mycobacterium tuberculosis Is Dependent on NPC2 Matthew Wheelwright, Elliot W. Kim, Megan S. Inkeles, Avelino De Leon, Matteo Pellegrini, Stephan R. Krutzik and Philip T. Liu J Immunol 2014; 192:2280-2290; Prepublished online 5 February 2014; doi: 10.4049/jimmunol.1301686 http://www.jimmunol.org/content/192/5/2280

Web link:

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46268

Acquire in R:

gset <- getGEO("GSE46268", GSEMatrix =TRUE)

GEO2R

We'll explore the GEO2R script created on the web site and add a few more plots.

http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE46268

My challenge....

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

My challenge was: why do we write gset <- gset[[idx]]?

This is why we start by a mini exercise with "lists" as this becomes gset <- gset[[1]] if there is only one dataset.

In [11]:
# Create list L
L  <- list(vn=c(2,3,5), vc=c("sun", "moons"))
# Print list L
L
class(L)
Out[11]:
$vn
  1. 2
  2. 3
  3. 5
$vc
  1. 'sun'
  2. 'moons'
Out[11]:
'list'
In [10]:
# Print first item and class of list L
L[1]

# First element of list: [1]
class(L[1])
# First element of first element [[1]]
class(L[[1]])
Out[10]:
$vn =
  1. 2
  2. 3
  3. 5
Out[10]:
'list'
Out[10]:
'numeric'

Get the MSWord files:

After class:

  • fill-in the one page "evaluation" form for this class
  • a link to the MSWord (.docx) file is provided at the end of the 1 page survey

The evaluation is anonymous. Click or type the short URL: http://go.wisc.edu/61c0pc

Note: Survey will be unlocked when workshops are held.