In [2]:
height <- c(58,59,60,61,62,63,64,65,66, 67, 68, 69, 70, 71, 72)
weight <- c(115,117,120,123,126,129,132,135,139, 142, 146, 150, 154, 159, 164)

In [3]:
htwtmatrix = matrix(c(height,weight),15,2) # what do 15 and 2 refer to?

In [4]:
print(htwtmatrix)
dim(htwtmatrix)
is.array(htwtmatrix)
# even its a matrix it will return TRUE, because matrix also a type of array in 2 dimentional

      [,1] [,2]
[1,]   58  115
[2,]   59  117
[3,]   60  120
[4,]   61  123
[5,]   62  126
[6,]   63  129
[7,]   64  132
[8,]   65  135
[9,]   66  139
[10,]   67  142
[11,]   68  146
[12,]   69  150
[13,]   70  154
[14,]   71  159
[15,]   72  164

1. 15
2. 2
TRUE
In [5]:
# to assign names for each column first convert it into a dataframe
htwtdata = data.frame(htwtmatrix)  # as.dataframe is also works well here
names(htwtdata) = c("height", "weight")
# here we used names() function to assign the names for our dataframe
names(htwtdata)  # here we are extracting the names , names function can be used to set or get the names

1. 'height'
2. 'weight'
In [ ]:
# Let us see how R operates on matrices, and how that compares to data frames
htwtmatrix * 2  # multiplying with 2 gives us result of multiplication on each value
htwtmatrix[, 1]/12  # convert height in inches to feet
mean(htwtmatrix[, 2])  # find mean of weight

In [7]:
dim(htwtdata)
nrow(htwtdata)
str(htwtdata)
summary(htwtdata)

1. 15
2. 2
15
'data.frame':	15 obs. of  2 variables:
$height: num 58 59 60 61 62 63 64 65 66 67 ...$ weight: num  115 117 120 123 126 129 132 135 139 142 ...

     height         weight
Min.   :58.0   Min.   :115.0
1st Qu.:61.5   1st Qu.:124.5
Median :65.0   Median :135.0
Mean   :65.0   Mean   :136.7
3rd Qu.:68.5   3rd Qu.:148.0
Max.   :72.0   Max.   :164.0  
In [8]:
htwtdata[,2]*703/htwtdata[,1]^2

1. 24.032401902497
2. 23.6285550129273
3. 23.4333333333333
4. 23.2381080354743
5. 23.0431841831426
6. 22.848828420257
7. 22.6552734375
8. 22.4627218934911
9. 22.4327364554637
10. 22.2379149030965
11. 22.1967993079585
12. 22.1487082545684
13. 22.0942857142857
14. 22.1735766712954
15. 22.2399691358025
In [9]:
# How would you get R to give you the height and weight of the 8th student in the
# data set? The 8th and 10th student?
names(htwtdata)
htwtdata[8,1]  # 8th student height
htwtdata[8,2]  #1 0th student height
htwtdata[10,1]  # 10th student height
htwtdata[10,2]  # 10th student weight

1. 'height'
2. 'weight'
65
135
67
142
heightweight
58 115
59 117
60 120
61 123
62 126
63 129
64 132
65 135
66 139
67 142

# Loops¶

## If/else statements¶

In R, one can write a conditional statement as follows:

ifelse(condition on data, true value returned, false returned)


The above expression reads: if condition on the data is true, then do the true value assigned; otherwise execute the "false value."

In [10]:
ifelse(3 > 4, x <- 5, x <- 6)
print(x)

ifelse(4 > 3, x <- 5, x <- 6)
print(x)

6
[1] 6

5
[1] 5


### Usage of operators like & & , | | , &, |¶

In [11]:
hmean =  mean(htwtdata$height) wmean = mean(htwtdata$weight)
?cat
cat("mean height=",hmean,"\n","mean weight=",wmean)

mean height= 65
mean weight= 136.7333

The operators && and || are often used to denote multiple conditions in an if statement. Whereas &(and) and |(or) apply element-wise to vectors, && and || apply to vectors of length one, and only evaluate their second argument in the sequence if necessary. Thus it is important to remember which logical operator to use in which situation.

In [12]:
ifelse( hmean > 61 && wmean > 120, x <- 5, x <- 6) # multiple conditions in an if statement

5
In [13]:
htwt_cat<-ifelse (height>=70 | weight>159, "high", "low") # apply element-wise to vectors
print(htwt_cat)
is.vector(htwt_cat)

 [1] "low"  "low"  "low"  "low"  "low"  "low"  "low"  "low"  "low"  "low"
[11] "low"  "low"  "high" "high" "high"

TRUE
In [14]:
#htwtdata1<-head(htwtdata,6)
cbind(htwtdata[c(1:3,c(13:15)),],htwt_cat[c(1:3,c(13:15))])
#print(htwtdata1[1:3])

heightweighthtwt_cat[c(1:3, c(13:15))]
158 115 low
259 117 low
360 120 low
1370 154 high
1471 159 high
1572 164 high
In [15]:
htwt_cat[1:6]

1. 'low'
2. 'low'
3. 'low'
4. 'low'
5. 'low'
6. 'low'
In [16]:
htwt_cat <- ifelse(height > 67 || weight > 150, "high", "low")
htwt_cat
# Notice that in the above ifelse statement only the first element in the series was computed.
htwt_cat <- ifelse(height > 57 || weight > 110, "high", "low")
htwt_cat

'low'
'high'
In [17]:
#This can also be extended to include multiple conditions.  Suppose we have the following data:

final_score<- c(39, 51, 60, 65, 72, 78, 79, 83, 85, 85, 87, 89, 91, 95, 96, 97, 100, 100)

passfail<-ifelse(final_score>=60, "pass", "fail")
passfail

1. 'fail'
2. 'fail'
3. 'pass'
4. 'pass'
5. 'pass'
6. 'pass'
7. 'pass'
8. 'pass'
9. 'pass'
10. 'pass'
11. 'pass'
12. 'pass'
13. 'pass'
14. 'pass'
15. 'pass'
16. 'pass'
17. 'pass'
18. 'pass'

# Suppose we want to create a variable called grades that is assigned as follows:¶

"F" if final_score <60

"D" if 60≤final_score<70

"C" if 70≤final_score<80

"B" if 80≤final_score<90

"A" if 90≤final_score


### Nested ifelse Statements¶

We can use a "nested" ifelse command as follows:

In [18]:
grade <- ifelse(final_score < 60, "F", ifelse(final_score < 70, "D", ifelse(final_score <
80, "C", ifelse(final_score < 90, "B", "A"))))


1. 'F'
2. 'F'
3. 'D'
4. 'D'
5. 'C'
6. 'C'
7. 'C'
8. 'B'
9. 'B'
10. 'B'
11. 'B'
12. 'B'
13. 'A'
14. 'A'
15. 'A'
16. 'A'
17. 'A'
18. 'A'

The logic by which this will assign grades is depicted in the figure below.

#if we want to resize the image we can use the below code

### Repetitive Execution: "for" loops, "repeat" and "while"¶

syntax of for loop is:

for (name in expr_1) expr_2


here name is the loop variable, expr_1 is a vector expression, (often a sequence like 1:20), and expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr_2 is repeatedly evaluated as name ranges through the values in the vector result of expr_1.

In [19]:
# let's take airquality dataset which is the Daily air quality measurements in
# New York, May to September 1973. for the details use
?airquality

In [20]:
# we want to figure out which days were good air quality days (1) or bad air
# quality (0), based on a cutoff of ozone levels above 60ppb.
numdays <- nrow(airquality)
print(numdays)

[1] 153

In [21]:
# creates an object which will store the vector
goodair <- numeric(numdays)
print(goodair)

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0

for(i in 1:numdays)
if (airquality$Ozone[i] > 60) goodair[i] = 0 else goodair[i] = 1  ## (Notice that we have an if statement here within a for loop.)¶ In [ ]: #Does the command above work? Why/why not? #Let's check the Ozone variable. What do you notice below? airquality$Ozone

In [23]:
# When there are missing values, many operations in R fail. One way to get around
# this is to create a new data frame that deletes all the rows corresponding to
# observations with missing rows. This can be done by means of the command
# 'na.omit'
airqualfull = na.omit(airquality)

In [24]:
dim(airqualfull)
dim(airquality)

1. 111
2. 6
1. 153
2. 6
In [25]:
#Now let's try doing this again with the data with the complete cases.
numdays = nrow(airqualfull)
numdays
print(numdays)

111
[1] 111

In [26]:
goodair = numeric(numdays)       # initialize the vector

In [27]:
for(i in 1:numdays)

if (airqualfull$Ozone[i] >60) goodair[i] = 0 else goodair[i] = 1  In [28]: goodair  1. 1 2. 1 3. 1 4. 1 5. 1 6. 1 7. 1 8. 1 9. 1 10. 1 11. 1 12. 1 13. 1 14. 1 15. 1 16. 1 17. 1 18. 1 19. 1 20. 1 21. 1 22. 1 23. 0 24. 1 25. 1 26. 0 27. 1 28. 1 29. 1 30. 1 31. 1 32. 1 33. 1 34. 0 35. 1 36. 1 37. 0 38. 1 39. 0 40. 0 41. 0 42. 0 43. 1 44. 1 45. 1 46. 1 47. 1 48. 0 49. 0 50. 0 51. 1 52. 0 53. 0 54. 1 55. 1 56. 0 57. 1 58. 0 59. 1 60. 1 61. 1 62. 1 63. 0 64. 0 65. 0 66. 1 67. 1 68. 0 69. 1 70. 1 71. 1 72. 1 73. 1 74. 1 75. 1 76. 1 77. 0 78. 0 79. 0 80. 0 81. 0 82. 0 83. 0 84. 0 85. 0 86. 0 87. 1 88. 1 89. 1 90. 1 91. 1 92. 1 93. 1 94. 1 95. 1 96. 1 97. 1 98. 1 99. 1 100. 1 101. 1 102. 1 103. 1 104. 1 105. 1 106. 1 107. 1 108. 1 109. 1 110. 1 111. 1 In [29]: # At this point we might be interested in which days were the ones with good air # quality. The 'which' command returns a set of indices corresponding to the # condition specified. We can then use the indices to find the day of the month # this corresponds to which(goodair == 1) ## notice the double "=" signs!  1. 1 2. 2 3. 3 4. 4 5. 5 6. 6 7. 7 8. 8 9. 9 10. 10 11. 11 12. 12 13. 13 14. 14 15. 15 16. 16 17. 17 18. 18 19. 19 20. 20 21. 21 22. 22 23. 24 24. 25 25. 27 26. 28 27. 29 28. 30 29. 31 30. 32 31. 33 32. 35 33. 36 34. 38 35. 43 36. 44 37. 45 38. 46 39. 47 40. 51 41. 54 42. 55 43. 57 44. 59 45. 60 46. 61 47. 62 48. 66 49. 67 50. 69 51. 70 52. 71 53. 72 54. 73 55. 74 56. 75 57. 76 58. 87 59. 88 60. 89 61. 90 62. 91 63. 92 64. 93 65. 94 66. 95 67. 96 68. 97 69. 98 70. 99 71. 100 72. 101 73. 102 74. 103 75. 104 76. 105 77. 106 78. 107 79. 108 80. 109 81. 110 82. 111 In [30]: goodindices <- which(goodair == 1) airqualfull[goodindices,]  OzoneSolar.RWindTempMonthDay 141 190 7.467 5 1 236 118 8.072 5 2 312 149 12.674 5 3 418 313 11.562 5 4 723 299 8.665 5 7 819 99 13.859 5 8 9 8 19 20.161 5 9 1216 256 9.769 5 12 1311 290 9.266 5 13 1414 274 10.968 5 14 1518 65 13.258 5 15 1614 334 11.564 5 16 1734 307 12.066 5 17 18 6 78 18.457 5 18 1930 322 11.568 5 19 2011 44 9.762 5 20 21 1 8 9.759 5 21 2211 320 16.673 5 22 23 4 25 9.761 5 23 2432 92 12.061 5 24 2823 13 12.067 5 28 2945 252 14.981 5 29 3137 279 7.476 5 31 3829 127 9.782 6 7 4139 323 11.587 6 10 4423 148 8.082 6 13 4721 191 14.977 6 16 4837 284 20.772 6 17 4920 37 9.265 6 18 5012 120 11.573 6 19 ..................... 11131 244 10.978 8 19 11244 190 10.378 8 20 11321 259 15.577 8 21 114 9 36 14.372 8 22 11645 212 9.779 8 24 12847 95 7.487 9 5 12932 92 15.584 9 6 13020 252 10.980 9 7 13123 220 10.378 9 8 13221 230 10.975 9 9 13324 259 9.773 9 10 13444 236 14.981 9 11 13521 259 15.576 9 12 13628 238 6.377 9 13 137 9 24 10.971 9 14 13813 112 11.571 9 15 13946 237 6.978 9 16 14018 224 13.867 9 17 14113 27 10.376 9 18 14224 238 10.368 9 19 14316 201 8.082 9 20 14413 238 12.664 9 21 14523 14 9.271 9 22 14636 139 10.381 9 23 147 7 49 10.369 9 24 14814 20 16.663 9 25 14930 193 6.970 9 26 15114 191 14.375 9 28 15218 131 8.076 9 29 15320 223 11.568 9 30 Suppose we want to define a day with good quality air as one with ozone levels below 60ppb, and temperatures less than 80 degrees F. Write an R loop to do this, and output the resulting subset of the data to a file called goodquality.txt. (Hint: use an ifelse() statement inside the for loop.) In [31]: airquality$Temp

1. 67
2. 72
3. 74
4. 62
5. 56
6. 66
7. 65
8. 59
9. 61
10. 69
11. 74
12. 69
13. 66
14. 68
15. 58
16. 64
17. 66
18. 57
19. 68
20. 62
21. 59
22. 73
23. 61
24. 61
25. 57
26. 58
27. 57
28. 67
29. 81
30. 79
31. 76
32. 78
33. 74
34. 67
35. 84
36. 85
37. 79
38. 82
39. 87
40. 90
41. 87
42. 93
43. 92
44. 82
45. 80
46. 79
47. 77
48. 72
49. 65
50. 73
51. 76
52. 77
53. 76
54. 76
55. 76
56. 75
57. 78
58. 73
59. 80
60. 77
61. 83
62. 84
63. 85
64. 81
65. 84
66. 83
67. 83
68. 88
69. 92
70. 92
71. 89
72. 82
73. 73
74. 81
75. 91
76. 80
77. 81
78. 82
79. 84
80. 87
81. 85
82. 74
83. 81
84. 82
85. 86
86. 85
87. 82
88. 86
89. 88
90. 86
91. 83
92. 81
93. 81
94. 81
95. 82
96. 86
97. 85
98. 87
99. 89
100. 90
101. 90
102. 92
103. 86
104. 86
105. 82
106. 80
107. 79
108. 77
109. 79
110. 76
111. 78
112. 78
113. 77
114. 72
115. 75
116. 79
117. 81
118. 86
119. 88
120. 97
121. 94
122. 96
123. 94
124. 91
125. 92
126. 93
127. 93
128. 87
129. 84
130. 80
131. 78
132. 75
133. 73
134. 81
135. 76
136. 77
137. 71
138. 71
139. 78
140. 67
141. 76
142. 68
143. 82
144. 64
145. 71
146. 81
147. 69
148. 63
149. 70
150. 77
151. 75
152. 76
153. 68
for(i in 1:numdays1)
ifelse(airquality$Ozone[i] <60 & airquality$Temp<80,goodair1[i] = 1,goodair1[i] = 0)
#here it's not working because of "=" sign , let's change to different assignment operator <-

In [32]:
for(i in 1:numdays)
ifelse(airquality$Ozone[i] <60 && airquality$Temp<80,goodair[i] <- 1,goodair[i] <- 0)

In [33]:
goodindices1 <-  which(goodair == 1)
#airquality[goodindices1,]
print(goodindices1)

 [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
[20]  20  21  22  23  24  25  27  28  29  31  32  33  35  36  38  41  43  44  45
[39]  46  47  48  49  50  51  54  55  57  59  60  61  63  64  67  72  73  74  75
[58]  76  77  78  82  87  88  90  92  93  94  95  97 102 103 104 105 107 108 109
[77] 110 111

In [34]:
airquality[goodindices1,]

OzoneSolar.RWindTempMonthDay
141 190 7.467 5 1
236 118 8.072 5 2
312 149 12.674 5 3
418 313 11.562 5 4
5NA NA 14.356 5 5
628 NA 14.966 5 6
723 299 8.665 5 7
819 99 13.859 5 8
9 8 19 20.161 5 9
10NA 194 8.669 5 10
11 7 NA 6.974 5 11
1216 256 9.769 5 12
1311 290 9.266 5 13
1414 274 10.968 5 14
1518 65 13.258 5 15
1614 334 11.564 5 16
1734 307 12.066 5 17
18 6 78 18.457 5 18
1930 322 11.568 5 19
2011 44 9.762 5 20
21 1 8 9.759 5 21
2211 320 16.673 5 22
23 4 25 9.761 5 23
2432 92 12.061 5 24
25NA 66 16.657 5 25
27NA NA 8.057 5 27
2823 13 12.067 5 28
2945 252 14.981 5 29
3137 279 7.476 5 31
32NA 286 8.678 6 1
.....................
60NA 31 14.977 6 29
61NA 138 8.083 6 30
6349 248 9.285 7 2
6432 236 9.281 7 3
6740 314 10.983 7 6
72NA 139 8.682 7 11
7310 264 14.373 7 12
7427 175 14.981 7 13
75NA 291 14.991 7 14
76 7 48 14.380 7 15
7748 260 6.981 7 16
7835 274 10.382 7 17
8216 7 6.974 7 21
8720 81 8.682 7 26
8852 82 12.086 7 27
9050 275 7.486 7 29
9259 254 9.281 7 31
9339 83 6.981 8 1
94 9 24 13.881 8 2
9516 77 7.482 8 3
9735 NA 7.485 8 5
102NA 222 8.692 8 10
103NA 137 11.586 8 11
10444 192 11.586 8 12
10528 273 11.582 8 13
107NA 64 11.579 8 15
10822 71 10.377 8 16
10959 51 6.379 8 17
11023 115 7.476 8 18
11131 244 10.978 8 19
In [35]:
#export and save the result into working directory with file name as goodquality.txt
write.table(airquality[goodindices1,], "goodquality.txt", sep=",")

In [36]:
# check whether the file exported into working directory
list.files()

1. 'Data Science Process.pdf'
2. 'goodquality.txt'
4. 'sphweb_Stats.ipynb'
In [37]:
#open the file and check the data
file.edit('goodquality.txt')


### To open a pdf file from working directory using package from bioconductor ( general topic)¶

Use the biocLite.R script to install Bioconductor packages.
To install core packages, type the following in an R command window:

source("https://bioconductor.org/biocLite.R")
biocLite("Biobase")
#to install packages from bioconductor
#try http:// if https:// URLs are not supported

In [38]:
library(Biobase)
openPDF("Data Science Process.pdf")

Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

The following objects are masked from 'package:base':

anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit

Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.


TRUE

### While Statements & REPEAT¶

Similar to a loop function, the while statement can be used to perform an operation while a given condition is true. For example:

In [39]:
z <- 0

while (z < 5) {

z <- z + 2

print(z)

}

[1] 2
[1] 4
[1] 6


In the above while statement we initiate z to have a value of 0. We then state that as long as z is less than 5 we will continue to perform the following loop operation z<-z+2. Thus we have

z <- 0+2  ##Initially z is 0, but after the first iteration of the loop the value of z is 2

z <- 2+2  ## After the second iteration the value of z is 4

z <- 4+2  ## After the third iteration the value of z is 6


The while statement stops here because now z is now bigger than 5.

### repeat exercies¶

In [40]:
#Another option for looping is the repeat function. An example follows:
i<-1

repeat{

print(i)

if( i == 15) break

i<-i+1

}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15

In [41]:
x <- 1
repeat{
print(x)
x <- x+1
if (x == 6){
break
}

}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In [42]:
# For the first exercise, write a repeat{} loop that prints all the even numbers
# from 2 <U+0096> 10, via incrementing the variable, <U+0093>i <- 0<U+0093>.
i <- 0
repeat {
i <- i + 2
print(i)
if (i == 10)
break
}

[1] 2
[1] 4
[1] 6
[1] 8
[1] 10


# Using the following variables:¶

msg <- c('Hello')
i <- 1


Write a repeat{} loop that breaks off the incrementation of, i, after 5 loops, andz prints msg at every increment.

In [43]:
msg <- c("Hello")
i <- 1

repeat {
i <- i + 1
print(msg)
if (i == 5) {break}
}

[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"

In [44]:
msg <- c('Hello')
i <- 1
repeat{
print(msg)
i <- i +1
if (i == 5)break
}

[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"


### while exercises¶

In [45]:
#With, i <- 1, write a while() loop that prints the odd numbers from 1 through 7.
i <- 1
while(i<=7){
print(i)
i <- i +2
}

[1] 1
[1] 3
[1] 5
[1] 7

In [46]:
#Write a while() loop that increments the variable, “i“, 6 times, and prints “msg” at every iteration.
i <- 1
while(i<=6){
print(msg)
i <- i +1
}

[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"


### for loop exercises¶

#### examples:¶

for(i in 1:4) {
print("variable"[i])
}

In [53]:
a <- c(15,23,78,45,124,82,75)
for (i in 1:4){
print(a[i])
}

[1] 15
[1] 23
[1] 78
[1] 45

for(i in seq("variable")) {
print(i)
}

In [54]:
for (i in seq(a)){
print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7

for(i in seq_along("variable")) {
print("variable"[i])
}

In [55]:
for (i in seq_along(a)){
print(a[i])
}

[1] 15
[1] 23
[1] 78
[1] 45
[1] 124
[1] 82
[1] 75

for(letter in "variable") {
print(letter)

In [60]:
a <- "Last Checkpoint: 16 hours ago (unsaved changes)"
for (letter in 2){
print(letter)
}

[1] 2

In [ ]:
# Create a vector filled with random normal values
u1 <- rnorm(30)
print("This loop calculates the square of the first 10 elements of vector u1")

# Initialize usq
usq <- 0

for(i in 1:10) {
# i-th element of u1 squared into i-th position of usq
usq[i] <- u1[i]*u1[i]
print(usq[i])
}

print(i)