You may discuss homework problems with other students, but you have to prepare the written assignments yourself. Late homework will be penalized 10% per day.

Please combine all your answers, the computer code and the figures into one file, and submit a copy in your dropbox on coursework.

Due March 13, 11:59PM.

Grading scheme: 10 points per question, total of 30.

Question 1 (Based on RABE 8.4-8.6)

The file

contains the values of the daily DJIA (Dow Jones Industrial Average) for all the trading days in 1996. The variable Time denotes the trading day of the year. There were 262 trading days in 1996.

  1. Fit a linear regression model connecting DJIA with Time using all 262 trading days in 1996. Is the linear trend model adequate? Examine the residuals for time dependencies, including a plot of the autocorrelation function.

  2. Regress DJIA[t] against its lagged by one version DJIA[t-1]. Is this an adequate model? Are there any evidences of autocorrelation in the residuals?

  3. The variability (volatility) of the daily DJIA is large, and to accomodate this phenomenon the analysis is crried out on the logarithm of the DJIA. Repeat 2. above using log(DJIA) instead of DJIA.

  4. A simplified version of the random walk model of stock prices states that the best prediction of the stock index at Time=t is the value of the index at Time=t − 1. Show that this corresponds to a simple linear regression model for 2. with an intercept of 0 and a slope of 1.

  5. Carry out the the appropriate tests of significance at level α = 0.05 for 4. Test each coefficient separately ($t$-tests) , then test both simultaneously (i.e. an F test).

  6. The random walk theory implies that the first differences of the index (the difference between successive values) should be independently normally distributed with mean zero and constant variance. What kind of plot can be used to visually assess this hypothesis? Provide the plot.

Question 2 (Based on RABE 12.3)

The O-rings in the booster rockets used in space launching play an important part in preventing rockets from exploding. Probabilities of O-ring failures are thought to be related to temperature. The data from 23 flights are given in this file

For each flight we have an indicator of whether or not any O-rings were damaged and the temperature of the launch.

  1. Fit a logistic regression, modeling the probability of having any O-ring failures based on the temperature of the launch. Interpret the coefficients in terms of odds ratios.

  2. From the fitted model, find the probability of an O-ring failure when the temperature at launch was 31 degrees. This was the temperature forecast for the day of the launching of the fatal Challenger flight on January 20, 1986.

  3. Find an approximate 95% confidence interval for the coefficient of temperature in the logistic regression using both the summary and confint. Are the confidence intervals the same? Why or why not?

Question 3 (Based on RABE 12.5)

Table 1.12 of the textbook describes variables in a study of health care in 52 health care facilities in New Mexico in the year 1988. The variables collected are:

Variable Description
RURAL Is hospital in a rural or non-rural area?
BED Number of beds in facility.
MCDAYS Annual medical in-patient days (hundreds).
TDAYS Annual total patient days (hundreds).
PCREV Annual total patient care revenue (\$100).
NSAL Annual nursing salaries (\$100).
FEXP Annual facilities expenditures (\$100).
  1. Using a logistic regression model, test the null hypothesis that the measured covariates have no power to distinguish between rural facilities and than non-rural facilities. Use level $\alpha=0.05$.

  2. Use a model selection technique based on AIC to choose a model that seems to best describe the outcome RURAL based on the measured covariates.

  3. Repeat 2. but using BIC instead. Is the model the same?

  4. Report estimates of the parameters for the variables in your final model. How are these to be interpreted?

  5. Report confidence intervals for the parameters in 4. Do you think you can trust these intervals?

In [1]: