ST505/ST697R, Regression Analysis and Regression Modelling. Fall 2012

NEWEST STUFF/REMINDERS

DEC. 7 HOMEWORK 9 SOLUTION POSTED (PROBLEMS 1 AND 3 COVER;
RELEVANT MATERIAL FOR EVERYONE)

FINAL EXAM: Monday, Dec. 10 1:30 pm in LGRT 101.
  • Information on "final" exam.

    SUNDAY, DEC. 9. 4 p.m. Open office hours
    - not a prepared review. I'll answer questions
    as long as people have them.

    Monday, Dec. 10. Office hours 11-12.

    Dec. 7: Added a couple of pages of closing notes.
  • Course Description, policies and tentative Syllabus.
  • LISTING OF POTENTIAL IE PROJECTS (THIS INCLUDES INFO ON THE SMSA DATA THAT 69R STUDENTS ARE USING IN HW 7
  • IE project description (for ST505 students).
  • IE project Step 2.
  • Updated: Nov. 9. IE project update (for ST505 students).
  • IE group assignments (ST505 students). Updated Oct. 19
  • IE Presentation days
  • Link to Chapters 1 and 2 of the book and student solution manual

    OFFICE HOURS

    Week of Dec. 3

    Mon: 1:30-3:30
    Tue: 3 - 4
    Thur: 2-3:30
    And by appointment. Contact me if
    have questions and can't make office hours.

    TA OFFICE HOURS

    TA: Viet Nguyen, office: LGRT 1335L
    Viet will handle computational questions
    Other questions can be directed to me in class or in office hours

    Tuesday: 2:00-3:00
    Wednesday 4:00-5:00
    Thursday 2:00- 3:00

    LECTURE NOTES/EXAMPLES

  • Lecture notes: Pages 1-142
    Dec. 2 : Pages 136-142 added. Intro to nonparmetric reg and serial correlation.

  • Plots for yield example.

    Reading


  • Detailed itemization of upcoming reading/coverage.

    READING


    Nov. 29: Reading on additional diagnostics: Section 10.2, 10.3 and 10.4 (but just Cook's distance in section 10.4)

    Nov. 13: Reading on model building (which we'll start in on Friday, Nov. 16).
    Chapter 9. Skip Press and data splitting.
  • Previous reading.

    Schedule



    Monday. Dec. 3

    - ST505 student working on IE projects
    - Lecture for ST697R: Some additional diagnostics;
    nonparametric regression (mainly as a diagnostic tool with one variables). Section 3.10
    Wed. Dec. 5
    Autocorelation in time series. The basic problem and one correction technque.
    Parts of Sections 12.1-12.4 (up to page 494)
    See notes for coverage.

    FRIDAY, DEC. 7
    - Comments on homeworks 8 and 9
    - Closing remarks
    - evaluations

    HOMEWORK/ASSIGNMENTS

  • Homework 1: Due Friday, Sept. 14
  • Homework 1 solution
  • Homework 2: Due Wednesday, Sept. 26
  • Homework 2 solution
  • Homework 2, SAS code and output for problem 1
  • Homework 2, R code and output for problem 1
  • Homework 3: Due Friday, Oct. 5 (start of class) Cow carcass and crime data are below.
  • Homework 3 solution.
  • Homework 4: Due Friday, Oct. 12 Kishi data below.
  • Homework 4 solution.
  • Homework 5: Due Monday, Oct. 22, start of class (no exceptions) Note in problem 2 it should be testing beta0=0 and beta1 = 1.
    An earlier version of this was linking to an unedited version.
  • Homework 5 solution.
  • Homework 6: Due Wed. Nov. 7 (by 4 p.m. sharp!)
  • Homework 6 solution.
  • Homework 7. Due: Wed. Nov 21
  • Homework 7 solution.
  • R code and output for weighted analysis of house data: hw 7
  • Homework 8. One problem is due Fri. Nov. 30 (start of class)
  • Homework 8 solution.
  • Homework 9. ST697R students only.
  • Homework 9 solution.
  • R code and output associated with homework 8

    Exam and grading

  • Midterm Solution (Note: a 25 got left off right at end of second version of answer to 2b in open book).
  • Grading information: summary of midterm scores and overall grading.
  • Final Solution

    COMPUTING

  • General comments on computing
  • An introduction to both SAS and R (from 2011 ST597 notes)
    This has A LOT more than what you need. See sections
    1 - 3 (intro to SAS) and Sections 15.1 - 15.4 (intro to R).
    We will demonstrate the key features in class.
  • An introduction to R in 3 parts. The first part is the same as Chapter 15 in the link above.
  • SAS links
  • R links

    R programs

  • Reading, listing and plotting data (with header names in the data file)
    and fitting a simple linear regression model.
  • Reading data (with no header names in the data file)
  • concise summary of how to run R programs above and save things (also sent in email).
  • More analysis of the pH example (confidence intervals for coefficients, residuals, etc).
    ( Updated Tues. Sept 18 to include confidence intervals on E(Y) and prediction intervals)
  • Class handout of pH example output from R.
  • Simulating estimated coefficients and MSE for simple linear
    Note: You don't need to understand the programming in the simulation
    program; just how to run it (will be demoed in class Wed.)
  • Analyzing the kishi data with illustration of how to use R as a calculator to get simultaneous intervals.
  • Class handout of Kishi example using R.
  • Illustrating computing and plotting confidence bands using Kishi data.
  • Illustrating correlation analysis using Brain data.
  • Puffin diagnostics. With hist corrected and dropping PIs and CIs
  • Esterase assay example; testing for constant variance.
    Also shows you how to route multiple plots to a file.
  • Testing lack of fit using cholesteroal data
  • Comparing means and assessing variance with grouped data (kishi example)
  • R code for Ramus example, page 65 of notes and GPA example(page 66).
  • R code for weighted least squares of ester example; parts of pages 71-75 of notes.
  • R code for house price example in notes
  • R code for house price example in notes. Modified to handle data with no missing and do overall F test in different way.
  • Descriptive statistics in R
  • Analyzing qualitative variables; Brain data
  • Variable selection: surgery example
  • Additional diagnostics (cook's distance, etc.) Housse example

    SAS programs

  • Reading, listing and plotting data
    and fitting a simple linear regression model.
  • Regression analysis of pH data. confidence intervals, etc.

    Note that the last part shows how you could customize graphs.
    For hwk. can just use plots out of prog reg.
  • Simulating estimated coefficients and MSE for simple linear regression
    ( Modified, Wed. Sept 19 to remove shrinking of graph size).

    Note: You don't need to understand the programming in the simulation
    program; just how to run it (will be demoed in class Wed.)

  • SAS files for Kishi example. Pages 30 and following in notes
  • SAS file to do inverse prediction
  • SAS files to do regulation
  • SAS file for Brain size example on page 42-44 of notes.
  • SAS file for Puffin example diagnostics, page 49 of notes.
  • Esterase assay example with residual analysis. Page 56 of notes
  • SAS file for lack of fit in Chol. example, page 62 of notes.
  • Kishi SAS file for assessing constant variance with grouped data. Page 63 of notes
  • SAS file for Ramus example, page 65 of notes..
  • SAS file for GPA example, page 66 of notes..
  • SAS file for weighted least squares in Ester example; parts of pages 71-75 of notes.
  • SAS file for house price example in notes.
  • SAS file for yield, moisture, temp example; page 97 and folllowing in notes
  • Analyzing qualitative variables with SAS; Brain data
  • Variable selection: surgery example
  • Additional diagnostics (cook's distance, etc.) Housse example

    DATA

  • pH data file.
  • Breakage data from Ex. 1.21 (no names).
  • Breakage data from Ex. 1.21 (with names in first row).
  • Hardness data from Ex 1.22 (for homework 2. Note that Y is first then X) ).
  • Cow carcasses and pH
  • Cow carcasses and pH data (txt file with names in first row). HW 3.
  • Crime data for homework 3
  • Kishi nitrogen intake-balance data (no names).
  • Puffin data (no names)
  • Esterase assay data (no names)
  • Virus-survival data for homework 5
  • Insurance data: IE project
  • Sex discrimination data: IE project
  • 1985 CPS wage data: IE project
  • SENIC data: Infections in hospital IE project
  • House Price data. Example in notes
  • House Price data. NO missing for price, sqft or tax. comma separated.
  • Patient data for homework 6.
  • Modified MCAS data
  • Reading MCAS data
  • SMSA data for s697R. MODIFIED SAT DEC 1. DROP FT. WORTH ADD NAMES
  • Yield data
  • Brain data, no names.
  • Copier data for homework 8
    Combines problem 8.15 and 1.20. Data in order: service, number, type.

    MISCELLANEOUS

  • Who is Francis Galton and what does he have to do with regression?
  • U.S. weather extremes
  • Dallas marathon times plot. Analysis is part of homework 2
  • Cell phones and driving: What is the regression here doing?
  • Brain and Spinal Cord Interaction: Regression example.

    So what's a puffin look like?


  • Florida 2000 election example