Consider the population of all students at a large university taking introductory statistics courses (1,129 students taking statistics for business, social sciences, or natural sciences). Suppose we are interested in the values of four specific variables for this population: handedness (right-handed or left-handed), sex, SAT Verbal score, and age. If we were unable to determine the values of those variables for the entire population, we may be able to take a random sample from that population, and use the sample summaries as estimates for population summaries. Would the random sample provide unbiased estimates for the population values? Next, what if instead of taking a random sample, we sampled the 192 students who happen to be enrolled in the business statistics course? First we will intuit, then check, if they would be a representative sample with respect to each of the four variables: handedness, sex, SAT Verbal score, and age. It may be helpful for you to know that, at this university, all students have comparable options in terms of when they take introductory statistics. You should also know that women, on the whole, tend to do somewhat better than men on the verbal portion of the SAT, and that business is a major that tends to interest men more than women.

Our dataset contains the following variables:

  • Course: natural science, social science, or business
  • Handed: right-handed or left-handed
  • Sex: male or female
  • Verbal: SAT Verbal scores up to 800
  • Age: in years

Though we provide SAS and SPSS output in the question below, we encourage you to create this output yourself using the datasets student_survey.xls or student_survey.csv. Use the following output files for Questions 1-4 (SAS output (SAS code) or SPSS Output). You will also need these additional output files for Questions 9-12 (SAS output or SPSS Output).

To create the required output in SPSS:

  • Import Data: FILE > OPEN > DATA, choose Excel file from the pull-down, find the file, continue
  • Edit Data: DATA > DEFINE VARIABLE PROPERTIES
  • Create Frequency Table for Handedness and Sex in the Population: ANALYZE > DESCRIPTIVE STATISTICS > FREQUENCIES
  • Create Summaries for Verbal and Age in the Population: ANALYZE > DESCRIPTIVE STATISTICS > FREQUENCIES
  • Create Random Sample: DATA > SELECT CASES > Random Sample, choose “Exactly 192 cases from the first 1129 cases”. Look at the data to see how this works using the default method and notice our full dataset still exists (we will learn a different method in the next activity).
  • Recreate the tables and summaries: exactly as before using this random sample
  • Create Non-Random Sample: DATA > SELECT CASES > choose “if condition is satisfied”, pull in the Course variable, and use the condition Course = “Business”
  • Recreate the tables and summaries: exactly as before using this random sample

To create the required output in SAS:

  • Create Frequency Table for Handedness and Sex in the Population: Use PROC FREQ to obtain these tables.
  • Create Summaries for Verbal and Age in the Population: Use PROC MEANS to obtain these tables which should contain the mean, standard deviation, and five-number summary.
  • Create Random Sample: Use PROC SURVEYSELECT to create a simple random sample of 192 observations from the current population. Name the output dataset students_srs.
  • Recreate the tables and summaries: exactly as before using this simple random sample
  • Create Non-Random Sample: Use a DATA step and an IF-THEN statement to create a non-random sample containing only business students. Name this dataset students_bus.
  • Recreate the tables and summaries: exactly as before using this random sample.

Learn By Doing
Question prompt
Question:

Question

A Choice 1
B Choice 2
C Choice 3
D Choice D
E Choice E
F Choice F
G Choice G
H Choice H
Our Answer Comments.