A local internet service provider (ISP) created two new versions of its software, with alternative ways of implementing a new feature. To find the product that would lead to the highest satisfaction among customers, the ISP conducted an experiment comparing users’ preferences for the two new versions versus the existing software. The ISP ideally wants to find out which of the three software products causes the highest user satisfaction. It has identified three major potential lurking variables that might affect user satisfaction — gender, age, and hours per week of computer use. In this activity, we will use adults in a hypothetical city as the population of interest to the ISP. We will:

  • create a simple random sample as the basis for the experimental study of the population,
  • use randomization to assign individuals to treatment groups, and
  • verify that randomization prevented the three treatment groups from being different with respect to the most obvious lurking variables.

Our dataset contains the following variables:

  • age: in years
  • gender: female or male
  • comp: hours per week of computer use

The company must rely upon sampling to study its customers’ preferences, since the entire population cannot be assigned to treatments. Therefore, we will first choose a simple random sample (SRS) of 450 people for the subjects in the study. Then we will randomly assign our SRS of 450 subjects to treatment groups, one for each of the three versions of the ISP’s software. Let’s denote the versions “1,” “2,” and “3,” and create a categorical variable to identify the treatment for each subject. Finally we will examine whether the randomization was successful in making our three treatment groups similar with respect to the variables age, gender, and comp. In other words, we will examine whether the distributions of these variables in the three groups are similar or not.

  • To compare the distribution of age and comp (the hours per week of computer use) among the three treatment groups, we’ll create side-by-side boxplots by treatment.
  • To compare the distribution of gender among the three treatment groups, we’ll look at a two-way table of conditional percents:

Though we provide SAS and SPSS output in the question below, we encourage you to create this output yourself using the datasets computer.sav or computer.sas7bdat. Use the following output files to answer the questions (SAS output (SAS code) or SPSS Output).

To create the required output in SPSS:

  • Create Random Sample: DATA > SELECT CASES > Random Sample, choose “Exactly 450 cases from the first 20783 cases”. Choose copy selected cases to new data set and enter a name (SPSS does not actually use this name but you much choose one).
  • Create New Variable: TRANSFORM > COMPUTE VARIABLE, name the new variable and type RND(RV.UNIF(0.5, 3.49)) into the numerical calculation box.
  • Edit Data: DATA > DEFINE VARIABLE PROPERTIES, set up the new variable as a nominal variable and round to 0 decimal places.
  • Save the new data: FILE > SAVE AS, choose location and file name and continue.
  • Side-by-Side Boxplots for Age by Treatment: GRAPHS > CHART BUILDER
  • Side-by-Side Boxplots for Comp by Treatment: GRAPHS > CHART BUILDER
  • Two-Way Table for Gender by Treatment: ANALYZE > DESCRIPTIVE STATISTICS > CROSSTABS

To create the required output in SAS:

  • Open SAS and Create Random Sample: Use PROC SURVEYSELECT to create a simple random sample of 450 observations from the current population. Name the output dataset computer_srs
  • Create New Variable: Use a DATA step to create a new variable called TRT using the code: TRT = floor(2.99*ranuni(0)+1); Although you do not need to understand this code, it creates a random uniform variable between 1 and 3.99 and then truncates this to an integer so that 1.2 would become 1, etc.
  • Side-by-Side Boxplots for Age by Treatment: Using PROC SGPLOT
  • Side-by-Side Boxplots for Comp by Treatment: Using PROC SGPLOT
  • Two-Way Table for Gender by Treatment: Using PROC FREQ

Learn By Doing
Question prompt
Question:

Question

A Choice 1
B Choice 2
C Choice 3
D Choice D
E Choice E
F Choice F
G Choice G
H Choice H
Our Answer Comments.