In order to perform adequate tests of claims, researchers must include systematic observations in their studies — observations made according to a plan. The specifics of the plan depend upon the claim being tested. For example, college aptitude tests, such as the Scholastic Aptitude Test (SAT), are used to predict how well a person will do in college, typically in terms of grade-point average (GPA). If we want to test the claim that the SAT predicts future GPAs, then our systematic observations must include the following:
- a representative sample of students (see below);
- giving these students the aptitude test;
- having them take a number of college courses;
- and, finally, correlating aptitude-test scores with cumulative GPAs.
If a strong positive correlation is found between SAT scores and GPA, then the claim is supported by our results (i.e., we can feel more certain that the claim is correct).
The goal of research is to derive generalizations. A generalization is a statement about what typically is true about something that is based on a set of observations. For example, let’s say that five studies found a positive correlation between SAT scores and cumulative GPAs. Based on these observations, we could generalize by concluding that scores on the SAT are good predictors of which students will do well in their college courses, on average. This generalization is then applied to any student who takes the SAT, even those who did not participate in any of the five studies. A fundamental problem with generalizations, however, is that they are based on a limited number of observations. In order to increase the likelihood that a generalization is accurate, the observations on which it is based must be adequate with respect to two characteristics:
- Relevance. Observations are relevant when they are appropriate to the generalization being made. For example, a pollster (a person who conducts or analyzes opinion polls) trying to predict who the next President of the United States will be could not generalize about which of two candidates was more likely to win based on interviews of a group of children. Instead, the pollster needs to interview a group of registered voters.
- Number. A sufficient number of observations are required to make a generalization. The pollster could not generalize about which of two presidential candidates was more likely to win after interviewing only five registered voters. It’s essential that the group interviewed contains a large number of people.
Representative Samples
When predicting which of two presidential candidates is most likely to win an election, the people interviewed would need to be similar to all people likely to vote in the election. A sample is a set of observations. In our example, the sample would be the registered voters interviewed by the pollster. The sample is selected from a population, which is the total number of relevant observations that could be made. In our example, the population is all registered voters in the United States. In this case, the population is too large for researchers to interview each individual. Instead, they must select a sample from the population. If researchers are to make an accurate generalization about what the population of registered voters will do, they must select a representative sample — a sample that is similar to the population with respect to essential characteristics. In predicting the outcome of an election, the sample of registered voters interviewed must be similar to the population in terms of age, race, ethnicity, gender, political affiliations, and so on. The study’s results would be invalid if they were based on a biased sample — a sample in which one or more of these characteristics differs significantly from those of the population. The more that the sample’s characteristics differ from the population’s characteristics, the less accurate will be the generalizations derived from the results.
A famous example of this occurred just before the 1936 U.S. presidential election. Franklin Roosevelt was the incumbent Democratic president running against a Republican challenger by the name of Alfred Landon. Landon was supported primarily by those who had survived the initial economic losses of the Great Depression and were still relatively well-off financially. Roosevelt was supported primarily by people hit hard by the economic collapse. In predicting the outcome of the election, a magazine called Literary Digest sent questionnaires to about 10,000,000 Americans (Goodwin, 1995; Katz & Cantril, 1937; “Landon In a Landslide,” 2006). Their sample included subscribers to the magazine, as well as a large number of people selected from phone books and motor-vehicle registration records. The pollsters received responses from about 2.5 million people, which is an extraordinarily large number of observations, especially considering that the population of the United States in 1936 was only about 130 million. Almost 60% of the respondents stated that they were going to vote for Landon, and only about 40% stated that they were going to vote for Roosevelt. Based on this result, the pollsters predicted that Landon would win in a landslide. The actual results of the election, however, were reversed: Roosevelt received about 60% of the votes.
What went wrong with the magazine’s polling? It used a biased sample. The sample was not biased in terms of number (almost 2% of the American population were polled, it was biased in terms of relevance. This was due to two factors. The first factor was the method used to select the sample. The pollsters mailed the questionnaire to people whose names appeared in phone books and car-registration records. In the middle of the Great Depression, however, car and telephone ownership were much less common than they are today: many people couldn’t afford them. This meant that the people polled in the Literary Digest study were primarily affluent and Republican in a country that was primarily poor and Democratic. The second factor was the number of questionnaires returned to the pollsters relative to the number they sent out. About 75% of the people sent questionnaires didn’t return them. This is a problem because of the possibility that the voting behavior of the minority who took the time to answer and return the questionnaire differed from that of the majority who probably tossed the questionnaire in the garbage. Thus, even a very large sample does not guarantee that the results will provide an accurate picture of the population if the people making up the sample differ from the population.
Controlling Extraneous Variables
Most of you probably have wondered how much you need to study in order to do well in your courses. Some of you may have heard the rule-of-thumb that states that you should study two (sometimes, three) hours outside of class for every hour you spend in class. We’ll refer to this as the “2-for-1 Rule.” The rule claims that, if you go to class for three hours each week, you should study six hours outside of class each week in order to do well in the course. How could you test this claim for its accuracy? Perhaps you could test it by remembering courses you have taken in the past. You might remember that you took an American history course last semester and received an A even though you rarely opened the textbook. Rather, you simply listened carefully in class and took good notes, which you reviewed just before each test. In fact, you now recall that you received all As and Bs last semester with very little studying outside of class.
Do these observations show that the 2-for-1 Rule is wrong? Not necessarily. It may be that the courses you took last semester were not a representative sample of courses offered at the college. They may have been less demanding than most other courses. Or it could be that you are misremembering how much you actually studied for your courses. In other words, you were not making systematic observations when you simply tried to recall what happened in a few courses that you took last semester. Let’s look at a fictional research study that includes systematic observations capable of testing the 2-for-1 Rule.
In our study, let’s say that we asked a sample of 80 students to take a week-long course that met every day (Monday through Thursday) for one hour, for a total of four hours of class time, with a test on the last day (Friday). Two variables were measured: the number of hours spent studying and test scores. The students were split into four groups (20 students in each group), and each group was asked to study a different number of hours outside of class for the test (see Table 1; adapted from Goodwin, 1995, pp. 135-136).
Group 1
|
Group 2
|
Group 3
|
Group 4
|
|
Monday
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
Tuesday
|
—
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
Wednesday
|
—
|
—
|
studies 2 hours
|
studies 2 hours
|
Thursday
|
—
|
—
|
—
|
studies 2 hours
|
Friday
|
Test
|
Test
|
Test
|
Test
|
Table 1. Design of an experiment for testing the 2-for-1 Rule
Now, let’s say that we discovered that Group 4, which had studied two hours for every hour spent in class, did best on Friday’s test, Group 3 was next, Group 2 followed them, and Group 1 did the worst on the test. We then could conclude that the more hours spent studying, the better that one will do on tests.
Is this a reasonable generalization to make? Although it may seem as if the study included systematic observations that support the generalization, you may have noticed a problem with the study. The four groups of students differed not only in the total number of hours that they studied, but also in the number of days between the last time that they studied and the time that they took the test (which is called the retention interval). Because of this, we cannot know whether the differences observed among the groups in test scores were due to the different amounts of time spent studying, the different retention intervals, or both.
When making systematic observations to test a causal claim — a statement that specifies a cause of something else (such as the claim that two hours spent studying for every hour spent in class causes better test performance) — the most important component of the plan is the need to “control for” (i.e., to take account of or eliminate) the effects of extraneous variables. In our example, we were unable to generalize about the effect of the amount of time spent studying on test scores because we did not control for the effect of the retention interval. Controlling a research situation means that it is set up in such a way that the effects of extraneous variables can’t explain the results. In other words, we try to make sure that only one possible explanation for the results is left — an explanation that involves only the effects of the factor being investigated. In our study, however, the effect of an important extraneous variable (retention interval) wasn’t “controlled for,” which means that there are three possible explanations for the results:
- Spending more time studying causes higher test scores.
- Studying closer to the time of a test causes higher test scores.
- Spending more time studying and studying closer to the time of a test together cause higher test scores.
In order to systematically observe, we need to control for the extraneous variable of retention interval. How could we have controlled for the effects of retention interval? Perhaps we could have had the groups study according to the schedule in Table 2.
Group 1
|
Group 2
|
Group 3
|
Group 4
|
|
Monday
|
—
|
—
|
—
|
—
|
Tuesday
|
—
|
—
|
—
|
—
|
Wednesday
|
—
|
—
|
—
|
—
|
Thursday
|
studies 2 hours
|
studies 4 hours
|
studies 6 hours
|
studies 8 hours
|
Friday
|
Test
|
Test
|
Test
|
Test
|
Table 2. Design of a second experiment for testing the 2-for-1 Rule
In this case, each group would study only the day before the test, which would control for the the effects of retention interval. But would this schedule allow us to achieve our goal of observing in a systematic manner? No, because it would introduce another extraneous variable: anyone who studies for eight hours on one day will suffer much more fatigue and, hence, have more trouble learning the material than someone who studies only two hours.
In our study, we need to control for the extraneous variables of fatigue and retention interval at the same time. The schedule in Table 3 would allow us to do this.
Group 1
|
Group 2
|
Group 3
|
Group 4
|
|
Monday
|
—
|
—
|
—
|
studies 2 hours
|
Tuesday
|
—
|
—
|
studies 2 hours
|
studies 2 hours
|
Wednesday
|
—
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
Thursday
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
studies 2 hours
|
Friday
|
Test
|
Test
|
Test
|
Test
|
Table 3. Design of a third experiment for testing the 2-for-1 Rule
If we now find that the students in Group 4 receive the highest average test scores, Group 3 the second highest, Group 2 the third highest, and Group 1 the lowest test scores, we can conclude that spending more time studying causes students to receive higher test scores.
Nevertheless, no matter how much care researchers take to control for the effects of extraneous variables, it’s always possible that they will miss one or more extraneous variables because they didn’t think of them. For example, if you hadn’t already had a great deal of experience with studying, it may never have occurred to you that a person who studies eight hours in one day might become more fatigued than a person who studies only four hours. This is why it is so important for researchers to describe their procedures carefully when publishing their studies. This allows other researchers to more easily detect the possible influence of unsuspected extraneous variables and then attempt to replicate the results with better-controlled studies of their own.
Study Questions for Section 4-1
- What does it mean to systematically observe?
- In general, what must be true of observations if they are to lead to accurate generalizations?
- How would you define a “sample” in your own words?
- How would you define a “population” in your own words?
- What is a representative sample?
- If you wanted to find out what most students at your college plan to do with their education, how would you go about obtaining a representative sample?
- How is a biased sample similar to, and different from, a representative sample?
- When do researchers need to control for the effects of extraneous variables?
- How do researchers control for the effects of extraneous variables?
- Use the example above to design a study that tests the following claim: In a six-row classroom (11 seats in each row), consistently sitting in the first three rows causes students to do better on tests than sitting in last three rows.
NOTE: My answer to this question is here. Your answer may differ.
Practice Quiz Questions for Section 4-1
Practice Quiz Answers for Section 4-1
References
Goodwin, C. J. (1995). Research in psychology: Methods and design. New York: Wiley & Sons.
Katz, D., & Cantril, H. (1937). Public opinion polls. Sociometry, 1, 155-179.
Landon in a landslide: The poll that changed polling. (2005). Retrieved September 30, 2011, from http://historymatters.gmu.edu/d/5168/