Back

A Comparison of Telephone Survey Respondent Selection Procedures.

John M Kennedy
Center for Survey Research
Indiana University

Presented at the Annual Meeting of the
American Association for Public Opinion Research
May 1993

ABSTRACT

In this paper, I present comparisons of various respondent selection procedures. I determined that the various procedures have differential impacts on survey estimates. Some procedures are more effective than others. By comparing the data from the Indiana Poll with decennial census data collected at the same time, I am able to measure the differences.

I never finished the paper after the 93 AAPOR meetings, and some of my notes for additional materials are included in this paper. Please feel free to comment on it.

INTRODUCTION

A further discussion of the four issues that prompted this paper are in italics:

(1) Ideal vs real conditions of sampling - telephone survey sampling assumes a random sample; within-household respondent selection procedures assume a random sample of telephone numbers. But, if there are biases that result from the difficulties in managing the telephone number sample (eg, Waksberg replacements) and there is differential non-cooperation, then the survey design needs to counteract these biases. Revising telephone number sampling design is underway (eg list-assisted designs Tucker, etal AAPOR 93). Non-cooperation will not go away, so a new sampling designs are needed. How can respondent selection techniques be improved to make it easier to produce representative samples given substantial non-cooperation.

(2) Random selection vs distribution of important characteristics - I have never read a discussion of why we do within household selection; only why we don't interview the person who answers the phone. Two possible reasons - random sampling and respondent characteristic distribution. Some respondent selection techniques are random selection techniques; others distribute characteristics better than talking to the first person. What is the goal - the ability to use inferential statistics; make estimates, etc? Depending on the goal, different respondent selection techniques are more appropriate.

(3) Who do we assume the respondent represents - in an ideal world of random samples and perfect cooperation, this is not an issue. Do we assume the respondent represents the household (then weight by number of adults) or does the respondent represent a different group, eg males under age 30 (then weight to population estimates). The survey design needs to recognize the intended use of the data. Different goals require different respondent selection procedures.

(4) Post-survey weighting - what does it really do? Is it used to hide errors introduced by the survey design? What are some of the problems in using it (errors of closure; difficulty of using raking procedures). How are the weights determined; by gender, age, or by a more careful process that takes into account sample design?

Statistical issues such as error estimates are beyond scope of the paper (and the author) but a short discussion of these issues will be included.

Telephone survey researchers use a variety of techniques to choose respondents in households where there is more than one potential respondent. Respondent selection procedures can accomplish two goals: 1) they may increase the diversity of the distribution of characteristics of respondents in sample over what would occur if only the first eligible person is interviewed; and 2) they randomly select respondents for the survey. The second goal implies that first would also be achieved but not vice-versa. In this paper, I compare eight respondent selection procedures to determine the sample compositions on three demographic characteristics. The results are compared with the 1990 decennial census data to evaluate how accurately each procedure represents the population they were drawn from. Two surveys, one conducted at the same time as the 1990 decennial census, and a second survey conducted in April 1993, provide the data for this paper. In these surveys, the households were rostered using the questions contained in the decennial census short form.

The primary goal of this paper is to investigate whether various respondent selection procedures contribute to sample proportions that differ from population proportions. For example, many survey organizations report a smaller proportion of males in their samples than might be expected given the proportion in the population that the sample was drawn from. The difference is usually attributed to a higher nonresponse among males. Yet, it is possible that males (or young adults) live in households that make them less likely to be selected for interviewing.

Five random respondent selection procedures are included: (1) Kish; (2) Trodahl/Carter/Bryant; (3) Hagen/Collier; (4) random gender/order; and, (5) random adult. Two "last birthday" procedures are included: (6) random assignment of either last or next birthday and (7) last birthday. The former uses a random number to determine whether the person with the next or the last birthday is selected. This procedure contains some aspects of randomization but is not fully random because only two adults could be included in the sample in households with three or more adults. The "last birthday" procedure is not random because the respondent is determined when the telephone number is selected and the field date is determined, but the procedure does increase the diversity of the respondents' characteristics. The eighth procedure is the selection of the first adult spoken to in the household as the respondent. The first seven procedures should produce better estimates of the population composition than the first adult method because respondent selection procedures are assumed to prevent biases that result from speaking to the first adult. Ideally, the sample distributions of the respondents' demographic characteristics should be similar to the census distributions.

Two forms of weighting are often used to adjust for sampling probabilities and nonresponse. Post-survey weighting to known population characteristics is used to correct for samples that are not distributed in the correct proportions. Post-survey weighting bases the weights on the differences between the proportion of the sample with a characteristic and the population proportion. This method implies that the correct population proportion is known.

An alternate method is to assign weights based on the inverse of the probability of selection for the sample. For example, the Survey of Consumer Attitudes conducted by the UM SRC bases its weights partially on the within-household probabilities of selection. That is, if there are three adults, the selected respondent is weighted as three times the weight assigned to a one adult household.

Neither procedure yields appropriate weights under all conditions and each has some problem in estimating the appropriate weight. In the latter case, weighting by the inverse of the within-household selection probability assumes that person selected is representative of the other household members. For the most part, social researchers feel there are differences between young and old, male and female. If so, then the persons selected do not represent the others in the household. Rather, weights should reflect the probability of selection from the target population. This probability is often not known.

With the former procedure, there is a similar problem in estimating appropriate weights. Survey organizations often base their weights on the age, gender, and education of the respondents. Older persons, males, and persons with less education are less likely to cooperate. If a weight is assigned based on age, then it will likely distort the distribution on education because the two variables are not perfectly correlated. Raking procedures or iterative proportional fitting (Kalton, 1983) are often used to get closer to the target distributions. Also, the target distribution might not be known in many populations. For geographic areas such as states, there is often no good source of population estimates for characteristics besides age and gender. For substate areas, there are very few estimates available except around decennial census years.

 Respondent Selection Procedures

Perhaps the best known respondent selection procedure used by survey researchers was developed by Kish (for this paper I relied on the descriptions of the Kish method contained in Frey, 1989 and Lavrakas, 1987). This procedure rosters households by age and gender and uses various matrices to determine the respondent in multi-adult households. The procedure has been criticized for multiple reasons but primarily for non-technical reasons. It is difficult to administer so it requires extensive interviewer training. It is intrusive because the interviewer rosters the household before beginning the interview. The procedure takes time and is therefore costly.

Trohdahl and Carter (1964) developed a procedure similar to Kish that is a little less intrusive. Bryant later adapted the procedure to reduce the impact of differential gender nonresponse. This procedure is easier to administer and requires less interviewing time than Kish. It requires that the interviewer ask informants only the age and gender of adult household members. Matrices are then used to select the respondent. As with Kish, it requires substantial interviewer training. In certain limited types of household compositions, both Kish and Trohdahl/Carter/Bryant will not select each adult with equal probability but these are minor problems and effectively have no impact on respondent selection.

The Hagen/Collier procedure (1983) uses a random procedure to determine which of four types of respondents are first asked for in the household - the oldest male, youngest male, oldest female, and youngest female. This procedure is easy to administer, is not intrusive, and requires little time. Hagen/Collier does not allow for middle person to be included in the sample when there are three or more adults of the same gender in the household. This procedure, while not fully random, does permit a representative distribution of sample persons.

The last birthday method produces diversity among respondents. It is easy to administer, requires little time, and is not intrusive. Lavrakas (1987) expressed concern that it is sometimes difficult for informants to understand the instructions. (At the IU CSR, we used the procedure for two surveys but we discontinued because the interviewers felt strongly that it was a bad procedure. First, they felt it did not appear to be sufficiently "scientific" to respondents, which made the interviewers work harder to maintain a professionally-administered interview. Second, the procedure allowed more screening by the informant. Third, under certain conditions, informant-arranged callbacks created problems.) Lavrakas etal (1993 AAPOR) reported that in about 20 - 25 percent of the households the wrong person was chosen with this method. Salmon and Nichols (1983) argued in a footnote in their article that the method is random based on the assumption that a persons date of birth is a random process. I feel that the randomness is lost because the selection of the survey field time is not a random process. Depending on the interpretation of randomness, the procedure may or may not produce a random sample.

In an attempt to introduce a random factor into the last birthday method, in this paper I test the impact of using a random number to determine whether the adult with the next or the last birthday should be interviewed. Since a majority of households have one or two adults, this procedure is random in most households. This procedure has many of the same difficulties in administering as the last birthday method, such as the problem of interpreting the meaning of the last birthday or respondent substitution.

Two procedures based only on random numbers are also examined. One procedure (random gender/order) uses two random numbers to select first, the gender of the respondent and, then the order (oldest, youngest, next to oldest, etc) within households with multiple adults of the selected gender. As it was done at the IU CSR, the CATI instrument was programmed to use a random number to ask the number of males or females. If there was more than one of the selected gender, the second random number selected the order. If there were no adults of the selected gender, the program asked how many of the other gender in the household and the order selection was made from that answer. This procedure took little time and was easy to administer. Its main problem was that many female respondents were reluctant to tell how many males were living in the household, especially when they were living alone.

To overcome the concern about asking gender questions early in the interview, last fall we changed the selection procedure to ask only the number of adults in the household (random order). The CATI program selected the order based on a random number. This is a minimally intrusive method and it requires little time and effort to administer.

An alternate, easy-to-administer method is to speak to the first adult. This procedure may introduce more bias in the respondent selection but it is not intrusive, is easy to administer and requires little time. It potentially reduces the number of refusals and would allow considerable cost savings (Zukin etal, 1987). Survey researchers generally do not interview the first adult because certain types of respondents (older persons, females) are more likely to answer the phone.

Most survey organizations do post-survey weighting which implies that a proportionally representative sample is rarely achieved. The problem occurs either because the sampling procedures (telephone number selection and/or respondent selection) or differential noncooperation distorts the sample distributions. The improvement in sample precision of a respondent selection procedure over interviewing the first adult is empirically testable.

Macklan and Waksberg (1987) evaluated the household coverage in RDD samples by comparing the results from a survey they conducted to the CPS and the NHIS. They found the RDD samples generally provided as good a coverage as these surveys, at least as measured by gender and age. But, this research did not compare selection methods to see if differences in samples result from the choice of respondent selection procedures.

This paper may not contain a comprehensive list of all selection procedures, but it represents most of those used by academic survey organizations. A systematic evaluation of respondent selection procedures and the comparative evaluation of multiple procedures has not been done. In this paper, I attempt to determine the factors underlying the problems related to samples that are not proportional to population distributions by evaluating the respondent selection procedures.

 Relevant Literature 

Many researchers have tested alternative respondent selection procedures but have not systematically checked multiple procedures and simultaneously compared the results using baselined data. Zukin etal (1987) compared the last birthday and the first adult methods to examine demographic characteristics and response rates differences. The response rates were lower and field costs were higher for the last birthday method. In approximately 20 percent of the households, a different respondent would have been chosen using the last birthday method over the first adult method, but, there were few differences between the methods in the distribution of responses to substantive questions. The results indicate the use of a respondent selection procedure did not substantially improve the results over selecting the first adult.

Oldendick etal (1988) compared the Kish and last birthday selection procedures to determine differences in response rates and the impact on substantive questions. Their results were very similar to Zukin etal. They found little difference between methods on selected demographic characteristics and on answers to selected substantive questions. Neither Zukin etal nor Oldendick etal compared their survey data to any known benchmark but both found that alternate respondent selection procedures did not produce different outcomes, at least as measured by demographic composition and substantive answers.

Previous research used in the development of new respondent selection procedures usually contained analyses of at least two alternate methods. The results were used as arguments for the adoption of the new procedure. For example, Salmon and Nichols (1983) compared first adult, male/female alteration, and Trodahl/Carter with the last birthday method. They found the last birthday and the first adult methods produced many fewer refusals than the male/female alteration and Trodahl/Carter. Cjaka etal (1982) compared two versions of Trodahl/Carter and Kish and found little difference in the demographic compositions of the three samples. Hagen and Collier (1983) compared their method with Trodahl/Carter. They found fewer refusals with their procedure but the demographic compositions were similar. Interestingly, in their research, Trodahl/Carter produced a weighted sample that was 52 percent male and the Hagen/Collier sample was 50 percent male. Adult populations have more females than males, so there is some indication that the procedures, as they used them, overcompensated for the anticipated higher male nonresponse.

Most of the research cited here was conducted as part of surveys where each selection procedure was used for selected parts of the same survey. For example, Oldendick etal split the sample roughly in half with each half being assigned either the Kish or the last birthday method. In this paper, I take a different track by comparing the selection procedures to determine who would have been selected in each household using each procedure. While this method allows for multiple comparisons, only one method was actually used in the surveys.

The earlier research cited in this paper was conducted typically with the goal of determining the easiest or most cost-efficient technique for obtaining survey cooperation while maintaining the scientific integrity associated with random sampling. This research in this paper attempts to determine the impact of the different selection procedures on the representativeness of the sample. Implicit in the research is the expectation that the procedure that produces the most accurate proportions would be the most valuable. Oldendick etal (1988) found that selection procedures have no impact on survey cooperation. If so, I can assume that the same households would have been included in the sample using any of the eight procedures. The testable question is whether the characteristics of the sample respondents differ when each procedure is used.

Research Procedures

Two RDD (Mitofsky/Waksberg two-stage) surveys conducted by the Indiana University Center for Survey Research are analyzed in this research. A survey of 1011 Indiana households was conducted in April, 1990. The households were rostered using the 1990 decennial census short-form questions which allowed for a comparison with the census totals. The availability of "complete" (assuming no underenumeration by respondents) household information permits me determine which adult would have been selected using each procedure. The 1990 survey did not ask month of birth, so the last/next birthday methods cannot be analyzed from this survey. The second survey of 688 households was conducted from April 16 - May 10, 1993. In this survey, the same household rostering was used and, in addition to the rostering, the birth month of all adults was recorded.

As part of all RDD surveys conducted at the IU CSR, a series of random numbers, besides those used for respondent selection, are assigned to each case. The random numbers were used as input to the matrices for Kish and the selection of the adult with the next/last birthday, as well as the starting point for Hagen/Collier procedure.

The standards used to measure the representativeness and accuracy of telephone survey samples are vague, but typically, response rates are assumed to measure the accuracy of survey results. Low response rates indicate the possibility differential nonresponse that could produce a less accurate representation of the population. Telephone survey researchers rarely compare their results with established data because their survey procedures do not usually include the information needed for accurate comparisons. In this paper, not only are the selection procedures compared with each other but they are simultaneously compared with an established data source - the decennial census.

 Data Analysis

This section needs further development. The data need to be analyzed and described more fully.

An issue not discussed in this paper, and one that interacts with "telephone households" is the underenumeration in the decennial census. It will be difficult to correct for both "whole household" and "within household" misses at the same time as correcting for "telephone households." A further complication is the underenumeration of adults in the surveys we conducted.

Given larger resources (time and data), it might be most appropriate to do the comparisons between the census data and the sample data across household types.

I may have to say that the appropriate comparison numbers are possibly not available and this is an intractable problem. Or, I might just say that the census numbers are "close enough for academic work." (humor !!) This, in fact, was my early intention because I doubt that making all possible corrections will change the proportions very much and I wanted only to be sure that I was not too far off with the survey data.

Population estimates were available for Indiana in 1993. I need to consider including them in Tables 2 and 3. Unfortunately, education is not an estimated characteristic, so it is not update between censuses.

Table 1 contains the 1990 distributions of selected demographic characteristics from the 1990 decennial census and the 1990 survey. The data indicate that the composition of the sample households (columns 3 and 4) is reasonably close to the census distributions (columns 1 and 2) for gender and age. There is a substantial difference in education. The 1993 distributions (Table 2), again, closely mirror the census distributions. These data indicate that the sampling procedures are capturing households that are very similar to the overall Indiana household composition.

In 1990, the IU CSR used the random gender/random order respondent selection procedure (RS). This procedure could be considered as a baseline respondent selection procedures because it is the only procedure that is directly affected by differential nonresponse. For example, the lowest proportion of males is found using this procedure. Overall, the gender distribution based on the random gender/random order procedure differs little from the procedures not used. None of the respondent selection procedures produced an age distribution that resembled either the census distribution or the overall sample distribution. All selection procedures produced about the same (incorrect) age distribution. The relatively high proportions of persons over 65 indicates that the smaller household sizes in those households where someone is over age 65.

The selection procedures tended to miss young adults. I assume they are missed because they are most likely to live in multi-adult households. They could be sharing residences with other young adults or living with their parents, especially since living with the parents is an increasingly common housing choice for this age group. They may also be a very difficult group to interview because their schedules do not mesh well with typical survey calling times.

The distribution of respondents' education is especially problematic. Both the sample distribution (column 4) and the respondent selection procedures produce distributions where college graduates are about seven to nine percentage points higher than was found in the census. It cannot be determined from these data if the respondents exaggerated their educational achievements; if those persons with higher educations were more likely to respond; or if the telephone number sampling procedure yielded a more educated population. The latter is possible because of locational clustering of persons with similar education characteristics and these locations are more likely to be selected in first-stage calling.

The data in Table 1 also indicate the relative improvement of respondent selection techniques over interviewing the first adult. All respondent selection procedures produced age and gender distributions that were closer to the census and the sample distributions than the first adult method. There were trivial differences on the distribution of education among the procedures. This might be partially explained by the age distribution of respondents. Education is measured only for those over age 25, so it is not surprising that the differences are minimal between the selection procedures and talking to the first adult.

Table 2 (1993 survey) presents approximately the same results. Overall, the sample distributions are very similar to the census distributions. In general, none of the procedures produces the appropriate distributions across the three characteristics. For example, Trodahl/Carter/Bryant would create a gender distribution that is close to accurate but it produces the most smallest proportion of young adults. The IU CSR random adult selection procedure resulted in the lowest proportion of males but, in general, none of the procedures except Trodahl/Carter/Bryant produced accurate results. Again, young adults are much less likely and older persons are much more likely to be selected, and the problem with a too high proportion of college graduates remains. In 1993, the respondent selection procedures yielded distributions on age and education that are not too different from what would have occurred if the first adult was interviewed.

Overall, these data indicate that, except for education, we reach a reasonable distribution of persons in the households we interview. There is little difference between the census distribution and the overall sample distributions. While it might appear that some of the respondent selection procedures would produce gender distributions closer to the actual distribution, in reality, the procedure actually used indicate what likely would have occurred under all procedures. These data indicate that each respondent procedure will produce about the same distribution of respondents who might be interviewed. The Trodahl/Carter/Bryant procedure creates an artificially high proportion of males and it does not correct for an differences on age or education. Each of the selection procedures produces gender distributions that are closer to actual distributions than talking to the first adult, but they do not substantially improve the age or education distributions. Finally, these data indicate that the selection procedures will always produce fewer males and fewer young people than actually occur in the population.

Table 3 and the previous research need more description and analysis. I need to explain who the table was created and more of why weighting did not seem to improve the distributions. The two missing columns (birthday methods) will be calculated for the next version. If there is space, I will explain how a 1988 survey demonstrated to me the shortcomings of the methodologies of telephone survey research for making population estimates.

Table 3 shows the distributions for the 1993 survey data after weighting by the number of adults in the household. Overall, the results are disappointing. While the gender distribution shows improvement, there remains some indication that young adults are underrepresented. Even though the household compositions indicate the young adults are in the households we interviewed, they are still disproportionately underrepresented using all respondent selection procedures. Weighting did not reduce the differences between the respondent distributions and the census proportions.

A re-examination of earlier research (Kennedy, 1988) indicates that weighting by the number of adults in a household made minimal difference in the population estimate derived from telephone survey data. Kish (as cited in Frey, 1989) found that weighting by (the number of adults/household size) did not change the estimates very much. In fact, weighting did little to change the distributions in the selection procedure actually used (random adult). I feel these results indicate that the improvements in estimates will be minimal, no matter which selection procedure is used.

Implications of the Research

This section also needs additional work. I will tie the results back to the four issues in the Introduction that are the basis for this research.

I make a statement about households with persons under age 25 that comes from a visual inspection of a database that contains the composition and relationships of all households in the IPS over three years. I need to quantify the number.

I hope to introduce some ideas on how we might integrate list-assisted sampling procedures with changes in respondent selection procedures. I will describe how it might be possible to use list-assisted sampling in conjunction with other information in the lists (demographic/census data) to incorporate differential respondent selection probabilities.

I intend to look at the cases where the informant became the respondent and when the respondent was not the informant to see differences on some substantive questions. If no differences exist, it might indicate that respondent selection is superfluous when used with post-survey weighting.

The data from these surveys indicates that respondent selection procedures will not, by themselves, produce samples of respondents that mirror the population demographic characteristics. There are two reasons why this occurs. First, some people are more likely to cooperate than others. From these data, it appears that women and persons with college educations are more comfortable talking to survey researchers. (This is not surprising or novel.) Second, none of the respondent selection procedures adequately takes into account household compositions or differential non-cooperation in their procedures. Older people live in households with fewer adults, so they are interviewed more often and their probability of selection is higher. Weighting by household size reduced their proportion to roughly equal the population proportion but there is a bias towards females among the older population. Persons under age 25 are less likely to be interviewed because they are more likely to live in multi-adult households and weighting by household size does not appear to compensate for it. Third, the data indicate that the differences in sample composition and population composition that result from using selection procedures over interviewing the first adult are not substantial at least when considering these three demographic characteristics.

Previous research (Zukin, etal; Oldendick etal) found that there was little difference in the answers to the substantive questions using different selection procedures. Zukin etal even found that the distributions on substantive questions when interviewing the first adult were not different from interviewing the adult who had the most recent birthday. These results, along with the data in this paper, question the value of current respondent selection procedures.

By weighting to known distributions, the differences, if they exist, between methodically selecting respondents and talking to the first adult can be minimized. Surveys would be much easier and much less costly if we interviewed the first consenting adult (an even less rigorous method). For the same resources, more interviews could be completed or more funds could be used for testing and development. Of course, a complete analysis of the potential error introduced by interviewing the first adult over what already exists is needed.

We often feel the need random selection procedures to make statistically valid conclusions. But, most statistics are based on the assumption of simple random sampling. The procedures for selecting the sample of telephone numbers are not simple random sampling. And, within households, depending on the procedures, not all persons have equal probability of selection. Weighting by the inverse of selection produces better estimates but it makes the use of standard inferential statistics more difficult.

Another important issue is how social researchers analyze survey results. We analyze respondents as representatives of designated groups (gender, age, marital status, education, race or ethnicity) but a proportionally representative sample of persons with the designated characteristics cannot be done effectively with the current survey methods. The procedures used to generate telephone numbers and the within-household selection procedures coupled with differential noncooperation produces a complex sample. The alternative to careful controlled sampling is post-survey weighting for correcting for errors and biases introduced by the sampling procedures and noncooperation. According to these data, there is little difference in the amount of additional weighting needed to correct for selection procedures and the weighting needed to correct for interviewing the first adult. In fact, the weighting by the number of adults may be an unnecessary step.

Survey researchers believe that respondent selection is needed to prevent certain types of persons from being selected too often to answer questions. These data indicate that this might be a false concern since all procedures produce distributions that are not representative of the population they are drawn from. Even more, since most organizations are skilled in post-survey weighting, there may be little loss in interviewing the first available adult. Further analysis is needed on this question.

The concern for random selection with a known non-zero probability of selection remains valid but it is not possible to know the probability of selection in most surveys. Some consideration must also be given to whether probabilities should be figured from household distributions, essentially the inverse of the within-household selection, or whether the probabilities should be based on the analytical characteristic of interest. For example, should we know the probability of selecting young males to provide the proper weighting schemes and not consider the household composition that the young males were selected from, if we are interested in examining differences among age groups and gender. I would argue that the intermediate step of weighting by household size does little to improve survey data.

There are no methods of accurately estimating the number of persons with a certain opinion or attitude using survey research. We can only assume that if we do everything right, that our measurement of attitudes, opinions, etc, are representative of what would be found in the population. We can measure how accurately we can produce a sample that is representative of the population that it is assumed to be drawn from. These data indicate that at least at the IU CSR, we do not produce proportionally representative samples. I suspect we are similar to other organizations. If we cannot duplicate population demographic characteristics, why should we assume our distributions of opinions are correct?

In the next step of this research, I will look at the differences between the responses of the first adult when s/he was chosen as the respondent and other selected respondents. This should help to determine how much difference, if any, results from selecting a respondent. I will also look at a variety of weighting procedures to determine if any can more accurately reproduce the population characteristics. Although Kish found that within-household weighting made little difference in the overall estimate, it might be valuable to see if changing household compositions might make weighting more important because of the current household and demographic structure is changed from when his work was done. Perhaps we need more measures of selection probabilities besides the number of adults in the household. In a society where household compositions are changing fairly rapidly and where the number of adults in households is increasing, if we want to improve our ability to use simple statistics to analyze our results, we need to estimate more effectively the sampling distributions.

Go to Top