This article was published in the NNSP Newsletter in Winter, 1994. Please contact me if you want more information about it.
![]()
This is an article about my experiences with various RDD sampling procedures. It describes my wanderings through a variety of sampling techniques and my observations of each. I wrote this article from the point of view of a practitioner, not a scientist, so I won't claim that I have systematically observed the techniques I describe. My continuing goal has been to find an RDD sampling procedure that provides accuracy and efficiency. Accuracy means that I feel the sample is representative; efficiency means that the sampling technique produces accuracy in a cost-effective manner.
![]()
We don't conduct many RDD studies. Our Center is required by the University to conduct two Indiana Polls each year. We usually conduct only 1 or 2 additional statewide RDD surveys each year. My perspective, then, is that of someone who does few RDD surveys.
When I started at Indiana University in 1987, the Center was using a Waksberg-Mitofsky type RDD for its telephone number sampling. Each January, we would purchase a Bellcore tape and extract the Indiana exchanges. We would then conduct what we called a PNC (primary number calling) survey. We assigned a random number to each area code/exchange combination in the state and called to determine if it was household number. When we identified a household, we conducted a short interview. These household numbers became the seeds for the RDD sample clusters that were used for the remainder of the year.
After the first year, I analyzed of the composition of the Indiana Poll surveys conducted in the previous five years. I noticed that we did not have any respondents in about 12 - 14 counties each year (Indiana has 92 counties). These counties represented about 10 - 12 percent of Indiana's population. There are approximately 900 exchanges in Indiana, and we typically had about 200 exchanges in the sample each year. These numbers caused me some concern about our coverage.
During the second year, I concluded that the costs of conducting the PNC and using two-stage RDD processes were greater than we could expect if we purchased samples. I figured that the costs for wages to conduct the PNC, the additional wages needed to work the RDD numbers in other surveys, and the cost of the Bellcore tape were greater than the cost of the purchased samples. In addition, the purchased samples would likely include all Indiana counties. The next year, we purchased RDD samples from a major sampling company. Overall, I had no complaints except perhaps the hit rate was not as high as advertised.
During that year, we conducted many multi-state listed surveys. In general, the sample lists did not include the time zone and daylight savings time information needed to program our call scheduler. I then decided that we should again purchase the Bellcore tape because it had the needed information.
The following year, we again conducted a PNC, but instead of assigning one random number to each exchange, we assigned three. I wanted to determine if we could improve coverage by calling more random numbers. I found there was some improvement in coverage when we used two random numbers per area code/exchange, but there was little additional coverage gained with the third number. Until last year, we used this procedure to generate our statewide RDD samples. Most years, we did not do a PNC survey but combined the first stage into the spring Indiana Poll.
During the time we were doing two-stage RDDs, I had two concerns. First, when we had multi-state RDD surveys, I still needed to purchase telephone numbers. The costs of a two-stage RDD process for multi-state samples would be prohibitive for the size of the surveys we conduct. Second, I was concerned that we were not managing two-stage RDD sampling very well. I continued to find problems with our sample compositions.
In 1990, we conducted our spring Indiana Poll at the same time that the decennial census was in the field (April). As part of the Poll, we rostered households using the census questions. (We also rostered households in 1991, 1992, and 1993.) As the results from the decennial census became available, we were able to compare the Poll distributions with the census distributions on regional and demographic characteristics. I found that the sample distributions based on the interviews conducted in the first stage were much closer to the census distributions than the second stage and the final distributions. That is, when we randomly sampled telephone numbers, we could roughly duplicate the census distributions. When we did the RDD second stage, we got further from the actual distributions. Post-survey weighting did not significantly reduce the differences.
Besides, coverage, I was also concerned about cost and efficiency. In spring 1993, we wanted to conduct 500 interviews in the Indiana Poll. To work the two-stage process, we finally had to conduct 677 interviews. That was frustrating from both a time and a cost perspective. In addition, we were noticing that the hit rate on the first and second stages dropped steadily over the years as more exchanges came online.
Overall, I am not certain that we did RDD sampling in the most efficient and accurate manner. But I felt that if we could not sample accurately with two-stage RDD, if the process was becoming increasingly more costly, and if it had inherent problems such as coverage, then we should not use it. Despite these problems, I was comfortable with the accuracy of our results. Our 1992 pre-election survey predicted the correct percentage for one candidate and was only one percentage point off for each of the other two.
At the 1993 AAPOR meetings, I heard a paper that presented a positive evaluation of the Genesys sampling system. I won't embarrass myself or Marketing Systems Group by trying to describe their procedures. Essentially, it is a list-assisted sampling system. I found the system appealing because it would likely provide full coverage across the state and it appeared to be more cost-efficient than two-stage RDD. Overall, the Genesys system should generate fewer "not in service" numbers.
The Genesys system is leased from Marketing Systems Group and would be expensive for a small organization that does a limited number of RDD surveys. MSG also sells sample telephone numbers, so in fall 1993 we decided to evaluate the Genesys system by buying a national sample. Overall, we were pleased with it. The demographic characteristics of the sample were about what we expected. The hit rate was a little below what we expected. Because of this experience, we decided to purchase the lease for this year. We projected substantial RDD surveying this year, so it seemed like the right year to try it.
Our 1994 "spring" Indiana Poll was conducted in early February using Genesys for the first time for a statewide survey. At the end of the Poll, we discovered that most demographic characteristics were the same as previous Polls, but two characteristics - income and marital status changed dramatically. Historically, we recorded about 59 - 62 percent married in our Indiana samples. In the spring Poll, we had only 52 percent married. (I have a 13-page document that describes this problem for any who wants to help me solve it.) We conducted interviews in 91 counties.
We won't conduct our fall Poll until mid-November, so I can't report if this is a one-time change, but an RDD that is currently in the field (about 2100 of 4500 interviews are completed) shows 59 percent married. Next spring's Poll will again roster the households. By next spring, we will be able to measure more accurately the differences between Genesys and or other sampling procedures in coverage and demographic characteristics.
The Genesys system is more than a sampling system. In fact, we also use it as an information system. Let me explain how we are using the demographic module. During this year, we have been conducting a survey funded by the National Endowment for the Humanities. The survey has two parts - a national sample and three samples of minorities. We used the Genesys system to help find minority respondents. For the African American sample, we stratified telephone exchanges by their density of African American households. We sampled primarily from the high density exchanges but also included some medium density and low density exchanges in the sample.
To develop the Mexican American sample, I looked at the sampling procedures used for the Latino National Political Survey. I was able to use the Genesys system to sample in a manner roughly similar to the LNPS. Essentially, I identified exchanges in five states that have high densities of Hispanic households. For the Native American sample, we identified a reservation and generated the telephone numbers from the exchanges in the counties that are in the reservation. The information in Genesys allowed us to target minorities with cost-effective designs.
The minority samples are not really "representative" of a minority population but the system did allow us to efficiently concentrate our efforts where they would likely produce the most interviews for the least interviewing costs. The nature of the survey (we are using ethnographic-type interviewing procedures to ask people to tell us stories about how the past affects their lives) makes representativeness less of a concern than most surveys.
Another example of how the system can be used - next year, we are going to conduct a survey of ovarian cancer survivors. Part of the study requires that we match the survivors with an RDD sample that has similar demographic and social characteristics. We will be able to use the Genesys system to focus on those exchanges where we will be most likely to find the matches.
The system is relatively easy to use. I am writing this on Saturday afternoon. Late this afternoon, the project manager reported that she was running out of sample and we might miss the target number of interviewers. I determined that the screen-out rate was underestimated, and we needed an additional 100 sample numbers. The numbers were generated in about five minutes. We can call these numbers this weekend, and we should be able to finish the survey next Wednesday as projected.
I am not writing this as an endorsement of Genesys. Rather, I am pointing out that there are alternatives to traditional two-stage RDD procedures that allow us to collect data more efficiently. With the system, we can target our RDD samples, we can predict the hit rate a little more accurately, and we have demographic characteristics of the exchanges that help researchers provide a social context for the respondents' answers. In addition, we can conduct an RDD sample and feel that we have sampled well and still obtain the target number of interviews.
Since becoming involved in survey research seven years ago, I have continually wondered why we do what we do. For example, the within-household sampling procedures have changed little over the years. At the 1993 AAPOR meetings, I presented a paper that showed how each of eight respondent selection procedures generates demographic distributions that differ from the characteristics of the households they were selected from. Within-household sampling procedures should be updated for the current variety of living arrangements found in US households. Over the past year, we have developed non-standardized interviewing procedures because the standardized procedures were causing problems under certain conditions. The "history" survey described above could not be conducted using standardized interviewing procedures.
The household selection sampling procedures used in survey research were developed in years ago. They were designed for large-scale surveys, conducted with long field periods. They continue to work well for large organizations with large samples and large budgets. New list-assisted sampling systems give smaller organizations the ability to generate samples equivalent in coverage and targeting that were once available only to larger organizations.
I predict that in the next few years, someone will design new procedures that recognize the strengths of the newly-available sampling procedures and use them to combine household selection with within-household selection to improve both coverage and accuracy. Interviewing procedures may differ based on the information the interviewer has when s/he conducts the interview. Overall, I feel these changes are needed to improve data quality.
Many members of NNSP are small organizations. There is much potential in our organizations to conduct research on survey procedures in manner similar to medical trials. They will often recruit many doctors and institutions to participate and then determine impacts over many trials and conditions. I invite you to work with the NNSP to design and develop procedures that work for small (and large) organizations. Many of you have already done something similar to what I described here. Let's share our information and our successes.
* The thoughts, ramblings, errors, and obfuscations in this article are my responsibility alone.
![]()