Stigma In Global Context: Mental Health Study
Indiana University . Bloomington . Indiana
 

Home

 

Research Plan 

 

Fielding Schedule and Sample Design

 

Participating
Countries

 

Instrument

Other Links

Please address all general inquiries to:

sgcmhs@indiana.edu

Search:

 
 

 

Bulgaria Sample Design

SAMPLE SIZE AND SAMPLING  PROCEDURE

The sample size is: Bulgarian population aged 18 and over; N=1000.

The sample we propose is a two-stage cluster sample. The first stage includes random selection of about 150 clusters. This sample is drawn from the list of electoral sections (the primary units in elections of the October 2003 local elections). Electoral sections of different size (depending on the number of people in each section) will be proportionally represented in the sample. The second stage includes random selection of eight (8) respondents. The planned sample size is thus 1200. The expected non-response is 15-20%, and that makes the expected size of the real sample about 1000. (For more details, please refer to the enclosed Sampling Plan)

Sampling Plan

The planned sample size is N=1200. The sampling model is two-stage random cluster sample. The sampling universe is the population of Bulgaria aged 18 and over (total number by March 2001 according to the Census of population – 6,372,418 people; 3,065,182 men and 3,307,236 women; (Table 1). The size of the planned sample is larger than the required because of an expected non-response rate of about 15-20%.

Table 1:  Population of Bulgaria Aged 18 and over by Age and Gender by March 2001

Age

Total

Men

Women

Total:

6372418

100%

3065182

100%

3307236

100%

18-19

3.4

3.7

3.2

20-24

9.0

9.6

8.4

25-29

8.9

9.5

8.4

30-34

8.5

9.0

8.1

35-39

8.2

8.6

7.9

40-44

8.6

8.9

8.3

45-49

8.8

9.0

8.6

50-54

9.3

9.4

9.2

55-59

7.6

7.5

7.7

60-64

6.9

6.6

7.2

65-69

7.1

6.7

7.5

70-74

6.1

5.5

6.7

75-79

4.5

3.8

5.2

80+

3.0

2.4

3.6

 Source: National Statistical Institute - Census of Population - March 2001

First stage

The first stage of the sample is based on the list of the electoral sections as of the last local elections (October, 2003). The total number of the electoral sections is 12313. The average size of one electoral section is about 500 people. Electoral sections cover the whole territory of the country and respectively they provide access to the whole population. We dispose of the complete list of electoral sections, which includes: number, territorial location, and number of registered voters. Selection of electoral sections to be included in the sample is made employing the following procedure (systematic random selection):

1.       Within the list of electoral section a cumulative measure of size column based on the number of people in each electoral section is computed.

2.       The total number of people in all sections is divided by the number of sections to be included in the sample (the proposed number for the present survey is 150). The product of the division is the so-called “selection interval” (SI).

3.       A random start (RS) within the range between 1 and SI is chosen.

4.       Following the cumulative column in the table, the first electoral section to be included is the one which contains the RS. The second section is the one, which contains RS+SI, the third - RS+2SI, etc.

The proposed number of clusters (electoral sections) for the sample of the present survey is N = 150. Following the above procedure ensures:

1.       That the clusters are chosen with probability proportional to the size of sections.

2.       That the sample is proportionally distributed over the territory of the country and includes all types of locations (cities, towns, and villages).

3.       That it is representative of the whole population of the country.

Second stage

At the second stage of the sampling procedure a fixed number of persons in each cluster (electoral section) are selected at random. This number is obtained by dividing the size of the planned sample by the number of clusters (1200/150). Thus each cluster is to include 8 persons.

The respondents within the clusters are chosen at random from the Central register (the computer center of the ESGRAON system). The ESGRAON system covers the whole territory of Bulgaria and the whole population. It is based on the so-called personal ID numbers: 10 digit numbers where the first 6 are composed by the birth date YYMMDD. These numbers are used for administrative purposes (taxation, social insurance, address registration, etc.) The personal numbers enable samples to include people living in a specific territory within a given age range. In the case of the survey age range will be specified as 18 +.

The result of the selection at the second stage is a list of respondents including personal ID, name, community, and address. Thus each interviewer will be supplied with the names and the addresses of the respondents to be interviewed. The interviewers will record the information for all inaccessible respondents.

The sample composed in the above-described way will have the following properties:

1.       The sample is representative of the Bulgarian population aged 18 and over and will cover the whole territory of the country.

2.       The sample is designed to reproduce the basic socio-demographic parameters of the population aged 18+ as of the data from the last Census.

3.       Where available information from the latest Census of the population (March 2001) will be used.

4.       The parameter estimates (distributions for each variable in the survey) will depend on the size of the sample and the level of intra-class correlation (the level of similarity of respondent answers to the different questions within a given cluster).

Given the planned sample size (N = 1200) and the average estimate for the intra-class correlation of B=0.05 the expected maximal stochastic errors for the different estimates of variable distributions are as follows:

Estimated Stochastic Error

Relative Share (%)

Maximal Relative Stochastic Error

Maximal Stochastic Error

Confidence Intervals

Low

High

 

 

 

3.6

6.4

5

27.6

1.4

8.1

11.9

10

19

1.9

12.7

17.3

15

15.1

2.3

17.5

22.5

20

12.7

2.5

22.2

27.8

25

11

2.8

27.1

32.9

30

9.7

2.9

36.9

43.1

40

7.7

3.1

46.8

53.2

Weighting procedure

The weighting of cases goes through several steps which could be summarized as follows:

  1. Estimation whether a weighting procedure is necessary. There are several demographic characteristics of respondents in survey which are controlled for being representative. The latest include: sex, age, marital status, type of residence etc. These characteristics are essential for representing the distribution of the general population. The way to control them is to compare the actual results from the sample survey with results driven from National Census preferably the latest possible. Thus, if discrepancies larger than the standard sample error are observed the data from the sample need to be weighted with the data from official source e.g. National Statistics Institute. The weighting procedure is applied only to those variables which are biased. If more than one is, than a combination of variables is calculated and weighting coefficients are calculated on this basis.
  1. Estimation of weighting coefficients. In order to obtain a distribution similar to the distribution of general population, weighting coefficients should be calculated. The basic logic will be described with the following example:

 According to the latest National Census Data (conducted in March 2001) sex distribution of the General population i.e., is 48.1% males and 51.9% females. Given survey arrived at following results: 43% males and 57% females. Obviously under the above mentioned requirements a weighting procedure is required in order to obtain data which is as close to the distribution of General population as possible. We require two deferent types of coefficients – one for males and another for females. The actual calculation is presented in the following table:  

 

National Statistics Data (%)

Sample survey data (%)

Computation actions

Weighting Coefficients

 

A

B

(A/B)

C

Male

48

43

48 / 43

1.116279

Female

52

57

52 / 57

0.912281

Total

100

100

 

 

After dividing column “A” by column “B” we arrive at the required weighting coefficients in column “C”. Thus, after applying the weighting variable in the data set, one male will no longer be a single unit, but 1.116279 units and one female will be 0.912281 of a unit.

The stated herewith example is applied only when one demographic variable from the sample is biased. In cases when several variables are “off the limits” a combination of each category of one variable and each category of the other variables is computed. The percentile distribution is than compared with the one driven from National Statistics. The eventual procedure of coefficient calculation follows the same logic as the one described above. Only in this case we have as much coefficient as the number of combinations between categories of variables. If two variables sex (2 categories) and marital status (5 categories) with have distribution different from the General in total 32 possible coefficients are possible.

 

1022 E. Third Street, Bloomington, IN 47405 (812) 855-3841

Indiana University

Last updated: 6 September 2005
Comments: sgcmhs@indiana.edu
Copyright 2004, The Trustees of Indiana University
Copyright Complaints