Contextualizing The Meaning of Probabilities
C. Y. Joanne Peng, Anne Buu, and Bernard Flury
Discussion
The insignificant context effect of the magnitude of numbers in the fractional restatement indicated that this context did not influence subjects= selection of the probability expressions. However, these results could be confounded by the phrasing used in those statements. For example, when subjects read the statement, AIn a national poll, 36% of the parents - or 36000 of every 100000 - said they would not like their child to take up teaching as a career," they might have simply skipped the fractional restatement (36000 of every 100000) since the percentage (36%) had already preceded the restatement. If this alternative interpretation is true, an improvement can be made for the instrument by deleting the percentage but keeping the fractional restatement only in each statement. The improved instrument should be tried on similar subjects to verify if indeed such a context effect did not exist.
The context effect of expectation was not noticeable in Figure 1A through Figure 11B. There was some possibility that this effect was confounded by the different percentages reported in the three statements, which represented the same probability interval. For example, for the lowest interval (the median was 7%), the percentage used in a statement which was presumed to be equal to what was expected by subjects was 12% whereas the percentages reported in the other two statements, presumed to be higher or lower than what was expected by subjects, were both 7%. This confounding effect can be investigated in a future study by selecting three statements, which use the same percentage but imply different expectation information, to represent each interval.
Another feasible approach to analyzing the expectation effect is to construct a contingency table of 4*11 for each statement to test if subjects= real expectation (higher, lower, equal, or no opinion) given during the first task has association with his/her selection of probability expressions from the third task. The commercial software used in this study was unable to perform exact testing on these Asparse" tables (i.e., many cells with small or zero frequencies). Algorithms most recently developed which make computations of exact tests feasible for sparse tables may shed more light on the effect of expectations (Agresti, 1990; Baglivo, Olivier, & Pagano, 1988; Cox & Plackett, 1980; Mehta & Patel, 1983; Mehta, Patel, & Senchaudhuri, 1988; Pagano & Halvorsen, 1981).
The frequency distributions for the 11 probability expressions (Figures 1A to 11B) led to the following conclusions:
- "Almost impossible" and "very unlikely" anchored the low end of numerical probabilities; when contexts were considered, "almost impossible" was judged to convey lower probabilities than "very unlikely."
- Expressions incorporating the stem Aprobable" seemed to be used less frequently than expressions incorporating the stem "likely."
- "Unlikely" and "possible" both expressed a broad range of probabilities less than 50%.
- The rank order of Aan even chance" and "possible" based on the context-bound data (the probability restatement task) was more consistent with the literature than the one based on the context-free data (the paired comparison task).
- On the scale of 0% to100%, Aan even chance" had an intrinsic meaning of 50%.
- "Probable" and "likely" both corresponded to the two intervals adjacent to the middle interval (centered around 50%) while meanings of "likely" gravitated toward slightly higher probabilities than "probable."
- While "very probable" and "very likely" both corresponded to a broad range of probabilities larger than 50%, "very likely" peaked around 79%.
- AAlmost certain" anchored the high end of the probability scale.
As subjects compared the relative magnitude of probability expressions in pairs, without a reference scale (e.g. 0%-100%), "an even chance" was not used as an anchor point for the middle at 50%. The rank order of the logit scale was consistent with what has been reported in the literature (Reagan et al., 1989) with one exception‹the relative positions of "possible" and "an even chance" were reversed. The scale values derived from logit modeling did not correspond with the relative distances on the reference (the first column of Table 1). For example, the scale value of "very likely" was in the middle of the logit scale (about 0.5); the relative distances among high probability words (i.e. "likely," "very probable," and "very likely") were larger than those among low probability words (i.e. "unlikely," "improbable," and "very unlikely"). These observations could have resulted from the theory of the logit modeling. In logit modeling, the scale value of a probability expression, which is the exponential of the logistic regression coefficient of the indicator variable, was called the Atrue rating" of that probability expression (Agresti, 1990; Torgerson, 1958). The probability that an expression i is chosen over the other expression j is therefore the true rating of i divided by the sum of true ratings for i and j. For instance, according to the logit scale for the 1998 data, the probability that Aalmost certain" represented a greater likelihood than "very likely" is determined to be 1.000/(1.000+0.546)=0.65. Thus, the rank order of expressions by the logit scale is reliable and can be directly explained. Yet the logit scale values only make sense if one transforms them to comparative probabilities of how one expression represents a greater likelihood than the other.
Compared to reference values (Column 1 of Table 1), the least squares scale values of the expressions "very unlikely," "improbable," and Aunlikely" were underestimated. This could be due to the way in which the origin of the least squares dimension was determined. For the paired-comparison data, the origin is always arbitrarily and uniquely decided by each scaling model. In this study, the least squares model set the origin at the low end (Aalmost impossible"). If it had chosen the word "almost certain" as an origin to anchor the scale, a set of different scaled values might evolve. Thus, future research needs to confirm whether the scale values close to the origin tend to be poorly estimated.
The consistency of scales obtained by the same scaling method for data collected between years suggested that the relative magnitude of the 11 probability expressions was stable between 1992 and 1998. Strong, positive correlations between the two different scales for the same data were not a surprising finding. According to Torgerson (1958), the logarithm of the logit scale value corresponds closely to the scale values of least squares model. In this study, the correlation between the logarithm of the logit scale and the least squares scale was 0.97 (for both 1998 and 1992 data). Therefore, the relationship between the two methods stated by Torgerson (1958) was supported in this study.