Formal Models of Familiarity and Memorability in Face Recognition



Thomas A. Busey

Indiana University





Please send correspondence to:

Thomas A. Busey

Department of Psychology

Indiana University

Bloomington, IN 47405


Note: Word doesn't always translate symbols correclty, so the RTF version is much cleaner.

The similarity structure of faces has long been recognized as an important mediator of face recognition. Distinctive faces have an enduring quality to them, such that they are rarely confused with other faces. At the same time, we often encounter a situation in which a particular face looks familiar, yet it may only bear a resemblance to someone we have actually met before. The veracity of these introspections has been borne out by empirical evidence, which has served to identify the information used during face recognition. Much of the research has focussed on the role of typicality, which may be defined in various ways, but is often operationalized as a subjective rating of the difficulty of picking a particular face out of a crowd. Defined as such, typicality embodies the similarity structure of faces, such that typical faces will be similar to lots of other faces, while atypical faces will be very dissimilar and appear distinctive as a result.

This chapter provides an overview of the research on face perception that attempts to discern the role of typicality and the similarity structure of faces in face recognition. The primary discussion will revolve around the ‘face-space’ representation that was formalized by Valentine (1991a, b) as an extension of previous geometric models from the categorization literature to the area of face recognition. The face-space representation provides the basis for a discussion of the storage and retrieval mechanisms that may account for the effects of typicality described above. To motivate this discussion, extant forced-choice face recognition data is analyzed using a variety of process-oriented models that make predictions for individual faces in the face-space representation. The success and failure of these models is used to draw conclusions about the nature of the representation of faces in memory and the retrieval processes that work to enable the recognition of faces.

A variety of studies have demonstrated that distinctive or atypical faces have a characteristic advantage in recognition. Subjects discriminate distinctive faces better than very typical faces, such that distinctive targets have high hit rates and low false alarm rates (e.g. Light, Kayra-Stuart, & Hollander, 1979; Vokey & Read, 1992). Typical faces tend to have slightly higher hit rates but produce low discriminability, which results from a very high false alarm rate that more than offsets the higher hit rate. Interestingly, typical faces may engender higher feeling of familiarity regardless of their status as targets or distracters, or in the parlance of an old/new recognition experiment, a higher feeling of 'oldness' (Bartlett, Hurry & Thorley, 1984).

These studies demonstrate that typicality ratings are at least related to those factors that affect recognition. Vokey & Read (1992) addressed the role of typicality in recognition with a principle components analysis of ratings of attractiveness, familiarity, likeability, typicality and memorability. They found that typicality ("how easy is it to pick this face out of a crowd") could be dissociated into two orthogonal components. The first consists of the attractiveness, likeability and familiarity ("how similar is this face to others that you know?") components. The second consists of the memorability rating ("how easy is it to remember this face?"). The typicality ratings loaded equally on the two components. This suggests that two processes affect typicality (and therefore recognition). The first is what Vokey & Read (1992) describe as context-free or structurally-induced familiarity. In this case, the to-be-identified face engenders a high feeling of familiarity, but there is no indexing of the source of the memory. Such a feeling of familiarity may be erroneously produced by misattributing the face to similar faces stored in memory, and thus typical faces are high in this context-free familiarity component. The second process is described as the familiarity due to prior exposure. With this process, the identifier matches the target to an item in memory, or at the very least perceives the target face as more familiar as a result of the prior exposure. Distinctive items are thought to have an advantage that results from encoding and retrieval processes working on the distinctive elements of the face; as a result, distinctive faces tend to gain more familiarity due to prior exposure than typical faces (Bartlett et al, 1984).

The crucial aspect of this framework is that the recognizer is thought not to be able to distinguish between these two forms of familiarity. This results in a situation where typical faces engender high feelings of familiarity, in part through their similarity to other faces. This also leads to confusions, such that a typical distracter will have a high false alarm rate due to erroneous false matches to old items in memory. Distinctive faces have low structurally induced familiarity which will produce very low false alarms when these are used as distracters. However, the distinctiveness provides for easy encoding, making them memorable and giving them high hit rates that more than makes up for the initial low feelings of familiarity due to the context-free component (Bartlett et al, 1984).

O'Toole, Deffenbacher, Valentin & Abdi (1994) extended the work of Vokey & Read (1992) to digitized pictures of faces used as input to a neural network. They trained an associative neural network to recognize Caucasian and Asian faces and found that the memorability component of recognition was due to small, local distinctive features, while the familiarity component of recognition was related to more global aspects of the shape of the face. This reveals what might be a strategic use of information on the part of subjects: if a small local feature such as a mole is highly predictive of a face, it will be used by the encoding system to access the context of the study event and provide strong discrimination. In the absence of such features, the recognition system is forced to rely on more generic face information such as shape. In this situation the face is evaluated for its overall familiarity, since the mechanism driven by the memorability component are not engaged by distinctive features.

In addition to these processes that have been proposed to account for the effects of typicality, several authors have suggested the need for negative evidence. Vokey & Read (1992) found that the memorability component of typicality was correlated with the false alarm rates of typical and atypical faces, which produces the result that distinctive faces have very low false alarm rates. They argue that subjects assess the memorability or the retrieval potential of a particular face, and conclude that if this is high they would have remembered the face if it had indeed been studied.

The division of the recognition processes into components associated with context free familiarity and those mediated by recall mechanisms parallels the process dissociation literature for word recognition (e.g. Jacoby & Dallas, 1981; Mandler, 1980). These dual-process models posit a fast-acting automatic retrieval process that produces a general sense of familiarity, and a slower controlled process that produces a specific conscious recollection of the prior study experience. A more typical face may have privileged access or provide faster retrieval through some form of perceptual fluency, and this might map onto the context-free familiarity mechanism proposed by Jacoby (1991).

Despite the intuitive appeal of the Vokey & Read (1992) two-process model and its support from the process dissociate literature, there are intepretational problems with the data used to support such a model in face recognition. O'Toole, Bartlett & Abdi (submitted) discuss the difficulties that come from correlating some external rating such as a typicality rating with a dependent measure such as the hit or false alarm rate. For example, a high false alarm rate may result from either low discriminability (d') for typical faces, or a criterion shift in which subjects relax their criterion for how much evidence they are willing to accept before calling a face 'old'. O'Toole et al conclude by calling for a model-based approach that makes predictions about which individual faces are easy to recognize. Such an approach should consider the similarity structure of the faces, and would have the added advantage of making the assumptions about the use of information explicit.

The goal of this chapter is to propose and test just such a model. I will describe the foundations of the similarity structure that has been developed in the categorization literature (e.g. Medin & Shaffer, 1978; Nosofsky, 1986) and proposed by Valentine (1991a,b) to account for face recognition. This 'face-space' representation is then used as the input to a face recognition model that uses a sampling rule to account for the data from typical and atypical faces described above. I then test this model on forced-choice recognition data and demonstrate how it can make quantitative predictions. As we shall see, the model will have difficulty accounting for faces that are very similar to studied faces, and I will explore a variety of extensions that might account for these data as well. Finally, I will discuss some future directions for the use of geometric inputs to face recognition models.

Geometric Models of Faces and Objects

The use of typicality in face recognition research has usually been operationalized as a rating on how easy a face would be to pick out from a crowd. Implicit in this question is how similar a particular face is to other faces, or how much the face would stand out. One alternative to this approach is to measure the similarity between all pairs of faces in the experiment and compute typicality in terms of the similarity of a particular face to other faces. The role of similarity has been well worked out in the categorization literature, where it has been used in models to make predictions for prototype experiments and test decision rules in categorization experiments. In these experiments, training exemplars are used to construct a prototype stimulus that represents the central tendency of the exemplars. This stimulus is then used at test in a recognition or categorization experiment to assess the existence of a psychological prototype. The prototype is a novel stimulus and should be classified as such, but it is almost always classified as an old stimulus. Although such data are consistent with the existence of a prototype, alternative accounts are also possible. The prototype is by definition similar to the training exemplars, and Nosofsky (1986) has demonstrated that this similarity increases the overall familiarity of the prototype stimulus, and this alone can account for the prototype effect. Thus in many cases there may not be a need to propose a psychological prototyping mechanism.

Similar mechanisms have been proposed for faces. Byatt & Rhodes (in press), Rhodes, Carey & Byatt (in press) and Valentine & Bruce (1986) all tested between a Norm-Based Coding representation, in which a face is compared against a central prototype face, and an exemplar based representation, in which each face is represented as a point in ‘face-space’. These two representations are notoriously difficulty to distinguish between, in part because if the exemplars are normally distributed around the center of the space (where the putative prototype would be located) and can extend their influence to nearby locations, then the exemplar-based model acts like a ‘fuzzy’ prototype. For example, as a face gets closer to the center of the space where a prototype would influence it more, it would also get closer to other exemplars that cluster around the center, which would also influence the face more. Often one requires quantitative models to distinguish between these two representations, since qualitatively they produce identical predictions.

Quantitative predictions can be produced by a model that represents the similarity structure of the stimuli as its initial input. The similarity between any two faces can be measured by asking subjects to make a similarity rating on a 9 point scale, and repeating this procedure for all pairs of faces. For an experiment with r faces, this requires multiple ratings on pairs of faces. This provides datapoints, and a more efficient representation can be produced by submitting the similarity ratings to a multi-dimensional scaling algorithm such as ALSCAL. The output consists of an n-dimensional space (where n is usually less than 10) that represents each face as a point in this space. The dimensions are not specified by the experimenter, instead they emerge from the MDS procedure according to the dimensions along which faces differ. Gender, age, race, facial fatness, hair color and eye width are all possible dimensions that might emerge. Figure 1 shows an hypothetical ‘face-space’. The location of a face in this space can be used to define it similarity to other faces, and assuming a normal distribution around the centroid of the space, the most typical face will appear near the center of the space. Distinctive faces will appear at the fringes of this space.

This exemplar-based representation does not make direct predictions for recognition experiments, but it can be used as input to models that work on this representation. This defines the source of information used when recognizing faces: face-space based models assume that the similarity structure of the faces is used as input to some mechanism that will eventually produce an old/new response. This puts an enormous weight on the face-space representation, such that if it is missing some key dimension that is used in recognition, all models based on the representation will be incorrect as well. However, if the face-space representation accurately captures the dimensions that are important for recognition, the model can account for all the hit and false alarm recognition data using a few simple principles that are embodied in mathematical relations with a small number of free parameters. Thus the model can provide a succinct account of face recognition by quantifying a few principles into a process-oriented model that describes how information computed from the face-space representation is manipulated to produce a predicted recognition response.

Face-Space Representations and Models of Recognition

Within the categorization literature, the use of the multi-dimensional scaling approach has been limited to relatively simple stimuli such as color chips, geometric figures, random dot patterns and random polygons. The advantage of such stimuli is that the experience provided by the training portion of the experiment is the only exposure the subject will have for a particular stimulus. In addition, these stimuli are either inherently low-dimensional, or if they are high-dimensional they are constrained to vary along only a few underlying dimensions (e.g. Edelman & Intrator, submitted). However, we have no way to control the prior exposure to faces, except to assume that subjects are very experienced with faces and somehow take that into account in the modeling. As a start, we can assume that for novel faces, the similarity relations between the faces provides a representation that captures those dimensions that are relevant for face recognition.

Much of the work with geometric representations provided by MDS applied to similarity ratings assumes a representation such as that shown in Figure 1. Stimuli have values along different dimensions, and a variety of quantities can be computed. Most models assume that the distance di,j between any two faces can be computed from the locations in this space,

Eq. 1

where xi,n is the coordinate for face i on dimension n (out of M total dimensions) and wn is the attentional weight given to dimension n as described below. The similarity, hi,j, between faces i and j is defined as

Eq. 2

where c is a scaling parameter used to define the relation between distance and similarity (Shepard, 1974). This re-computation of similarity enables a mapping of distance to similarity that systematically varies; high c values produce similarity values that are high only for very short distances and indicate that no item is very similar to any other item. Low c values imply that all faces bear some similarity to each other, and are difficult to distinguish.

The similarity structure provided by the similarities computed from the MDS distances provides the basic input to models. One such model that has been proposed by Valentine (1991a,b) to account for face recognition is the Identification version of GCM (Nosofsky, 1986, 1987). In this model, distinctive items are more likely to be encoded into memory, which expresses the memorability component described by Vokey & Read (1992). The model uses the similarity values from Eq. 2 to make a prediction for the probability of saying 'old'. For target faces, this values is,

Eq 3a

and for distracter faces is,

Eq 3b

where F is a logistic function,

Eq. 4

with free parameters b and q that map the ratio in Eq 3 into the range of 0 to 1.

The form of the ratio in Eq 3 should provide some intuition for why Valentine (1991a,b) proposed this formal model for face recognition. First, consider the denominator in Eq 3a. When a face is tested in an old/new recognition experiment, the similarity to all other items in memory is computed. Faces that are very atypical tend to lie near the edges of this space, and will therefore will not be similar to many other faces. Thus, the summed similarity from the numerator will be small, making the overall fraction large. Thus distinctive target faces will have a very high probability of saying old on the basis of the denominator. Typical targets have a larger denominator and thus an overall smaller probability of being called an old face.

While this model predicts the high hit rate to distinctive target items, it may have difficulty accounting for the low false alarm rates to the distinctive distracters. Previously, Vokey & Read (1992) argued that such a situation requires the use of negative evidence, which the Identification version of GCM does not contain. The denominator in Eq 3b will be very small for distinctive distracter which would lead to a higher false alarm rate that for typical distracters. Opposing this tendency is the numerator, which tends to be larger for typical distracters. How the numerator and denominator trade off depends in part on the similarity structure of the face space, and thus quantitative model predictions are required to evaluate the adequacy of the Identification model.

In summary, the Identification model includes a mechanism that has the properties associated with the memorability component of Vokey & Reed’s framework. It may or may not include the familiarity component, which in part may depend on the structure of the face space and the ability of nearby targets to produce false alarms for typical distracters via the numerator of Eq 3b.

Applications to Forced-Choice Face Recognition

While most models of recognition memory are applied to old/new picture recognition paradigms, the legal setting provides an important forced-choice situation. In a lineup, a witness may often assume that the suspect is present in the lineup, and use a comparison between the faces to name a suspect. Vokey & Reed’s breakdown of familiarity into context-free and that provided by previous study raises an interesting possibility for the lineup situation. For example, typical faces tend to induce more context-free familiarity. Distinctive target faces begin with less context-free familiarity but benefit more from study (Bartlett et al, 1984). Consider a situation in which a target and a distracter face are compared in forced choice. The target face will have study induced familiarity in addition to some amount of context-free familiarity. However, if the distracter face is very typical, it may have a large amount of context-free familiarity, causing the subject to choose the distracter over the target. At the very least, such a comparison would be more difficult than if the target and distracters are both distinctive. In an old/new recognition experiment, Solso & McCarthy (1981) demonstrated that prototype faces could attract a large number of false alarms, suggesting that the familiarity induced by the similarity to studied faces could translate into a false recognition.

The lineup situation is complicated somewhat by assumptions that the witness might make when making an identification (e.g. Wells & Lindsay, 1985). In the present case we will limit ourselves to the case in which exactly one face in a two-alternative forced-choice comparison was presented at study. The data described below was briefly described in Busey & Tunniciff (submitted) but was not modeled. A summary of this experiment is provided below.

Experimental Design and Procedures

The stimuli used in this experiment were photos of bald men that ranged in apparent age from mid-twenties to mid 50s (Kayser, 1985). As describe above, the typicality of a particular face is an important mediator of memory performance. In addition to the naturally occurring differences in typicality, we included 16 faces that were constructed by morphing two parent faces. These morphs were included only in the test portion of the experiment, and were used because the morphs tend to be highly typical. At the very least they are similar to the parent faces, and due to the geometry of MDS space, the morphs might be closer to many other faces as well (Busey, in press). Thus these morphs may provide a stimulus that induces a large amount of context-free familiarity. The parent faces provide the appropriate comparison stimulus, since they are both studied and more distinctive. If the two-process framework described by Vokey & Read is correct, and if the context-free familiarity induced by the typicality of the morphs dominates, then we might find that subjects choose the morph over one of the parents in forced choice.

The details of this experiment are provided by Busey & Tunnicliff (submitted), but the essential details are reproduced below. Participants were 119 Indiana University undergraduates who participated in one of 24 groups of up to 5 subjects at a time. They received course credit for their participation. The stimuli consisted of 104 pictures of bald men with neutral expressions. Twenty-one of the men had facial hair. Fourteen of the men were black and the rest were Caucasian. Sixty-eight faces were selected for the study portion of the experiment. Thirty-two faces were selected to be parent faces for the morphs. These faces were paired so that 8 pairs had faces that were dissimilar, while 8 pairs had faces that were similar according to a pre-experiment sorting task. This manipulation allows us to evaluate the effect of similarity on the psychological mechanisms that underlie the responses to the morphs. Sixty-eight faces were selected for the study portion of the experiment, including 36 target faces and 32 parent faces. The parent faces were combined to create 16 morph faces as described below. There were 20 distracter faces selected from the faces. The constraints placed by the morphing procedures did not allow us to select faces at random for the parent faces, since faces with facial hair do not morph well. Faces with facial hair tend to be more distinctive, which may influence the forced-choice data. However, we know the structure of MDS ‘face space’ and therefore will be able to take these differences into account.

Control points were placed on the salient features of each parent face and 50% averages were created using the Morph™ software package (Gryphon Software). At least 150 control points were placed on each parent, and control points were added as required to remove obvious artifacts in the resulting morph. Data was collected by a PowerMac computer using 5 numeric keypads that provided identifiable responses from each keypad. The faces were displayed on a 21" Macintosh grayscale monitor.

There are four types of faces in each experiment. Target and parent faces appear both at study and at test; the only distinction between the two sets of faces is that parent faces tended to be less distinctive because they were all clean shaven. The target faces were a mix of clean-shaven and mustached faces. Morphs and distracters appeared only at test and are therefore distracters. However, the morphs are similar to the parents and we therefore expect higher false alarm rates in general to the morphs than to other distracter faces.

Subjects were asked to view a series of faces in the study phase and remember them for the subsequent recognition test. There were 68 faces in the study phase: 36 target faces (faces not used for morphs but would reappear in test phase) and 32 parent faces (faces previously used to create the morphs). Each face appeared for 1500 ms followed by a two second delay between each face.

At test, subjects were given a forced choice recognition test. Subjects were required to pick one of two faces presented that was previously studied. Subjects either chose between a morph and one of the two parents, or between a target and a distracter. There was a total of 36 faces in the test phase: 16 morph/parent pairs, 20 target/distracter pairs in random order. Although there were 36 targets presented at study, only 20 randomly-chosen targets were tested since we have only 20 distracters.


The mean probability of choosing a target over a randomly-chosen distracter is .765 (SEM=0.015). When comparing the morphs and parents constructed from similar parents, the probability of choosing a similar parent over its associated similar morph is .463 (0.222), which is statistically significantly less than 0.5 (t(1448) = 2.12, p < 0.05). Morphs from dissimilar parents show the opposite effect: the probability of choosing a dissimilar parent over the morph is 0.658 (0.161), which is greater than 0.5 (t(1448) = 10.6, p<0.05).

These results support the framework suggested by Vokey and Read (1992). Target faces tend to be very distinctive, because many had facial hair. This distinctiveness may have resulted in a large amount of familiarity due to the prior exposure. Similar morphs are similar not only to the two parents, but also to many other faces (Busey, in press). As a result, they may have engendered a large amount of context-free familiarity and therefore have been chosen over the parent face in the forced-choice comparison.

Although these results are consistent with the two-process view, there are several aspects of this framework that are troubling. First, it is not clear that context-free familiarity is a distinct construct that is separable from familiarity due to prior exposure. Clearly the face at least must be recognized as a face, which must require some form of active search through memory. There may also be an additional search through memory that corresponds to the familiarity due to the prior exposure. As a result of this overlap between the two processes, a single-process model might be able to account for both the good discriminate of the target faces and the errors made by subjects to the similar morphs. One possible starting point is the Identification version of the GCM model developed by Nosofsky (1986; 1987) and suggested by Valentine (1991a,b) as a good model for face recognition. Below I describe how this model can be extended to forced-choice data and demonstrate the adequacy of this model.

Measuring Face Space: Similarity Ratings

Before a model can make quantitative predictions for individual faces, the similarity structure of the faces must be measures. The procedures used to gather similarity ratings and produce the multidimensional scaling output are described in Busey (in press), but are briefly sketched below. A set of 104 faces requires multiple similarity ratings on all (100*99/2) pairs of faces. This required 373 Indiana University undergraduates making ratings on 177 randomly-chosen pairs of faces, on a scale from 1 (most similar) to 9 (least similar). These similarity ratings were submitted to the ALSCAL multidimensional scaling algorithm, which produced a 6 dimensional solution. Because the program could only handle 100 stimuli, 4 target faces were selectively deleted from the set. The dimensions of the solution were all interpretable, and included dimensions such as age, race, facial pudginess, and facial hair.

Accounting for Forced-Choice Data

One possible extension of the Identification model to forced choice would be to consider the model's predicted familiarity for both faces, and whichever face produces the higher familiarity is the face that is selected (Clark, in press). While intuitively plausible, this model cannot be correct, because without noise or some other process it would always predict that the target would be chosen over the distracter with probability 1.0 (assuming that the distracter did not have more context-free familiarity than the target). In the data, the targets were chosen over distracters about 77% of the time, while dissimilar parents (which tend to be less distinctive) were chosen over the dissimilar morphs 66% of the time. Any model must account for these gradations in choosing rates that appear to depend upon typicality or the similarity structure of the faces.

In an old/new recognition task, the subject typically makes either an ‘old’ or a ‘new’ response. In some sense this is a categorization task of each test face into either the old or new category. Most categorization tasks also include two categories, in which members of category a are distinguished from the members of category b according to some criteria. What distinguishes recognition from most categorization tasks is that the population of new items in recognition is generally unknown. A forced-choice task is much closer to the categorization task, since the target face is directly compared with a known quantity, the distracter face. In categorization, the probability that test item i is classified as a member of category a is,

Eq. 5

where A is the evidence that item i belongs to category a, B is the evidence for category b, and G represents a monotonic transform. Often, G takes the form of an exponential,

Eq. 6

where z represents the extent to which small differences between the evidence for faces A and B are magnified into a large likelihood of choosing face A. For example, for small z, virtually all choosing probabilities will be close to 0.5, since the exponents will all be close to 0. However, for large z, this emphasizes the impact that A and B can have, such that if A dominates B only slightly, the subject will be very likely to say "A". Thus z can be thought of as a confidence parameter that indicates how much the evidence of A over B influences the resulting choosing rate. It also may reflect to some degree the noisiness of the comparison process, since if A and B are similar and the system is noisy, the subjects would choose B on some proportion of the trials. The model would mimic this behavior by reducing the probability of choosing A by having a fairly small z parameter.

Eq. 6 can be used to adapt the Identification version of GCM for the forced-choice recognition paradigm if we assume that the subject computes the probability of saying old to faces a and b via Eqs. 3 and 4, which provides the values A and B for Eq 6. In situations where A and B are about equal (that is, both faces a and b are identified as likely to have been presented at study), the probability of choosing face a will be close to 0.5. However, as one face tends to dominate, Eq 6 will get closer to 1.0.

The forced-choice data were fit as follows. The data from target faces consists of the probability of choosing a given target face over one of the distracters. Over the course of the experiment, each target face was tested against 24 randomly-chosen distracters, and the modeling must reflect this. This was accomplished by computing the probability of saying old to each target and distracter face via Eqs 1-4, and then computing the probability of choosing the target face over a given distracter via Eq 6. These probabilities were then averaged over all such comparisons involving that particular target face. This process was repeated for the parent, morph and distracter faces, although morphs were always compared only with their parent faces and vice versa.

This model has 9 free parameters; the similarity gradient parameter c, the 5 weight parameters, and b and q that map the ratio in Eq 3 into a probability, and z that determines how the evidence for face a is compared with the evidence for face b. The best-fitting parameter values are given in Table 2.

Figure 2 shows the fit of the GCM-Identification model, with the probability of choosing a face from the data on the abscissa and the theory’s predictions on the ordinate. Overall the fit is not bad; in general the points fall on the diagonal. However, there are systematic deviations for some of the target faces, the similar parents and the distracters. Most telling is the failure to account for the fact that subjects tend to choose the similar morphs over the similar parents; the model places the similar morphs (open gray triangles) to the left of the similar parents (filled gray triangles), where the reverse should be true.

Despite these failings, overall the model is accounting for the distinctiveness effects seen in the faces. The target and distracter faces tend to be more distinctive than the dissimilar parents and dissimilar morphs. We seen in Figure 2 that the targets have a higher choosing rate than the dissimilar parents, which demonstrates that the model can account for the effects of distinctiveness. Thus, the Identification model might be associated with the Memorability component described by Vokey & Read (1992). What it apparently lacks is some mechanism to account for the similar morphs, which are very typical distracters. This might either require a better model formulation or a separate familiarity mechanism to include context-free familiarity.

The SimSample Model

Before adopting a second process to account for something like context-free familiarity, consider an alternative model that might account for both the effects of highly typical and highly distinctive faces. This model involves sampling from memory according to the similarity of the target face to items in memory, and thus I term it SimSample. This model has previously accounted for old/new recognition data (Busey & Tunnicliff, submitted) and might provide a better account of the forced-choice data as well.

The SimSample model samples from memory according to the similarity of a face in memory to the test face; thus the name SimSample. To develop the SimSample model we assume that similarity is constructed from the MDS face space according to Eqs 1-2. We then assume that for each test face in the forced-choice comparison, the test face is used to probe memory, and exactly one face is sampled from memory. Not all items are equally likely to be sampled, however. The probability that the observer samples face k in memory given face i was presented at test is,

Eq 7

which is simply the Luce Choice Rule. We then assume that the sampled face is compared with the test face, and if they are similar enough the observer concludes that they have a match and says "old". This involves a criterion such that if face k is sampled when face i is used to probe memory,

Eq 8

where the similarity criterion is a free parameter. If the similarity between the sampled item and the test face is less than the criterion, the model predicts that the observer will say "new". More formally, we can compute the probability of saying old to item i as the probability of sampling all items that are similar enough such that if sampled, the observer would say old. Define function q(hi,k) such that

Eq 9

which is simply the probability that the observer will say old to item i given item k is sampled. The probability that the observer says old when viewing face i at test is,

Eq 10

where the first term inside the summation comes from Eq 7 and the second from Eq 9.

For a variety of reasons it is reasonable to assume that the similarity criterion in Eqs 8 and 9 is not fixed, but has normally distributed variability due to some internal noise or differences across subjects. In this case, we can redefine Eq 9 according to a cumulative gaussian function with mean set to the criterion and standard deviation set to a free parameter critSD,

Eq 11

which implies that if hi,k equals the criterion, the probability that the observer says old when face k is sampled is 0.5. No modification of Eq 10 is necessary to accommodate this change to q(hi,k).

The sampling and criterion assumptions embodied by Eqs 7, 10 and 11 are related to the sampling and testing processes of the SAM model, although in SAM the model is allowed to sample multiple times, whereas the SimSample model is only allowed to sample once. Various multiple-sampling versions of SimSample were attempted, with little success.

The SimSample model has 9 free parameters (which is the same number as the Identification model): 1 generalization gradient parameter c, 5 attention weights, the response criterion and the standard deviation of the response criterion and z, which controls the comparison behavior between the two faces.

Accounting for Distinctiveness

At a minimum, the SimSample model must account for the finding that subjects are very good at recognizing distinctive targets and rejecting distinctive distracters. The upper-left panel of Figure 3 demonstrates how the SimSample model accounts for the high hit rates to distinctive targets. A distinctive target is not similar to many other items in memory, making the denominator in Eq 7 small. When sampling its own item in memory, the numerator in Eq. 7 is 1.0, and for all other faces the numerator is much less than 1.0. This implies that distinctive faces are very likely to sample their own image in memory, and of course when they do, i = k, and hi,k = 1.0, which exceeds the similarity criterion in Eq 8. Less distinctive targets are less likely to sample themselves in memory, since even though the numerator is still 1.0 in Eq 7, the denominator is larger for more typical faces. When a moderately typical test face samples other faces in memory, they may be far enough away such that hi,k < criterion and the observer will incorrectly say 'new'. Thus the SimSample model correctly predicts that more distinctive target faces will be more likely to be chosen over a distracter than less distinctive targets.

The SimSample model can also account for the fact that distinctive distracters are easily rejected by observers (and are not often chosen in forced choice), as demonstrated by the upper-right panel of Figure 3. As with target faces, a distinctive distracter will sample some face in memory. However, it cannot sample itself because it wasn't placed into memory at test. If there are no faces near enough to fall inside the criterion in MDS space, the observer will never make a false alarm. The noise added to the criterion insures that all distracters have above-zero false alarm rates, but the model will produce very few false alarms. More typical distracters will have a greater chance of being near a target face that is inside the criterion, which if sampled will produce a false alarm or be erroneously chosen in a forced-choice paradigm. Thus the SimSample model can account for the low false alarm rates to distinctive distracters without assuming negative evidence as suggested by Brown, Lewis & Monk (1977).

Accounting for Familiarity

In addition to accounting for the effects of distinctiveness, the SimSample model can in principle account for the fact that very typical faces engender a high feeling of familiarity. The bottom panels of Figure 3 demonstrate how SimSample can in principle account for the high false alarm rates to the morphs created from similar parents, as well as the relatively high hit rates to typical parents. When a morph is used to probe memory, it cannot sample itself because it was not presented at study. However, it does have the opportunity to sample nearby items in memory and will produce a false alarm if the sampled item is inside the criterion. In the case of the morphs created from similar parents, there are likely to be at least two studied faces (the two parents) that are similar enough to fall inside the criterion. In addition, the morphs tend to be among the most typical of faces, since the morphing procedures tend to place the morphs near the middle of MDS face space (Busey, in press). Thus the SimSample model correctly predicts higher false alarm rates to the morphs than to more distinctive distracters.

This same explanatory principle can account for the fact that very typical parents have higher hit rates than moderately typical parents, as seen in Figure 3. Typical parents are likely to be similar to lots of other faces in memory, and even though such a face is not very likely to sample its own trace in memory, it is very likely to sample a nearby face. Typical parents have lots of other faces nearby, and if one of these is sampled the observer will say old. When this happens, the observer is making a correct response but doing so for the wrong reason. Less typical parents have fewer opportunities to sample nearby faces that would generate an old rating, and therefore cannot take advantage of incorrect samplings that result in a correct decision.

The fit of the SimSample model is shown in Figure 4, and the best-fitting parameters are provided in Table 1. The RMSE is 0.112. The fit is an improvement on the fit of the Identification model despite the fact that it has the same number of free parameters. The RMSE is reduced, and the systematic deviations for the distracters and targets are no longer present. However, the model still has difficulty with the similar morphs and parents: the similar morphs are still systematically to the left of the similar parents. Thus either the model cannot account for very typical faces, or there is some other mechanism such as noise, clustering or blending that is going on that might account for these faces.

Extensions to the Exemplar-Based Model

One possible explanation for the high choosing rate for the similar morphs is that there is enough noise in the recognition system such that the morph is confused with one or both of the parent faces. This seems somewhat unlikely given the fact that one of the parent faces is shown with the morph, and subjects know that one and only one studied face is shown at test. Nevertheless, noise may play a role in the false recognition of the morphs, since the central location of a prototype may make it more immune to noise than the exemplars. One mechanism to introduce noise into the locations of the faces in MDS space is to assume a gaussian similarity gradient rather than an exponential gradient,

Eq. 12

which tends to make the sharp drop of the similarity gradient ‘fuzzy’. This provided a very slight improvement in the RMSE, reducing it to 0.108. However, it could not predict that subjects would choose the similar morphs over the similar parents.

A second mechanism which might save the exemplar-based version of the SimSample model is to assume some sort of clustering mechanism that might work to bring similar faces even closer together. This clustering goes against other effects of density as described by Krumhansl’s Distance-Density hypothesis (Krumhansl, 1978), where experience with a dense region tends to make the items in that region appear less similar, not more. A clustering algorithm was appended to the SimSample model according to the following logic. Studied items were place into memory at their locations in MDS space. If they were close enough to other items (as determined by a free parameter), all items inside this pre-defined region were systematically moved closer to each other by an amount proportional to their distance and a free parameter. This mechanism reduced the RMSE only slightly, to 0.107, and did not predict that the subjects would choose the similar morphs over the similar parents. Thus it appears as if a clustering mechanism cannot help the SimSample fit.

A third mechanism that might help the morphs is the assumption of a global prototype that exists in addition to the individual exemplars. Such a model assumption is similar to the norm-based coding model proposed by Byatt & Rhodes (in press), Rhodes, Carey & Byatt (in press) and Valentine & Bruce (1986). In this model, some form of blending or abstraction mechanism is at work that creates a new trace in memory that represents a global prototype of all bald men. We do not know the location or strength of this prototype, but we can estimate its values on each dimension and its strength as free parameters. Thus to the SimSample model I added 7 more parameters: 6 parameters dealing with the location of the prototype on the 6 dimensions, and 1 free parameter that determined the strength of the prototype when computing its contribution to the probability of saying old via the SimSample process. This model only reduced the RMSE to 0.111, which is not a significant reduction in error given the addition of 7 free parameters (see Table 1 for F-values). In addition, the model did not reverse the discrepancy between the similar morphs and parents. Thus a global prototype does not seem to be a plausible extension to the SimSample model.

It is interesting to note that the global prototype could have completely dominated the individual exemplars by choosing a very large prototype weight. This would have been equivalent to the Norm-Based Coding model, which suggests that faces are coded relative to a global prototype rather than as individual exemplars. The failure of this model casts doubt on the norm-based coding model, and demonstrates that quantitative predictions are necessary to distinguish a prototype model from an exemplar model.

Prototype Extensions to the Exemplar-Based Model

Given the failure exemplar-based or global prototype mechanism to account for the finding that subjects choose the similar morphs over the similar parents, an alternative is to propose individual prototypes that form between parent faces and correspond to the morph locations. Similar extensions have been proposed in categorization work (e.g. Homa, Goldhardt, Burruel-Homa, & Smith, 1993). As with the previous prototype theories, these prototypes would act like faint traces in memory at the locations of the morphs in MDS space and could in principle help the similar morphs be chosen over the similar parents. The morphs were included in the original similarity rating experiment and so we know the locations of the morphs in MDS space. The morphing operation introduces artifacts into the resulting blended face (Busey, in press) moving it away from the midpoint in MDS space between the two parent faces. However, by including the morphs into the scaling solution, we presumably have eliminated these biases by informing the model of the morph’s true location.

One possible prototyping mechanism would blend nearby faces, creating a prototype trace in memory that would be treated from the perspective of a model as a faint version of a real trace. The strength of the prototype affects both the likelihood that a prototype is sampled, as well as the probability of saying old if it is sampled. In general, when sampling items from memory, the probability that face k is sampled (where k can now be either a parent, a target or a morph) is,

Eq 13

for faces actually studied, and,

Eq 14

for the morphs. In both Eq 13 and Eq 14, pw is the prototype weight that determines the strength of the prototype in the sampling process. Once a face has been sampled (and now a prototype may be the sampled face), the probability that the observer says old is related to the similarity between the test face and the sampled face as in Eq 8. This is modified such that if the morph is sampled, the similarity used to compute the probability of saying old via Eq 9 is reduced by the prototype weight. This is in keeping with the idea that the prototype trace is fainter than a real face's trace, and this influences both the sampling and decision processes. This assumption implies that we compute Q(pw hi,k) when a prototype is sampled, rather than Q(hi,k) as in Eq. 9. To compute the overall probability of saying old to item i, we compute,

Eq 15

which simply extends Eq 9 to include the possibility of sampling a prototype, and if one is indeed sampled, the probability of saying old. Note that this addition of prototypes to the SimSample model is somewhat arbitrary, since it assumes that prototypes are only created between two parents and not between any other pairs of faces. However, since we are only probing the locations between two parents with the morphs, this seems like a reasonable assumption.

The prototype strength (pw) for morph face i was assumed to be a function of the distance between the two parent faces, under the assumption that blending is more likely to occur between two similar faces than between two dissimilar faces. Thus,

pwi = (hi,p1 + hi,p2) r Eq 16

which gives the prototype model one additional free parameter, r, which determines the relation between distance and prototype strength. The results of this model fit are shown in Figure 5. Not only does this model provide a significant decrease in the RMSE, it places the similar morphs to the right of the similar parents, which previous models failed to do. Thus this model can account for the finding that similar morphs are chosen over their parents in forced-choice.

Applications of Face-Space Modeling

The successes of the current quantitative modeling provide direction for future work. Much of the face-recognition literature revolves around three central themes: how faces are represented in memory (i.e. as exemplars or relative to a central prototype), how faces are retrieved from memory (via a two-process model involving both familiarity and memorability (Vokey & Read 1992) or just involving memorability (Valentine, 1991a,b)), and how the structure of the face-space can affect the storage and recognition process. This third theme has been discussed in a number of domains, including applications to the cross-racial identification data, in which members of another race are represented in a separate cluster in MDS space, where the individual exemplars are grouped together more tightly, making individual identifications more difficult (Chiroro & Valentine, 1995).

The models applied to the forced-choice recognition data in this chapter allow a number of conclusions about these themes. First, we find much more support for an exemplar-based representation than a central-prototype representation, although in some instances it appears that prototypes are necessary to account for very typical faces. It is important to point out that an exemplar-based model that had a different formulation of typicality or a different sampling mechanism might account for the similar morphs without assuming prototypes. The current modeling merely demonstrates the failures of existing exemplar-based models. The fact that we find a poor fit from the global prototype model suggests that the norm-based coding model is not a reasonable model to account for recognition data, although it may be useful for other types of comparisons where a rating is made on a face relative to some standard (i.e. he is attractive for his age). In general, discriminating between exemplar-based and prototype-based models is difficult without a representation of the scaling space as input to a quantitative model.

The SimSample model demonstrates how a sampling process in conjunction with a similarity-based decision mechanism could incorporate mechanisms that account for both a familiarity-based recognition mechanism and a recall-based mechanism. The sampling process that depends upon the similarity structure of the faces in memory tends to favor distinctive items, which corresponds to the memorability component of Vokey & Read (1992). The tendency of typical items to lie near one another and be mis-sampled during the sampling process will increase both the hit and false alarm rate for typical items, which previously was associated with a separate familiarity-based mechanism. Evidence in favor of the SimSample model comes from the previous work done on the various components of the model. There is strong evidence in favor of the exponential similarity gradient, as well as a great deal of work on the Luce Choice Rule. The sampling process comes directly from the SAM model (Gillund and Shiffrin, 1984). Thus none of the assumptions underlying the model are ad-hoc.

While the SimSample model could account for both the effects of distinctiveness and typicality, it failed to account for the effects of very typical faces, as demonstrated by its inability to account for the behavior of the similar morphs. Fixing this problem might require one of several possibilities. One is the prototype fix described above. The second possibility is an alternative mechanism that provides a better account of the most typical faces, perhaps by adopting a different sampling or response mechanism. Third, there might be a second familiarity-based mechanism at work, in addition to the current model. This possibility was investigated by adding Nosofsky’s Generalized Context Model (GCM, Nosofsky, 1986) to the SimSample process, which effectively increased the probability of choosing very typical faces. The addition of this explicit familiarity component to the SimSample model did not improve the fit, nor did it place the similar morphs above the4 similar parents.

A final possibility is the there is something strange about the morphs that tend to make them seem familiar. This quantity would have to be outside the domain of the dimensions recovered by the similarity ratings experiment. For example, morphs appear smoother and younger than their parent faces, that this may have attracted responses in the forced-choice task.

This raises the larger issue of the limits of the geometric model. Levin (1996) points out that in cross-racial classification, the dimensional information used is different from that used to make cross-racial identifications. One might resort to shifting attention along different dimensions, as the modeling in this chapter has adopted, but this will work only if the recovered dimensions from the MDS solution correspond to the dimensions that are used in recognition or classification. Alternatively, a subspace model might be adopted, that defines in advance which dimensions are relevant for a particular task.

These limitations should not be seen as disconfirmation of what I believe is a very promising approach. The MDS approach allows predictions for individual stimuli, which in turn provides evidence for the role of the similarity structure of faces. This structure can then be used to ask questions about the retrieval mechanisms that enable recognition. There are important links that can be made between a large literature involving geometric models and an equally large literature involving global memory models. Faces seem to be an elegant and important stimulus that can be used to bridge both literatures.



Bartlett, J., Hurry, S., & Thorley, W. (1984). Typicality and familiarity of faces. Memory & Cognition, 12, 219-228.

Brown, J., Lewis, V.J., & Monk, A.F. (1977). Memorability, word frequency and negative recognition. Quarterly Journal of Experimental Psychology, 29, 461-473.

Busey, T. (in press). Physical and psychological representations of faces: Evidence from morphing. Psychological Science.

Busey, T. A & Tunnicliff, J.(submitted). Accounts of blending, typicality and distinctiveness in face recognition. Submitted to Journal of Experimental Psychology: Learning Memory and Cognition.

Byatt, G. & Rhodes, G. (in press). Recognition of own-race and other-race caricatures: Implications for models of face recognition. In press, Vision Research.

Chiroro, P. & Valentine, T. (1995). An investigation of the contact hypothesis of the own-race bias in face recognition. The Quarterly Journal of Experimental Psychology, 48A, 879-894.

Clark, S. (in press). A familiarity-based account of confidence-accuracy inversions in recognition memory. Journal of Experimental Psychology: Learning Memory and Cognition.

Edelman, S. & Intrator, N. (submitted). Learning as extraction of low-dimensional representations.

Gillund, G, & Shiffrin, R. (1984). A retrieval model for both recognition and recall. Psychological Review, 92, 1-38.

Homa, D., Goldhardt, B., Burruel-Homa, L., & Smith, J.C. (1993). Influence of manipulated category knowledge on prototype classification and recognition. Memory & Cognition, 21(4), 529-538.

Jacoby, J. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30, 513-541.

Jacoby, L.L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 3, 306-340.

Kayser, A. (1985). Heads. New York: Abbeville Press.

Krumhansl, C. L. (1978). Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.

Levin, D. (1996). Classifying faces by race: The structure of face categories. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 1364-1382.

Light, L. L., Kayra-Stuart, F., & Hollander, S. (1979). Recognition memory for typical and unusual faces. Journal of Experimental Psychology: Human Learning & Memory, 5, 212-228.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87, 252-271.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.

Nosofsky, R.M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57.

Nosofsky, R.M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 87-108.

O'Toole, A.J., Bartlett, J.C., & Abdi, H. (submitted). A signal detection model applied to the stimulus: Understanding covariances in face recognition experiments in the context of face sampling distributions.

O'Toole, A.J., Deffenbacher, K.A., Valentin, D., & Abdi, H. (1994). Structural aspects of face recognition and the other-race effect. Memory & Cognition, 22, 208-224.

Rhodes, G., Carey, S., & Byatt, G. (In press). Coding spatial variations in faces and simple shapes: A test of two models. in press: Vision Research.

Shepard, R. N. (1974). Representation of structure in similarity data: Problems and prospects. Psychometrika, 39, 373-421.

Solso, R. L., & McCarthy, J. E. (1981). Prototype formation of faces: A case of pseudo-memory. British Journal of Psychology, 72, 499-503.

Valentine, T. (1991a). A unified account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology, 43A, 161-204.

Valentine, T. (1991b). Representation and process in face recognition. In Watt, R. (Ed.), Vision and visual dysfunction. Vol. 14: Pattern recognition in man and machine series editor, J. Cronley-Dillan). London: Macmillan.

Valentine, T., & Bruce, V. (1986). The effects of distinctiveness in recognizing and classifying faces. Perception, 15, 525-535.

Vokey, J. & Read, J (1992). Familiarity, memorability and the effect of typicality on the recognition of faces. Memory & Cognition, 20 291-302.

Wells, G.L., & Lindsay, R.C.L. (1985). Methodological notes on the accuracy-confidence relation in eyewitness identifications. Journal of Applied Psychology, 70, 413-419.




# p













Crit. F















# p
























SimSample-Gaussian Noise

# p

























# p























5.06 *


SimSample-Global Prototype

# p























1.21 (NS)
















SimSample- Proportional Prototypes

# p























31.10 *


Table 1. Parameter values for all fits. The obtained F-values compare the model with the original SimSample model. # p represents the number of parameters, RMSE is the root-mean-squared error that has been corrected by subtracting the number of parameters (p) from the number of datapoints (n) in the denominator: .



Forced-Choice Data

SimSample Model Fit

SimSample + Gaussian Similarities

SimSample + Clustering

SimSample + Global Prototype

SimSample + Proportional Prototypes Fit















Similar Morphs







Similar Parents







Dissimilar Morphs







Dissimilar Parents







Table 2. Mean probability of choosing data for the 6 experimental conditions, along with the fits for the various models. Only the Proportional Prototypes model can account for the reversal between Similar Morphs and Parents (Bold numbers).



Figure 1. Hypothetical 'Face-Space' derived from MDS procedures applied to similarity ratings on all pairs of faces. Each face is represented as a point in this space, and values such as distance can be computed directly from this representation.



Figure 2. Fit of GCM-Identification model.



Figure 3. Predictions of the SimSample model to Distinctive Targets, Distinctive Distracters, and Typical Distracters (morphs). Upper Left: A distinctive target is very likely to sample itself and thus has a high hit rate. Upper Right: A distinctive distracter cannot sample itself and may not have any nearby faces that could produce a false alarm if sampled. Bottom Panel: A very typical distracter may produce a false alarm if a nearby item is sampled by mistake and is within the criterion for responding old. Typical target faces will have a high hit rate if either the face samples itself or samples a nearby target that lies inside the criterion.



Figure 4. Fit of SimSample model.



Figure 5. Fit of Proportional Prototypes version of the SimSample model. This model places the Similar Morphs to the right of the Similar Parents, which previous models could not do.