BEAGLE Model Code
This page contains demonstration code to implement the BEAGLE model described in:
Jones, M. N., & Mewhort,D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1-37. [PDF]
The code is implemented in Fortran 95 and uses the FFT libraries from the Numerical Algorithms Group (www.nag.com). We are currently rewriting the code to be stand alone in Python, but the Fortran NAG routines are currently the most optimal for learning realistic-size corpora. This demo code learns from a small corpus of text borrowed from Wikipedia containing articles on sports, biology, and English literature. You can adapt the code to learn from any corpus you choose, but the input must be stripped of formatting, in lowercase, and one sentence per line.
Click here to download all the files as a gzipped tarball, or download each file individually:
BEAGLE.f95 is the main program. Number_Generators and Reading_Tools contain modules of subroutines used by the main program (and must be ordered first in the compile line). Stoplist contains a list of function words that the model will ignore when computing context information. It can be edited as desired, and is probably overly restrictive for the small sample corpus in its present form.
The files Nearest_Neighbors.f95 and Cosine_Between.f95 are stand-alone programs to query the memory matrix learned from a particular run of the BEAGLE model. Near Neighbors returns the N nearest neighbors to a word in the semantic space.
Syntax: Neighbors <WORD> <N>
E.g.: "Neighbors baseball 20" finds 20 most similar memory
vectors to the baseball vector
For example, in context space, "Neighbors batter 20" from the
small Wikipedia corpus returns:
20 Nearest Neighbors to [batter]:
Cosine Between computes cosines (or distances) between pairs of words provided in an input file.
Syntax: Cosine_Between <FILE>
The first line of FILE must contain the number of comparisons (integer),
and each of the subsequent 1..N lines contains the pairs of words in
lowercase, with a space between each word
If either word does not appear in the lexicon, N/A is output with a zero
cosine--note that this zero should be coded as a missing value in
The cosine of the vectors can be replaced by the Euclidean distance
by replacing the call to "Vector_Cosine" with a call to the inline
Code to estimate order transitions from the memory matrix will be up shortly. It is our desire to provide all these routines in the form of a web interface in the near future.
For more information, or if you have specific lists of words that you want BEAGLE cosines for, contact Mike Jones: firstname.lastname@example.org
For a simpler version of the model using only context information (and not convolution), you can check out the Semantics demonstration code on Mike’s teaching website here. This code is basically BEAGLE without the order information.