BEAGLE Model Code
This page contains demonstration code to implement the BEAGLE model described in:
Jones, M. N., & Mewhort,D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1-37. [PDF]
The code is implemented in Fortran 95 and uses the FFT libraries from the Numerical Algorithms Group (www.nag.com). We are currently rewriting the code to be stand alone in Python, but the Fortran NAG routines are currently the most optimal for learning realistic-size corpora. This demo code learns from a small corpus of text borrowed from Wikipedia containing articles on sports, biology, and English literature. You can adapt the code to learn from any corpus you choose, but the input must be stripped of formatting, in lowercase, and one sentence per line.
Click here to download all the files as a gzipped tarball, or download each file individually:
BEAGLE.f95 is the main program. Number_Generators and Reading_Tools contain modules of subroutines used by the main program (and must be ordered first in the compile line). Stoplist contains a list of function words that the model will ignore when computing context information. It can be edited as desired, and is probably overly restrictive for the small sample corpus in its present form.
The files Nearest_Neighbors.f95 and Cosine_Between.f95 are stand-alone programs to query the memory matrix learned from a particular run of the BEAGLE model. Near Neighbors returns the N nearest neighbors to a word in the semantic space.
Syntax: Neighbors <WORD> <N>
E.g.: "Neighbors baseball 20" finds 20 most similar memory
vectors to the baseball vector
For example, in context space, "Neighbors batter 20" from the
small Wikipedia corpus returns:
================================================
20 Nearest Neighbors to [batter]:
================================================
batter
1.0000001
ball
0.7821321
hit
0.76954096
fair 0.7437778
pitch
0.72545194
ground
0.70555484
attempt
0.70481705
base
0.69794034
put
0.69120353
runners
0.6387966
foul
0.6332392
plate 0.62975984
touch
0.6270346
fly
0.6193897
strike
0.6137481
strikes
0.60961693
contact
0.6070126
pitcher
0.598485
swing
0.592292
caught
0.58490586
Cosine Between computes cosines (or distances) between pairs of words provided in an input file.
Syntax: Cosine_Between <FILE>
The first line of FILE must contain the number of comparisons (integer),
and each of the subsequent 1..N lines contains the pairs of words in
lowercase, with a space between each word
If either word does not appear in the lexicon, N/A is output with a zero
cosine--note that this zero should be coded as a missing value in
subsequent analyses
The cosine of the vectors can be replaced by the Euclidean distance
by replacing the call to "Vector_Cosine" with a call to the inline
function "Distance"
Code to estimate order transitions from the memory matrix will be up shortly. It is our desire to provide all these routines in the form of a web interface in the near future.
For more information, or if you have specific lists of words that you want BEAGLE cosines for, contact Mike Jones: jonesmn@indiana.edu
For a simpler version of the model using only context information (and not convolution), you can check out the Semantics demonstration code on MikeÕs teaching website here. This code is basically BEAGLE without the order information.