BEAGLE Model Code

 

 

This page contains demonstration code to implement the BEAGLE model described in:

 

Jones, M. N., & Mewhort,D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1-37.   [PDF]

 

The code is implemented in Fortran 95 and uses the FFT libraries from the Numerical Algorithms Group (www.nag.com). We are currently rewriting the code to be stand alone in Python, but the Fortran NAG routines are currently the most optimal for learning realistic-size corpora. This demo code learns from a small corpus of text borrowed from Wikipedia containing articles on sports, biology, and English literature. You can adapt the code to learn from any corpus you choose, but the input must be stripped of formatting, in lowercase, and one sentence per line.

 

Click here to download all the files as a gzipped tarball, or download each file individually:

 

 

BEAGLE.f95 is the main program. Number_Generators and Reading_Tools contain modules of subroutines used by the main program (and must be ordered first in the compile line). Stoplist contains a list of function words that the model will ignore when computing context information. It can be edited as desired, and is probably overly restrictive for the small sample corpus in its present form.

 

The files Nearest_Neighbors.f95 and Cosine_Between.f95 are stand-alone programs to query the memory matrix learned from a particular run of the BEAGLE model. Near Neighbors returns the N nearest neighbors to a word in the semantic space.

 

 Syntax: Neighbors <WORD> <N>

 E.g.:  "Neighbors baseball 20" finds 20 most similar memory

 vectors to the baseball vector

 

 For example, in context space, "Neighbors batter 20" from the

 small Wikipedia corpus returns:

 

 ================================================

 20 Nearest Neighbors to [batter]:

 ================================================

 batter             1.0000001

 ball                 0.7821321

 hit                   0.76954096

 fair                  0.7437778

 pitch               0.72545194

 ground            0.70555484

 attempt           0.70481705

 base               0.69794034

 put                  0.69120353

 runners           0.6387966

 foul                 0.6332392

 plate               0.62975984

 touch              0.6270346

 fly                   0.6193897

 strike              0.6137481

 strikes            0.60961693

 contact           0.6070126

 pitcher            0.598485

 swing              0.592292

 caught            0.58490586

 

 

Cosine Between computes cosines (or distances) between pairs of words provided in an input file.

 

 Syntax: Cosine_Between <FILE>

 

 The first line of FILE must contain the number of comparisons (integer),

 and each of the subsequent 1..N lines contains the pairs of words in

 lowercase, with a space between each word

 

 If either word does not appear in the lexicon, N/A is output with a zero

 cosine--note that this zero should be coded as a missing value in

 subsequent analyses

 

 The cosine of the vectors can be replaced by the Euclidean distance

 by replacing the call to "Vector_Cosine" with a call to the inline

 function "Distance"

 

Code to estimate order transitions from the memory matrix will be up shortly. It is our desire to provide all these routines in the form of a web interface in the near future.

 

For more information, or if you have specific lists of words that you want BEAGLE cosines for, contact Mike Jones: jonesmn@indiana.edu

 

For a simpler version of the model using only context information (and not convolution), you can check out the Semantics demonstration code on Mike’s teaching website here. This code is basically BEAGLE without the order information.

 

 

 

Back to Mike’s IUPage