Q550 Maximal Likelihood and Bayesian Model Comparison Homework; Prof. John K. Kruschke
Q550 Models in Cognitive Science, Prof. Kruschke

Homework: Maximal Likelihood and Bayesian Model Comparison

Modified Thursday Feb. 24, 2005

Please turn in your Matlab program, clearly commented, and its output (with the answers to the questions below clearly annotated).

Suppose we are measuring some value of interest, such as the simple response time (RT) to a flash of light. Here are the RT's, in msec: 345, 350, 365, 385, 410, 495.

Consider two models of RT:

where t is response time and μ and σ are parameters (such that μ>0 and σ>0). We assume that the data are independent, so p(data|μ,σ) is the product of p(ti|μ,σ) for the data = {ti}.

1. Using fminsearch (i.e., not using a closed form solution), find the maximal likelihood parameter values for each model. That is, find the parameter values that maximize the probability of the data.

Hint: Rather elaborate explanation is provided in the Matlab example below.

2A. Suppose we have the following prior distribution over the parameter values: p(μ,σ) = k * 1/μ * 1/σ, for μ = 250 to 500, sampled every 5, for σ = 1 to 151, sampled every 3, and k is set such that the sum of the probabilities over that grid is 1.0. Show a graph of this prior probability distribution.

Hint:
m = 250 +(5/2) : 5 : 500 -(5/2) ; % row vector of possible mean values
s = 1 +(3/2) : 3 : 151 -(3/2) ; % row vector of possible sigma values
pMS = (1 ./ m)' * (1 ./ s) ; % matrix of (1/m)*(1/s) for each m,s combination
pMS = pMS / sum(sum(pMS)) ; % Normalized: each cell has a probability, not a density.
% At this point, sum(sum(pMS)) = 1.0;

2B. For each model, determine the likelihood over the parameter values (on that grid). That is, determine p(data|μ,σ) for every point on the grid. Show a surface graph. Do the peaks of these likelihoods match the solutions you found in Question 1?

Hint: See Matlab code for #1. Because we're dealing with only six data points here, you can compute the product of the six p(datum|m,s) without taking log's.

2C. For each model, determine the posterior distribution over the parameter values (on that grid), given the data. That is, determine p(μ,σ|data) from p(μ,σ) and p(data|μ,σ). Show a graph of the posterior distribution. When you sum (i.e., integrate) across the parameter grid you'll be computing p(data|model), which you'll need for the next part...

Hint: Let M denote the Model, D denote the Data, and R denote the paRameters (shorthand for m,s). Then
p(D|M)
= Integral_R p(D|R,M) p(R|M) dR        where p(R|M) is probability density
= Sum_R p(D|R,M) p(R|M)        where p(R|M) is probability (pMS) determined in 2A.

2D. Compute p(Gaussian|data) and p(Exponential|data), assuming equal priors on the two models.

Hint: Let M1 denote Model 1, D denote the Data, and R denote the paRameters. Then

p(M1|D)
= p(D|M1) p(M1) / p(D)
= p(D|M1) p(M1) / [ p(D|M1) p(M1) + p(D|M2) p(M2) ]

where the priors are unbiased, i.e., p(M1) = p(M2) = 0.5,
and where p(D|M) is explained in the previous hint.

Some Matlab hints:

  • Suppose that the particular data response times are in the vector
    > t = [ t1 t2 t3 t4 t5 t6 ];

    The exponential model probabilities can be computed by the sequence of commands
    > p = zeros(size(t));
    > p(t>=m) = (1/s)* exp( -(1/s) * ( t(t>=m) - m ) );

    Notice the use of logical indexing such as t(t>=m).

  • When calling fminsearch, there is no way to keep the parameters non-negative. That is, fminsearch does not allow you to specify limits on the parameter space. Instead, do this: In your definition of the model function, check whether a parameter passed into the function is negative, and if it is, send back a Really Bad Fit value to fminsearch so that it avoids that parameter value in the future:
    > function fitValue = ExponentialModel( paramVector )
    > m = paramVector(1) ;
    > σ = paramVector(2) ;
    > ReallyBadFit = 9.9e+99 ; % arbitrary value
    > if m < 0
    > fitValue = ReallyBadFit;
    > return;
    > end
    > % etc.

  • It is good programming style to keep the arbitrary data from an experiment out of the definition of your model. That is, the model really has two types of arguments: (1) the parameters and (2) the data and experiment design constants. The model takes those two arguments and returns a fit value. You can pass both kinds of arguments to your model through fminsearch. The example below shows you how.


    function fminsearchExample()
    %
    % Example to accompany Q550 Homework. The goal here is to illustrate the
    % passing of additional arguments to functions called by fminsearch. This
    % is useful for passing data and experiment design constants into models.
    %
    % This example puts both the Gaussian and Exponential models into a single
    % sub-function, which one would never do in 'real life'. But doing it here
    % illustrates the use of structures and switch statements in Matlab.
    %
    % John Kruschke, February 2005.

    % MyFunctionToMinimize, defined below, is Gaussian, so its parameters are
    % mu and sigma. They are initialized here.
    muInit = 0.0 ;
    sigmaInit = 10.0 ;
    parInit = [ muInit sigmaInit ] ;

    % Data vector could have any number of values.
    data = [ 10.0 20.0 25.0 32.0 ] ;
    MyFunctionArguments.data = data ;
    % Comment out one of the statements below (whichever model you do NOT want)
    MyFunctionArguments.model = 'Gaussian';
    % MyFunctionArguments.model = 'Exponential';

    % Call fminsearch. Notice the last argument is MyFunctionArguments, i.e.,
    % the data.
    searchOptions = optimset('Display','iter') ;
    [bestParam,fitVal,exitflag,output] = fminsearch( @MyFunctionToMinimize, ...
    parInit, searchOptions, MyFunctionArguments ) ;

    % Display the outcome of fminsearch, and display the mean and SD of the
    % data.
    fprintf(1,'fminsearch best fitting mu and sigma:\n\t%f\t%f\n', bestParam);
    fprintf(1,'Mean and SD of data:\n\t%f\t%f\n', mean(data), std(data,1));

    %-------------------------------------------------------------------------
    function FitDiscrepancy = MyFunctionToMinimize( Parameters, ...
    MyFunctionArguments )
    %
    % This function determines the Gaussian probability of the data specified
    % in MyFunctionArguments relative to the mean and sd specified in
    % Parameters. The output is the negative log likelihood of the data.
    %
    % This function expects a Parameters vector with two components and a
    % MyFunctionArguments vector (e.g., data constants) with an arbitrary
    % number of components.

    % Unpack the arguments into meaningful components.
    mu = Parameters(1);
    sigma = Parameters(2);
    data = MyFunctionArguments.data;
    model = MyFunctionArguments.model;

    % ReallyBadFit is any value bigger than a negLogLik you would get.
    ReallyBadFit = 9.9e+99 ;

    % Check if sigma is non-positive, and if so, return a punishing fit value.
    if sigma <= 0.0;
    FitDiscrepancy = ReallyBadFit ;
    return;
    end
    % If exponential model is being fit, then check if mu is greater than any
    % data values, and if so, return a punishing fit value.
    if strcmp(model,'Exponential')
    if sum(data < mu)
    FitDiscrepancy = ReallyBadFit ;
    return;
    end
    end

    % pDatum (below) is a vector of probabilities of the data values.
    switch model

    case 'Gaussian'
    % Notice in the formula below that ((data-mu)/sigma) is a vector, and
    % its components are individually squared by using the .^ operator.
    pDatum = (1/(sigma*sqrt(2*pi))) * ...
    exp( -0.5 * (( data - mu )/sigma).^2 );

    case 'Exponential'
    % Notice the use of logical indexing.
    pDatum = zeros(size(data));
    pDatum(data >= mu) = (1/sigma) * ...
    exp( -(1/sigma) * ( data(data >= mu) - mu ) ) ;

    otherwise
    error('Model not recognized');

    end

    negLogLik = -sum(log(pDatum));

    FitDiscrepancy = negLogLik ;

    %-------------------------------------------------------------------------