function meanRatings = NHA65stats() % % Analysis of data generated by NHA65CW.m experiment. % % John Kruschke, 17-Feb-2003, 2-Mar-2005. % Tell the world that this program is running. disp('Running NHA65stats.m:'); disp('Data analysis for replication of N. H. Anderson (1965).'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Read the data file. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This assumes that the overall data file was created by NHA65mergedata.m, % which creates a file named NHA65merge.dat. For more info, consult Matlab % help regarding the textread command. filename = 'NHA65merge.dat'; [id block trialInBlock trialOverall type token rating rt] = ... textread(filename,'%s %d %d %d %s %s %d %f'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Create summary means for each individual. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This assumes that each and every individual did 2 blocks of 40 trials, % as in NHA65CW.m. nTrialsPerBlock = 40; nBlocks = 2; nTrialsPerSubject = nBlocks * nTrialsPerBlock ; nSubjects = length(id) / nTrialsPerSubject; % Trait types. The types are ordered according to the results of % N.H.Anderson (1965). nTypes = 10; typeLabels = cell(nTypes,1); % column typeLabels(1,1) = {'LL'}; typeLabels(2,1) = {'L'}; typeLabels(3,1) = {'MmL'}; typeLabels(4,1) = {'MmMm'}; typeLabels(5,1) = {'Mm'}; typeLabels(6,1) = {'Mp'}; typeLabels(7,1) = {'MpMp'}; typeLabels(8,1) = {'HMp'}; typeLabels(9,1) = {'H'}; typeLabels(10,1) = {'HH'}; % The alignedData array will have a row for each subject, a column for each % type, and a layer for each repetition. Initialized with NaN. This array % collects the ratings on each trial. This ignores the response times, but % those might be interesting to look at, too! nReps = 8; alignedData = NaN * ones(nSubjects,nTypes,nReps); % dataIndex simply counts which row of the master data we're in. Initialize % it here. dataIndex = 0; % Loop through data. for subjIndex = 1 : nSubjects % repCurrent(nTypes) is just an array that holds indices for the current % subject. It keeps track of the current repetition for each trait type. % It is initialized here, for each subject. repCurrent = zeros(1,nTypes); for blockIndex = 1 : nBlocks for trialIndex = 1 : nTrialsPerBlock % Increment dataIndex. dataIndex = dataIndex + 1; % Check what type of trial this is, and put the datum % in the corresponding cell of the summary table. for typeIndex = 1 : nTypes if strcmp(type(dataIndex),typeLabels(typeIndex)) repCurrent(typeIndex) = repCurrent(typeIndex) + 1; alignedData(subjIndex,typeIndex,repCurrent(typeIndex)) = ... rating(dataIndex) ; end end end end end % Now, *for each subject*, compute a summary rating, collapsed across % repetitions of the trait type. We could compute the mean of the ratings, % or the median. Or we could first trim outliers from the repetitions and % then compute the mean... . Here, I've chosen to use the median. If I had % removed outliers and replaced them with NaN's, then I could use nanmedian % or nanmean instead of median or mean. % The resulting summaryData array has nSubjects rows and nTypes columns. summaryData = median(alignedData,3); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Display summary of data %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% figure(1); subplot(2,1,1 ,'replace'); plot(summaryData','-o' ,'LineWidth',2 ) ylabel('Median Rating'); xlabel('Trait Type'); xlim([0.5 10.5]) set(gca, 'XTickLabel' , typeLabels ); subplot(2,1,2 ,'replace'); boxplot(summaryData ,'notch','on' ); ylabel('Median Rating'); xlabel('Trait Type'); set(gca, 'XTickLabel' , typeLabels ); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Excise outliers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % One way of defining outliers is what is done in boxplots, i.e., % 1.5 times the interquartile range beyond the quartiles. % Use the prctile function to find the quartiles. The data_quartile array % has nTypes columns. data_quartile = prctile(summaryData,[25 50 75]) ; % Compute the interquartile range. Can also be done with the iqr function % predefined in Matlab. There are nTypes columns. interquartile_range = data_quartile(3,:) - data_quartile(1,:) ; % Determine the low and high limits. low_lim and high_lim have nTypes % columns and just 1 row. whisker_length = 1.5 * interquartile_range ; low_lim = data_quartile(1,:) - whisker_length ; high_lim = data_quartile(3,:) + whisker_length ; % outlier array has 1's where there are outliers, 0's elsewhere: outlier = ( ... summaryData < repmat(low_lim,nSubjects,1) | ... summaryData > repmat(high_lim,nSubjects,1) ) ; % If a single subject has several conditions that are outliers relative to % the other subjects, then exclude all data from that subject. How many is % "several"? It's arbitrary; here I define it as 3. tooManyOut = 3 ; outlier( sum(outlier,2) >= tooManyOut , : ) = 1 ; % One option for trimming outliers is to replace them with NaN and then use % the functions nanmean, nanstd, etc. If you want to be properly careful, % you should retain the untrimmed data array, and call the trimmed data % array something else. But I'm not being that careful here... . Notice the % use of logical indexing: summaryData(outlier) = NaN ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Determine output vector %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Useful for subsequent modeling. meanRatings = nanmean(summaryData) ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Perform inferential tests %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Look in help for "ttest" and "mean". % H versus H.Mp. Averaging model predicts H > HMp. columnH = 9; % see definition of typeLabels, above columnHMp = 8; % Compute H - HMp for each subject. HmHMp (below) stands for H minus H.Mp. % The '-' function treats NaN values appropriately. HmHMp = summaryData(:,columnH) - summaryData(:,columnHMp); mean_HmHMp = nanmean(HmHMp); % Run t-test using built-in Matlab function, which treats NaN as missing % values. [ rejectNull_HmHMp, prob_HmHMp, ci_HmHMp, stats_HmHMP ] = ... ttest( HmHMp ,0 ,.05 ,'both' ); disp(' ') disp('Averaging model predicts H - H.Mp > 0:') disp([ ... 'H - HMp = ' , num2str(mean_HmHMp) , ... ', est.pop.SD = ' , num2str(stats_HmHMP.sd) , ... '; t(' , num2str(stats_HmHMP.df) , ... ') = ' , num2str(stats_HmHMP.tstat) , ... ', p = ', num2str(prob_HmHMp) ... ] ) % L.Mm versus L. Averaging model predicts LMm > L. columnL = 2; % see definition of typeLabels, above columnLMm = 3; %LHmmL = ... ; % fill in the rest yourself! % Combined H versus H.MP with L versus L.Mm. This is the test that % N.H.Anderson (1965) conducted, at the bottom of page 396, except that % we'll be using a positive value to indicate the averaging model instead % of the additive model, and he used an F test whereas we'll use a t-test % (remember F is t squared). %Combin = HmHMp + LHmmL ; % fill in the rest yourself! % -------------------------------------------------------------------------