Evaluative Summary of articleon Latin-square Design


1. Background Information


Title of the article: Three Levels of Staff Intervention and Their Effect on Inpatient Small Group Morale, Leadership, and Performance

   Source: Journal of Abnormal Psychology, Vol. 73. No. 5 500-502

   Year: 1968


2. Abstract

The author is interested in the effects of staff disciplinary intervention on small group morale, leadership and performance. 3 different levels of intervention effects have been studied by observing three groups in an 18- week period with a 3x3 crossover Latin-square design. The result indicates that the 3 group-functions were not differentially modified by the different treatment levels. And the findings are discussed generally in term of the staff’s supervisory role.     


3. Null hypothesis, alpha (or p) level, and sample size per group

For the two-way ANOVA:

     let i =  1,2,3  where 1 means the low intervention

                                          2 means the moderate intervention

                                          3 means the high intervention                                                                  

j=1,2,3  where they represent three different period of time:





              k=1,2,3 where they represent three groups of objects.


uijk= mean score in the ith intervention level, jth period of time and kth   group            


then H0:  u1..=u2..=u3..

        H1: at least one of the above equalities does not hold


The alpha level or p level used in the article is unknown, and the sample size for the model is 3. Notice although the author claimed that at first 51 patients, which was divided into 3 groups, had been represented in the study, from the research hypothesis, it is obvious that we are only interested in the group action instead of the individual action, thus the sample size is only 3 if we neglect the factor that later some patients have been transferred and some have been added. The author stated out that 89 patients were in fact represented in the study, however I think we could still roughly regard the whole test as crossover Latin-square. I will talk more about that in the last part of critical. The author also claimed in the study period, no group had less than 15 people. 


4. Independent and dependent variables

For the crossover Latin-square design, the dependent variables are the mean scores of the treatments, notice they are: Group performance, Problem input, information input, post hospital plan, morale, and leadership rating.  The independent variable is the different intervention levels, the different groups and time periods are considered as nuisance variables.


5. Instrument, comment on its reliability and validity

The author measured the group functioning on a weekly basis for 18 weeks by instruments used by Maynard. The group performance was measured by a 5-point scale. The mean number of notes per patient sent from the staff to the group that required group reaction was the problem input measure. The means of notes per patient with post hospital and information input were also considered to be related to group performance. A 20-item morale inventory completed by each group member was comprised of subscales that loaded between .71 and .80 on dimension cohesion-morale. The leadership rating loaded 0.62 on the dimension leadership-role-clarity according to the author.  The validity is assumed since the author used those measurements, which are well examined, in the hospital. 


6. Experimental Procedure

The author randomly assigned 51 patients in 3 groups. Each had 17 people. The study was done on a fully open 51-bed ward in a large Veterans Administration hospital. The study lasted for 18 weeks. And normal procedure was followed for transfers to and from the ward. 89 patients were presented in the study and in any time, each group has more than or equal to 15 people. The staff returned a written evaluation to the group functioning for the week. And under different levels, the stuff had different interfering methods.



7. Statistical analysis and Conclusion

For the Latin-square design, here it is obviously a crossover Latin-square design since each group will have the different treatment in different period of time. The statistical result for this 3x3 Latin-square is that no F(2,2) ratio (ranging from 16.63 to less than 0) is statistically significant. Thus we can draw the conclusion that three areas of the group functioning were not differentially modified by the three levels of staff intervention with the groups.


8. If you were the researcher, how would you improve the study? Be specific

The author’s article is well organized and with clear English expressions. However in the view of the statistics writing and design rationale, I want to argue in the following issues:


1) The design rational. It is quite obvious that the author is using the crossover Latin-square design for his research. In order to simplify the situation, let us assume that the patient transferred to or from the ward have NO major influence on the dependent variables. Then the author has used a standard crossover Latin-square design here, with the nuisance variables: the group difference and the time period difference. Now the natural question is whether the variables should be taken as nuisance variables? Let us look at the period time difference first, in the article, the author DID NOT explain the reason to regard it as the nuisance variable, however from the normal sense of crossover Latin-square design, we know it REQUIRES that there is NO carryover effects for the different treatments on different period of time! The article never mentioned anything about carry over effect, and this is highly questionable since the different intervention levels of the stuff seemed more probably to have the carryover effect on the patients’ minds. For example, the group that got the high level intervention might feel happier when they had the low level intervention later. And it might affect the group performance and functioning and so on. Therefore the carryover effect here is very important but we did not see any explanation in the article.  Next let us look at the second nuisance variable: the group difference. This is even more serious problem. Since we know that this variable is the blocking variable, which assume in different block we should have different row effects, however from the article these groups were created by RANDOMLY ASSIGNING! Is that supposed to be the blocking variables? It is really doubtful here, and again the author did not explain it at all! In fact, this kind of suspect can be partial clarified if the author can provide the statistical analysis result in detail, like the SAS output, so we can see whether the blocking variables here are reasonable or not. But unfortunately, nothing is provided. Besides the doubt of nuisance variables, we can see the sample size is only 3, since the research is focused on the group performance. This should be considered insufficient statistics data due to the small degree of freedom, and according to Kirk’s book, any Latin-square less than 5x5 with single data in each block should not be used! The small degree of freedom (F(2,2)) is totally unconvincing statistical result! Also there are other conditions like the interaction within any two variables, the transferred patients in the study will also challenge the validity of this statistical model here. Well to summarize, I think the existence of Carryover effect will make the whole crossover Latin-square to be nonsense, and the insufficient data make this statistics result to be nonsense! If we assume that NO carryover effects and all other interaction effects, then we can consider the block variable again, like the different seriousness of the patients, and so on, more important, we need more blocks rather than 3! I think 9 or 12 blocks of this design might be effectively efficient. If the Carryover effect does exist, then I would say , try to consider a repeated measure (but more things like Sphericity condition need to be considered) or simply choose more groups as the observations and do CR-3 design in the 6 weeks study.        


2) The statistical description is insufficient. In the article we even can not tell which significant level the author has used! The mean data table is not sufficient to rerun the test, and not sufficient to calculate efficiency. The exact p-value were not provided yet, which sometime will provide more information.


In a word, this article is terrible in the sense of statistical analysis. I really doubt that the author knows what he was doing there!