Evaluative Summary of an article on

SPF-pr.q--Split Plot Factorial Design

 

1.     Background Information
          
Hess, C. M. & Shrigley, R. (1981). A study of the effect of three modes of teaching on metric knowledge and attitude. Science Education, 65(2), 131-138.

2.      Abstract
The purpose of this study was to examine the effectiveness of three instructional modes for teaching metric knowledge and raising the level of positive metric attitudes of preservice elementary teachers who include both high and low math achievers. 

A 32.2 split-plot factorial design (SPF-32.2) was employed with 3 Instructional Modes (expository, modular, and gaming) and 2 levels of Math Abilities (high math achiever and low math achiever) as between-subjects factors, and time of testing (pretest and posttest) as within-subjects factor. A total of 82 preservice teachers (N = 82) with 41 high math achievers and 41 low math achievers identified from 6 classes of preservice teachers (n=141) who enrolled in a math education course at an eastern state college and were randomly assigned to the expository, modular, and gaming instruction. participated in the study. Twenty-six out of the 82 preservice teachers were in expository group, 30 preservice teachers in modular group, and 26 preservice teachers in gaming group. 

According to the researchers, there was no significant difference in preservice teachers’ metric knowledge whether they were taught by expository, modular, or gaming methods. Second, high math achievers and low math achievers gained similarly in metric knowledge. In addition, preservice teachers achieved similar scores in metric attitudes whether they were taught by expository, modular or gaming methods. Finally, high math achievers and low math achievers gained similarly in metric attitudes.
  

3.      Null hypothesis, alpha (or p) level, and sample size per group.
As stated by the authors, there were 4 null hypotheses examined in this design. The null hypotheses are as follows:

Ho1: There is no significant difference in the mean scores on metric knowledge
         whether subjects are taught with the expository, modular, or gaming treatments.
Ho2: There is no significant difference in the mean gain scores on metric knowledge
         between high math achievers and low math achievers.
Ho3: There is no significant difference in the mean scores on metric attitudes whether
         subjects are taught with the expository, modular, or gaming treatments.
Ho4: There is no significant difference in the mean gain scores on metric attitudes
         between high math achievers and low math achievers.

Since the authors were interested to investigate the effectiveness of three instructional modes for teaching metric knowledge and raising the level of positive metric attitudes of preservice elementary teachers who include both high and low math achievers, it would be better and clearer if the hypotheses were revised as follows:

Ho1: There is no significant difference in the preservice teachers’ gain scores on
          metric knowledge whether they are taught with the expository, modular, or
          gaming methods.
Ho2: There is no significant difference in the preservice teachers’ gain scores on metric
         knowledge between high math achievers and low math achievers.
Ho3: There is no significant difference in the preservice teachers’ gain scores on metric
         attitudes whether they are taught with the expository, modular, or gaming
         treatments.
Ho4: There is no significant difference in the preservice teachers’ gain scores on metric
         attitudes between high math achievers and low math achievers.

If the use of SPF-32.2 was appropriate, 8 other hypotheses concerning interactions between Instructional Modes, Math ability, and Time of Testing on preservice teachers’ metric knowledge and metric attitudes should be considered. However, since SPF-32.2 design was not appropriate for the purpose of this research, I would not provide these hypotheses here. Instead if a correct design, 2-way CRF-32 design (the reasons for using 2-way CRF-32 design are given in question #8) was employed in this study, two other hypotheses concerning interactions between Instructional Modes and Math ability on preservice teachers’ metric knowledge and metric attitudes should be considered. These two hypotheses are stated as below.

Ho5: There is no significant interaction between Instructional Modes and Math ability
         on preservice teachers’ metric knowledge.
Ho6: There is no significant interaction between Instructional Modes and Math ability
         on preservice teachers’ metric attitudes.

For F-test, the  level was not stated, but the p level was reported in the ANOVA table.  In ANOVA test of Metric Knowledge, two significant F-ratios, main effect of Math Ability, and main effect of Time of Testing (Pre-post testing), were reported with the p< .01 level. In ANOVA test of Metric Attitude, two F-ratio were reported significant with main effect of Time of Testing (Pre-post testing) at p<. 01, and interaction effect of Time of Testing x Instructional Modes at p< .05.  Although the authors did not specify the  level for F test, they mentioned if the interaction between factors was significant, the simple main effects would be tested at a= .05 level. 

This study consisted of 82 preservice teachers. The 82 subjects were identified from 6 classes of preservice teachers (n=141) who enrolled in a math education course at an eastern state college, were randomly assigned to the expository, modular, and gaming instruction and were further identified as high math achievers (n=41) and low math achievers (n=41). Twenty-six out of the 82 preservice teachers were in expository method, 30 in modular method, and 26 were in gaming method. From the information given in this article, it is not clear how many subjects were in each cell (combination of Instructional Mode and Math Ability).

4.      Independent and Dependent Variables
If the SPF-32.2 design was correct, there would be three independent variables: Instruction Modes, Math Ability as the between-subjects factors and Time of Testing as the within-subjects factor. The first independent variable, Instruction Modes, has three levels: expository method, modular method, and gaming method. The second independent variable, Math Ability, consists of high math achievers and low math achievers. The third independent variable, Time of Testing, includes pretest and posttest of metric knowledge and metric attitudes.

There were two dependent variables, preservice teachers’ Metric Knowledge, and preservice teachers’ Metric Attitudes.

5.      Instrument, comment on its reliability and validity
For the independent variable, Math Ability, the authors used preservice teachers’ scores on the Sequential Tests of Educational Progress II, Form A, “Mathematics Concepts” to identify high math achievers and low math achievers. However, no reliability and validity evidence was provided regarding this measurement.

For the dependent variables, the Szabo-Trueblood Test of Metric Knowledge (STMK) and the Shrigley-Trueblood Metric Measurement Attitudes Scale (SMAS) were administered to test the preservice teachers’ Metric Knowledge and Metric Attitudes respectively. The STMK consists of 50 multiple-choice items which measure mastery of (1) knowledge of quantities within metric system, (2) comparison of metric measurements to common objects or standard measures, and (3) conversion between units of metric measurement. According to the authors, the STMK has a reliability of .93 when administered to other preservice teachers. Because a close comparison of the test items and the metric content being taught had been done, the authors claimed that the STMK is content valid.

The SMAS consists of 22 statements, 10 negative and 12 positive. The SMAS has a reliability of .92 when administered to other preservice teachers in 1977. For validity, the authors claimed the SMAS was validated by a factor analysis and Likert analysis as well as Edwards’ set of 13 criteria for attitude scale construction. 
 

6.      Experimental Procedure
This study used a SPF-32.2 design. Eighty-two preservice teachers with 41 high math achievers and 41 low math achievers were identified from 6 classes of preservice teachers (n=141) who enrolled in a math education course at an eastern state college and were randomly assigned to the expository, modular, and gaming instruction groups.

The three different instructional groups met simultaneously with three different instructors. Since it was impossible to have all three instruction groups taught by a single instructor and the researchers were also limited by the availability of only two equally qualified instructors, procedures such as control of teaching materials and the roles played by these instructors were taken to control instructor bias or personality differences.

Before the formal treatment, two classes of nonmetric pretreatment designed to familiarize all subjects with gaming and modular modes of learning were implemented. Following these two classes of pretreatment activities, each of the three different instructional groups were taught by three different instructors for three class hours of metric instruction.

7.      Statistical Analysis and Conclusions
A SPF-32.2 analysis of variance with Instructional Modes and Math Ability as between-subject factors and Time of Testing as within-subject factor was employed. According to the authors, in metric knowledge, the results showed that the main effect of Instructional Modes was not statistically significant indicating that type of instruction did not influence preservice teachers’ metric knowledge, F(2, 76) = 2.47, p >.05. The results of SPF-32.2 also showed that interaction effect of Math Ability x Time of Testing was not significant indicating there is no significant difference in preservice teachers’ gain scores in metric knowledge between high math achievers and low math achievers, F(1, 76) =.00, p>.05.

For the metric attitudes, the results showed that the main effect of Instructional Modes was not statistically significant indicating that instructional modes did not influence preservice teachers’ metric attitudes, F(2, 76) = 1.79, p >.05. In addition, the results showed that the interaction between Math Ability x Time of Testing was not significant indicating that there is no significant difference in preservice teachers’ gain scores in metric attitudes between high math achievers and low math achievers, F(1, 76) =1.38, p>.05. The results also suggested an interaction between Instructional Modes and Time of Testing was significant, F(2,76)=3.68, p<.05 (The authors employed simple main effects to test this interaction effect, but the result does not make any sense).

If the SPF-32.2 design was corrected, the conclusion concerning metric attitudes was questionable because significant interaction between Time of Testing and Instructional Modes was shown. Therefore, it was inappropriate to conclude main effect of Instructional Modes was insignificant. The researchers should consider the interaction effect first. However, since the authors used an incorrect research design, SPF-32.2, to analyze the data, the findings and conclusions might be incorrect.

8.      If you were the researcher, how would you improve the study?
The purpose of the study was to compare the effectiveness of three different instructional modes on the metric knowledge and metric attitudes of the preservice teachers who include both high math achievers and low math achievers. There were four research questions involved in the research. To answer these questions, it is inappropriate to test marginal means of the three different methods (here marginal means refer to the means of pretest and posttest of the three different instructional groups). The same argument can be applied to questions concerning math ability. It would be more appropriate to use 2-way CRF design with gain scores (difference between posttest and pretest) on metric knowledge and metric attitudes as dependent variables to seek answers for the foregoing questions. Furthermore, the effectiveness of the three different instructional modes might result in different metric knowledge and metric attitudes for high math achievers and low math achievers, therefore it is more appropriate to test the interaction between instructional modes and math ability. The 3 (Instructional Modes: expository, modular, and gaming) x 2 (Math Ability: high math achievers and low math achievers) analysis of variance with gains cores (the difference between posttest and pretest) of metric knowledge and metric attitudes as dependent variables is appropriate for answering these questions.

Second, as described by the authors, the 3 groups were taught by 3 different instructors and only 2 of the 3 instructors were equally qualified. Although they made many efforts to control instructor bias or personality differences, it was still possible that the instructors would have an impact on the preservice teachers’ metric knowledge and metric attitudes. Given that it was impossible for one instructor to teach three different methods, or having three equally qualified teachers to teach three different instruction groups simultaneously, if I were the researcher, to control the impact caused by the instructors, I would choose hierarchical design.
 
Furthermore, if we consider the relationship between the two dependent variables, Metric Knowledge and Metric Attitudes, we might want to use two-way MANOVA instead of two-way ANOVA to avoid an inflated type I error.

There were also other weaknesses of this study. If I were the researcher, the followings would also be taken into account.