Evaluative Summary of Articles on
Two-way ANOVA CRF-pq Design
Submitted by Kathleen S. Burger

1. Background Information
Authors: Justen, III, J., Waldrop, P., and Adams, II, T.
Title: Effects of Paired versus Individual user Computer-Assisted Instruction and type of feedback on student achievement.
Source: Educational Technology
Year: July 1990, 51-53.

2. Abstract
Building on previous research concerning Computer-Assisted Instruction (CAI), this study was designed to determine if two students using one computer at the same time would influence the relative effectiveness of various Feedback conditions. The authors used a 2 (type of CAI - Individual / Paired) X 2 (type of Feedback - Extended / Minimal) fixed-effect factorial design in this investigation. They hypothesized that there would be no significant difference in student performance between Paired and Individual use of CAI or between Minimal and Extended Feedback conditions, and no significant interaction between the factors type of Feedback and type of CAI in a Computer-Assisted Instructional tutorial. An identical twenty question, multiple-choice test was administered to each of the 68 subjects. Using these scores, Analysis of variance (ANOVA) procedures yielded a significant main effect for type of Feedback (Extended/Minimal) [F (1,64) = 5.43, p= .02], but no significant main effect for type of CAI [F (1,64)= .28, p>.05] or for interaction [F (1,64)=3.36, p>.05]. Students in the Minimal Feedback condition answered more test questions correctly (M=13) than did students in the Extended Feedback condition (M=11.42). This finding favoring Minimal Feedback condition was surprising and inconsistent with previous research literature findings favoring Extended Feedback conditions.

3. Null hypothesis, alpha (or p) - level, and sample size per group.
The null hypotheses to be tested were stated as follows:
(1) There is no significant difference in student performance between Paired and Individual use of Computer-Assisted Instruction. (Ho1: mp = mi)
(2) There is no significant difference in student performance between Minimal and Extended Feedback conditions on a Computer-Assisted Instruction task. (Ho2: mm = me)
(3) There is no significant interaction between type of Feedback and number of users in a Computer-Assisted Instruction task. (Ho3: mf = mn)
The information provided about the alpha and p levels is scant. All that is reported about the alpha
level is that the alpha level of .05 was used to determine significance. In the first table, which reported the results of the Analysis of Variance, the p value (.02) of the significant finding is reported. This indicates, to the credit of the researchers, that they note the difference between the alpha level and the p value.

Table 1
Results of Analysis of Variance
Source SS df MS F
Type of CAI 2.22 1 2.22 .28
Type of Feedback 43.05 1 43.05 5.43*
CAI X Feedback 26.59 1 26.59 3.36
Error 507.06 64 7.92
*p=.02

It would have been informative if Table 1 included a column containing all the p values in addition to a row containing Totals for df, SS, and MS. Total sample size, N=68, was reported in the narration, while sample size per cell, test score means, and standard deviation were reported in the second table.

Table 2
Group Means and Standard Deviations forCriterion Measures
Group N M SD
Paired/Minimal Feedback 19 13.79 2.92
Paired/Extended Feedback 14 10.93 1.98
Individual/Minimal Feedback 18 12.17 3.00
Individual/Extended Feedback 17 11.82 3.07

4. Independent and dependent variables
There were two fixed independent variables (factors): Type of Feedback and Type of Computer-Aided Instruction. Each factor had two levels. Type of Feedback levels were Minimal and Extended Feedback, while Type of Computer-Aided Instruction levels were Individual and Paired Approach.
The dependent variable (the criterion measure) was scores from a 20 item multiple-choice test. Each item contained four choices and all subjects received the same test.
It may be that equivocating the dependent variable with scores is not "best practices." Rather, the authors could have suggested a characteristic to serve as a target of the data collection efforts.

5. Instrument, briefly comment on its reliability and validity
The authors explain that the test consisted of 20 multiple-choice questions (4 options each) and that it was administered to all subjects.The authors did not mention if the company that developed the CAI tutorial also constructed this test or if the researchers constructed the test. Unfortunately, there is no information provided concerning the psychometric properties of this test and therefore, the validity and reliability of this instrument are impossible to determine. This serious omission must be kept in mind when considering the overall validity of the conclusions obtained as a result of these scores.

6. Experimental Procedure
Participants were students enrolled in four upper division education courses. Written consent was obtained after subjects were informed of the nature of the study. There was no further information provided concerning the participants (gender, SES, etc.), which seriously limits generalizability and the possibility of replicating the study. This article also does not address the degree to which subjects were or were not willing volunteers. It would be interesting to note if there were any students in the classes who did not participate in the experiment, and have information about them as well.
The authors write, "two classes were randomly assigned to the Paired use treatment and the remaining two classes were assigned to the Individual use treatment" (p. 51). The wording in this passage is imprecise. It is unclear whether the classes were randomly assigned or whether the participants in the classes were randomly assigned. Further, the reader is left to wonder whether the other two other classes (or participants in the classes) were or were not randomly assigned since the authors simply write that they "were assigned."
The four groups (Paired Approach with Minimal Feedback, Paired Approach with Extended Feedback, Individual Approach with Minimal Feedback, and Individual Approach with Extended Feedback) participated in Computer-Assisted Instruction tutorials. The CAI content consisted of six lessons related to hypotheses testing research. McGraw Hill Courseware Authoring System designed the tutorial. Unfortunately, no information was provided concerning this courseware. The authors do write, however, that the modules were identical except for the Feedback frames. The modules either contained Extended Feedback or Minimal Feedback.
All subjects were tested using the same multiple-choice test. There is no information concerning the psychometric properties of the test. There is no specific information indicating whether subjects were tested after each module or after the six lessons were completed, although the text implies one test was used one time. Further, no mention was made concerning the length of time this process lasted; whether all modules were presented during one session or over a period of several weeks or months. There was no mention of a debriefing process.

7. Statistical Analysis and conclusion
The authors used a 2 (type of CAI - Individual / Paired) X 2 (type of Feedback - Extended / Minimal) fixed-effect factorial design in this investigation. They hypothesized that there would be no significant difference in student performance between Paired and Individual use of Computer-Assisted Instruction or between Minimal and Extended Feedback conditions, and no significant interaction between type of Feedback and number of users in a Computer-Assisted Instructional tutorial. An identical twenty question, multiple-choice test was administered to each of the 68 subjects. Using these scores, Analysis of variance (ANOVA) procedures yielded a significant main effect for type of Feedback (Extended/Minimal) [F (1,64) = 5.43, p= .02], but no significant main effect for type of CAI [F (1,64)= .28, p>.05] or for interaction [F (1,64)=3.36, p>.05]. Students in the Minimal Feedback condition answered more test questions correctly (M=13) than did students in the Extended Feedback condition (M=11.42). No further analyses were reported.
According the authors, the finding favoring the Minimal Feedback condition was surprising and inconsistent with previous research literature findings favoring Extended Feedback conditions. In an effort to explain this inconsistent finding, the authors propose that the difficulty level of the material learned may account for these results. From there, they speculate that perhaps the difficulty level plays a role in the effectiveness of the Feedback. This possibility has face validity and while it provides a future direction for research, it may serve as a threat to the validity of this study since it was not controlled for.
The author's conclusion to this study, that "the results suggest that group instruction with computer-assisted tasks is a viable means of providing CAI" (p. 52, 53), is interesting. It is interesting because this idea does not happen to be one of the hypotheses tested. The tested hypothesis that comes closest to pertaining to this suggested result was "There is no significant difference in student performance between Paired and Individual use of Computer-Assisted Instruction. (Ho1: mp = mi)." And, it seems as if the authors overlooked the results of their statistical procedures, as this null hypothesis was not rejected. In other words, based on their data and statistical procedures, there was not a significant difference between the mean scores from the tests they administered when comparing the Paired vs. Individual use of CAI. These results imply that test scores of those who worked on the tutorial individually were quite similar to those who worked on the tutorial in pairs. The results do not directly pertain to their assertion that "group instruction is a viable means of providing CAI." The only precise conclusions the authors can make, and even these are suspect based on the limitations of this study, is that the difference between the score means for Type of Feedback (Minimum vs. Extended) conditions was statically significant and the other hypotheses tested did not yield statistically significant results.

8. If you were the researcher, how would you improve the study?
To begin, the purpose of an experiment is to obtain an answer to or insight about a specific research question or questions. To accomplish this, the research question(s) must be precise. It may be that the authors did generate a precise research question, but their formally stated conclusions do not pertain to the question they asked. The question they proposed was "does the paired use of computers influence the relative effectiveness of various feedback conditions?" (p. 51). Their conclusion was "the results suggest that group instruction with computer-assisted tasks is a viable means of providing CAI" (p. 52, 53). They further wrote that this study "found students performed better under Minimal Feedback conditions" (p. 53). The second conclusion is appropriate, in terms of it being a finding from this study. It does not, however, answer the proposed research question. While analysis of variance (ANOVA) suggested that there was a significant difference in test scores between Minimal and Extended Feedback conditions, it also suggested that there was no significant main effect for Paired vs. Individual use of CAI and no significant interaction effect between Type of Feedback and Number of Users in a CAI task. To improve this study, the research question(s), hypotheses, and conclusions must address the same issues. Further, all findings must be addressed.
There are many threats to the validity and reliability of this study. The supporting, contextual information that could have clarified important issues was omitted, leaving critical issues unanswered. Without further information on these issues, the entire study must be viewed with great caution. First, the conclusions drawn in a research study are no better than the data on which they were based. The test instrument used to obtain the scores remains a mystery. We do not know if the researchers designed the test or whether it was designed by the McGraw Hill Courseware Authoring System. The only information provided is that it is a 20-item multiple-choice test (4 options) that was administered to all of the participants in the study. The reliability of this instrument could have been evaluated. Perhaps a Cronbach's alpha method could be used to assess the internal consistency of the instrument. Since random sampling was not conducted, and information regarding the random assignment is vague, it is difficult to know if the four groups are "equivalent." This being said, it is therefore difficult to know whether test scores (and resulting conclusions) are due to true differences resulting from the treatments, or from differing groups, from an unreliable test, or a combination of all of these factors. There is no mention of the instruments content validity, criterion-related validity, or construct validity. Construct validity can be assessed through factor analysis. Multiple approaches to assess instrument quality increases confidence that results obtained accurately represent what the researchers want to measure.
Similarly, the authors provided no information regarding the Computer-Aided Instructional tutorial designed by McGraw Hill Courseware Authoring System. No information is provided concerning the level of instruction or previous results obtained by using this tutorial. Rich and thorough description concerning the tutorial and the test must be provided.
Rich description regarding the subjects was neglected. All that was included is that they were enrolled in one of four upper level education courses. Again, contextual information helps the reader to understand the population from which the samples were drawn and to which the results may be generalized. Demographics, previous educational experience, gender, and similar information is pertinent. In addition, there needs to be clarification regarding the method of group assignment. It is unclear whether the students in the classes were randomly assigned or whether the classes (as stated by the authors) were randomly assigned.
This is a particularly important distinction due to the assumptions of ANOVA: randomness, independence, normality, and homogeneity of variance. As pointed out in Huck, "the randomness and independence assumptions can ruin a study if they are violated" (p. 417). Having unequal numbers of subjects in each cell leads to loss of statistical power. Issues such as these require planning during the design phase of the experiment. It would be advisable for the researchers to concern themselves with the normality and homogeneity of variance assumptions. Hartley's F-max test for equal population variance could be used for this purpose. In the case of this study, the F-test may very well be biased, causing the F-test to be either too large or too small. If the F-test is too large, the computed p-value associated with a calculated F-value will be too small. When this occurs, the amount that the data deviates from the null is exaggerated and the alpha level will understate the probability of a Type-1 error. If the bias is negative, the p-values associated with the F-values will be too large, and the researcher may not reject a null hypothesis that would have been rejected if the p-value were unbiased (Huck, 419).
The researchers did not perform any statistical analyses other than the ANOVA. Since it is possible for results to be statistically significant but not practically significant, the possibility of this occurring should be considered. Computing the effect size indices can provide information concerning the practical significance of the results. Power analysis is another procedures that analyses practical significance. Power analysis can be conducted during the design phase to help determine if the experiment is worthwhile to conduct, or it can be conducted after data is collected to see if there was sufficient power associated with the completed statistical test(s). Finally, strength of association measures such as eta squared and omega squared can be computed. Post Hoc measures were not necessary since this was a 2X2 analysis of variance.
It must be noted that simply because one of the null hypotheses was rejected and two of null hypotheses were not rejected, it may be a result of Type I or Type II error. Small sample sizes, unreliable measuring instruments, or too much within-group variability are some of the factors that could have affected the results. Because of the fundamental, numerous, and serious problems with this study, this study must be ignored or redesigned with attention to issues of planning, sampling, instrumentation, statistical analysis, and clarity of presentation.