Evaluative Summary of Article on
Split-Plot Factorial (SPF-p.q) Design
1. Background Information
Kurpius, D. J., Benjamin, D., & Morran, D. K. (1985). Effects of teaching a cognitive strategy on counselor trainee internal dialogue and clinical hypothesis formulation. Journal of Counseling Psychology, 32(2), 263-271.
2. Abstract
The author applied a Split-Plot Factorial design with one fixed-effect between subject variable (treatment or training methods on case conceptualization) and one fixed-effect within subject variable (time or duration between repeated tests) in order to compare the effectiveness of training methods in improving counselor trainees’ ability of conceptualizing a counseling case. The between subject variable consists of 4 levels, which include 3 training conditions (strategy, knowledge, and combined), and a placebo control group. On the other hand, the within subject variable has only 2 levels, which are posttest and 6-week follow-up test. Reviewing the previous studies, the authors argue that traditional microskills training approach has limitations in applying to different situations because of the difficulty of generalization by using microskills. They suggest an alternative way of teaching counseling skills, specifically case conceptualization skills, which is based on cognitive process training. The cognitive process training means teaching counseling trainees how to use internal dialogue in order to conceptualize a counseling case or to produce clinical hypotheses. Although they did not specify their research hypotheses, the authors had three questions that they wanted to answer through the research: (1) which, if any, training conditions are superior to the placebo control condition? (2) Are some training conditions superior to others? (3) Will posttest difference be maintained at the 6-week follow-up? The researchers recruited the participants from graduate prepracticum counseling courses. Originally 39 students volunteered, but only 32 were used for the data analysis. After receiving one of the 4 kinds of training based on their assigned condition, the participants took a case conceptualization test and then they took a follow-up test 6 weeks later. The results indicated that the strategy condition and the combined (strategy and knowledge) conditions were more effective than other training conditions in increasing the counselor trainees’ case conceptualization ability on the posttest, but there was no difference of the effectiveness of the four training conditions on the follow-up test. In addition, the knowledge condition was not proved to be effective in improving counselor trainees’ case conceptualization ability.
3. Null hypothesis, alpha (or p) level, and sample size per group
The three null hypotheses for the two-way ANOVA with repeated measures
on one factor are
for all
,
for all
, and
for all
and
. The first null hypothesis
means that there is no treatment effect caused by the four kinds of conditions.
That is, the strategy, knowledge, combined, and control group will
show no difference in the ability of conceptualizing a counseling case or
producing hypotheses about a counseling case.
The second null hypothesis indicates that there is no treatment effect
caused by the duration between the posttest and the follow-up test, which
means that the duration will not affect participants’ ability of conceptualizing
a case. The third null hypothesis assumes that there will be no interaction
effect between the four kinds of training methods and the time duration.
That is, it is assumed that a specific combination of the treatment
factors, e.g. strategy and posttest, will not cause more or less counselor
trainees’ case conceptualization ability compared to other combinations of
the treatment factors.
The researchers did not state any specific hypotheses in the article. However, as I wrote in the abstract section,
they had three research questions, which imply their research hypotheses. The null hypotheses for the research question
number 1 and 2 can be tested only through a post hoc procedure after finding
out statistical significance at the ANOVA test. Thus, for the research question number 1 and
2, the implied null hypothesis can be expressed in the following way: ![]()
for all
and
(
=1~4,
=1~4, & ![]()
![]()
). The fourth null hypothesis
indicates there is no
significant difference between any two groups of the four training conditions.
In addition, the research question number 3 implies the null hypothesis
number 3, which is about the interaction effect between the factor A (between
subject= four kinds of training conditions) and the factor B (within subject=time/duration). The null hypothesis about the interaction effect
in the research question 3 is implied because in the sentence “the posttest
difference” implies the effect of factor A and “be maintained at the follow-up
test” the effect of factor B.
The
level for the ANOVA test was not reported, but the p level was
reported. A significant F-ratio was
reported at the interaction between the factor A (4 kinds of training conditions)
and the factor B (time) for the Thought-Listing measure (explained in the
instrument section), and at the factor A (4 kinds of training conditions)
for the Clinical Hypothesis measure (explained in the instrument section)
with the p<.05 for each.
The total number (N) of the participants is 32 although the original participants were 39. According to the researchers, the participants were randomly assigned to one of the four training conditions. Although some cells had more than 8 before some participants dropped out, the researchers equalized the number of each group members as 8 for the convenience of the statistical analysis.
4. Independent and dependent variables
Two fixed independent variables are adopted, which are type of training methods/conditions and time/duration. There are four types of training methods: strategy, knowledge, combined (strategy & knowledge), and placebo. The within subject variable (time) consists of two levels: posttest and follow-up test.
Although the dependent variable is not articulated in the article, I think the dependent variable for this research is the counselor trainees’ ability of conceptualizing or producing clinical hypotheses about a counseling case. The degree of the dependent variable was expressed with the test scores of two instruments, which are explained in the next section.
5. Instrument, comment on its reliability and validity.
The researchers used the thought-listing (TL) technique, and a brief retrospective self-report questionnaire in order to assess counselor trainees’ ability of conceptualizing a case or producing clinical hypotheses. According to the researchers, the TL technique was developed by Cacioppo and Petty (1981), and it provided participants with more than enough empty boxes (10 boxes per page) to list one thought per box. In this research, the participants wrote their clinical thoughts after seeing each of 8 short videos where a role-playing client talks about her problem. In Cacioppo and Petty’s research, the interrater reliability of the TL technique when two raters were used was .95. Using a brief retrospective self-report questionnaire, the researchers tried to assess participants’ ability to conceptualize or produce clinical hypotheses (CH) about the videotaped client. The participants were allowed to describe their clinical hypotheses (CH) by being instructed to describe their assessment of the client’s problem(s) and their rationale for such an assessment.
In addition, to rate these two kinds of test results, the TL and the CH, the researchers used two raters. There was not specific information about how the raters were trained in order to establish a good reliability and validity of the raters. The authors, however, reported interrater reliability coefficients for the obtained scores, which were .91 for the TL and .98 for the CH. Thus, the rating was very reliable and valid.
6. Experimental procedure
The participants were graduate students who were taking prepracticum counseling courses. Originally 39 students volunteered for the research, but 5 were not able to take the follow-up test and 2 were randomly removed from the study to make the cell number of the design equal. Thus, total 32 students’ test results were used for the data analysis. The gender ratio of the participants was approximately 2 female to 1 male. The researchers randomly assigned 8 participants to each of the four treatment groups: strategy, knowledge, combined, and placebo control group. The researchers used class time to maximize participation and motivation of all students.
The experiment was conducted as the following. According to their assigned condition, the participants learned how to use the techniques through an instructor’s teaching. The instructor used a board on which the points of the teaching were written. The participants in the three treatment groups practiced the learned techniques by applying them to 6 videotape client statements. There was 2 minute of black tape playing between each 6 videotape vignettes, and the practice was conducted during that time. During the practice, the instructor reminded the participants of the learned techniques and also demonstrated how to use them. After the last vignette, the participants were asked to think about their own clinical hypotheses regarding the role-played client’s problem for 2 minutes and then the instructor read an example hypothesis. On the other hand, the participants in the control group watched a film on another aspect of self-observation, Kagan’s (1973) Elements of Facilitating Communication. This intervention involved group discussions and was guided by an instructor. The participants in the control group did not watch the six videotape vignettes. The instruction time for the strategy, knowledge, and control group required about 30 minutes, and 40 minutes for the combined group. After finishing the instruction segment, all the participants took a posttest. They were given directions on how to fill out the TL instrument, and then watched 8 videotape vignettes. There were 90 seconds pauses between each vignette for the participants to fill out the TL. Right after fining the last TL, the participants completed the retrospective self-report questionnaire about their clinical hypothesis (CH) concerning the client’s problem(s) and the rational for their hypothesis. In 6 weeks, a follow-up test was conducted with the same material.
The techniques that the three treatment groups learned are the following. First, the participants in the strategy group learned cognitive self-instruction strategy. This strategy emphasizes on thinking aloud about the counseling task, asking and answering questions about the counseling task, using internal dialogue coping skills, and giving positive self-reinforcement. Second, the participants of the knowledge group were taught clinical hypothesis knowledge. This treatment focuses on finding client’s major problems or theme, significant internal (e.g. personal characteristics) and external (e.g. relationships and social contexts) factors, the connection between these internal and external factors, and the cognitive, affective, and behavioral elements of the client’s problem. Third, the members of the combined groups learned both techniques of strategy and knowledge.
A white male played the client role for the 6 practice videotape vignettes, and a white female did for the 8 test videotape vignettes. However, it was not revealed who was the instructor for each group and was not clearly stated how many instructors were used.
The instructor was prepared for accurate presentation of the treatments by repeatedly rehearsing with a detailed written procedures and getting feedback until two expert judges agreed that each treatment condition was accurately provided.
7. Statistical analysis and conclusion
A 4 (strategy, knowledge, combined, and control group) x 2 (posttest
and follow-up test) analysis of variance for SPF-4·2 was conducted separately
on each results of the TL and the CH measures.
The researchers found the violation of variance homogeneity assumption
for TL scores by using Hartley’s (1950)
statistic, but found no violation for CH scores. Because they could not correct the violation
by using other methods (square-root, reciprocal, logarithmic, and inverse
sine), the researchers used the original TL scores in a two-step analysis:
First, ANOVA test was conducted, and second, for the significant findings,
the Mann-Whitney
statistic (1947), nonparametric procedure, was applied.
ANOVA results for TL scores revealed a significant interaction effect between treatments and time factors, F (3, 28)=3.36, p<.05. The researchers conducted the Tukey’s procedure on the four training conditions at both the posttest and follow-up, and posttest versus follow-up within each training condition, which means that they looked at the main-effects of the four training conditions and the simple main-effects of the interaction between four training conditions and time. Then, they reported that strategy group and the combined group scored significantly higher on TL than either the knowledge or the control group on the posttest, which was about the main-effects of the four training conditions. In addition, they also reported that only the posttest versus follow-up difference for the strategy group was significant with higher scores at the posttest than the follow-up, which was about the simple main-effects of the interaction between four training conditions and time.
ANOVA results for CH scores reported a significant finding at the training conditions, F (3, 28) = 3.34, p<.05. The Tukey post hoc procedure was conducted among all training conditions on posttest and follow-up scores separately. The researchers reported that strategy and combined groups scored significantly higher on CH than either the knowledge or control group at the posttest. No other comparisons on the posttest or the follow-up test were found to be significant.
In regard to the first research question, the researchers concluded that strategy and combined (strategy and knowledge) conditions were superior to the control condition for both TL and CH posttest measures. The knowledge condition was not superior to the control condition. The researcher suggested interpretations on the finding. That is, the researchers argue that the strategy treatment is likely to have developed a “cognitive map” that enabled the participants to retain relevant client information while eliminating irrelevant information. They further argued that the strategy treatment participants were able to apply their more productive internal dialogue by writing superior clinical hypotheses. Regarding the second research question, the researchers concluded that the strategy treatment, either alone or in combination with the knowledge treatment, was superior to knowledge on the posttest. They also concluded that the strategy alone is the most effective among the four treatment conditions when comparing the cell mean scores. They argued that the strategy condition taught the participants how to seek relevant client information while the knowledge condition taught the participants what to seek for. They argue that once the participants got to know how to conceptualize the client, then they were able to infer about what to look for. However, the participants treated with the knowledge alone condition were limited in applying what is known because of the lack of cognitive strategy. That is why the strategy condition was more superior to the knowledge condition. For the reason that the strategy alone condition seemed to be more superior to the combined condition, the researchers argue that the knowledge component worked as more of a cognitive distracter than as a supplement. Concerning the third research question, the researchers concluded that no treatment effects were still evident at the follow-up. They argued that the short period of treatment, which was about 30 to 40 minutes, was the reason for the no long-term effect of the treatment.
8. If you were the researcher, how would you improve the study?
I think the researchers did a good job in reviewing the relevant previous
studies and describing the experimental procedure in detail. They also well designed the research and used an appropriate research
design. They were right to use a SPF
design because they used repeated measures on time while expecting an interaction
effect between the two independent variables. They also provided easily readable ANOVA tables
with sufficient information. You can
obtain further information, such as
, using the information provided on the ANOVA tables. Their interpretations on the results were thought
provoking, and comments on the limitations of the study, such as relatively
small number of participants, possible effect from the course work, and unsophisticated
scoring system, were accurate. Thus, overall the authors did a good job in designing the research,
in conducting it, and in reporting the results. Nevertheless, I would like to point out some
area that may need improvement.
First, I would set up research hypotheses as much specific as I could. Although the researchers had specific research questions, it would be better to state specific hypotheses since the researchers had specific ideas about cognitive process and possible effectiveness of the internal dialogue that will be taught by the strategy.
Second, I would specify about which cell members dropped out of the research to increase the credibility of the research procedure. The researchers reported that originally 39 students volunteered for the research, then they were randomly assigned to each treatment condition. So, how many were for each condition? Since there were 4 groups, one group probably had 9 members while the other three groups had 10. Now, after taking posttest, five participants were not available for the follow-up test. So, which cell members did not take the posttest? It seems that luckily the 5 were not from a few groups because the researchers were able to keep the equal cell number by deleting 2 more participants. I would like to believe that way, but it is certain because there are other possibilities that the researchers could not have been able to keep the equal cell number due to a possibility that 3 of them are from one group with 10 members or 2 from the group with 9 members.
Third, I would specify how many instructors were used to present the treatment conditions. It seems only one instructor was used since the researchers used the word “the instructor” in the Instructor/Model section on page 264. Regardless how many instructors they used, I would provide a rationale for using that number of instructor since there are four kinds of treatment conditions including the control condition and I think each counseling expert or faculty member has different orientation and different type of skills. Thus, even though there was a detailed written format to follow, you cannot remove possible nuisance variable due to the idiosyncrasy of the instructor. If only one instructor was used, then I wonder how much equal quality for each instruction was given by the same instructor.
Forth, along with the third point, I would specify who was the instructor, who were the two expert judges that gave feedback to the instructor, and who were the two raters of the tests because if one of those was very deeply involved in the research, actually in this research it seems likely, that person could bring bias to the research and the results.
Fifth, I would clarify what is the dependent variable for clearer understanding about what I am trying to measure. The researchers operationalized dependent variable by using the different scales, the TL and the CH, but never stated what was the dependent variable. I assumed that the dependent variable was counselor trainees’ ability of conceptualizing a counseling case.
Sixth, I would mention if the sphericity assumption, which is for a valid F test of treatments by time at the ANOVA for the TL scores, was violated or not although the assumption is known to be almost always violated. In this research design, however, because there are only 2 levels of within subjects variable, the most conservative Geiser-Greenhouse procedure has the same degrees of freedom with the most liberal conventional test, thus they could have induced the same results in detecting a statistical significance of the treatment by time interaction.
Seventh, for the control condition I would adopt one kind of a real training condition, such as a traditional method if there is any, not a placebo, or I would just eliminate the placebo control group because I think training almost always produces better results than no-training in improving whatever skills.
Eighth, I would not conduct a post hoc comparison test separately on the posttest and the follow-up test because it is not a logical way of doing test after conducting ANOVA test of a SPF design.
Ninth, I would not compare any contrasts if they do not have a statistically significant result because it is rather meaningless. The researchers write, “Strategy subjects scored higher than combined subjects on both measure, though not in a statistically significant degree”(p.268).
Tenth, I would be more clear about my research interests in the factor B and the interaction if I use the SPF design. In other words, if I am really interested in the effects of the factor B, “time”, and the interaction effects between factor A and B, I would definitely use the SPF design. I am a little bit doubtful about the researchers’ interest in the factor B and the interaction. First, they had 2 research questions regarding the factor A (treatments), but had only one research question about the interaction effect. It seems that they were more interested in finding out the treatment effects, but not much about the time effect. Besides, they used the class time for the research procedure, which I think caused some interaction among participants of different groups and probably some skills for a specific group may have been reveled to other groups. As you see the results of the follow-up test, interestingly the control group did as well as the other groups. Thus, I would either carefully conduct a research to get valid results to test the effects of the factor B and the interaction between the factor A and B if I use SPF design, or I would use another design if I am only interested in testing the treatment effects (factor A). For the latter case, I may use a Nested design or a Completely Randomized design to test the effects of the treatment conditions.