James E. Mitchell
Associate Dean for Undergraduate Affairs, College of Engineering
Presented at NSF Conference for Coalition Evaluators 10/93, Baltimore MD
A chronology of the evaluation methods used in Drexel University's E4 curriculum revision experiment allows an appraisal of evaluation's impact. Most attention was given to Formative evaluation over the five year duration of the experiment resulting in a faculty desire to continue the process after the completion of the experiment. Quantitative Summative evaluation, performed mostly at the end of the experiment, had its major effect on constituencies outside the E4 faculty group, those deciding whether to adopt the curriculum for the entire College of Engineering. Evaluation of retention, not an original experimental goal, became particularly important in the program's adoption. Journal analysis, having both Formative and Summative roles, gives an excellent "feel" for the impact of the program.
Drexel University's E4 program is widely known in the Engineering Education community for its drastic reformulation of the first two years of the Engineering Curriculum. The goals, curriculum changes, and evaluation of the results have been reported elsewhere. [#65] A more detailed look at the process, evolution and uses of E4's evaluation methods may be beneficial for others engaged in similar experiments. The perspective I bring is of a latecomer to the E4 team. My charge was to lead the adoption of a modified E4 Curriculum by the entire College of Engineering. In that role I had to review the evidence addressing not only the program outcomes, but also the costs. I also now teach in the program.
The E4 experiment is five years old(Table 1). Evaluation of the experiment was planned from the
beginning. The actual evaluation process, however, has changed and developed greatly over that time.
External evaluation was planned in the original proposal and carried out throughout the experiment. The major technique was a yearly analysis of selected journals kept by students as part of their humanities course requirements. This analysis was supplemented by interviews with students at intervals during the life of the program, particulary those leaving the program. Broader external evaluation using a variety of standard measures was initially envisioned but was not completed.
Internal evaluation, usually thought of by the participants as "feedback", began with the first group of students. Most important undoubtedly were the weekly meetings instituted and continued throughout the life of the experiment. Faculty and students attended the meetings with conscious intent to evaluate what was happening, to coordinate, and to modify the progress of the courses being taught. End-of-term meetings reviewed overall progress and planned modifications for succeeding terms. Summer planning teams drew on the prior year's experience to modify the curricula for succeeding years. Humanities professors read the student logs in progress and contributed the insights drawn from those readings to the evaluation process. Term-end student questionnaires provided way-point evaluations of the process and the faculty involved - and led to changes in both.
Faculty also evaluated student performance in terms of project goals through "measurements of laboratory skills; evaluation of performance in several design projects; critiques of written and oral presentations; evaluation of performance in using the computer as an intellectual and professional tool; and the normal evaluations using homework, quizzes and examinations."
Quantitative measures of performance were compared to a matched control group. These measures included comparison of retention rates within the university, "on track" progress, and GPA in upper level courses once E4 students rejoined the control group in post-E4 courses.
Finally, a comparison of faculty resources required to teach in the "traditional" curriculum and the "E4" curriculum was developed using a detailed model of course credits, contact hours and actual time required per credit taught. This led to a generalized model of the entire five year "traditional" curriculum which could be compared to the new curriculum. FTE faculty requirements for each curriculum, derived from that analysis, were key data for the financial analysis of the impact of the shift from the "traditional" to the new curriculum.
Evaluation is usually categorized as either Formative or Summative rather than the Internal and External terminology used above. For an administrator coming fresh to field the terminology was initially confusing, but soon made great sense because of its different uses.
The majority of the efforts described above are clearly Formative . The major effort during most of the E4 experiment's life was to continually improve it by listening to all participants, then revising the curriculum as appropriate. The weekly meetings, term surveys, planning meetings and understanding of the journal contents were all of a formative nature. 10
A key, unexpected, outcome of this process is that the faculty involved came to value the formative measures for themselves. In the Spring of 1993 they developed a statement that was unanimously agreed upon by all participating faculty. Amongst other recommendations it strongly endorses the continuation of the formative evaluation steps defined above as a necessary ingredient for the continuing health of the curriculum and involvement of the faculty. This conclusion is also born out in the 1993 outside Evaluator's report on Journal Analysis and Faculty Interviews.
Summative evaluation is, in contrast to the dynamic nature of Formative evaluation, a static measure of achievement. It asks "what happened", not "what is happening". The Summative work was indeed performed primarily during the last year of the E4 experiment. Quantitative summative measures were primarily derived from University-kept statistics on student enrollment and Grade Point Averages (GPA's) comparing the E4 students to matched control groups on those measures. Qualitative summative work drew from the four years of journal analysis by the external evaluator as well as structured interviews with selected students and faculty. What follows are comments on the generation of these results rather than their interpretation.
Student Retention Analysis was arguably the single most important quantitative measure of the results of the E4 program. That number was first computed in the third year (1991) of the program. It showed considerable improvements in retention within the university for students who started in the E4 program - whether they remained in it or not. In a private, tuition-driven university that result immediately drew the attention of the Senior Administration and led to their continuing support for expansion of the program to the entire College of Engineering. Of particular note is the fact that retention was not a key element of the original E4 proposal. It is possible that, if the quantification of expected results from the E4 experiment had been firmly defined initially, the retention improvements might have been missed.
GPA Analysis was the second major quantitative result developed. It used the pool of students who volunteered but were not picked "from the hat" as a control group. This measure was extremely important to faculty within the College of Engineering in their consideration of whether to support expansion of the program for all students. Only when the analysis showed similar or mildly superior performance in their "post-E4" years were many of the faculty convinced that the unquantified aspects of the program (superior communication skills, teamwork, laboratory experience, problem solving) justified the change in curriculum. For those embarking on similar efforts it is worth noting the benefit of keeping good records and planning the analysis from the beginning. Considerable extra effort was necessary to document the control groups when the GPA analysis was performed in the final year.
Journal Analysis was probably not highly important as a Formative evaluation tool, though that possibility existed since the analysis was prepared yearly. Its appearance during the summer, when faculty minds were on other matters, probably contributed to its relative lack of impact. However, the impact on faculty and others not involved in the E4 program of the student experiences, thoughts, fears and triumphs was extraordinarily potent. Experiment Leader Quinn repeatedly used these quotations to illuminate concepts such as curriculum integration or laboratory experience by direct quotes from student logs. For those few willing to devote the necessary time, reading the evaluation report's carefully organized themes (presented primarily in the students own words) conveys, as no statistic can, the impact of the program.
Faculty Load and Financial Analysis . The original E4 proposal made no projections about cost, expandability or other issues. Proof of the concept was sufficient. Nonetheless, when the academic and retention success of the program was becoming apparent, these issues needed to be addressed. As mentioned above, this was achieved by modeling the program - using class size, faculty teaching load, and teaching assistant requirements compared to those of the "traditional" curriculum. In addition, the E4 faculty generated workload data that allowed preparation of a plausible analysis of the additional effort to teach in the new mode. In combination with the retention statistic this allowed development of a financial model showing the net revenue effect for the university created by shifting to the new curriculum.
A number of evaluation efforts considered as desirable during the initial planning were not carried out. In particular, no standardized tests of any sort were given to the E4 and their control groups. There were initial plans to use such tests as the Torrance Tests of Creative Thinking (TTCT) as well as other measures to evaluate changes in appreciation of social issues etc. These hopes were not carried out for two reasons. In the single effort to carry out such a comparative test, no way was found to involve the control group students. Secondly, the working relationship with external evaluators skilled in these areas did not develop fruitfully.
As mentioned above, less attention was devoted throughout the E4 project on Summative than to Formative work. Given the extraordinary effort necessary to define, organize, commence, modify and then expand the curriculum this balance is unsurprising. Nonetheless, there were consequences to the decision which are worth noting.
With an innovative, expensive program which was assumed to require considerably more faculty effort than the "traditional" mode of teaching, there was bound to be a high level of questioning about the merits of the new curriculum. These questions naturally led those not immersed in the program to say "show me". In an engineering school particularly, the desire is for "numbers" to prove the worth of the experiment. For most of its life the E4 program produced few generally accessible numbers. Faculty involved were enthusiastic, but couldn't document the basis for their enthusiasm. The outside evaluator's yearly reports analyzing journals and interviewing selected students bore out the promise of the program, but produced no "hard numbers" to allow comparison to the traditional program. This lack of numbers, combined with a minimal general communication to the university community about the program, led to mistrust both within the College of Engineering and the wider university. Only in the final year of the program, as expansion to the entire university was contemplated, was a major effort made to generate numbers that could convince those wanting quantification. Those numbers indeed bore out the more qualitative conclusions and greatly aided adoption by the entire College of Engineering.
Coming late to such a large undertaking as the E4 program I struggled to understand what it accomplished. In the process I learned that evaluation was far more than a "final grade". Careful formative and summative work were vital first to the creation of a new curriculum and then to its formal adoption. I also learned that it is probably impossible to predict the final outcome well enough to fix the evaluation measures firmly before the process begins. Using open-ended processes such as weekly meetings and journal evaluation allow the unexpected to emerge and become accepted. That these are non-quantitative measures does not belie the importance of quantitative measures as well. Those, I realized, can justify the program to those outside it having limited time and restricted interest.
More specifically I believe the following are significant lessons which may help others undertaking similar efforts:
* Formative evaluation was essential to the operational success of the E4 program. It is perceived as so beneficial by faculty participants that it will continue as the program becomes the standard for the entire College of Engineering. It is doubtful that this degree of importance was originally foreseen.
* Student journal analysis provided both formative and summative evaluation. Its major impact, however, was probably in the vivid impressions that student writing gives to an outsider becoming acquainted with the E4 program.
* Early identification of a control group and attention to record keeping of at least standard statistics such as GPA's and university status is almost certain to be beneficial in summative work on curriculum reform. In the E4 Program student retention and GPA standings compared to control groups were vital in convincing the University community of the E4 program's success. The retention analysis was of particular importance to University Administrators, yet was not originally contemplated.
* Others engaged in similar experiments would be wise to be sensitive to unexpected outcomes which warrant changing evaluation plans or emphasis.
This project is supported, in part, by Grant USE-8854555 from the Education and Human Resources Directorate of the National Science Foundation and grants from the General Electric Foundation and the Ben Franklin Partnership.
Fall `88 E4 Proposal Prepared - Including Evaluation Plan Winter `89 E4 Proposal accepted Spring `89 New Curriculum development begun Summer `89 New Curriculum for first year completed Summer `89 Advisory Board Critical of Evaluation Plan Summer `89 Original external evaluator relationship terminated 9/89 Evaluation Plan proposed by 2nd external evaluator 9/89 First 100 students enter 9/89 Weekly faculty-student meetings begin & continue through entire experiment. 9/89 Control group identified from those volunteers not (randomly) selected for E4. 9/89 Students required to keep journals. Selected journals for entire year analyzed for themes by 3rd external evaluator (Haslam). Spring `90 Attempt to test E4 and control groups unsuccessful due to inability to attract control group students. 9/90 Second 100 students enter. Control group, journals and meetings continue as in first year. 5/91 Survey of Freshman Opinions on Course 7/91 Journal analysis of 2nd 100 students 9/91 Third 100 students enter. Control group, journals and meetings continue as in prior year. Summer 92 College of Engineering curriculum revision begun - based on E4 experiment. 9/92 Fourth 200 students enter. Control group formed of those with similar SAT and HS rank. 3/93 Retention & GPA Analysis 7/93 College of Engineering unanimously votes for new curriculum 9/93 Fifth 300 Students enter program. 10/93 Submit new curriculum proposal to University Faculty Senate. Resource requirements analysis included as part of proposal. 9/94 Planned date for all entering Engineering students to enter the new curriculum
Curator: James E. Mitchell - Drexel University
Return to: J. Mitchell Home