The idea that students might routinely evaluate the teaching they experience in college was a hard sell on most campuses; student ratings were usually introduced with a good deal of struggle. The controversies over the years caused student ratings to become the most extensively studied aspect of collegiate education. Now, after fifty years of research and more than 2,000 journal articles, there's little reason to doubt that the procedure can provide valid and useful information for both faculty members and administrators.
Even today, though, student evaluations seldom make an optimal contribution to improving either teaching or personnel decisions. One reason may be that they've become banal: Students and faculty treat them as a routine, giving them little thought. Another reason is that we forget what we've learned about how to make them most effective. Whatever the case, student and faculty time gets taken up in an exercise producing only mediocre results.
While there is consensus that student ratings should be supplemented by other evidence, such as might be included in a portfolio, problems in evaluating college teaching persist that are not Likely to be solved by portfolios or~other alternatives; in fact, other sources are subject to the same problems.
All of us in academe have some responsibility for the persistence of the problems; all of us should do our part in solving them.
If we take students' (or anyone's) time, we have an ethical obligation to ensure that the time spent is educational, interesting, or in some way rewarding. For midterm evaluations, one can argue that students benefit from whatever improvements follow from their feedback But, the time students spend filling out endof-term ratings cannot be justified in terms of a benefit they receive from improvement in that teacher's teaching.
There is, however, a value that we have failed to emphasize in our use of student ratings of teaching; that is, the potential benefit to students' own learning that can occur in the process of filling out rating forms. Student ratings of teaching should encourage students to think about their educational experiences--to develop clearer conceptions of the kinds of teaching and educational experiences that contribute most to their learning. All of us could do a much better job of introducing the educational rationale for filling out these forms. We can certainly create forms that encourage students to be reflective. For example, we might ask students, "Think about the conditions where you have learned welt Describe them." Faculty members can help students consider educational issues by collecting feedback early in the semester and discussing the results with the students, relating the use of the feedback to educational goals and theory. Ratings collected after the first few weeks could be either a short version of the end-of-term form or a few open-ended questions, such as "What do you like about the course so far? What suggestions do you have'?" One of us sometimes uses a form with items such as these:
If I am to achieve my goals in this course, I need to: do more . . . [describe], do less . . continue to . . .
I would like the teacher to: continue to . . ., do more . . ., do less . . .
An alternative to a rating form is Small Group Instructional Diagnosis (Redmond and Clark 1982, also see Angelo and Cross 1993), in which a consultant observes, a class session and then conducts discussions with students about what is helping or hindering their I learning. The observer summarizes the information and provides consultative help to the instructor. This is particular helpful because students find out that other students not only perceive the situation differently from themselves but also may have different needs. It also has the advantage that consultation is particularb effective for improving teaching (see research by McKeachie et aL 1980).
The SGID is usually carried out between the first third and middle of a term. One benefit of collecting midterm impressions of a course from students is that the students can actually experience the effect of any suggestions the instructor implements; thus they are likely to be motivated to give helpful feedback An instructor who coll lects student opinions during a | term (rather than at the end) can l discuss the results, opening up a dialogue that will help students think about their own learning and how various aspects of the course can contribute to it. As this conversation continues through the term, students will l be more sophisticated in their responses on a final evaluation form.
Discussion with students aimed at helping them evaluate their own learning and the conditions that contribute to it develops their ability to learn more effectively. Discussions before and after ratings are collected can result in more useful feedback for teachers and help students become better learners. Any instruction of students on how to be better evaluators should produce more valid evidence for personnel committees judging effective teaching.
Student involvement will have progressively less impact if aspects of a course or curriculum have not been changed despite generations of student complaints. There are undoubtedly times when the faculty knows better than students what is needed, but a long record of student com
plaints signals at the least that we've done a poor job of helping them understand why the course or curriculum is as it is. More often there is some validity to the complaints, and we have an obligation to consider them seriously. When student comments lead to revisions, we should let current students know in order to encourage their further participation in improving learning and teaching.
To sum up: The student opinion form could, and should, be educational in the highest sense-- helping students gain a better understanding of the goals of education, stimulating them to think more metacognitively about their own learning, motivating them to continue learning, and encouraging them to accept responsibility for their learning.
What about faculty members? What value lies in the process for them?
There is evidence that faculty members do improve their teaching as a result of getting feedback from student ratings or through other methods. However, the amount of improvement depends upon the type of information collected and the use of the information. Typically, feedback from questionnaire items referring to specific behaviors is more likely to be helpful than from broad, general items.
In addition, faculty members should have an opportunity to choose items that answer questions they would like answered with respect to their own course and teaching. If departments or colleges require certain items to be included, they have an ethical obligation to make sure those items are indeed relevant to each instructor's teaching responsibilities. Irrelevant questions simply confirm faculty and student suspicions that the whole process is a bureaucratic exercise rather than an honest attempt to improve education.
All too often we fail to help faculty members interpret the results of the ratings; their eyes glaze over at the rows and columns of statistics. But consultation about the ratings with an experienced peer or an expert, even explicit suggestions such as lists of teaching strategies or techniques, makes a big difference in the amount of improvement teachers make.
One reason that such help is more effective than a simple return of ratings is that most of us as teachers tend to focus on the few low evaluations or the one stinging comment--even though that comment may be contradicted by the ratings of most students. Teaching is a highly personal activity, involving one's deepest sense of self. Negative comments are difficult to ignore. We all find it hard to believe that we were unaware that one of our students had such negative feelings about the class.
Research on feedback from filmed or videotaped classes has shown that the teacher viewing the videotape tends to focus on minutiae of gestures and personal appearance. A consultant viewing the videotape can help a teacher sort out the major issues from the minutiae. Similarly, in interpreting student ratings, a consultant can help sort out the most useful information, provide encouragement and strategies for improvement, and suggest printed materials, workshops, training opportunities, or other means for continued learning.
Despite our knowledge of how to increase the value of student ratings, many colleges roll out the forms without a thought to useful feedback; consultation is more nearly the exception than the rule. Our failure to ensure that student ratings are used effectively is an ethical breach, affecting both us as faculty members and students.
Student ratings also are used in decisions about promotion and salary increases. Here, too, we
have serious problems. The most serious may be that teaching is not valued as highly in practice as in our rhetoric. Even when members of personnel committees say that teaching and research should receive equal weight in promotion, their judgments put preponderant weight on research.
But even when administrators and faculty committees sincerely intend to recognize excellent teaching, they fail to take student ratings as seriously as they should. Seldom do they bother to investigate the extensive research literature on student ratings of teaching. Decades of research have related student ratings to measures of student learning, student motivation for further learning, instructors' own judgments of which of two classes they had taught more effectively, alumni judgments, peer and administrator evaluation, and ratings by trained observers. All of these criteria attest to the validity of student ratings well beyond that of other sources of evidence about teaching (see Feldman 1989a, 1989b; Marsh 1987). Yet members of personnel committees cheerfully use their own biases (especially if their own ratings are not high) as a substitute for this more substantial evidence from students.
In addition, faculty committees and administrators often have stereotypes about what effective teaching involves. They assume, for example, that a teacher who is not highly organized will be less effective than one who is. But wbile organization is, in general, related to effectiveness in teaching, the effect of dfflerent degrees of organization depends on the students' own abilities and background.
Because particular charact;eristics of teachers and teaching are far from perfectly correlated with teaching effectiveness, Scriven (1991) has argued that ratings on such characteristics should not be used at ale by personnel committees. We agree with Scriven's point, but we do so because we believe that administrators, faculty evaluation experts, and others responsible for justice in faculty evaluation have failed in our responsibility to provide proper training for those who are using student ratin~s as a source of evidence for personnel decisions.
In general, student ratings of their own learning, of their own achievement of course goals (such as critical thinking), and of their own motivation for further learning in the area of the course are preferable to their evaluations of teacher characteristics. Ratings on teacher warmth, organization, and enthusiasm, for example, could be helpful to a committee if used with some sophistication, and such items can be helpful for teacher improvement. But these characteristics are neither necessary nor sufficient as indicators of effective teaching. We fail ethically when we permit important personnel decisions to proceed on the basis of such potentially misleading data
As an aside, it is worth mentioning that evaluations of research can be just as questionable. Studies of judges' agreement on papers submitted for publication suggest that we don't do very well on papers even in our own fields; so there is likely to be even more reason to question the wisdom of personnel-committee members making judgments outside their own areas of expertise.
Another source of problems in personnel decisions is the general practice of judging a faculty member's teaching effectiveness against college-wide norms. Clearly, teaching methods, and therefore ratings, differ across departments. Similarly, even though variables such as class size, grading standards, class level, and other characteristics have relatively small effects upon overall student ratings, small differences in numerical averages are often treated as significant by personnel committees.
Because norms are so often detrimental to teacher motivation and are so frequently misused in personnel decisions, we believe personnel committees should be provided with the distributions of student responses, rather than with norms.
Student ratings may be the best-validated source of evidence of teaching effectiveness, but everyone agrees that other data are also desirable. Today the most frequently advocated device is the teaching dossier or portfolio. The portfolio approach has many advantages, including that of providing diverse sources of evidence.
Portfolios can, however, be costly to put together, in time and resources, and they have their own sources of bias. An "attractive" portfolio--with color, graphics, and perhaps a videotape --may prove more persuasive than one with less polish And just as research is sometimes judged by the number of publications, large portfolios may carry more weight than short ones. Some faculty members may be more skilled at putting a best face on what they have done (or believe the committee would want them to have done) than others. All these potential sources of bias matter with portfolios because of the importance of the decisions they can influence.
Just as there are ethical problems in taking student time to fill out rating forms that are not used effectively, there are real problems in asking faculty members to spend time compiling portfolios if the personnel committees have
had no training in evaluating such evidence with validity and fairness. Appropriate training might begin with discussion of what effective teaching is. Is the ultimate criterion student learning If not, what other criteria are relevant? If some agreement is reached on what effective teaching is, the committee members might practice judging portfolios, assessing their agreements and disagreements until some consensus is reached.
And just as we need to teach students to use ratings of teaching as a means of thinking about their education, we need to teach faculty members how to use portfolio development to improve their teaching. Guidelines for faculty to use in preparing their portfolios and consultation with experienced peers or experts not involved in the personnel decision can be helpful in the months or years before the critical portfolio goes forward to the personnel committee. During that time, instructors can be provided with opportunities to improve in areas in which the documentation appears to be weak.
Moreover, because the nature of effective teaching differs across disciplines, the nature of portolios should vary, too. Promotion committees need to be trained to look for different kinds of evidence rather than to judge on the basis of a single stereotype of the "good teacher" or "good portfolio."
Another area in which we have been remiss is in the appraisal interview that normally follows the review of a faculty member's personnel file or portfolio. These interviews can be a useful device for facilitating faculty development, but they often leave faculty members angry, defensive, and less motivated. Typically, the department head has had no training in carrying out such interviews.
Norman Maier's book The Appraisal Interview (Wiley, 1958) describes three styles of appraisal interviews: "tell and sell,"tell and listen," and "joint problem solving." Only the last seems to be generally effective. It is a tragedy to do a good job of collecting the evidence in an ethical fashion, to evaluate it fairly, but then to use it in ways that result in poorer, rather than better. teachina.
The evaluation of teaching can have important consequences for both students and teachers. Clearly, we are all fallible; we are not likely to achieve perfection; but we can do better, and we should. We have an ethical obligation to maximize the value of the time spent by students, facultv. and Personnel committees.
Angelo, Thomas A, and K. Patricia Cross. Classroon Assessment Techniques, 2nd ed. San Francisco: Jossey-Bass, 1993.
Feldman, KA "Instructional Effectiveness of College Teachers as Judged by Teachers Themsehes, Current and Fonner Students, Colleagues, Administrators, and External (Neutral) Observers." Research in Higher Education 30 (1989a): 137- 194.
Feldman, KA "The Association Between Student Ratings of Specific Instructional Dimensions and Student Achievement: Refining and Extending the Synthesis of Data From Multisection Validity Studies." Research in Higher Education 30 (1989b): 583-645.
Marsh, H.W. "Students' Evaluations of University Teaching Research Findings, Methodological Issues, and Directions for Future Research." International Journal of Educational Research 11 (1987): 253-388.
McKeachie, W.J., Y-G Lin, M. Daugherty, M.M. Moffett, C. Neigler, J. Nork M. Walz, and R. Baldwin. "Using Ratings and Consultation to Improve Instruction." British Journal of Educational Psychology 50(1980): 168-174.
Redmond, Mark V., and D. Joseph Clark "Small Group Instructional Diagnosis: A Practical Approach to Improving Teaching." AAHE Bulletin, February 1982, pp. 810.
Scriven, M. "Duties of the Teacher" Journal of Personnel Evaluation in Educatum 3 (1991 ): 151 - 184.