|Posted on January 28, 2018 at 7:25 AM||comments (0)|
A couple of years ago I did a literature review on rubrics and learned that there’s no consensus on what a rubric is. Some experts define rubrics very narrowly, as only analytic rubrics—the kind formatted as a grid, listing traits down the left side and performance levels across the top, with the boxes filled in. But others define rubrics more broadly, as written guides for evaluating student work that, at a minimum, lists the traits you’re looking for.
But what about something like the following, which I’ve seen on plenty of assignments?
70% Responds fully to the assignment (length of paper, double-spaced, typed, covers all appropriate developmental stages)
15% Grammar (including spelling, verb conjugation, structure, agreement, voice consistency, etc.)
Under the broad definition of a rubric, yes, this is a rubric. It is a written guide for evaluating student work, and it lists the three traits the faculty member is looking for.
The problem is that it isn’t a good rubric. Effective assessments including rubrics have the following traits:
Effective assessments yield information that is useful and used. Students who earn less than 70 points for responding to the assignment have no idea where they fell short. Those who earn less than 15 points on organization have no idea why. If the professor wants to help the next class do better on organization, there’s no insight here on where this class’s organization fell short and what most needs to be improved.
Effective assessments focus on important learning goals. You wouldn’t know it from the grading criteria, but this was supposed to be an assignment on critical thinking. Students focus their time and mental energies on what they’ll be graded on, so these students will focus on following directions for the assignment, not developing their critical thinking skills. Yes, following directions is an important skill, but critical thinking is even more important.
Effective assessments are clear. Students have no idea what this professor considers an excellently organized paper, what’s considered an adequately organized paper, and what’s considered a poorly organized paper.
Effective assessments are fair. Here, because there are only three broad, ill-defined traits, the faculty member can be (unintentionally) inconsistent in grading the papers. How many points are taken off for an otherwise fine paper that’s littered with typos? For one that isn’t double-spaced?
So the debate about an assessment should be not whether it is a rubric but rather how well it meets these four traits of effective assessment practices.
If you’d like to read more about rubrics and effective assessment practices, the third edition of my book Assessing Student Learning: A Common Sense Guide will be released on February 13 and can be pre-ordered now. The Kindle version is already available through Amazon.
|Posted on August 18, 2016 at 12:40 AM||comments (6)|
Over the last couple of years, I’ve started to get some gentle pushback from faculty on rubrics, especially those teaching graduate students. Their concern is whether rubrics might provide too much guidance, serving as a crutch when students should be figuring out things on their own. One recent question from a faculty member expressed the issue well: “If we provide students with clear rubrics for everything, what happens when they hit the work place and can’t figure out on their own what to do and how to do it without supervisor hand-holding?”
It’s a valid point, one that ties into the lifelong learning outcome that many of us have for our students: we want to prepare them to self-evaluate and self-correct their work. I can think of two ways we can help students develop this capacity without abandoning rubrics entirely. One possibility would be to make rubrics less explicit as students progress through their program. First-year students need a clear explanation of what you consider good organization of a paper; seniors and grad students shouldn’t. The other possibility—which I like better—would be to have students develop their own rubrics, either individually or in groups, subject, of course, to the professor’s review.
In either case, it’s a good idea to encourage students to self-assess their work by completing the rubric themselves—and/or have a peer review the assignment and complete the rubric—before turning it in. This can help get students in the habit of self-appraising their work and taking responsibility for its quality before they hit the workplace.
Do you have any other thoughts or ideas about this? Let me know!
|Posted on April 3, 2016 at 6:50 AM||comments (2)|
Last fall I drafted a chapter, “Rubric Development,” for the forthcoming second edition of the Handbook on Measurement, Assessment, and Evaluation in Higher Education. My literature review for the chapter was an eye-opener! I’ve been joking that everything I had been saying about rubrics was wrong. Not quite, of course!
One of the many things I learned is that what rubrics assess vary according to the decisions they inform, falling on a continuum from narrow to broad uses.
Task-specific rubrics, at the narrow end, are used to assess or grade one assignment, such as an exam question. They are so specific that they apply only to that one assignment. Because their specificity may give away the correct response, they cannot be shared with students in advance.
Primary trait scoring guides or primary trait analysis are used to assess a family of tasks rather than one specific task. Primary trait analysis recognizes that the essential or primary traits or characteristics of a successful outcome such as writing vary by type of assignment. The most important writing traits of a science lab report, for example, are different from those of a persuasive essay. Primary trait scoring guides focus attention on only those traits of a particular task that are relevant to the task.
General rubrics are used with a variety of assignments. They list traits that are generic to a learning outcome and are thus independent of topic, purpose, or audience.
Developmental rubrics or meta-rubrics are used to show growth or progression over time. They are general rubrics whose performance levels cover a wide span of performance. The VALUE rubrics are examples of developmental rubrics.
The lightbulb that came on for me as I read about this continuum is that rubrics toward the middle of the continuum may be more useful than those at either end. Susan Brookhart has written powerfully about avoiding task-specific rubrics: “If the rubrics are the same each time a student does the same kind of work, the student will learn general qualities of good essay writing, problem solving, and so on… The general approach encourages students to think about building up general knowledge and skills rather than thinking about school learning in terms of getting individual assignments done.”
At the other end of the spectrum, developmental rubrics have a necessary lack of precision that can make them difficult to interpret and act upon. In particular, they’re inappropriate to assess student growth in any one course.
Overall, I’ve concluded that one institution-wide developmental rubric may not be the best way to assess student learning, even of generic skills such as writing or critical thinking. As Barbara Walvoord has noted, “You do not need institution-wide rubric scores to satisfy accreditors or to get actionable information about student writing institution-wide.” Instead of using one institution-wide developmental rubric to assess student work, I’m now advocating using that rubric as a framework from which to build a family of related analytic rubrics: some for first year work, some for senior capstones, some for disciplines or families of disciplines such as the natural sciences, engineering, and humanities. Results from all these rubrics are aggregated qualitatively rather than quantitatively, by looking for patterns across rubrics. Yes, this approach is a little messier than using just one rubric, but it’s a whole lot more meaningful.
|Posted on November 14, 2015 at 8:15 AM||comments (0)|
It’s actually impossible to determine whether any rubric, in isolation, is valid. Its validity depends on how it is used. What may look like a perfectly good rubric to assess critical thinking is invalid, for example, if used to assess assignments that ask only for descriptions. A rubric assessing writing mechanics is invalid for drawing conclusions about students’ critical thinking skills. A rubric assessing research skills is invalid if used to assess essays that students are given only 20 minutes to write.
A rubric is thus valid only if the entire assessment process—including the assignment given to students, the circumstances under which students complete the assignment, the rubric, the scoring procedure, and the use of the findings—is valid. Valid rubric assessment processes have seven characteristics. How well do your rubric assessment processes stack up?
Usability of the results. They yield results that can be and are used to make meaningful, substantive decisions to improve teaching and learning.
Match with intended learning outcomes. They use assignments and rubrics that systematically address meaningful intended learning outcomes.
Clarity. They use assignments and rubrics written in clear and observable terms, so they can be applied and interpreted consistently and equitably.
Fairness. They enable inferences that are meaningful, appropriate, and fair to all relevant subgroups of students.
Consistency. They yield consistent or reliable results, a characteristic that is affected by the clarity of the rubric’s traits and descriptions, the training of those who use it, and the degree of detail provided to students in the assignment.
Appropriate range of outcome levels. The rubrics’ “floors” and “ceilings” are appropriate to the students being assessed).
Generalizability. They enable you to draw overall conclusions about student achievement. The problem here is that any single assignment may not be a representative, generalizable sample of what students have learned. Any one essay question, for example, may elicit an unusually good or poor sample of a student’s writing skill. Increasing the quantity and variety of student work that is assessed, perhaps through portfolios, increases the generalizability of the findings.
Sources for these ideas are cited in my chapter, “Rubric Development,” in the forthcoming second edition of the Handbook on Measurement, Assessment, and Evaluation in Higher Education to be published by Taylor & Francis.
|Posted on November 2, 2015 at 6:55 AM||comments (0)|
I’ve finished a draft of my chapter, “Rubric Development,” for the forthcoming second edition of the Handbook on Measurement, Assessment, and Evaluation in Higher Education. Of course the chapter had to explain what a rubric is as well as how to develop one. My research quickly showed that there’s no agreement on what a rubric is! There are at least five formats for guides to score or evaluate student work, but there is no consensus on which of the formats should be called a rubric.
The simplest format is a checklist: a list of elements present in student work. It is used when elements are judged to be either present or not; it does not assess the frequency or quality of those items.
Then comes a rating scale: a list of traits or criteria for student work accompanied by a rating scale marking the frequency or quality of each trait. Here we start to see disagreements on vocabulary; I’ve seen rating scales called minimal rubrics, performance lists, expanded checklists, assessment lists, or relative rubrics.
Then comes the analytic rubric, which fills in the rating scale’s boxes with clear descriptions of each level of performance for each trait or criterion. Here again there’s disagreement on vocabulary; I’ve seen analytic rubrics called analytical rubrics, full rubrics or descriptive rubrics.
Then there is the holistic rubric, which describes how to make an overall judgment about the quality of work through narrative descriptions of the characteristics of work at each performance level. These are sometimes called holistic scoring guides.
Finally, there’s what I’ve called a structured observation guide: a rubric without a rating scale that lists traits with spaces for comments on each trait.
So what is a rubric? Opinions fall into three camps.
The first camp defines rubrics broadly and flexibly as guides for evaluating student work. This camp would consider all five formats to be rubrics.
The second camp defines rubrics as providing not just traits but also standards or levels of quality along a continuum. This camp would consider rating scales, analytic rubrics, and holistic rubrics to be rubrics.
The third camp defines rubrics narrowly as only those scoring guides that include traits, a continuum of performance levels, and descriptions of each trait at each performance level. This camp would consider only analytic rubrics and holistic rubrics to be rubrics.
I suspect that in another 20 years or so we’ll have a common vocabulary for assessment but, in the meanwhile, if you and your colleagues disagree on what a rubric is, take comfort in knowing that you’re not alone!
|Posted on October 16, 2015 at 7:45 AM||comments (0)|
I recently came across two ideas that struck me as simple solutions to an ongoing frustration I have with many rubrics: too often they don't make clear, in compelling terms, what constitutes minimally acceptable performance. This is a big issue, because you need to know whether or not student work is adequate before you can decide what improvements in teaching and learning are called for. And your standards need to be defensibly rigorous, or you run the risk of passing through and graduating students unprepared for whatever comes next in their lives.
My first "aha!" insight came from a LinkedIn post by Clint Schmidt. Talking about ensuring the quality of coding "bootcamps," he suggests, "set up a review board of unbiased experienced developers to review the project portfolios of bootcamp grads."
This basic idea could be applied to almost any program. Put together a panel of the people who will be dealing with your student after they pass your course, after they complete your gen ed requirements, or after they graduate. For many programs, including many in the liberal arts, this might mean workplace supervisors from the kinds of places where your graduates typically find jobs after graduation. For other programs, this might mean faculty in the bachelor's or graduate programs your students move into. The panels would not necessarily need to review full portfolios; they might review samples of senior capstone projects or observe student presentations or demonstrations.
The cool thing about this approach is that many programs are already doing this. Internship, practicum, and clinical supervisors, local artists who visit senior art exhibitions, local musicians who attend senior recitals--they are all doing a various of Schmidt's idea. The problem, however, is that often the rating scales they're asked to complete are so vaguely defined that it's unclear which rating constitutes what they consider minimally acceptable performance.And that's where my second "aha!" insight comes into play. It's from a ten-year-old rubric developed by Andi Curcio to assess a civil complaint assignment in a law school class. (Go to lawteaching.org/teaching/assessment/rubrics/, then scroll down to Civil Complaint: Rubric (Curcio) to download the PDF.) Her rubric has three columns with typical labels (Exemplary, Competent, Developing), but each label goes further.
- "Exemplary" is "advanced work at this time in the course - on a job the work would need very little revision for a supervising attorney to use."
- "Competent" is "proficient work at this time in the course - on a job the work would need to be revised with input from supervising attorney."
- And "Developing" is "work needs additional content or skills to be competent - on a job, the work would not be helpful and the supervising attorney would need to start over."
Andi's simple column labels make two things clear: what is considered adequate work at this point in the program, and how student performance measures up to what employers will eventually be looking for.
If we can craft rubrics that define clearly the minimal level that students need to reach to succeed in their next course, their next degree, their next job, or whatever else happens next in their lives, and bring in the people who actually work with our students at those points to help assess student work, we will go a long way toward making assessment even more meaningful and useful.
|Posted on August 9, 2015 at 7:30 AM||comments (1)|
My graduate courses in educational measurement in the 1970s taught us to grade or score student papers by literally sorting them into piles—an incredibly primitive approach compared to today’s rubrics. I’ve always wondered where rubrics came from, and this summer I did some research and found out.
The grandfather of rubrics was Paul Diederich at Educational Testing Service. In 1961, he and two colleagues conducted a factor analysis of comments made by readers on thousands of papers that they had sorted into piles, as I was taught, without any guidance other than their own preferences. (Back in those days, before modern computers, doing a factor analysis so difficult and complex that it was sometimes accepted as a doctoral dissertation!) Diederich and his colleagues identified five factors related to those sorting decisions:
• Ideas: relevance, clarity, quantity, development, persuasiveness
• Form: organization and analysis
• Flavor: style, interest, sincerity
• Mechanics: specific errors in grammar, punctuation, etc.
• Wording: choice and arrangement of words
Diederich and his colleagues also found that only the last two factors—mechanics and wording—correlated with scores on the writing tests of the day. So clearly traditional writing tests were inadequate to assess overall writing skill and a new approach was needed.
By 1974, Diederich evolved these factors into a simple “rating slip” for evaluating writing. The rating slip was a simple five-point rating scale with eight criteria organized into two categories:
• General merit
Ideas and organization were given double the weight of the other criteria. Two years of testing led to descriptions for High, Middle, and Low ratings for each criterion. Thus Diederich pioneered what we now call an analytic rubric.
Two of the first people to use the term “rubric” were Charles Cooper at what is now called the University at Buffalo and Richard Lloyd-Jones at the University of Iowa, both of whom contributed chapters to a 1977 monograph called Evaluating Writing: Describing, Measuring, Judging published by the National Council of Teachers of English. Lloyd- Jones used the word “rubric” to describe a five-point scoring guide that today we would call a holistic rubric.
Cooper advocated evaluating writing with a “scoring guide which describes each feature and identifies high, middle, and low quality levels for each feature”—in other words, a rubric. Cooper chose to call this a holistic evaluation, because it considers all the attributes of effective writing, as opposed to the writing tests of his day that focused on isolated components such as vocabulary or sentence length. Cooper enumerated seven types of holistic assessment, most of which are probably rarely if ever used today. But one—the analytical scale—resembles today’s analytic rubric: “a list of the prominent features or characteristics of writing in a particular mode. The list of features ordinarily ranges from four to ten or twelve, with each feature described in some detail and with high-mid-low points identified and described along a scoring line for each feature.”
Ironically, Cooper used the word “rubric” for something different than his scoring guide. He mentioned that Advanced Placement raters followed a “rubric”’ worked out in advance, but the rubric was concerned mainly with the relevance and content of the writing sample, not its general writing features.
Also in the mid-1970s, “Standardized Developmental Ratings” were developed for use by raters of the New York State Regents Exam in Writing, but I haven’t yet unearthed the details.
Diederich’s work, incidentally, led directly to the 6+1 Trait Writing rubrics developed in the 1980s by teachers working with the Northwest Regional Educational Laboratory (now Education Northwest). The rubrics provide separate scores for:
• Word choice
• Sentence fluency
• (optionally) Presentation.
The 6+1 Trait Writing rubrics are widely used today to assess student writing at the K-12 level. Susan Brookhart has cited research demonstrating that using the rubrics has significantly increased student writing skills.
Today rubrics—and the term “rubric”—pervade higher education. Research at NILOA (National Institute for Learning Outcomes Assessment) has found that U.S. provosts identify rubrics as one of the top three most valuable or important approaches for assessing undergraduate learning outcomes, and over 60% of U.S. institutions of higher education use rubrics for institution-wide assessments and at least some academic program assessments.
We’ve come a long way from sorting papers into piles!
|Posted on February 5, 2015 at 7:55 AM||comments (0)|
A recent study by Hart Research Associates for the Association of American Colleges & Universities found, among many other things, that only about a quarter of employers are satisfied with the creative and innovative skills of recent college graduates. Why are college graduates so dismal in this respect? Throughout their education, from grade school through college, in most classes, the way to get a good grade is to do what the teacher says: read this assignment, do this homework, write a paper on this topic with these sections, develop a class presentation with these elements. Faculty who teach general education courses in the creative arts—art, theater, creative writing, even graphic design—have told me that students hate taking those courses because they have no experience in “thinking outside the box.”
How can we encourage creativity and innovative thinking? Simply building it into our grading expectations can help. The first time I used a rubric, many, many years ago, I gave it to my class with their assignment, and the papers I received were competent but flat and uninspired. I had to give the best papers A’s because that was what the rubric indicated they should earn, but I was disappointed.
The next time I taught the course, I changed the rubric so that all the previous elements earned only 89 points. The remaining points were for a fairly vague category I labeled “Creative or innovative ideas or insight.” Problem solved! The A papers were exactly what I was hoping for.
Now this was a graduate course, and just putting something on a rubric won’t be enough to help many first-year students. This is where collaborative learning comes into play. Put students into small groups with a provocative, inspiring question for them to discuss, and watch the ideas start to fly.
|Posted on August 14, 2014 at 6:15 AM||comments (0)|
In my July 30 blog post, I discussed the key findings of a study on rubric validity reported in the June/July 2014 issue of Educational Researcher. In addition to the study’s major findings, a short statement on how the rubric under study was developed caught my attention:
“A team…developed the new rubric based on a qualitative analysis of approximately 100 exemplars. The team compared the exemplars to identify and articulate observed, qualitative differences…”
I wish the authors had fleshed this out a bit more, but here’s my take on how the rubric was developed. The process began, not with the team brainstorming rubric criteria, but by looking at a sample of 100 student papers. I’d guess that team members simply took notes on each paper: What in each paper struck them as excellent? Mediocre but acceptable? Unacceptably poor? Then they probably compiled all the notes and looked through them for themes. From these themes came the rubric criteria and the performance levels for each criterion…which, as I explained in my July blog post, varied in number.
I’ve often advised faculty to take a similar approach. Don’t begin the work of developing a rubric with an abstract brainstorming session or by looking at someone else’s rubric. Start by reviewing a sample of student work. You don’t need to look at 100 papers—just pick one paper, project or performance that is clearly outstanding, one that is clearly unacceptable, and some that are in between. Take notes on what is good and not-so-good about each and why you think they fall into those categories. Then compile the notes and talk. At that point—once you have some basic ideas of your rubric criteria and performance levels of each criterion—you may want to consider looking at other rubrics to refine your thinking (“Yes, that rubric has a really good way of stating what we’re thinking!”).
Bottom line: A rubric that assesses what you and your colleagues truly value will be more valid, useful, and worthwhile.
|Posted on July 30, 2014 at 6:45 AM||comments (0)|
A study reported in the latest issue of Educational Researcher promises to rock the world of rubrics.
While I’ve always advocated for flexibility in designing rubrics, one particular format is widely regarded as the gold standard for rubric design. It’s what I call a descriptive rubric and what others call an analytic rubric. It’s a table or matrix, with the rubric criteria listed in a column on the left and student performance levels listed across the top. Then each box in the table has a short description of student performance for that criterion at that level. Probably the best known examples of descriptive or analytic rubrics are the VALUE rubrics published by the Association of American Colleges & Universities.
The study’s authors challenge this format, specifically the idea that every criterion in a rubric must have the same number of performance levels. A team used observed, qualitative differences in student writing to reconsider a writing rubric. The new rubric had, for example, seven performance levels for vocabulary but just three for paragraphing. Breaking away from the table/matrix rubric format helped raters make more independent judgments on each criterion…and thereby arrive at more valid judgments of student writing performance.
I’ve always been uncomfortable forcing faculty to come up with a set number of student performance levels, whether 3, 4, 5, or more. This study confirms my uneasiness.
The citation for the study is:
Humphry, S. M., & Heldsinger, S. A. (2014, June/July). Common structural design features of rubrics may represent a threat to validity. Educational Researcher, 43(5), 253-263. DOI: 10.3102/0013189X14542154