|Posted on January 16, 2019 at 7:45 AM|
A recent discussion on the ACCSHE listserv reminded me that setting meaningful benchmarks or standards for student learning assessments remains a real challenge. About three years ago, I wrote a blog post on setting benchmarks or standards for rubrics. Let’s revisit that and expand the concepts to assessments beyond rubrics.
The first challenge is vocabulary. I’ve seen references to goals, targets, benchmarks, standards, thresholds. Unfortunately, the assessment community doesn’t yet have a standard glossary defining these terms (although some accreditors do). I now use standard to describe what constitutes minimally acceptable student performance (such as the passing score on a test) and target to describe the proportion of students we want to meet that standard. But my vocabulary may not match yours or your accreditor's!
The second challenge is embedded in that next-to-last sentence. We’re talking about two different numbers here: the standard describing minimally acceptable performance and the target describing the proportion of students achieving that performance level. That makes things even more confusing.
So how do we establish meaningful standards? There are four basic ways. Three are:
1. External standards: Sometimes the standard is set for us by an external body, such as the passing score on a licensure exam.
2. Peers: Sometimes we want our students to do as well as or better than their peers.
3. Historical trends: Sometimes we want our students to do as well as or better than past students.
Much of the time none of these options is available to us, leaving us to set our own standard, what I call a local standard and what others call a competency-based or criterion-referenced standard. Here are the steps to setting a local standard:
Focus on what would not embarrass you. Would you be embarrassed if people found out that a student performing at this level passed your course or graduated from your program or institution? Then your standard is too low. What level do students need to reach to succeed at whatever comes next—more advanced study or a job?
Consider the relative harm in setting the standard too high or too low. A too-low standard means you’re risking passing or graduating students who aren’t ready for what comes next and that you’re not identifying problems with student learning that need attention. A too-high standard may mean you’re identifying shortcomings in student learning that may not be significant and possibly using scarce time and resources to address those relatively minor shortcomings.
When in doubt, set the standard relatively high rather than relatively low. Because every assessment is imperfect, you’re not going to get an accurate measure of student learning from any one assessment. Setting a relatively high bar increases the chance that every student is truly competent on the learning goals being assessed.
If you can, use external sources to help set standards. A business advisory board, faculty from other colleges, or a disciplinary association can all help get you out of the ivory tower and set defensible standards.
Consider the assignment being assessed. Essays completed in a 50-minute class are not going to be as polished as papers created through scaffolded steps throughout the semester.
Use samples of student work to inform your thinking. Discuss with your colleagues which seem unacceptably poor, which seem adequate though not stellar, and which seem outstanding, then discuss why.
If you are using a rubric to assess student learning, the standard you’re setting is the rubric column (performance level) that defines minimally acceptable work. This is the most important column in the rubric and, not coincidentally, the hardest one to complete. After all, you’re defining the borderline between passing and failing work. Ideally, you should complete this column first, then complete the remaining columns.
Now let’s turn from setting standards to setting targets for the proportions of students who achieve those standards. Here the challenge is that we have two kinds of learning goals. Some are essential. We want every college graduate to write a coherent, grammatically correct paragraph, for example. I don’t want my tax returns prepared by an accountant who can complete them correctly only 70% of the time, and I don’t want my prescriptions filled by a pharmacist who can fill them correctly only 70% of the time! For these essential goals, we want close to 100% of students meeting our standard.
Then there are aspirational goals, which not everyone need achieve. We may want college graduates to be good public speakers, for example, but in many cases graduates can lead successful lives even if they’re not. For these kinds of goals, a lower target may be appropriate.
Tests and rubrics often assess a combination of essential and aspirational goals, which suggests that overall test or rubric scores often aren’t very helpful in understanding student learning. Scores for each rubric trait or for each learning objective in the test blueprint are often much more useful.
Bottom line here: I have a real problem with people who say their standard or target is 70%. It’s inevitably an arbitrary number with no real rationale. Setting meaningful standards and targets is time-consuming, but I can think of few tasks that are more important, because it’s what help ensure that students truly learn what we want them to…and that’s what we’re all about.
By the way, my thinking here comes primarily from two sources: Setting Performance Standards by Cizek and a review of the literature that I did a couple of years ago for a chapter on rubric development that I contributed to the https://www.amazon.com/Handbook-Measurement-Assessment-Evaluation-Education/dp/1138892157" target="_blank">Handbook on Measurement, Assessment, and Evaluation in Higher Education. For a more thorough discussion of the ideas here, see Chapter 22 (Setting Meaningful Standards and Targets) in the new 3rd edition of my book Assessing Student Learning: A Common Sense Guide.
|Posted on February 28, 2016 at 7:30 AM|
In a recent ASSESS listserv, Jocelyn Shadforth asked for opinions on the value of setting targets for collective performance. She sees the value of clearly defining what constitutes acceptable performance and reporting the percentage of students who achieve that performance level. But she wondered about the merit of setting a target for what that proportion should be.
I think Jocelyn’s priorities are right. As I suggested in my October 16, 2015, blog on two simple steps to better assessment, we need rubrics whose performance levels clearly define acceptable and unacceptable performance, not using subjective terms such as “Good” or “Fair.” If all our rubrics were designed this way, we’d move light years ahead in terms of generating truly useful, actionable assessment results.
But, as I suggested in my June 6, 2015, blog on overcoming barriers to using assessment results, I do think we need a clear sense of what results are satisfactory and what results aren’t. Let’s say 75% of your students meet your standard for acceptable performance. Do you have a problem or not? Without a target, if less than 100% of students meet the standard, one could conclude there’s always room for improvement. And if every student meets the standard, one could conclude that the standard was too low. Either way, you have more work ahead of you—a demoralizing prospect!
I think faculty deserve to set a target that will let them say, at least sometimes, “We’re not perfect, but we’ve done our job, and we’ve done it pretty darned well, and all we need to do from this point forward is stay the course.” And I'd like them to celebrate that decision with a pizza party!
As I suggested in my March 23, 2015, blog on setting meaningful benchmarks or standards, the target is going to vary depending on what you’re assessing. I’ve often said that I want my taxes done by an accountant who does them correctly 100% of the time. And I want to be confident that if I hire an accountant who graduated from your college, he or she can indeed do them correctly, not be one of the 30% or 25% or 20% who doesn't always do them correctly. But if one of your gen ed goals is to develop students’ creative thinking skills, you might be satisfied if half or three-quarters of your students achieve your target.
What I suggested in my March 23 blog is that you first define the performance level that would embarrass you if a student passed your course, or completed your gen ed requirement, or graduated with that competency level. For example, you would be embarrassed by a college graduate who can’t put together a coherent paragraph. You really should have a target that close to 100% of students (at least those who pass) surpass that level…but maybe not exactly 100%, since there will always be the occasional student who misunderstands the assignment, is sick while completing the assignment, etc.
And then I’ve never met a faculty member who would be satisfied if every student performed at a minimally adequate level…and none did any better. So it may be worthwhile to set a second target for the proportion of students for whom we’d like to see exemplary work.
One last key from my March 23 blog to setting meaningful targets: don’t pull targets out of a hat, but use data to inform your thinking. Maybe hold off on setting targets until you use your rubric once and see what the results are. Or, if you’re using a published instrument like NSSE, look at the national averages for your peers and decide if you want to be, say, above the average of your peers on certain criteria.
Bottom line: Yes, I think targets are an extremely important part of meaningful assessment, and they can actually save assessment work in the long run, but only if they’re established thoughtfully.
|Posted on October 16, 2015 at 7:45 AM|
I recently came across two ideas that struck me as simple solutions to an ongoing frustration I have with many rubrics: too often they don't make clear, in compelling terms, what constitutes minimally acceptable performance. This is a big issue, because you need to know whether or not student work is adequate before you can decide what improvements in teaching and learning are called for. And your standards need to be defensibly rigorous, or you run the risk of passing through and graduating students unprepared for whatever comes next in their lives.
My first "aha!" insight came from a LinkedIn post by Clint Schmidt. Talking about ensuring the quality of coding "bootcamps," he suggests, "set up a review board of unbiased experienced developers to review the project portfolios of bootcamp grads."
This basic idea could be applied to almost any program. Put together a panel of the people who will be dealing with your student after they pass your course, after they complete your gen ed requirements, or after they graduate. For many programs, including many in the liberal arts, this might mean workplace supervisors from the kinds of places where your graduates typically find jobs after graduation. For other programs, this might mean faculty in the bachelor's or graduate programs your students move into. The panels would not necessarily need to review full portfolios; they might review samples of senior capstone projects or observe student presentations or demonstrations.
The cool thing about this approach is that many programs are already doing this. Internship, practicum, and clinical supervisors, local artists who visit senior art exhibitions, local musicians who attend senior recitals--they are all doing a various of Schmidt's idea. The problem, however, is that often the rating scales they're asked to complete are so vaguely defined that it's unclear which rating constitutes what they consider minimally acceptable performance.And that's where my second "aha!" insight comes into play. It's from a ten-year-old rubric developed by Andi Curcio to assess a civil complaint assignment in a law school class. (Go to lawteaching.org/teaching/assessment/rubrics/, then scroll down to Civil Complaint: Rubric (Curcio) to download the PDF.) Her rubric has three columns with typical labels (Exemplary, Competent, Developing), but each label goes further.
- "Exemplary" is "advanced work at this time in the course - on a job the work would need very little revision for a supervising attorney to use."
- "Competent" is "proficient work at this time in the course - on a job the work would need to be revised with input from supervising attorney."
- And "Developing" is "work needs additional content or skills to be competent - on a job, the work would not be helpful and the supervising attorney would need to start over."
Andi's simple column labels make two things clear: what is considered adequate work at this point in the program, and how student performance measures up to what employers will eventually be looking for.
If we can craft rubrics that define clearly the minimal level that students need to reach to succeed in their next course, their next degree, their next job, or whatever else happens next in their lives, and bring in the people who actually work with our students at those points to help assess student work, we will go a long way toward making assessment even more meaningful and useful.
|Posted on March 23, 2015 at 7:50 AM|
One of the biggest barriers to understanding and using assessment results is figuring out whether or not the results are good enough by setting appropriate benchmarks or standards. Most faculty and administrators don’t have any idea how to do this and end up pulling numbers out of a hat!
Here are seven steps to setting good-quality benchmarks or standards for rubrics:
1. Know how the assessment results will be used: who will use them and what decisions the results will inform. If the purpose is to fund things that need improvement (something many accreditors want to see), you’ll want to set a relatively high bar so you identify all potential areas for improvement. If the purpose is to maintain the status quo, you’ll want to set a relatively low bar so your students appear to be successful.
2. Clarify the potential harm of setting the bar too high or too low. If the bar is set too high, you may identify too many problems and spread yourselves too thin trying to address them all. If the bar is set too low, you increase the risk of graduating incompetent students.
3. Bring in external information to inform your discussions. Disciplinary and professional standards, employers and alumni, peer programs and colleges, faculty teaching more advanced courses—any of these help you develop justifiable benchmarks.
4. Have a clear rubric, with clear descriptions of performance in every box. The fuzzier your rubric, the harder it is to set meaningful benchmarks.
5. Look at the assignment that the rubric is evaluating, as well as samples of student work, to inform your thinking. Students’ organization and grammar will likely be weaker on an essay exam question completed in a short amount of class time than on a research paper subject to multiple revisions.
6. For each rubric criterion, identify the performance level that represents a minimally competent student: one whose performance would not embarrass you. Setting standards or benchmarks is inherently a value judgment, so a group of people should do this—the more the merrier—by voting and going with the majority. Not all rubric criteria will have the same benchmark: basic or essential competencies like grammar may have a higher benchmark than “aspirational competencies” like creative thinking.
7. Ground your benchmarks with data…but after that first vote. If you have assessment results in hand, or results from peer colleges or programs, share them and let everyone discuss and perhaps vote again. If your students’ performance is far below your benchmark, think about setting short-term targets to move your students toward that long-term benchmark.
If you’d like to learn more, I’ll be talking about these seven steps at the Higher Learning Commission’s Annual Conference. My session, “How Good is Good Enough? Principles for Setting Benchmarks,” is on Sunday, March 29, at 1:00 p.m.