|Posted on March 18, 2017 at 8:25 AM|
My last blog post on analyzing multiple choice test results generated a good bit of feedback, mostly on the ASSESS listserv. Joan Hawthorne and a couple of other colleagues thoughtfully challenged my “50% rule”—that any questions that more than 50% of your students get wrong may suggest something wrong and should be reviewed carefully.
Joan pointed out that my 50% rule shouldn’t be used with tests that are so important that students should earn close to 100%. She’s absolutely right. Some things we teach—healthcare, safety—are so important that if students don’t learn them well, people could die. If you’re teaching and assessing must-know skills and concepts, you might want to look twice at any test items that more than 10% or 15% of students got wrong.
With other tests, how hard the test should be depends on its purpose. I was taught in grad school that the purpose of some tests is to separate the top students from the bottom—distinguish which students should earn an A, B, C, D, or F. If you want to maximize the spread of test scores, an average item difficulty of 50% is your best bet—in theory, you should get test scores ranging all the way from 0 to 100%. If you want each test item to do the best possible job discriminating between top and bottom students, again you’d want to aim for a 50% difficulty.
But in the real world I’ve never seen a good test with an overall 50% difficulty for several good reasons.
1. Difficult test questions are incredibly hard to write. Most college students want to get a good grade and will at least try to study for your test. It’s very hard to come up with a test question that assesses an important objective but that half of them will get wrong. Most difficult items I’ve seen are either on minutiae, “trick” questions on some nuanced point, or questions that are more tests of logical reasoning skill than course learning objectives. In my whole life I’ve written maybe two or three difficult multiple choice questions that I’ve been proud of: that truly focused on important learning outcomes and didn’t require a careful nuanced reading or logical reasoning skills. In my consulting work, I’ve seen no more than half a dozen difficult but effective items written by others. This experience has led me to suggest that “50% rule.”
2. Difficult tests are demoralizing to students, even if you “curve” the scores and even if they know in advance that the test will be difficult.
3. Difficult tests are rarely appropriate, because it’s rare for the sole or major purpose of a test to be to maximize the spread of scores. Many tests have dual purposes. There are certain fundamental learning objectives we want to make sure (almost) every student has learned, or they’re going to run into problems later on. Then there are some learning objectives that are more challenging—that only the A or maybe B students will achieve—and those test items will separate the A from B students and so on.
So, while I have great respect for those who disagree with me, I stand by my suggestion in my last blog post. Compare each item’s actual difficulty (the percent of students who answered incorrectly) against how difficult you wanted that item to be, and carefully evaluate any items that more than 50% of your students got wrong.