|Posted on March 18, 2017 at 8:25 AM||comments (13)|
My last blog post on analyzing multiple choice test results generated a good bit of feedback, mostly on the ASSESS listserv. Joan Hawthorne and a couple of other colleagues thoughtfully challenged my “50% rule”—that any questions that more than 50% of your students get wrong may suggest something wrong and should be reviewed carefully.
Joan pointed out that my 50% rule shouldn’t be used with tests that are so important that students should earn close to 100%. She’s absolutely right. Some things we teach—healthcare, safety—are so important that if students don’t learn them well, people could die. If you’re teaching and assessing must-know skills and concepts, you might want to look twice at any test items that more than 10% or 15% of students got wrong.
With other tests, how hard the test should be depends on its purpose. I was taught in grad school that the purpose of some tests is to separate the top students from the bottom—distinguish which students should earn an A, B, C, D, or F. If you want to maximize the spread of test scores, an average item difficulty of 50% is your best bet—in theory, you should get test scores ranging all the way from 0 to 100%. If you want each test item to do the best possible job discriminating between top and bottom students, again you’d want to aim for a 50% difficulty.
But in the real world I’ve never seen a good test with an overall 50% difficulty for several good reasons.
1. Difficult test questions are incredibly hard to write. Most college students want to get a good grade and will at least try to study for your test. It’s very hard to come up with a test question that assesses an important objective but that half of them will get wrong. Most difficult items I’ve seen are either on minutiae, “trick” questions on some nuanced point, or questions that are more tests of logical reasoning skill than course learning objectives. In my whole life I’ve written maybe two or three difficult multiple choice questions that I’ve been proud of: that truly focused on important learning outcomes and didn’t require a careful nuanced reading or logical reasoning skills. In my consulting work, I’ve seen no more than half a dozen difficult but effective items written by others. This experience has led me to suggest that “50% rule.”
2. Difficult tests are demoralizing to students, even if you “curve” the scores and even if they know in advance that the test will be difficult.
3. Difficult tests are rarely appropriate, because it’s rare for the sole or major purpose of a test to be to maximize the spread of scores. Many tests have dual purposes. There are certain fundamental learning objectives we want to make sure (almost) every student has learned, or they’re going to run into problems later on. Then there are some learning objectives that are more challenging—that only the A or maybe B students will achieve—and those test items will separate the A from B students and so on.
So, while I have great respect for those who disagree with me, I stand by my suggestion in my last blog post. Compare each item’s actual difficulty (the percent of students who answered incorrectly) against how difficult you wanted that item to be, and carefully evaluate any items that more than 50% of your students got wrong.
|Posted on February 28, 2017 at 8:15 AM||comments (2)|
Next month I’m doing a faculty professional development workshop on interpreting the reports generated for multiple choice tests. Whenever I do one of these workshops, I ask the sponsoring institution to send me some sample reports. I’m always struck by how user-unfriendly they are!
The most important thing to look at in a test report is the difficulty of each item—the percent of students who answered each item correctly. Fortunately these numbers are usually easy to find. The main thing to think about is whether each item was as hard as you intended it to be. Most tests have some items on essential course objectives that every student who passes the course should know or be able to do. We want virtually every student to answer those items correctly, so check those items and see if most students did indeed get them right.
Then take a hard look at any test items that a lot of students got wrong. Many tests purposefully include a few very challenging items, requiring students to, say, synthesize their learning and apply it to a new problem they haven’t seen in class. These are the items that separate the A students from the B and C students. If these are the items that a lot of students got wrong, great! But take a hard look at any other questions that a lot of students got wrong. My personal benchmark is what I call the 50 percent rule: if more than half my students get a question wrong, I give the question a hard look.
Now comes the hard part: figuring out why more students got a question wrong than we expected. There are several possible reasons including the following:
- The question or one or more of its options is worded poorly, and students misinterpret them.
- We might have taught the question’s learning outcome poorly, so students didn’t learn it well. Perhaps students didn’t get enough opportunities, through classwork or homework, to practice the outcome.
- The question might be on a trivial point that few students took the time to learn, rather than a key course learning outcome. (I recently saw a question on an economics test that asked how many U.S. jobs were added in the last quarter. Good heavens, why do students need to memorize that? Is that the kind of lasting learning we want our students to take with them?)
If you’re not sure why students did poorly on a particular test question, ask them! Trust me, they’ll be happy to tell you what you did wrong!
Test reports provide two other kinds of information: the discrimination of each item and how many students chose each option. These are the parts that are usually user-unfriendly and, frankly, can take more time to decipher than they’re worth.
The only thing I’d look for here is any items with negative discrimination. The underlying theory of item discrimination is that students who get an A on your test should be more likely to get any one question right than students who fail it. In other words, each test item should discriminate between top and bottom students. Imagine a test question that all your A students get wrong but all your failing students answer correctly. That’s an item with negative discrimination. Obviously there’s something wrong with the question’s wording—your A students interpreted it incorrectly—and it should be thrown out. Fortunately, items with negative discrimination are relatively rare and usually easy to identify in the report.
|Posted on November 1, 2013 at 7:20 AM||comments (0)|
Unlike many people involved with higher education assessment, I'm a fan of multiple choice tests...under the right circumstances, of course.
Multiple choice tests can give us a broader picture of student learning than "authentic" assessments, and they can be scored and evaluated very quickly. And, yes, they can assess application and analysis skills as well as memory and comprehension.
The key is to ask questions that can be answered in an open-book, open-note format...ones that require students to think and apply their knowledge rather than just recall. My favorite way to do this is with what I call "interpretive exercises" and others call "vignettes," "context-dependent items" or "enhanced multiple choice." You've seen these on published tests. Students are given material they haven't seen before: a chart, a description of a scenario, a diagram, a literature excerpt. The multiple choice questions that follow ask students to interpret this new material.
The key to a good multiple choice test is to start with a "test blueprint": a list of the learning objectives you want to assess. Then write items for each of those learning objectives.
There are just two other precepts for writing good multiple choice items. First, remove all barriers that will keep a knowledgeable student from getting the item right. (For example, don't make the item unnecessarily wordy.) Second, remove all cludes that will help a less-than-knowledgeable student get the item right. (For example, use common misconceptions as incorrect options.)