Linda Suskie

  A Common Sense Approach to Assessment in Higher Education

Blog

How many samples of student work do you need to assess?

Posted on November 22, 2019 at 7:00 AM Comments comments (0)

It’s a question I get a lot! And—fair warning!—you probably won’t like my answers.

 

First, the learning goals we assess are promises we make to our students, their families, employers, and society: Students who successfully complete a course, program, gen ed curriculum, or other learning experience can do the things we promise in our learning goals. Those learning goals also are (or should be) the most important things we want students to learn. As a matter of integrity, we should therefore make sure, through assessment, that every student who completes a learning experience has indeed achieved its learning goals. So my first answer is that you should assess everyone’s work, not a sample.

 

Second, if you are looking at a sample rather than everyone’s work, you must look at a large enough sample (and a representative enough sample) to be able to generalize from that sample to all students. Political polls take this approach. A poll may say, for example, that 23% of registered voters prefer Candidate X with (in fine print at the bottom of the table) an error margin of plus or minus 5%. That means the pollster is reasonably confident that, if every registered voter could be surveyed, between 18% and 28% would prefer Candidate X.

 

Here’s the depressing part of this approach: An error margin of 5%--and I wouldn’t want an error margin bigger than that—requires looking at about 400 examples of student work. (This is why those political polls typically sample about 400 people.) Unless your institution or program is very large, once again you need to look at everyone’s work, not a sample. Even if your institution or program is very large, your accreditor may expect you to look separately at students at each location or otherwise break your students down into smaller groups for analysis, and those groups may well be under 400 students.

 

I can think of only three situations in which samples may make sense.

 

Expensive, supplemental assessments. Published or local surveys, interviews, and focus groups can be expensive in terms of time and/or dollars. These are supplemental assessments—indirect evidence of student learning—and it’s usually not essential to have all students participate in them.

 

Learning goals you don’t expect everyone to achieve. Some institutions and programs have some statements that aren’t really learning goals but aspirations: things they hope some students will achieve but not that they can realistically promise that every student will achieve. Having a passion for lifelong learning or a commitment to civic engagement are two examples of such aspirations. It may be fine to assess aspirations by looking at samples to estimate how many students are indeed on the path to achieving them.

 

Making evidence of student learning part of a broader analysis. For many faculty, the burden of assessment is not assessing students in classes—they do that through the grading process. The burden is in the extra work of folding their assessment into a broader analysis of student learning across a program or gen ed requirement. Sometimes faculty submit rubric or test scores to an office or committee; sometimes faculty submit actual student work; sometimes a committee assesses student work. These additional steps can be laborious and time consuming, especially if your institution doesn’t have a suitable assessment information management system. In these situations, samples of student work may save considerable time—if the samples are sufficiently large and representative to yield useful, generalizable evidence of student learning, as discussed above.

 

For more information on sampling, see Chapter 12 of the third edition of Assessing Student Learning: A Common Sense Guide.

 

Who is your audience?

Posted on August 7, 2019 at 6:30 AM Comments comments (0)

In my July 9, 2019, blog post I encouraged using summertime to reflect on your assessment practices, starting with the question, “Why are we assessing?”


Here are the next questions on which I suggest you reflect:

  • Who are our audiences for the products we’re generating through our assessment processes?
  • What decisions are they making?
  • How can the products of our assessment work help them make better decisions?


In other words, before planning any assessment, figure out the decisions the assessment results should inform, then design the assessment to help inform those decisions.


Your answers to the questions I’ve listed will affect the length, format, and even the vocabulary you use in each product. Consider these products of assessment processes:


Assessment results. The key audience for assessment results should be obvious: the faculty and administrators who need them to make decisions, especially what and how to teach, how to help students learn and succeed, and how best to deploy scarce resources.


Faculty and administrators are always making these decisions. The problem is that many people make those decisions in a “data-free zone.” They make a decision simply because someone thinks they have a good idea or perhaps because a couple of students complained about something.


Today time and resources at virtually every institution are limited. We can no longer afford to make decisions simply because people think they have a good idea. Before plunging ahead with a decision, we first need some evidence that we’ve identified the problem correctly and that our solution has a good chance of solving the problem.


In his book How to Measure Anything, Douglas Hubbard points out we’re not aiming to make infallible decisions, just better decisions than we would without assessment results.


Many reports of assessment results say the results have been used only to make tweaks to assignments and course curricula (“We’ll emphasize this more in class.”) But what if, say, six programs all find that their seniors can’t analyze data well? That calls for another audience: your institution’s academic leadership team. Your institution needs a process to examine assessment results across programs holistically—and probably qualitatively—to identify any pervasive issues and bring them to the attention of academic leaders so they can provide professional development and other support to address those issues across your institution.


All this suggests that you need to involve your audiences in designing your assessments and your reports of assessment results, both to make sure you’re providing the information they need and that it’s in a format they can easily understand and use.)


Learning goals have several audiences. The most important audience is students, because research has shown that many students learn more effectively when they understand what they’re supposed to be learning. Prospective students—those who are considering enrolling in your institution, program, course, or co-curricular learning experience—are another important audience. Key learning goals might help convince them to enroll (“I’ve always wanted to learn that” or “I can see why it would be important to learn that.”). A third audience is potential employers (“These are the skills I’m looking for when I hire people, so I’m going to take a close look at graduate of this program.”). And a fourth audience is potential funding sources such as foundations, donors, and government policymakers (“This institution or program teaches important things, the kinds of skills people need today, so it’s good place to invest our funds.”).


All these audiences need learning goals stated in clear, simple terms that they will easily understand. Academic jargon and complex statements have no place in learning goals.


Strategic and unit goals. These often have two key audiences: the employees who will help accomplish the goals and potential funding sources such as donors. Both need goals stated in clear, simple terms that they will easily understand so they can figure out where the institution or unit is headed, what it will look like in a few years, and how they can help achieve the goals.


Curriculum maps. Curriculum maps are a tool to help faculty (1) analyze the effectiveness of their curricula and (2) identify the best places to assess student achievement of key learning goals. So they need to be designed in ways that help faculty accomplish these quickly and easily.


Student assignments. In many assignments we give students, the implicit audience for their work—be it a paper, presentation, or performance—is us: the faculty or staff member giving the assignment. That doesn’t prepare students well for creating work for other audiences. When I taught first-year writing, one assignment was to write solicitations for gifts to a charity to two different audiences (and a third statement comparing the two). When I taught statistics, I had students write a one-paragraph summary their statistical test addressed to the hypothetical individual who requested the analysis. When I taught a graduate course in educational research methods, I had students not only draft the first three chapters of their theses but deliver mock presentations to a foundation explaining, justifying, and seeking funding for their research.


Documentation of assessment processes (how each learning goal has been assessed). The key audience here is the faculty and staff responsible for the program, course, or other learning experience being assessed. They can use this documentation to avoid reinventing the wheel (“How did we assess this last time?”).


Another audience for documentation of assessment processes is whatever group is overseeing and supporting assessment efforts at your institution, such as an assessment committee. This group can use this documentation to (1) recognize and honor good practices, (2) share those good assessment practices with others at your college, (3) give each program or unit feedback on how well its assessment work meets the characteristics of good assessment practices, and (4) plan professional development to address any pervasive issues they see in how assessment is being done.


Documentation of uses of assessment results. Here again the key audience is the faculty and staff responsible for the program, course, or other learning experience being assessed. They can use this information to track the impact of improvements they’ve attempted (“We tried adding more homework problems but that didn’t help much. Maybe this time we could try incorporating these skills into two other required courses.”).


Why haven’t I mentioned your accreditor as an audience of your assessment products? Accreditors are a potential audience for everything I’ve mentioned here, but they’re a secondary audience. They are most interested in the impact of your assessment products on students, colleagues, and the other audiences I’ve mentioned here. They want to see what you’ve shared with your key audiences—and how those audiences have used what you’ve shared with them. Most of all, they want your summary and candid, forthright analysis of the overall effectiveness of your institution’s or program’s assessment products in helping those audiences make decisions.

Why are you assessing?

Posted on July 9, 2019 at 3:50 PM Comments comments (2)

Summer is a great time to reflect on and possibly rethink your assessment practices. I’m a big believer in form following function, so I think the first question to reflect on should be, “Why are we doing this?” You can then reflect on how well your assessment practices achieve those purposes.


In Chapter 6 of my book Assessing Student Learning I present three purposes of assessment. Its fundamental purpose is, of course, giving students the best possible education. Assessment accomplishes this by giving faculty and staff feedback on what is and isn’t working and insight into changes that might help students learn and succeed even more effectively.


The second purpose of assessment is what I call stewardship. All colleges run on other people’s money, including tuition and fees paid by students and their families, government funds paid by taxpayers, and scholarships paid by donors. All these people deserve assurance that your college will be a wise steward of their resources, spending those resources prudently, effectively, and judiciously. Stewardship includes using good-quality evidence of student learning to help inform decisions on how those resources are spent, including how everyone spends their time. Does service learning really help develop students’ commitment to a life of service? Does the gen ed curriculum really help improve students’ critical thinking skills? Does the math requirement really help students analyze data? And are the improvements big enough to warrant the time and effort faculty and staff put into developing and delivering these learning experiences?


The third purpose of assessment is accountability: assuring your stakeholders of the effectiveness of your college, program, service, or initiative. Stakeholders include current and prospective students and their families, employers, government policy makers, alumni, taxpayers, governing board members…and, yes, accreditors. Accountability includes sharing both successes and steps being taken to make appropriate, evidence-based improvements.


So your answers to “Why are we doing this?” will probably be variations on the following themes, all of which require good-quality assessment evidence:

  • We want to understand what is and isn’t working and what changes might help students learn and succeed even more effectively.
  • We want to understand if what we’re doing has the desired impact on student learning and success and whether the impact is enough to justify the time and resources we’re investing.
  • Our stakeholders deserve to see our successes in helping students learn and succeed and what we’re doing to improve student learning and success.

Culturally Responsive Assessment

Posted on June 8, 2019 at 6:25 AM Comments comments (0)

I have the honor of serving as one of the faculty of this year's Mission Fulfillment Fellowship of the Northwest Commission on Colleges and Universities (NWCCU). One of the readings that’s resonated most with the Fellows is Equity and Assessment: Moving Towards Culturally Responsive Assessment by Erick Montenegro and Natasha Jankowski. 


A number of the themes of this paper resonate with me. One is that I’ve always viewed assessment as simply a part of teaching, and the paper confirms that there’s a lot of overlap between culturally responsive pedagogy and culturally responsive assessment.


Second, a lot of culturally responsive assessment concepts are simply about being fair to all students. Fairness is a passion of mine and, in fact, the subject of the very first paper I wrote on assessment in higher education twenty years ago. Fairness includes:

  • Writing learning goals, rubrics, prompts (assignments), and feedback using simple, clear vocabulary that entry-level students can understand, including defining any terms that may be unfamiliar to some students.
  • Matching your assessments to what you teach and vice versa. Create rubrics, for example, that focus on the skills you have been helping students demonstrate, not the task you’re asking students to complete.
  • Helping students learn how to do the assessment task. Grade students on their writing skill only if you have been explicitly teaching them how to write in your discipline and giving them writing assignments and feedback. 
  • Giving students a variety of ways to demonstrate their learning. Students might demonstrate information literacy skills, for example, through a deck of PowerPoint slides, poster, infographic, mini-class, graphic novel, portfolio, or capstone project, to name a few.
  • Engaging and encouraging your students, giving them a can-do attitude.


Third, a lot of culturally responsive pedagogy and assessment concepts flow from research over the last 25 years on how to help students learn and succeed, which I’ve summarized in List 26.1 in my book Assessing Student Learning: A Common Sense Guide. We know, for example, that some students learn better when:

  • They see clear relevance and value in their learning activities.
  • They understand course and program learning goals and the characteristics of excellent work, often through a rubric.
  • Learning activities and grades focus on important learning goals. Faculty organize curricula, teaching practices, and assessments to help students achieve important learning goals. Students spend their time and energy learning what they will be graded on.
  • New learning is related to their prior experiences and what they already know, through both concrete, relevant examples and challenges to their existing paradigms.
  • They learn by doing, through hands-on practice engaging in multidimensional real world tasks, rather than by listening to lectures.
  • They interact meaningfully with faculty—face-to-face and/or online.
  • They collaborate with other students—face-to-face and/or online—including those unlike themselves.
  • Their college and its faculty and staff truly focus on helping students learn and succeed and on improving student learning and success.

These are all culturally responsive pedagogies.


So, in my opinion, the concept of culturally responsive assessment doesn’t break new ground as much as it reinforces the importance of applying what we already know: ensuring that our assessments are fair to all students, using research-informed strategies to help students learn and succeed, and viewing assessment as part of teaching rather than as a separate add-on activity.


How do we apply what we know to students whose cultural backgrounds and experiences are different from our own? In addition to the ideas I’ve already listed, here are some practical suggestions for culturally responsive assessment, gleaned from Montenegro and Jankowski’s paper and my own experiences working with people from a variety of cultures and backgrounds:

  1. Recognize that, like any human being, you’re not impartial. Grammatical errors littering a paper may make it hard, for example, for you to see the good ideas in it.
  2. Rather than looking on culturally responsive assessment as a challenge, look on it as a learning experience: a way to model the common institutional learning outcome of understanding and respecting perspectives of people different from yourself.
  3. Learn about your students’ cultures. Ask your institution to develop a library of short, practical resources on the cultures of its students. For cultures originating in countries outside the United States, I do an online search for business etiquette in that country or region. It’s a great way to quickly learn about a country’s culture and how to interact with people there sensitively and effectively. Just keep in mind that readings won’t address every situation you’ll encounter.
  4. Ask your students for help in understanding their cultural background.
  5. Involve students and colleagues from a variety of backgrounds in articulating learning goals, designing rubrics, and developing prompts (assignments).
  6. Recognize that students for whom English is a second language find it particularly hard to demonstrate their learning through written assignments and oral presentations. They may demonstrate their learning more effectively through non-verbal means such as a chart or infographic. 
  7. Commit to using the results of your assessments to improve learning for all students, not just the majority or plurality.

Understanding direct and indirect evidence of student learning

Posted on May 10, 2019 at 8:50 AM Comments comments (6)

A recent question posted to the ASSESS listserv led to a lively discussion of direct vs. indirect evidence of student learning, including what they are and the merits of each.


I really hate jargon, and “direct” and “indirect” is right at the top of my list of jargon I hate. A few years ago I did a little poking around to try to figure out who came up with these terms. The earliest reference I could find was in a government regulation. That makes sense—governments are great at coming up with obtuse jargon!


I suspect the terms came from the legal world, which uses the concepts of direct and circumstantial evidence. Direct evidence in the legal world is evidence that supports an assertion without the need for additional evidence. Witness knowledge or direct recollection are examples of direct evidence. Circumstantial evidence is evidence from which reasonable inferences may be drawn.


In the legal world, both direct and circumstantial evidence are acceptable and each alone may be sufficient to make a legal decision. Here’s an often-cited example: If you got up in the middle of the night and saw that it’s snowing, that’s direct evidence that it snowed overnight. If you got up in the morning and saw snow on the ground, that’s circumstantial evidence that it snowed overnight. Obviously both are sufficient evidence that it snowed overnight.


But let’s say you got up in the morning and saw that the roads were wet. That’s circumstantial evidence that it rained overnight. But the evidence is not as compelling, because there might be other reasons the roads were wet. It might have snowed and the snow melted by dawn. It might have been foggy. Or street cleaners may have come through overnight. In this example, this circumstantial evidence would be more compelling if it were accompanied by corroborating evidence, such as a report from a local weather station or someone living a mile away who did get up in the middle of the night and saw rain.


So, in the legal world, direct evidence is observed and circumstantial evidence is inferred. Maybe “observed” and “inferred” might be better terms for direct and indirect evidence of student learning. Direct evidence can be observed through student products and performances. Indirect evidence must be inferred through what students tell us, through things like surveys and interviews, or what faculty tell us through things like grades, or some student behaviors such as graduation or job placement.


But the problem with using “observable” and “inferred” is that all student learning is inferred to some extent. If a crime is recorded on video, that’s clearly direct, observable evidence. But if a student writes a research paper or makes a presentation or takes a test, we’re only observing a sample of what they’ve learned, and maybe it’s not a good sample. Maybe the test happened to focus heavily on the concepts the student didn’t learn. Maybe the student was ill the day of the presentation. When we assess student learning, we’re trying to see into a student’s mind. It’s like looking into a black box fitted with lenses that are all somewhat blurry or distorted. We may need to look through several lenses, from several angles, to infer reasonably accurately what’s inside.


In the ASSESS listserv discussion, John Hathcoat and Jeremy Penn both suggested that direct and indirect evidence fall on a continuum. This is why. Some lenses are clearer than others. Some direct evidence is more compelling or convincing than others. If we see a nursing student intubate a patient successfully, we can be pretty confident that the student can perform this procedure correctly. But if we assess a student essay, we can’t be as confident about the student’s writing skill, because the skill level displayed can depend on factors such as the essay’s topic, the time and circumstances under which the student completes the assignment, and the clarity of the prompt (instructions).


So I define direct evidence as not only observable but sufficiently convincing that a critic would be persuaded. Imagine someone prominent in your community who thinks your college, your program, or your courses are a joke—students learn nothing worthwhile in them. Direct evidence is the kind that the critic wouldn’t challenge. Grades, student self-ratings, and surveys wouldn’t convince that critic. But rubric results, accompanied by a few samples of student work, would be harder for the critic to refute.


So should faculty be asked or required to provide direct and indirect evidence of student learning? If your accreditor requires direct and indirect evidence, obviously yes. Otherwise, the need for direct evidence depends on how it will be used. Direct evidence should be used, for example, when deciding whether students will progress or graduate or whether to fund or terminate a program. The need for direct evidence also depends on the likelihood that the evidence will be challenged. For relatively minor uses, such as evaluating a brief co-curricular experience, indirect evidence may be just as useful as direct evidence, if not even more insightful.


One last note on direct/observable evidence: learning goals for attitudes, values, and dispositions can be difficult if not impossible to observe. That’s because, as hard as it is to see into the mind (with that black box analogy), it’s even harder to see into the soul. One of the questions on the ASSESS listserv was what constitutes direct evidence that a dancer dances with confidence. Suppose you’re observing two dancers performing. One has enormous confidence and the other has none. Would you be able to tell them apart from their performances? If so, how? What would you see in one performance that you wouldn’t see in the other? If you can observe a difference, you can collect direct evidence. But if the difference is only in their soul—not observable—you’ll need to rely on indirect evidence to assess this learning goal.

Setting meaningful benchmarks and standards, revisited

Posted on January 16, 2019 at 7:45 AM Comments comments (1)

A recent discussion on the ACCSHE listserv reminded me that setting meaningful benchmarks or standards for student learning assessments remains a real challenge. About three years ago, I wrote a blog post on setting benchmarks or standards for rubrics. Let’s revisit that and expand the concepts to assessments beyond rubrics.


The first challenge is vocabulary. I’ve seen references to goals, targets, benchmarks, standards, thresholds. Unfortunately, the assessment community doesn’t yet have a standard glossary defining these terms (although some accreditors do). I now use standard to describe what constitutes minimally acceptable student performance (such as the passing score on a test) and target to describe the proportion of students we want to meet that standard. But my vocabulary may not match yours or your accreditor's!


The second challenge is embedded in that next-to-last sentence. We’re talking about two different numbers here: the standard describing minimally acceptable performance and the target describing the proportion of students achieving that performance level. That makes things even more confusing.


So how do we establish meaningful standards? There are four basic ways. Three are:

1. External standards: Sometimes the standard is set for us by an external body, such as the passing score on a licensure exam.

2. Peers: Sometimes we want our students to do as well as or better than their peers.

3. Historical trends: Sometimes we want our students to do as well as or better than past students.


Much of the time none of these options is available to us, leaving us to set our own standard, what I call a local standard and what others call a competency-based or criterion-referenced standard. Here are the steps to setting a local standard:


Focus on what would not embarrass you. Would you be embarrassed if people found out that a student performing at this level passed your course or graduated from your program or institution? Then your standard is too low. What level do students need to reach to succeed at whatever comes next—more advanced study or a job?


Consider the relative harm in setting the standard too high or too low. A too-low standard means you’re risking passing or graduating students who aren’t ready for what comes next and that you’re not identifying problems with student learning that need attention. A too-high standard may mean you’re identifying shortcomings in student learning that may not be significant and possibly using scarce time and resources to address those relatively minor shortcomings.


When in doubt, set the standard relatively high rather than relatively low. Because every assessment is imperfect, you’re not going to get an accurate measure of student learning from any one assessment. Setting a relatively high bar increases the chance that every student is truly competent on the learning goals being assessed.


If you can, use external sources to help set standards. A business advisory board, faculty from other colleges, or a disciplinary association can all help get you out of the ivory tower and set defensible standards.


Consider the assignment being assessed. Essays completed in a 50-minute class are not going to be as polished as papers created through scaffolded steps throughout the semester.


Use samples of student work to inform your thinking. Discuss with your colleagues which seem unacceptably poor, which seem adequate though not stellar, and which seem outstanding, then discuss why.


If you are using a rubric to assess student learning, the standard you’re setting is the rubric column (performance level) that defines minimally acceptable work. This is the most important column in the rubric and, not coincidentally, the hardest one to complete. After all, you’re defining the borderline between passing and failing work. Ideally, you should complete this column first, then complete the remaining columns.


Now let’s turn from setting standards to setting targets for the proportions of students who achieve those standards. Here the challenge is that we have two kinds of learning goals. Some are essential. We want every college graduate to write a coherent, grammatically correct paragraph, for example. I don’t want my tax returns prepared by an accountant who can complete them correctly only 70% of the time, and I don’t want my prescriptions filled by a pharmacist who can fill them correctly only 70% of the time! For these essential goals, we want close to 100% of students meeting our standard.


Then there are aspirational goals, which not everyone need achieve. We may want college graduates to be good public speakers, for example, but in many cases graduates can lead successful lives even if they’re not. For these kinds of goals, a lower target may be appropriate.


Tests and rubrics often assess a combination of essential and aspirational goals, which suggests that overall test or rubric scores often aren’t very helpful in understanding student learning. Scores for each rubric trait or for each learning objective in the test blueprint are often much more useful.


Bottom line here: I have a real problem with people who say their standard or target is 70%. It’s inevitably an arbitrary number with no real rationale. Setting meaningful standards and targets is time-consuming, but I can think of few tasks that are more important, because it’s what help ensure that students truly learn what we want them to…and that’s what we’re all about.


By the way, my thinking here comes primarily from two sources: Setting Performance Standards by Cizek and a review of the literature that I did a couple of years ago for a chapter on rubric development that I contributed to the https://www.amazon.com/Handbook-Measurement-Assessment-Evaluation-Education/dp/1138892157" target="_blank">Handbook on Measurement, Assessment, and Evaluation in Higher Education. For a more thorough discussion of the ideas here, see Chapter 22 (Setting Meaningful Standards and Targets) in the new 3rd edition of my book Assessing Student Learning: A Common Sense Guide.

Grading group work

Posted on October 27, 2018 at 10:30 AM Comments comments (0)

Collaborative learning, better known as group work, is an important way for students to learn. Some students learn better with their peers than by working alone. And employers very much want employees who bring teamwork skills.


But group work, such as a group presentation, is one of the hardest things for faculty to grade fairly. One reason is that many student groups include some slackers and some overactive eager beavers. When viewing the product of a group assignment—say they’ve been asked to work together to create a website—it can be hard to discern the quality of individual students’ achievements fairly.


Another reason is that group work is often more about performances than products—the teamwork skills each student demonstrates. As I note in Chapter 21 of Assessing Student Learning: A Common Sense Guide, performances such as working in a team or delivering a group presentation are harder to assess than products such as a paper.


In their book Collaborative Learning Techniques: A Handbook for College Faculty, Elizabeth Barkley, Claire Major, and K. Patricia Cross acknowledge that grading collaborative learning fairly and validly can be challenging. But it’s not impossible. Here are some suggestions.


Have clear learning goal(s) for the assignment. If your key learning goal is for students to develop teamwork skills, your assessment strategy will be very different than if your learning goal is for them to learn how to create a well-designed website.


Make sure your curriculum includes plenty of opportunities for students to develop and achieve your learning goal. If your key learning goal is for students to develop teamwork skills, for example, you’ll need to provide lessons, classwork, and homework that helps them learn what good and poor teamwork skills are and to practice those skills. Just putting students into a group and letting them fend for themselves won’t cut it—students will just keep using whatever bad teamwork habits they brought with them.


Deal with the slackers--and the overactive eager beavers--proactively. Barkley, Major and Cross suggest several ways to do this. Design a group assignment in which each group member must make a discrete contribution for which they’re held accountable. Make these contributions equitable, so all students must participate evenly. Make clear to students that they’ll be graded for their own contribution as well as for the overall group performance or product. And check in with each group periodically and, if necessary, speak individually with any slackers and also those eager beavers who try to do everything themselves.


Consider observing student groups working together. This isn’t always practical, of course—your presence may stifle the group’s interactions—but it’s one way to assess each student’s teamwork skills. Use a rubric to record what you see. Since you’re observing several students simultaneously, keep the rubric simple enough to be manageable—maybe a rating scale rubric or a structured observation guide, both of which are discussed in the rubrics chapter of Assessing Student Learning.


Consider asking students to rate each other. Exhibit 21.1 in Assessing Student Learning is a rating scale rubric I’ve used for this purpose. I tell students that their groupmates’ ratings of them will be averaged and be 5% of their final grade. I weight peer ratings very low because I don’t want students’ grades to be advantaged or disadvantaged by any biases of their peers.


Give each student two grades: one grade for the group product or performance and one for his or her individual contribution to it. This only works when it’s easy to discern each student’s contribution. You can weight the two grades however you like—perhaps equally, or perhaps weighting the group product or performance more heavily than individual contributions, or vice versa.


Give the group a total number of points, and let them decide how to divide those points among group members. Some faculty have told me they’ve used this approach and it works well.


Barkley, Major and Cross point out that there’s a natural tension between promoting collaborative learning and teamwork and assigning individual grades. Whatever approach you choose, try to minimize this tension as much as you can.

Should rubrics be assignment-specific?

Posted on September 2, 2018 at 8:25 AM Comments comments (3)

In a recent guest post in Inside Higher Ed, “What Students See in Rubrics,” Denise Krane explained her dissatisfaction with rubrics, which can be boiled down to this statement toward the end of her post, “Ideally, rubrics are assignment specific.”


I don’t know where Denise got this idea, but it’s flat-out wrong. As I’ve mentioned in previous blog posts on rubrics, a couple of years ago I conducted a literature review for a chapter on rubric development that I wrote for the second edition of the Handbook of Measurement, Assessment, and Evaluation in Higher Education. The rubric experts I found (for example, Brookhart; Lane; Linn, Baker & Dunbar; and Messick) are unanimous in advocating what they call general rubrics over what they call task-specific rubrics: rubrics that assess achievement of the assignment’s learning outcomes rather than achievement of the task at hand.


Their reason is exactly what Denise advocates: we want students to focus on long-term, deep learning—in the case of writing, to develop the tools to, as Denise says, grapple with writing in general. Indeed, some experts such as Lane posit that one of the criteria of a valid rubric is its generalizability: it should tell you how well students can write (or think, or solve problems) across a range of tasks, not just the one being assessed. If you use a task-specific rubric, students will learn how to do that one task but not much more. If you use a general rubric, students will learn skills they can use in whole families of tasks.


To be fair, the experts also caution against general rubrics that are too general, such as one writing rubric used to assess student work in courses and programs across an entire college. Many experts (for example, Cooper, Freedman, Lane, and Lloyd-Jones) suggest developing rubrics for families of related assignments—perhaps one for academic writing in the humanities and another for business writing. This lets the rubric include discipline-specific nuances. For example, academic writing in the humanities is often expansive, while business writing must be succinct.


How do you move from a task-specific rubric to a general rubric? It’s all about the traits being assessed—those things listed on the left side of the rubric. Those things should be traits of the learning outcomes being assessed, not the assignment. So instead of listing each element of the assignment (I’ve seen rubrics that literally list “opening paragraph,” “second paragraph,” and so on ), list each key trait of the learning goals. When I taught writing, for example, my rubric included traits like focus, organization, and sentence structure.


Over the last few months I’ve worked with a lot of faculty on creating rubrics, and I’ve seen that moving from a task-specific to a general rubric can be remarkably difficult. One reason is that faculty want students to complete the assignment correctly: Did they provide three examples? Did they cite five sources? If this is important, I suggest making “Following directions” one of the learning outcomes of the assignment and including it as a trait assessed by the rubric. Then create a separate checklist of all the components of the assignment. Ask students to complete the checklist themselves before submitting the assignment. Also consider asking students to pair up and complete checklists for each other’s assignments.


To identify the other traits assessed by the rubric, ask yourself, “What does good writing/problem solving/critical thinking/presenting look like? Focus not on this assignment but on why you’re giving students the assignment. What you want them to learn from this assignment that they can use in subsequent courses or after they graduate?


Denise mentioned two other things about rubrics that I’d also like to address. She surveyed her students about their perceptions of rubrics, and one complaint was that faculty expectations vary from one professor to another. The problem here is lack of collaboration. Faculty teaching sections of the same course--or related courses--should collaborate on a common rubric that they all use to grade student work. This lets students work on the same important skill over and over again in varying course contexts and see connections in their learning. If one professor wants to emphasize something above and beyond the common rubric, fine. The common elements can be the top half of the rubric, and the professor-specific elements can be the bottom half.


Denise also mentioned that her rubric ran three pages, and she hated. I would too! Long rubrics focus on the trees rather than the forest of what we’re trying to help students learn. A shorter rubric (I recommend that rubrics fit on one page) focuses students on the most important things they’re supposed to be learning. If it frustrates you that your rubric doesn’t include everything you want to assess, keep in mind that no assessment can assess everything. Even a comprehensive final exam can’t ask every conceivable question. Just make sure that your rubric, like your exam, focuses on the most important things you want students to learn.


If you’re interested in a deeper dive into what I learned about rubrics, here are some of my past blog posts. My book chapter in the Handbook has the full citations of the authors I've mentioned here.

Is This a Rubric? 

Can Rubrics Impede Learning? 

Rubrics: Not Too Broad, Not Too Narrow 

What is a Good Rubric? 

What is a Rubric? 

Is assessment worth it?

Posted on August 14, 2018 at 8:50 AM Comments comments (1)

A while back, a faculty member teaching in a community college career program told me, “I don’t need to assess. I know what my students are having problems with—math.”


Well, maybe so, but I’ve found that my perceptions often don’t match reality, and systematic evidence gives me better insight. Let me give you a couple of examples.


Example #1: you may have noticed that my website blog page now has an index of sorts on the right side. I created it a few months ago, and what I found really surprised me. I aim for practical advice on the kinds of assessment issues that people commonly face. Beforehand I’d been feeling pretty good about the range and relevance of assessment topics that I’d covered. The index showed that, yes, I’d done lots of posts on how to assess and specifically on rubrics, a pet interest of mine. I was pleasantly surprised by the number of posts I’d done on sharing and using results.


But what shocked me was how little I’d written on assessment culture: only four posts in five years! Compare that with seventeen posts on curriculum design and teaching. Assessment culture is an enormous issue for assessment practitioners. Now knowing the short shrift I’d been giving it, I’ve written several more blog posts related to assessment culture, bring the total to ten (including this post).


(By the way, if there’s anything you’d like to see a blog post on, let me know!)


Example #2: Earlier this summer I noticed that some of the flowering plants in my backyard weren’t blooming much. I did a shade study: one sunny day when I was home all day, every hour I made notes on which plants were in sun and which were in shade. I’d done this about five years ago but, as with the blog index, the results shocked me; some trees and shrubs had grown a lot bigger in five years and consequently some spots in my yard were now almost entirely in shade. No wonder those flowers didn’t bloom! I’ll be moving around a lot of perennials this fall to get them into sunnier spots.


So, yes, I’m a big fan of using systematic evidence to inform decisions. I’ve seen too often that our perceptions may not match reality.


But let’s go back to that professor whose students were having problems with math and give him the benefit of the doubt—maybe he’s right. My question to him was, “What are you doing about it?” The response was a shoulder shrug. His was one of many institutions with an assessment office but no faculty teaching-learning center. In other words, they’re investing more in assessment than in teaching. He had nowhere to turn for help.


My point here is that assessment is worthwhile only if the results are used to make meaningful improvements to curricula and teaching methods. Furthermore, assessment work is worthwhile only if the impact is in proportion to the time and effort spent on the assessment. I recently worked with an institution that undertook an elaborate assessment of three general education learning outcomes, in which student artifacts were sampled from a variety of courses and scored by a committee of trained reviewers. The results were pretty dismal—on average only about two thirds of students were deemed “proficient” on the competencies’ traits. But the institutional community is apparently unwilling to engage with this evidence, so nothing will be done beyond repeating the assessment in a couple of years. Such an assessment is far from worthwhile; it’s a waste of everyone’s time.


This institution is hardly alone. When I was working on the new 3rd edition of my book Assessing Student Learning: A Common Sense Guide, I searched far and wide for examples of assessments whose results led to broad-based change and found only a handful. Overwhelmingly, the changes I see are what I call minor tweaks, such as rewriting an assignment or adding more homework. These changes can be good—collectively they can add up to a sizable impact. But the assessments leading to these kinds of changes are worthwhile only if they’re very simple, quick assessments in proportion to the minor tweaks they bring about.


So is assessment worth it? It’s a mixed bag. On one hand, the time and effort devoted to some assessments aren’t worth it—the findings don’t have much impact. On the other hand, however, I remain convinced of the value of using systematic evidence to inform decisions affecting student learning. Assessment has enormous potential to move us from providing a good education to providing a truly great education. The keys to achieving this are commitments to (1) making that good-to-great transformation, (2) using systematic evidence to inform decisions large and small, and (3) doing only assessments whose impact is likely to be in proportion to the time, effort, and resources spent on them.

Should assessments be conducted on a cycle?

Posted on July 30, 2018 at 8:20 AM Comments comments (2)

I often hear questions about how long an “assessment cycle” should be. Fair warning: I don’t think you’re going to like my answer.


The underlying premise of the concept of an assessment cycle is that assessment of key program, general education, or institutional learning goals is too burdensome to be completed in its entirety every year, so it’s okay for assessments to be staggered across two or more years. Let’s unpack that premise a bit.


First, know that if an accreditor finds an institution or program out of compliance with even one of its standards—including assessment—Federal regulations mandate that the accreditor can give the institution no more than two years to come into compliance. (Yes, the accreditor can extend those two years for “good cause,” but let’s not count on that.) So an institution that has done nothing with assessment has a maximum of two years to come into compliance, which often means not just planning assessments but conducting them, analyzing the results, and using the results to inform decisions. I’ve worked with institutions in this situation and, yes, it can be done. So an assessment cycle, if there is one, should generally run no longer than two years.


Now consider the possibility that you’ve assessed an important learning goal, and the results are terrible. Perhaps you learn that many students can’t write coherently, or they can’t analyze information or make a coherent argument. Do you really want to wait two, three, or five years to see if subsequent students are doing better? I’d hope not! I’d like to see learning goals with poor results put on red alert, with prompt actions so students quickly start doing better and prompt re-assessments to confirm that.


Now let’s consider the premise that assessments are too burdensome for them all to be conducted annually. If your learning goals are truly important, faculty should be teaching them in every course that addresses them. They should be giving students learning activities and assignments on those goals; they should be grading students on those goals; they should be reviewing the results of their tests and rubrics; and they should be using the results of their review to understand and improve student learning in their courses. So, once things are up and running, there really shouldn’t be much extra burden in assessing important learning goals. The burdens are cranking out those dreaded assessment reports and finding time to get together with colleagues to review and discuss the results collaboratively. Those burdens are best addressed by minimizing the work of preparing those reports and by helping faculty carve out time to talk.


Now let’s consider the idea that an assessment cycle should stagger the goals being assessed. That implies that every learning goal is discrete and that it needs its own, separate assessment. In reality, learning goals are interrelated; how can one learn to write without also learning to think critically? And we know that capstone assignments—in which students work on several learning goals at once—are not only great opportunities for students to integrate and synthesize their learning but also great assessment opportunities, because we can look at student achievement of several learning goals all at once.


Then there’s the message we send when we tell faculty they need to conduct a particular assessment only once every three, four, or five years: assessment is a burdensome add-on, not part of our normal everyday work. In reality, assessment is (or should be) part of the normal teaching-learning process.


And then there are the practicalities of conducting an assessment only once every few years. Chances are that the work done a few years ago will have vanished or at least collective memory will have evaporated (why on earth did we do that assessment?). Assessment wheels must be reinvented, which can be more work than tweaking last year’s process.


So should assessments be conducted on a fixed cycle? In my opinion, no. Instead:


  • Use capstone assignments to look at multiple goals simultaneously.
  • If you’re getting started with assessment, assess everything, now. You’ve been dragging your feet too long already, and you’re risking an accreditation action. Remember you must not only have results but be using them within two years.
  • If you’ve got disappointing results, move additional assessments of those learning goals to a front burner, assessing them frequently until you get results where you want them.
  • If you’ve got terrific results, consider moving assessments of those learning goals to a back burner, perhaps every two years or so, just to make sure results aren’t slipping. This frees up time to focus on the learning goals that need time and attention.
  • If assessment work is widely viewed as burdensome, it’s because its cost-benefit is out of whack. Perhaps assessment processes are too complicated, or people view the learning goals being assessed as relatively unimportant, or the results aren’t adding useful insight. Do all you can to simplify assessment work, especially reporting. If people don't find a particular assessment useful, stop doing it and do something else instead.
  • If assessment work must be staggered, stagger some of your indirect assessment tools, not the learning goals or major direct assessments. An alumni survey or student survey might be conducted every three years, for example.
  • For programs that “get” assessment and are conducting it routinely, ask for less frequent reports, perhaps every two or three years instead of annually. It’s a win-win reward: less work for them and less work for those charged with reviewing and offering feedback on assessment reports.

Getting started with meeting your professional development needs

Posted on June 24, 2018 at 4:30 PM Comments comments (1)

A recent paper co-sponsored by AALHE and Watermark identified some key professional development needs of assessment practitioners. 


While a book is no substitute for a rich, interactive professional development experience, some of the things that assessment practitioners want to learn about are discussed in my books Assessing Student Learning: A Common Sense Guide (new 3rd edition) and Five Dimensions of Quality: A Common Sense Guide to Accreditation and Accountability. Perhaps they’re a good place to kick off your professional development.


Analyzing and Interpreting Assessment Data


See Chapter 24 (Analyzing Evidence of Student Learning) of Assessing Student Learning (3rd ed.).


Analyzing and Interpreting Qualitative Results


See “Summarizing qualitative evidence” on pages 313-316 of Chapter 23 (Summarizing and Storing Evidence of Student Learning) of Assessing Student Learning (3rd ed.)


Reporting Assessment Results


See Chapter 25 (Sharing Evidence of Student Learning) of Assessing Student Learning (3rd ed.) and Chapter 16 (Transparency: Sharing Evidence Clearly and Readily) of Five Dimensions of Quality.


Assessment Culture


This is such a big issue that the 3rd edition of Assessing Student Learning devotes six chapters to it. See Part 3, which includes the following chapters:

Chapter 9 (Guiding and Coordinating Assessment Efforts)

Chapter 10 (Helping Everyone Learn What to Do)

Chapter 11 (Supporting Assessment Efforts)

Chapter 12 (Keeping Assessment Cost-Effective)

Chapter 13 (Collaborating on Assessment)

Chapter 14 (Valuing Assessment and the People Who Contribute)


A good place to start is Chapter 14, because it begins with a section titled, “Why is this so hard?” Even better, see the chapter that section summarizes: Chapter 4 (Why Is This So Hard?) of Five Dimensions of Quality.


Also see Chapter 17 (Using Evidence to Ensure and Advance Quality and Effectiveness) in Five Dimensions of Quality.


Culture of Change


See Chapter 18 (Sustaining a Culture of Betterment) of Five Dimensions of Quality along with the aforementioned Chapter 4 (Why Is This So Hard?) in the same book. For a briefer discussion, see “Value innovation, especially in improving teaching” on pages 180-181 of Chapter 14 (Valuing Assessment and the People Who Contribute) of Assessing Student Learning (3rd ed.).


Effective/Meaningful/Best Assessment Practices


See Chapter 3 (What Are Effective Assessment Practices?) of Assessing Student Learning (3rd ed.) and Chapter 14 (Good Evidence Is Useful) of Five Dimensions of Quality.


Co-Curricular Learning Outcomes and Assessment


Information on co-curricula is scattered throughout the new 3rd edition of Assessing Student Learning. See the following:

“Learning goals for co-curricular experiences” on pages 57-58 of Chapter 4 (Learning Goals: Articulating What You Most Want Students to Learn)

“Planning assessments in co-curricula” on pages 110-112 of Chapter 8 (Planning Assessments in Other Settings)

Chapter 20 (Other Assessment Tools) 

Chapter 21 (Assessing the Hard-to-Assess) 


Rubrics


See Chapter 15 (Designing Rubrics to Plan and Assess Assignments) of Assessing Student Learning (3rd ed.).


Establishing Standards


See Chapter 22 (Setting Meaningful Standards and Targets) of Assessing Student Learning (3rd ed.) and Chapter 15 (Setting and Justifying Targets for Success) of Five Dimensions of Quality.


Program Review


See Chapter 20 (Program Reviews: Drilling Down into Programs and Services) of Five Dimensions of Quality.

Some learning goals are promises we can't keep

Posted on May 2, 2018 at 6:55 AM Comments comments (0)

I look on learning goals as promises that we make to students, employers, and society: If a student passes a course or graduates, he or she WILL be able to do the things we promise in our learning goals.


But there are some things we hope to instill in students that we can’t guarantee. We can’t guarantee, for example, that every graduate will be a passionate lifelong learner, appreciate artistic expressions, or make ethical decisions. I think these kinds of statements are important aims that might be expressed in a statement of values, but they’re not really learning goals, because they’re something we hope for, not something we can promise. Because they’re not really learning goals, they’re very difficult if not impossible statements to assess meaningfully.


How can you tell if a learning goal is true learning goal—an assessable promise that we try to keep? Ask yourself the following questions.


Is the learning goal stated clearly, using observable action verbs? Appreciate diversity is a promise we may not be able to keep, but Communicate effectively with people from diverse backgrounds is an achievable, assessable learning goal.


How have others assessed this learning goal? If someone else has assessed it meaningfully and usefully, don’t waste time reinventing the wheel.


How would you recognize people who have achieved this learning goal? Imagine that you run into two alumni of your college. As you talk with them, it becomes clear that one appreciates artistic expressions and the other doesn’t. What might they say about their experiences and views that would lead you to that conclusion? This might give you ideas on ways to express the learning goal in more concrete, observable terms, which makes it easier to figure out how to assess it.


Is the learning goal teachable? Ask faculty who aim to instill this learning goal to share how they help students achieve it. If they can name specific learning activities, the goal is teachable—and assessable, because they can grade the completed learning activities. But if the best they can say is something like, “I try to model it” or “I think they pick it up by osmosis,” the goal may not be teachable—or assessable. Don’t try to assess what can’t be taught.


What knowledge and skills are part of this learning goal? We can’t guarantee, for example, that all graduates will make ethical decisions, but we can make sure that they recognize ethical and unethical decisions, and we can assess their ability to do so.


How important is this learning goal? Most faculty and colleges I work with have too many learning goals—too many to assess well and, more important, too many to help students achieve well in the time we have with them. Ask yourself, “Can our students lead happy and fulfilling lives if they graduate without having achieved this particular learning goal?”


But just because a learning goal is a promise we can’t keep doesn’t mean it isn’t important. A world in which people fail to appreciate artistic expressions or have compassion for others would be a dismal place. So continue to acknowledge and value hard-to-assess learning goals even if you’re not assessing them.


For more information on assessing the hard-to-assess, see Chapter 21 of the new 3rd edition of Assessing Student Learning: A Common Sense Guide.

Making assessment worthwhile

Posted on March 13, 2018 at 9:50 AM Comments comments (26)

In my February 28 blog post, I noted that many faculty have been expressing frustration that assessment is a waste of an enormous amount of time and resources that could be better spent on teaching. Here are some strategies to help make sure your assessment activities are meaningful and cost-effective, all drawn from the new third edition of Assessing Student Learning: A Common Sense Guide.


Don’t approach assessment as an accreditation requirement. Sure, you’re doing assessment because your accreditor requires it, but cranking out something only to keep an accreditor happy is sure to be viewed as a waste of time. Instead approach assessment as an opportunity to collect information on things you and your colleagues care about and that you want to make better decisions about. Then what you’re doing for the accreditor is summarizing and analyzing what you’ve been doing for yourselves. While a few accreditors have picky requirements that you must comply with whether you like them or not, most want you to use their standards as an opportunity to do something genuinely useful.


Keep it useful. If an assessment hasn’t yielded useful information, stop doing it and do something else. If no one’s interested in assessment results for a particular learning goal, you’ve got a clue that you’ve been assessing the wrong goal.


Make sure it’s used in helpful ways. Design processes to make sure that assessment results inform things like professional development programming, resource allocations for instructional equipment and technologies, and curriculum revisions. Make sure faculty are informed about how assessment results are used so they see its value.


Monitor your investment in assessment. Keep tabs on how much time and money each assessment is consuming…and whether what’s learned is useful enough to make that investment worthwhile. If it isn’t, change your assessment to something more cost-effective.


Be flexible. A mandate to use an assessment tool or strategy that’s inappropriate for a particular learning goal or discipline is sure to be viewed as a waste of everyone’s time. In assessment, one size definitely does not fit all.


Question anything that doesn’t make sense. If no one can give a good explanation for doing something that doesn’t make sense, stop doing it and do something more appropriate.


Start with what you have. Your college has plenty of direct and indirect evidence of student learning already on hand, from grading processes, surveys, and other sources. Squeeze information out of those sources before adding new assessments.


Think twice about blind-scoring and double-scoring student work. The costs in terms of both time and morale can be pretty steep (“I’m a professional! Why can’t they trust me to assess my own students’ work?” ). Start by asking faculty to submit their own rubric ratings of their own students’ work. Only move to blind- and double-scoring if you see a big problem in their scores of a major assessment.


Start at the end and work backwards. If your program has a capstone requirement, students should be demonstrating achievement in many key program learning goals in it. Start assessment there. If students show satisfactory achievement of the learning goals, you’re done! If you’re not satisfied with their achievement of a particular learning goal, you can drill down to other places in the curriculum that address that goal.


Help everyone learn what to do. Nothing galls me more than finding out what I did wasn’t what was wanted and has to be redone. While we all learn from experience and do things better the second time, help everyone learn what to do so, their first assessment is a useful one.


Minimize paperwork and bureaucratic layers. Faculty are already routinely assessing student learning through the grading process. What some resent is not the work of grading but the added workload of compiling, analyzing, and reporting assessment evidence from the grading process. Make this process as simple, intuitive, and useful as possible. Cull from your assessment report template anything that’s “nice to know” versus absolutely essential.


Make assessment technologies an optional tool, not a mandate. Only a tiny number of accreditors require using a particular assessment information management system. For everyone else, assessment information systems should be chosen and implemented to make everyone’s lives easier, not for the convenience of a few people like an assessment committee or a visiting accreditation team. If a system is hard to learn, creates more work, or is expensive, it will create resentment and make things worse rather than better. I recently encountered one system for which faculty had to tally and analyze their results, then enter the tallied results into the system. Um, shouldn’t an assessment system do the work of tallying and analysis for the faculty?


Be sensible about staggering assessments. If students are not achieving a key learning goal well, you’ll want to assess it frequently to see if they’re improving. But if students are achieving another learning goal really well, put it on a back burner, asking for assessment reports on it only every few years, to make sure things aren’t slipping.


Help everyone find time to talk. Lots of faculty have told me that they “get” assessment but simply can’t find time to discuss with their colleagues what and how to assess and how best to use the results. Help them carve out time on their calendars for these important conversations.


Link your assessment coordinator with your faculty teaching/learning center, not an accreditation or institutional effectiveness office. This makes clear that assessment is about understanding and improving student learning, not just a hoop to jump through to address some administrative or accreditation mandate.

Is this a rubric?

Posted on January 28, 2018 at 7:25 AM Comments comments (0)

A couple of years ago I did a literature review on rubrics and learned that there’s no consensus on what a rubric is. Some experts define rubrics very narrowly, as only analytic rubrics—the kind formatted as a grid, listing traits down the left side and performance levels across the top, with the boxes filled in. But others define rubrics more broadly, as written guides for evaluating student work that, at a minimum, lists the traits you’re looking for.


But what about something like the following, which I’ve seen on plenty of assignments?


70% Responds fully to the assignment (length of paper, double-spaced, typed, covers all appropriate developmental stages)

15% Grammar (including spelling, verb conjugation, structure, agreement, voice consistency, etc.)

15% Organization


Under the broad definition of a rubric, yes, this is a rubric. It is a written guide for evaluating student work, and it lists the three traits the faculty member is looking for.


The problem is that it isn’t a good rubric. Effective assessments including rubrics have the following traits:


Effective assessments yield information that is useful and used. Students who earn less than 70 points for responding to the assignment have no idea where they fell short. Those who earn less than 15 points on organization have no idea why. If the professor wants to help the next class do better on organization, there’s no insight here on where this class’s organization fell short and what most needs to be improved.


Effective assessments focus on important learning goals. You wouldn’t know it from the grading criteria, but this was supposed to be an assignment on critical thinking. Students focus their time and mental energies on what they’ll be graded on, so these students will focus on following directions for the assignment, not developing their critical thinking skills. Yes, following directions is an important skill, but critical thinking is even more important.


Effective assessments are clear. Students have no idea what this professor considers an excellently organized paper, what’s considered an adequately organized paper, and what’s considered a poorly organized paper.


Effective assessments are fair. Here, because there are only three broad, ill-defined traits, the faculty member can be (unintentionally) inconsistent in grading the papers. How many points are taken off for an otherwise fine paper that’s littered with typos? For one that isn’t double-spaced?


So the debate about an assessment should be not whether it is a rubric but rather how well it meets these four traits of effective assessment practices.


If you’d like to read more about rubrics and effective assessment practices, the third edition of my book Assessing Student Learning: A Common Sense Guide will be released on February 13 and can be pre-ordered now. The Kindle version is already available through Amazon.

Assessing the right things, not the easy things

Posted on October 7, 2017 at 8:20 AM Comments comments (5)

One of the many things I’ve learned by watching Ken Burns’ series on Vietnam is that Defense Secretary Robert MacNamara was a data geek. A former Ford Motor Company executive, he routinely asked for all kinds of data. Sounds great, but there were two (literally) fatal flaws with his approach to assessment.


First, MacNamara asked for data on virtually anything measurable, compelling staff to spend countless hours filling binders with all kinds of metrics—too much data for anyone to absorb. And I wonder what his staff could have accomplished had they not been forced to spend so much time on data collection.


And MacNamara asked for the wrong data. He wanted to track progress in winning the war, but he focused on the wrong measures: body counts, weapons captured. He apparently didn’t have a clear sense of exactly what it would mean to win this war and measure progress toward that end. I’m not a military scientist, but I’d bet that more important measures would have included the attitudes of Vietnam’s citizens and the capacity of the South Vietnamese government to deal with insurgents on its own.


There are three important lessons here for us. First, worthwhile assessment requires a clear goal. I often compare teaching to taking our students on a journey. Our learning goal is where we want them to be at the end of the learning experience (be it a course, program, degree, or co-curricular experience).


Second, worthwhile assessment measures track progress toward that destination. Are our students making adequate progress along their journey? Are they reaching the destination on time?


Third, assessment should be limited—just enough information to help us decide if students are reaching the destination on time and, if not, what we might to do help them on their journey. Assessment should never take so much time that it detracts from the far more important work of helping students learn.

What's a good schedule for assessing program learning outcomes?

Posted on August 26, 2017 at 8:20 AM Comments comments (11)

Chris Coleman recently asked the Accreditation in Southern Higher Education listserv ([email protected]) about schedules for assessing program learning outcomes. Should programs assess one or two learning outcomes each year, for example? Or should they assess everything once every three or four years? Here are my thoughts from my forthcoming third edition of Assessing Student Learning: A Common Sense Guide.


If a program isn’t already assessing its key program learning outcomes, it needs to assess them all, right away, in this academic year. All the regional accreditors have been expecting assessment for close to 20 years. By now they expect implemented processes with results, and with those results discussed and used. A schedule to start collecting data over the next few years—in essence, a plan to come into compliance—doesn’t demonstrate compliance.


Use assessments that yield information on several program learning outcomes. Capstone requirements (senior papers or projects, internships, etc.) are not only a great place to collect evidence of learning, but they’re also great learning experiences, letting students integrate and synthesize their learning.


Do some assessment every year. Assessment is part of the teaching-learning process, not an add-on chore to be done once every few years. Use course-embedded assessments rather than special add-on assessments; this way, faculty are already collecting assessment evidence every time the course is taught.


Keep in mind that the burden of assessment is not assessment per se but aggregating, analyzing, and reporting it. Again, if faculty are using course-embedded assessments, they’re already collecting evidence. Be sensitive to the extra work of aggregating, analyzing, and reporting. Do all you can to keep the burden of this extra work to a bare-bones minimum and make everyone’s jobs as easy possible.


Plan to assess all key learning outcomes within two years—three at most. You wouldn’t use a bank statement from four years ago to decide if you have enough money to buy a car today! Faculty similarly shouldn’t be using evidence of student learning from four years ago to decide if student learning today is adequate. Assessments conducted just once every several years also take more time in the long run, as chances are good that faculty won’t find or remember what they did several years earlier, and they’ll need to start from scratch. This means far more time is spent planning and designing a new assessment—in essence, reinventing the wheel. Imagine trying to balance your checking account once a year rather than every month—or your students cramming for a final rather than studying over an entire term—and you can see how difficult and frustrating infrequent assessments can be, compared to those conducted routinely.


Keep timelines and schedules flexible rather than rigid, adapted to meet evolving needs. Suppose you assess students’ writing skills and they are poor. Do you really want to wait two or three years to assess them again? Disappointing outcomes call for frequent reassessment to see if planned changes are having their desired effects. Assessments that have yielded satisfactory evidence of student learning are fine to move to a back-burner, however. Put those reassessments on a staggered schedule, conducting them only once every two or three years just to make sure student learning isn’t slipping. This frees up time to focus on more pressing matters.

Assessing learning in co-curricular experiences

Posted on August 8, 2017 at 10:35 AM Comments comments (2)

Assessing student learning in co-curricular experiences can be challenging! Here are some suggestions from the (drum roll, please!) forthcoming third edition of my book Assessing Student Learning: A Common Sense Guide, to be published by Jossey-Bass on February 4, 2018. (Pre-order your copy at www.wiley.com/WileyCDA/WileyTitle/productCd-1119426936.html)


Recognize that some programs under a student affairs, student development, or student services umbrella are not co-curricular learning experiences. Giving commuting students information on available college services, for example, is not really providing a learning experience. Neither are student intervention programs that contact students at risk for poor academic performance to connect them with available services.


Focus assessment efforts on those co-curricular experiences where significant, meaningful learning is expected. Student learning may be a very minor part of what some student affairs, student development, and student services units seek to accomplish. The registrar’s office, for example, may answer students’ questions about registration but not really offer a significant program to educate students on registration procedures. And while some college security operations view educational programs on campus safety as a major component of their mission, others do not. Focus assessment time and energy on those co-curricular experiences that are large or significant enough to make a real impact on student learning.


Make sure every co-curricular experience has a clear purpose and clear goals. An excellent co-curricular experience is designed just like any other learning experience: it has a clear purpose, with one or more clear learning goals; it is designed to help students achieve those goals; and it assesses how well students have achieved those goals.


Recognize that many co-curricular experiences focus on student success as well as student learning—and assess both. Many co-curricular experiences, including orientation programs and first-year experiences, are explicitly intended to help students succeed in college: to earn passing grades, to progress on schedule, and to graduate. So it’s important to assess both student learning and student success in order to show that the value of these programs is worth the college’s investment in them.


Recognize that it’s often hard to determine definitively the impact of one co-curricular experience on student success because there may be other mitigating factors. Students may successfully complete a first-year experience designed to prepare them to persist, for example, then leave because they’ve decided to pursue a career that doesn’t require a college degree.


Focus a co-curricular experience on an institutional learning goal such as interpersonal skills, analysis, professionalism, or problem solving.


Limit the number of learning goals of a co-curricular experience to perhaps just one or two.


State learning goals so they describe what students will be able to do after and as a result of the experience, not what they’ll do during the experience.


For voluntary co-curricular experiences, start but don’t end by tracking participation. Obviously if few students participate, impact is minimal no matter how much student learning takes place. So participation is an important measure. Set a rigorous but realistic target for participation, count the number of students who participate, and compare your count against your target.


Consider assessing student satisfaction, especially for voluntary experiences. Student dissatisfaction is an obvious sign that there’s a problem! But student satisfaction levels alone are insufficient assessments because they don’t tell us how well students have learned what we value.


Voluntary co-curricular experiences call for fun, engaging assessments. No one wants to take a test or write a paper to assess how well they’ve achieved a co-curricular experience’s learning goals. Group projects and presentations, role plays, team competitions, and Learning Assessment Techniques (Barkley & Major, 2016) can be more fun and engaging.


Assessments in co-curricular experiences need students to give them reasonably serious thought and effort. This can be a challenge when there's no grade to provide an incentive. Explain how the assessment will impact something students will find interesting and important.


Short co-curricular experiences call for short assessments. Brief, simple assessments such as minute papers, rating scales, and Learning Assessment Techniques can all yield a great deal of insight.


Attitudes and values can often only be assessed with indirect evidence such as rating scales, surveys, interviews, and focus groups. Reflective writing may be a useful, direct assessment strategy for some attitudes and values.


Co-curricular experiences often have learning goals such as teamwork that are assessed through processes rather than products. And processes are harder to assess than products. Direct observation (of a group discussion, for example), student self-reflection, peer assessments, and short quizzes are possible assessment strategies.

Should you collect more assessment data before using it?

Posted on June 19, 2017 at 9:30 AM Comments comments (1)

Someone on the ASSESS listserv recently asked how to advise a faculty member who wanted to collect more assessment evidence before using it to try to make improvements in what he was doing in his classes. Here's my response, based on what I learned in a book I discussed in my last blog post called How to Measure Anything.


First, we think of doing assessment to help us make decisions (generally about improving teaching and learning). But think instead of doing assessment to help us make better decisions than we would make without them. Yes, faculty are always making informal decisions about changes to their teaching. Assessment should simply help them make somewhat better informed decisions.


Second, think about the risks of making the wrong decision. I'm going to assume, rightly or wrongly, that the professor is assessing student achievement of quantitative skills in a gen ed statistics course, and the results aren't great. There are five possible decision outcomes:

1. He decides to do nothing, and students in subsequent courses do just fine without any changes. (He was right; this was an off sample.)

2. He decides to do nothing, and students in subsequent courses continue to have, um, disappointing outcomes.

3. He changes things, and subsequent students do better because of his changes.

4. He changes things, but the changes don't help; despite his best effort, changes in his teaching didn't help improve the disappointing outcomes.

5. He changes things, and subsequent students do better, but not because of his changes--they're simply better prepared than this year's students.


So the risk of doing nothing is getting Outcome 2 instead of Outcome 1: Yet another class of students doesn't learn what they need to learn. The consequence is that even more students consequently run into trouble in later classes, on the job, wherever, until the eventual decision is made to make some changes.


The risk of changing things, meanwhile, is getting Outcome 4 or 5 instead of Outcome 3: He makes changes but they don't help. The consequence here is his wasted time and, possibly, wasted money, if his college invested in something like an online statistics tutoring module or gave him some released time to work on this.


The question then becomes, "Which is the worst consequence?" Normally I'd say the first consequence is the worst: continuing to pass or graduate students with inadequate learning. If so, it makes sense to go ahead with changes even without a lot of evidence. But if the second consequence involves a major investment of sizable time or resources, then it may make sense to wait for more corroborating evidence before making that major investment.


One final thought: Charles Blaich and Kathleen Wise wrote a paper for NILOA a few years ago on their research, in which they noted that our tradition of scholarly research does not include a culture of using research. Think of the research papers you've read--they generally conclude either by suggesting how some other people might use the research and/or by suggesting areas for further research. So sometimes the argument to wait and collect more data is simply a stalling tactic by people who don't want to change.

How to assess anything without killing yourself...really!

Posted on May 30, 2017 at 12:10 AM Comments comments (41)

I stumbled across a book by Douglas Hubbard titled How to Measure Anything: Finding the Value of “Intangibles in Business.” Yes, I was intrigued, so I splurged on it and devoured it.


The book should really be titled How to Measure Anything Without Killing Yourself because it focuses as much on limiting assessment as measuring it. Here are some of the great ideas I came away with:

1. We are (or should be) assessing because we want to make better decisions than what we would make without assessment results. If assessment results don’t help us make better decisions, they’re a waste of time and money.

2. Decisions are made with some level of uncertainty. Assessment results should reduce uncertainty but won’t eliminate it.

3. One way to judge the quality of assessment results is to think about how confident you are in them by pretending to make a money bet. Are you confident enough in the decision you’re making, based on assessment results, that you’d be willing to make a money bet that the decision is the right one? How much money would you be willing to bet?

4. Don’t try to assess everything. Focus on goals that you really need to assess and on assessments that may lead you to change what you’re doing. In other words, assessments that only confirm the status quo should go on a back burner. (I suggest assessing them every three years or so, just to make sure results aren’t slipping.)

5. Before starting a new assessment, ask how much you already know, how confident you are in what you know, and why you’re confident or not confident. Information you already have on hand, however imperfect, may be good enough. How much do you really need this new assessment?

6. Don’t reinvent the wheel. Almost anything you want to assess has already been assessed by others. Learn from them.

7. You have access to more assessment information than you might think. For fuzzy goals like attitudes and values, ask how you observe the presence or absence of the attitude or value in students and whether it leaves a trail of any kind.

8. If you know almost nothing, almost anything will tell you something. Don’t let anxiety about what could go wrong with assessment keep you from just starting to do some organized assessment.

9. Assessment results have both cost (in time as well as dollars) and value. Compare the two and make sure they’re in appropriate balance.

10. Aim for just enough results. You probably need less data than you think, and an adequate amount of new data is probably more accessible than you first thought. Compare the expected value of perfect assessment results (which are unattainable anyway), imperfect assessment results, and sample assessment results. Is the value of sample results good enough to give you confidence in making decisions?

11. Intangible does not mean immeasurable.

12. Attitudes and values are about human preferences and human choices. Preferences revealed through behaviors are more illuminating than preferences stated through rating scales, interviews, and the like.

13. Dashboards should be at-a-glance summaries. Just like your car’s dashboard, they should be mostly visual indicators such as graphs, not big tables that require study. Every item on the dashboard should be there with specific decisions in mind.

14. Assessment value is perishable. How quickly it perishes depends on how quickly our students, our curricula, and the needs of our students, employers, and region are changing.

15. Something we don’t ask often enough is whether a learning experience was worth the time students, faculty, and staff invested in it. Do students learn enough from a particular assignment or co-curricular experience to make it worth the time they spent on it? Do students learn enough from writing papers that take us 20 hours to grade to make our grading time worthwhile?

How hard should a multiple choice test be?

Posted on March 18, 2017 at 8:25 AM Comments comments (78)

My last blog post on analyzing multiple choice test results generated a good bit of feedback, mostly on the ASSESS listserv. Joan Hawthorne and a couple of other colleagues thoughtfully challenged my “50% rule”—that any questions that more than 50% of your students get wrong may suggest something wrong and should be reviewed carefully.

 

Joan pointed out that my 50% rule shouldn’t be used with tests that are so important that students should earn close to 100%. She’s absolutely right. Some things we teach—healthcare, safety—are so important that if students don’t learn them well, people could die. If you’re teaching and assessing must-know skills and concepts, you might want to look twice at any test items that more than 10% or 15% of students got wrong.

 

With other tests, how hard the test should be depends on its purpose. I was taught in grad school that the purpose of some tests is to separate the top students from the bottom—distinguish which students should earn an A, B, C, D, or F. If you want to maximize the spread of test scores, an average item difficulty of 50% is your best bet—in theory, you should get test scores ranging all the way from 0 to 100%. If you want each test item to do the best possible job discriminating between top and bottom students, again you’d want to aim for a 50% difficulty.

 

But in the real world I’ve never seen a good test with an overall 50% difficulty for several good reasons.

 

1. Difficult test questions are incredibly hard to write. Most college students want to get a good grade and will at least try to study for your test. It’s very hard to come up with a test question that assesses an important objective but that half of them will get wrong. Most difficult items I’ve seen are either on minutiae, “trick” questions on some nuanced point, or questions that are more tests of logical reasoning skill than course learning objectives. In my whole life I’ve written maybe two or three difficult multiple choice questions that I’ve been proud of: that truly focused on important learning outcomes and didn’t require a careful nuanced reading or logical reasoning skills. In my consulting work, I’ve seen no more than half a dozen difficult but effective items written by others. This experience has led me to suggest that “50% rule.”

 

2. Difficult tests are demoralizing to students, even if you “curve” the scores and even if they know in advance that the test will be difficult.

 

3. Difficult tests are rarely appropriate, because it’s rare for the sole or major purpose of a test to be to maximize the spread of scores. Many tests have dual purposes. There are certain fundamental learning objectives we want to make sure (almost) every student has learned, or they’re going to run into problems later on. Then there are some learning objectives that are more challenging—that only the A or maybe B students will achieve—and those test items will separate the A from B students and so on.

 

So, while I have great respect for those who disagree with me, I stand by my suggestion in my last blog post. Compare each item’s actual difficulty (the percent of students who answered incorrectly) against how difficult you wanted that item to be, and carefully evaluate any items that more than 50% of your students got wrong.