Modern Grading Systems Need an Upgrade

Scantron test form being filled out by number 2 pencil. Scantron sheets are often used to complete testing in schools. Illustration by Ada Hallstrom.

What is proficiency? It is a high amount of competence in a skill. Proficiency-based grading, on the other hand, refers to a grading system in which a person is assessed on their understanding of a subject. This may be used instead of a traditional letter grade scale. Proficiency-based grading is often confused with what I will be mainly referencing in this article, which is testing for proficiency. Testing for proficiency, in a broad sense, gives students a test to figure out the extent of what they know, then assigns a grade based on their results. 

According to Mark Durm, author of “A is not an A is not an A: A History of Grading,” the first experimentation with the letter grading scale happened at Harvard University in 1883, when the first reference to a student receiving a letter grade was recorded. Mount Holyoke, a liberal arts college in South Hadley, Massachusetts, adopted letter grading in 1897 and assigned percentages out of a 100 point scale to the five levels (It should be noted that at this point the grading scale still included the letter E). In 1897 they changed the scale to include the letter F to represent ‘failed’ and this system became the foundation for college, and eventually high school as well as middle school grading. 

One major place where a lot of people start to find a problem with this grading scale is in the mathematics classroom. Gauging an understanding in math is collected with the usage of tests and not much else, and there is a reason for this. “I think we have a pretty narrow view of what it means to do mathematics, and the kinds of tests we give reflect that,” says Dr. Stephanie Salomone, a professor and current Chair of Mathematics at the University of Portland. “Most people don’t see math as particularly creative, and so it’s not surprising that the ways that we evaluate math learning aren’t creative either.” 

Salomone thinks that there are positives to any kind of grading, particularly that they let educators know if a student has the skills and knowledge they will need for success after they move on to another form of education. Timed tests and standardized tests aren’t great measures of what students know because they don’t favor all students’ needs. “Timed tests are great for students who are good test-takers, especially if the problems on an exam look a lot like the problems that students have already done. But they are not good for determining if a student can apply what they know to a new situation,” says Salomone. Traditional testing is quick and all teachers are trained to know how to give a test, which is a positive, explains Salomone. However she raises these questions, “What about a kid who cannot solve a problem on test day, but gets it a week later? Should their grade be lower just because they were one week behind?

Some subjects use the proficiency grading scale more than others. Yoshio Drescher, English teacher at Franklin High School, says this is because subjects such as science, math, and world languages have very specific standards that build upon each other in a sequence and are easier to gauge understanding with a test, while subjects like English are a bit more circular. Dr. Sahnzi Moyers, science teacher at Franklin, says that in a science classroom where a lot of the subject material has to do with practical application, using a four point proficiency grading scale makes more sense then traditional points grading which can be more easily applied to the memorization of facts which, Moyers points out, can just be looked up on a phone. 

Not only is using tests to gauge proficiency not an equitable way to reflect the understanding of a subject, it also doesn’t take into account different types of learning. “The most equitable way to show proficiency is to have a variety of assessments,” says Drescher. “Relying on any one assessment format means that there are some students who are going to have trouble accessing that format, for a variety of different reasons.” Learning styles, which include visual, auditory, writing, and kinesthetic, all require different environments to accurately reflect their knowledge, and the mental and emotional health of an individual can all impact the way they learn. If a student has anxiety around test taking, they require different accommodations than someone who does their best work under pressure. The only way to measure proficiency in an entirely equitable way is to offer a variety of assessment formats.

Not only is the way that student knowledge is collected not equitable and accessible, the environment we place the students in isn’t either. The Preliminary Scholastic Aptitude Tests or the PSAT’s for the sophomores and juniors at Franklin High School this year were set up on campus, with the majority of juniors sitting in the gym building at desks in rows facing forward. There were strict guidelines put in place around how the test had to be filled out as well as how much time was allowed for each section. This test taking environment is a perfect example of one assessment format that is not beneficial to the majority of test takers due to its rigid environment. 

Salomone agrees with Drescher’s statement, especially when it comes to mathematics as a subject. “Having students sitting in rows, taking timed tests alone doesn’t set them up for success and it just isn’t how math is done. Math is collaborative and it is creative, and we’re not letting students experience that.” She says students should be able to choose how they best show off what they’ve learned and then demonstrate their knowledge in the method that best suits them. Salomone continues by saying, “They should be showing off what they know because they have confidence in their ability because we have shown them how to have confidence, and a timed test doesn’t do that, except for that small set of students who came with the privilege of confidence to begin with.”

In addition to this system of testing not being equitable for all students, it also doesn’t prioritize learning the material of the class they’re taking. More often than not, students spend their energy and time prioritizing the information that they know their teacher is going to test them on rather than being interested and engaged with the material. A paper published in 2020 in the journal, Translational Issues in Psychological Science, titled, “Four Empirically Based Reasons Not to Administer Time-Limited Tests” outlines the reasons to administer untimed tests instead of time-limited tests in educational settings. One reason for this is that timed tests are less reliable. In other words, the actual information gathered from a test can be skewed because the things students study in preparation isn’t necessarily the information they are able to apply to a larger context. Moyers concurs with this point: “A lot of the time [when grading is set up this way] it feels like I’m measuring my students’ ability to take a test, and that’s the biggest thing I like to steer away from.” Salomone agrees, “That learning [learning to prioritize achievement in the classroom] is about getting points, and it so is not. Learning is about curiosity and commitment to improvement, and if we’re looking only for points, we don’t get there at all.” She continues by saying, “We get students who have the wrong idea of what we want, [which is] lots of lots of points on their paper instead of growing as a person, learning something new, and applying what they know [to their real world experiences].” When a subject does capture the attention of the student taking it, not just the need to get a good grade, the results are also often skewed by other internal factors such as the student’s ability to perform well in a high pressure, high stakes, timed environment. 

While it is true that some test takers give their best results when they work silently and individually, many people operate in a completely different manner. The paper, “Four Empirically Based Reasons Not to Administer Time-Limited Tests” explains that one major reason for administering tests without a time limit is that timed testing is less inclusive. Timed testing limits students with documented disabilities and can impede students who are learning English, or who encounter barriers in obtaining accommodations for both of these things. To add on to all of that, Moyers points out that the stress of timed tests don’t help students perform better. From an evolutionary standpoint, stress was for survival, helping to keep people alert and awake, but she says that being hyper alert while taking a test won’t make anyone perform at their best. Students who struggle with test driven anxiety or have different learning styles can struggle with rigid testing environments. Salomone perfectly adds to this point when saying, “If a student is an anxious test-taker, they aren’t necessarily less anxious because they get extra time. It might help, but it doesn’t mitigate the problem completely. The anxiety sticks around.” She continues by stating that the mathematics community has historically not been welcoming to people of color, disabled people, or people with intersectional identities, and says that for those students, navigating stereotype threats can have a profound impact on their performance in the classroom. “There are studies about how if you tell students you believe they can succeed, they do better on average than if you tell them what they are doing is difficult. So with students who are already struggling, by not giving them other methods to show what they know, we’re confirming what the community has already told them,” concludes Salomone. 

Given this information, what solutions can we give to replace testing as a major form of grading? All the educators I talked to agree that the only way to fairly assess understanding in a given subject is to expand the options with which you are assessing students. “The most equitable way to show proficiency is to have a variety of formats, because relying on any one assessment format means that there are some students who are gonna have trouble accessing that format,” says Drescher. “When we don’t have a variety of assessment formats, then you’re always leaving somebody out.”

Salomone explains that one method that she likes is a choice board. A choice board gives students the opportunity to pick what format they think will best show what they know. Students who work best with a timed test can choose that option while another student could pick a portfolio or a more artistic option. Group projects in which students collaborate to solve a problem, self assessments, as well as a piece of writing or presentation are all choices that allow students to show the differences in their learning while still showing their teacher that they understood the material. “It takes planning and creativity on the part of the instructor,” says Salomone, “but it also gives students autonomy and they deserve that. It allows them to play to their strengths.”

I know how I feel about high stakes timed testing as a way to give out grades and evaluate knowledge on a topic. By the end of this article I figure that you know that opinion too, so I’m going to ask some questions. Shouldn’t it be considered outdated to still be using a system of grading developed in 1877?  To use a system that not only doesn’t prioritize the learning of a subject over getting an A but also doesn’t support and represent the knowledge of all students? Shouldn’t all of that be a cause of some reevaluation? The answer to all of these questions should be yes. 

Leave a Reply