What Do We Know: Assessment of Teaching and Learning

Loren Jones; Shannon Kane; Sarah Morris; Margaret Peterson

5 What Do We Know: Assessment of Teaching and Learning

Loren Jones; Shannon Kane; Sarah Morris; and Margaret Peterson

Before We Read

Before we read, reflect on how assessments have impacted your learning. What aspects of assessments did you find helpful? Challenging? What qualities do you think make an assessment effective? Take a few minutes to skim the chapter headings and subheadings and consider what you already know about assessment. What are the different types and purposes of assessment? How can assessments be used to inform and improve teaching and learning? What are some ethical considerations when designing and implementing assessments? Finally, consider the different stakeholders involved in assessment (teachers, students, parents, administrators, district leaders). What are their perspectives on assessment? By engaging with these questions before diving into the chapter, you will activate your prior knowledge and be better prepared to understand the complex concepts related to assessment.

Critical Questions For Consideration

As you read, consider these essential questions: What are some ethical considerations when designing and implementing assessments? How can educators work to limit assessment bias? And consider the different stakeholders involved in assessment (teachers, students, parents, administrators). What are their perspectives on assessment?

Defining Assessment and Its Challenges

Assessments in public schools have long been a topic of discussion and debate, with stakeholders ranging from policymakers to educators, parents, and students expressing various opinions and concerns. Recently, the media has played a significant role in shaping public discourse around assessments and their impact on the education system. For example, consider the Edweek headline, “Two Decades of Progress, Nearly Gone: National Math, Reading Scores Hit Historic Lows” (Edweek, October 24, 2022), or NPR’s headline U.S. reading and math scores drop to lowest level in decades. These headlines, though, portray a distorted public perception of teaching and learning in today’s classrooms. One might read this headline and believe our school systems, administrators, and teachers are “not good enough,” but the truth is much more complicated. Rather than capturing the day-to-day functioning of the classroom, high-stakes assessments are generally imposed upon teachers rather than created by them. As such, the assessments and headlines don’t accurately capture the progress and performance of our students. When reading headlines related to schools and assessments, it is essential to be critical consumers of media messages.

Consider that the headline examples above, and many others, label schools as “failing” based on a single metric–a standardized test score. However, the focus is just on specific numbers and not a nuanced understanding of factors that influence these numbers, such as socioeconomic background, limited English proficiency, or special needs students. These complexities are rarely addressed and considered. Also, remember that standardized tests are snapshots in time, not measures of long-term progress. A school might slightly dip in scores one year but steadily improve overall. The media rarely considers these trends or highlights schools that are making significant gains year after year. Additionally, changes in test scores can have multiple explanations. Media portrayals often paint a simplistic picture of progress or decline. A score dip could be due to a temporary disruption or a shift in curriculum focus, not necessarily a failing school system.

Media outlets might also select data points that fit a pre-existing narrative. For instance, they might focus on a single grade level’s decline or highlight achievement gaps without acknowledging progress in other areas. An unfair and inaccurate picture of performance is given by cherry-picking data and excluding information that discredits the narrative being told. Finally, as also discussed in chapter one, media portrayals often pit schools against each other or blame teachers. These actions can create a hostile environment and distract from the real work of improving education. Focusing on collaboration and solutions, rather than negativity and finger-pointing, would be far more productive.

Assessment data should be an important tool for educators, but it can and often is misinterpreted by the media. The inaccurate or misinterpreted use of data by the media has historically led to a distorted public perception of the teaching and learning going on in classrooms. Typically, what’s highlighted in the news is external to the day-to-day functioning of a school or classroom. It is generally imposed upon teachers rather than created by them. The assessments discussed by mainstream media are not used to inform/shape classroom instruction. Instead, the type of data focused on by the media and splashed across headlines is often tied to the academic status of schools and districts, regardless of accuracy. We’ll discuss some kinds of assessments and their uses in the following sections.

Norm-referenced versus Criterion-referenced Assessments

Teachers and educational researchers use various assessment tools to gauge student learning and inform instructional decisions. However, these tools differ in their fundamental purpose and how they interpret performance. Two of the main approaches are norm-referenced assessments and criterion-referenced assessments.

Norm-Referenced Assessments:

Focus: Compare a student’s score to the performance of a specific group (norm group).
Interpretation: Scores indicate percentile ranks or standardized scores (e.g., z-scores), revealing how students stand relative to their peers.
Examples: DIEBLS, ITBS SAT, ACT, State Achievement Tests
Strengths: Useful for ranking students, identifying gifted students, and making placement decisions.
Weaknesses: Do not directly measure mastery of specific learning objectives, are susceptible to test bias and anxiety, and have limited information about individual strengths and weaknesses. For some norm-referenced assessments, teachers do not receive scores until the summer or early fall the following year which means the data cannot be used to inform instruction. Additionally, the gap between the assessment and receiving scores fails to account for student progression or regression that may have occurred.

Criterion-Referenced Assessments:

Focus: Compare a student’s score to a predetermined standard or criterion (learning objective).
Interpretation: Scores indicate mastery or non-mastery of specific skills or knowledge domains.
Examples: Rubrics, performance assessments, portfolios, exit tickets, and quizzes/tests aligned to learning objectives.
Strengths: Provide direct information about individual learning progress, guide instructional planning, inform targeted interventions, and promote mastery learning.
Weaknesses: Less useful for comparing students to peers, requires careful development of clear criteria, and may be subjective.

Choosing the Right Tool

Selecting the ‘right’ approach depends on the assessment’s specific goals and intended purpose(s). Norm-referenced assessments offer valuable insights for large-scale comparisons and rankings, while criterion-referenced assessments provide targeted feedback for personalized learning. Issues typically arise not from the assessment type but from the misuse or misrepresentation of the information gathered. When used correctly, both types of assessments can provide different pictures of student performance that support instructional planning.

Formative and Summative Assessments

Working to create learning environments that foster growth and understanding is a primary goal of all educators. Assessments can provide valuable insights into student learning and support creating a robust learning environment. However, successfully navigating the world of assessments means understanding key concepts, such as formative and summative, and how each plays a distinct role in effective teaching.

Formative assessments are informal, ongoing checks for understanding. The main goal of formative assessments is to identify areas of strength and weakness in students’ grasp of the material. This allows you to adapt your teaching strategies quickly, providing targeted support and differentiated instruction to address individual needs. Examples include exit tickets, quick quizzes, observations, and peer feedback. The key is providing timely and specific feedback that helps students progress. By providing constructive and timely feedback, we can adjust our teaching strategies to address individual needs and ensure everyone stays on track. Summative assessments measure student learning at the culmination of a unit, grade/course. They aim to measure student achievement against predetermined learning objectives and standards. Examples include exams, projects, presentations, or standardized tests. While summative assessments can provide valuable data on overall learning outcomes, they often occur after the learning has already happened, limiting the opportunity to influence the learning process directly.

The key to success lies in integrating both forms of assessment. Formative assessments inform your teaching, allowing you to individualize instruction, differentiate learning activities, and provide targeted support. This targeted support ultimately leads to improved performance on summative assessments, providing a comprehensive picture of student learning at the end of the learning cycle. Here are some practical takeaways related to formative and summative assessment:

Embed formative assessments regularly: Use various strategies to gather ongoing information about student learning.
Focus on feedback: Provide clear, actionable feedback to help students better understand both their strengths and areas for improvement.
Use formative data to inform instruction: Adapt your teaching based on student needs identified through formative assessments.
Balance formative and summative assessments: Utilize both to create a holistic picture of student learning and growth.

Assessments should not simply be about collecting data or assigning grades; they are about using data to inform and improve the learning process for every student. By strategically employing formative and summative assessments, you can create a dynamic learning environment where feedback, support, and growth go hand in hand.

Key Differences: Formative vs Summative Assessments
Formative Assessment	Feature	Summative Assessment
Identify areas of understanding	Purpose	Measure overall achievement
Throughout instruction	Timing	End of unit/course/program
Informal (e.g., discussions, exit tickets)	Formality	Formal (e.g., exams, projects)
Low or no stakes	Grading	High stakes (often contribute to grades)
Immediate and targeted	Feedback	May be delayed or general

Connecting Assessment and Instruction

Ideally, assessment and instruction should not be considered separately but instead viewed as two interconnected concepts that support student learning. Though assessment typically conjures images of tests and grades, it should instead be thought of as the ability to inform and guide instruction that supports student learning and mastery. Unlike summative assessments that gauge final achievement, formative assessments provide continuous feedback. Through observations, discussions, quizzes, and self-reflections, formative assessments provide teachers with insights into students’ strengths, weaknesses, and misconceptions. Ideally, this data informs instructional decisions, allowing teachers to:

Differentiate instruction by providing targeted support for struggling students and challenging activities for advanced learners.
Modify lesson plans to incorporate different teaching strategies, catering to diverse learning styles and preferences.
Identify and clarify misunderstandings before they solidify, preventing knowledge gaps from widening.
Encourage students to reflect on their learning.

Effective assessment goes beyond simply collecting data; it’s about providing meaningful feedback. Meaningful feedback needs to be timely, specific, and actionable. Feedback can:

Focus on particular strengths, weaknesses, and areas for improvement rather than simply assigning grades.
Provide concrete suggestions for students to apply in their learning.
Be specific to individual student needs.

When students are actively involved in the assessment process, they can take greater ownership in their learning. This can be achieved through:

Empowering students to reflect on their learning progress and set personal goals.
Creating space for students to give and receive peer feedback fosters critical thinking and communication skills.
They are utilizing collaborative assessments where students work together.

The connection between assessment and instruction should not be a one-time event but a continuous cycle. Teachers use assessment data to inform instruction and refine their assessment practices. This iterative process ensures that assessments remain aligned with learning objectives and provide meaningful feedback for both teachers and students. Assessment and instruction should not be considered independent entities but two sides of the same coin. Teachers create an effective learning environment that informs instruction and refines assessment by leveraging formative assessment, providing effective feedback, and fostering student involvement.

Assessment Driven Instruction

In order for assessments to act as a guide for planning and teaching, teachers must first be clear about, and then plan for, what students will actually learn in a lesson or unit. Then, teachers must be clear about what assessment of that learning actually measures. Assessments begin with planning for teaching and learning, meaning that during planning, a clear “criterion for success” needs to be specifically named. Knowing the criterion for success means that a teacher can envision what mastery of a particular skill looks like and ways mastery can be illustrated by a learner. Too often planning falls short of naming specifically enough how a learner can illustrate mastery of a concept, and sometimes that leads to assessments that do not do a good job of measuring the actual learning sought. For example, a lesson objective states that at the end of the lesson, a student should be able to skip count by 3’s to 99. Then within the lesson students practice skip counting orally from 3-99. The assessment for that lesson should match the learning sought, so if students are asked to fill in the multiples of 3 on a chart, this isn’t an exact match for the learning in the lesson. A written assessment of the skill of skip counting adds in assessment of number formation, or the ability to place the numbers in a chart. This keeps this assessment from providing a good snapshot of student learning, because of the mismatch between what and how students learned something and how they are asked to show they have learned it.

Backwards Design
A way to design units and lessons that begins at the desired results based on learning goals or content, or literacy, standards and then gathers evidence of learning based on performance or project-oriented assessments.

In the late 1990s, Researchers Grant Wiggins and Jay McTighe called the idea of planning for teaching by first thinking about learning, Backwards Design. They described this kind of planning as a way to foreground what students would know or be able to do at the end of a learning segment. They argued that the “output” of learning and teachers’ assessments of it, were far more important than planning that focused on what the teacher would do in a lesson. This may not sound like a groundbreaking perspective, but it is an important perspective shift to be able to foreground learning (the output) rather than teaching (the input). For teachers, this means that before planning begins, they must “think a great deal, first, about the specific learnings sought, and the evidence of such learnings” (Bowen, 2017). The evidence of learning is the basis for thinking about how a teacher can assess student learning to provide meaningful information. This shift to foregrounding learning also means that we plan for learning “before thinking about what we, as the teacher, will do or provide in teaching and learning activities” (Bowen, 2017). So, for teachers in the field, this means there is a need to become more specific about what real evidence of learning looks like; what is actually being assessed, in order to create classrooms that are assessment driven and where student learning can be seen.

Overview of Assessment – Common Assessments in Classrooms

Considering the importance of assessment and the many ways it is used to both guide and gauge progress in teaching and learning, we’ll turn now to common classroom assessments and to the types of learning they are designed to assess. We’ve divided these assessments into two categories here. The first category encompasses assessment of knowledge and skills related to content; the second category encompasses assessments of learner attitudes, beliefs and self awareness of how they relate to a content or topic.

Overview of Common Classroom Assessments
Assessing Content Related Knowledge and Skills	Purposes	Common Classroom Examples
Assessing Prior Knowledge, Recall, and Understanding What do students already know or believe about this topic?	Guiding of planning, teaching and pacing, grouping of students.	KWL, entrance ticket, mind map around the focus topic, soliciting oral responses to questions
Assessing Skill in Analysis and Critical Thinking Do students understand the related parts, concepts, issues of the content they are learning?	Guiding of teaching, formative assessment of content, formative assessment of depth or complexity of understanding.	Schematic drawings, process maps, webs and extended mind maps, short writing about the topic, outlines
Assessing Skill in Synthesis and Creative Thinking Do students understand, and can they apply knowledge of the content in their own ways?	Guiding of teaching, formative assessment of content, formative assessment of depth or complexity of understanding, formative assessment of mastery.	Reenactments, synthesis and summary writing, essay, creating a play illustrating the content, creating illustrations of the content, extending a story, comparing two ideas or situations.
Assessing Skill in Problem Solving Do students have skills to identify types of problems they are solving? Do students have multiple algorithms, or strategies for solving the problem? Can students solve the problems by applying solutions in novel ways?	Guiding of teaching, formative assessment of skill development, assessment of content, formative assessment of depth and complexity of understanding, practice of application of strategies.	Student Think Aloud, solving problems and showing your work, collaborative work to solve problems and explain thinking behind problem solving.
Assessing Skill in Application and Performance Can students apply skills in new settings, can they use a number of skills together to accomplish a task, even when skills are learned in a new setting?	Guiding of long-term planning, Summative assessment of content, summative assessment of application of new knowledge and skills.	Performance based tasks including writing, demonstrating, performing, reenactment, explanations, posters, reports, presentations
Assessing Learner Attitudes, Beliefs, Values and Self Awareness	Purposes	Common Classroom Examples
Assessing Students’ Awareness of Their Attitudes and Values	Aid student to see what they must “unlearn” to begin to develop new knowledge and skills. Consider beliefs, biases and attitudes that will make it more difficult for students to learn new skills and develop new knowledge.	Interest surveys and inventories, KWL charts, reflective writing, class discussion, conferences, ratings
Assessing Students’ Self-Awareness as Learners	Helps students to know about themselves as learners. Builds student skills in gauging their successes as learners.	Student surveys, reflections on learning,learning target discussions, evaluative discussion of models, discussing and co-creating criteria
Assessing Course-Related Learning and Study Skills, Strategies, and Behaviors	Help students to develop strategies to strengthen learning of specific skills and application of skills and knowledge	Student surveys and inventories, reflective writing, discussion and explication of learning processes

Validity, Reliability of Assessment

Validity and reliability are two key concepts related to assessment. These terms indicate the quality and accuracy of measurement tools, ensuring that the data collected is meaningful, trustworthy, and helpful in making informed decisions. Understanding these concepts is fundamental for creating assessments that accurately measure their intended purpose.

Criterion-Related Validity
This is the extent to which an assessment is related to a purported outcome. For example, SAT and ACT exams claim validity because the scores correlate, or predict, college GPA.

Construct Validity
How accurately a test measures a concept that the test designed to measure. Capacity measures like the Cattell Culture Fair Intelligence Test claim to have a construct validity in that the test creators claim the test measures cognitive abilities free from co-variants like sociocultural or environmental factors.

Validity refers to how accurately a conclusion or measurement reflects what is being assessed. In other words, does the assessment assess the construct or concept it claims to assess? For example, if a literacy test claims to measure students’ comprehension skills, its validity would be questioned if it primarily tests phonics. Validity can be assessed through various methods, such as criterion-related and construct validity. Criterion-related validity examines the relationship between the assessment scores and some external criterion, such as performance in real-world situations. Construct validity evaluates whether the assessment accurately measures the underlying theoretical construct it is designed to measure.

Reliability is the extent to which a set of results or interpretations can be generalized over time, across tasks, and among interpreters of assessment information. A reliable assessment tool produces consistent results when repeatedly administered under the same conditions. It should yield similar scores for individuals with the same trait or ability level. Reliability can be determined through different methods such as test-retest reliability and parallel forms reliability. Test-retest reliability involves administering the same assessment to a group on different occasions and looking for consistency of scores across administrations. Parallel forms reliability involves administering two equivalent forms of the same assessment to the same group of individuals and examining the consistency of scores between the two forms.

Valid and reliable assessments are essential for accurately evaluating students’ knowledge, skills, and abilities in education. They inform teachers about students’ strengths and weaknesses, guide instructional planning, and facilitate evidence-based decision-making. Valid and reliable assessments also play a crucial role in research, ensuring that the data collected is credible and can be used to draw meaningful conclusions and advance knowledge in various fields. By upholding high standards of validity and reliability, practitioners can enhance the credibility, utility, and effectiveness of assessment tools in their classrooms.

Some Common Assessments

In this section, you will learn about some assessment tools that are commonly used in K-12 classrooms. Some of these are assessments you may even have completed yourself as a student. Some are used nationally across many K-12 districts, such as DIBELS and WIDA Access testing, and some are specific to states, and are used, as required by federal laws, as large scale assessments to try to measure and track educational achievement in specific subject areas, such as reading or math.

DIBELS

First developed by researchers at the University of Oregon in the 1970’s, DIBELS assessments were initially designed to help show beginning phonemic awareness in kindergarten through 3rd grade. Additional subtests developed over decades now focus on assessing many foundational skills for reading in kindergarten through 8th grade students. In one test students are asked to identify the sounds heard at the beginning of common words; to sound out and decode nonsense words, but all DIBELS assessments are designed to “detect risk,” meaning to monitor the ongoing progress of readers and identify gaps in foundational literacy skills. DIBELS have been redesigned to assess all of the foundational skills needed for reading development, identified by the National Reading Panel in 2000 (Learning Point, 2004). DIBELS seeks to assess students’ ongoing proficiency with the foundational skills of phonemic awareness, phonics. fluency, vocabulary development and comprehension.

Though this assessment is still used widely, it’s important to note that there have been many critiques of DIBELs over the years including the fact that the tests are designed to assess extremely specific skills readers use, but that exist within complicated processes for learning to read. For example one of DIBELS assessments asks students to decode made up syllables, which critics say may remove contexts of language, and motivation for a reader, adding to miscues and over identification of reading delays or need for remediation.

WIDA Access Testing

WIDA stands for World-class Instructional Design and Assessment, and comes from researchers at the University of Wisconsin, Madison and their ACCESS test is widely used in the US to assess English language proficiency levels for students identified as English Language Learners (ELL). The assessment is used to help to discern several key pieces of language learning, including listening, speaking, reading and writing. Scores are used to help identify the kinds of instruction that an ELL will need to increase their English proficiency across all of the language domains.

ACCESS testing is designed to show the progress a student has made and is given at the beginning of a school year and near the end of a school year to help provide important information about how, and in what language domain, a student is progressing well, or not.

ACCESS scores, which range from 1 or “entering” and 6 or “reaching,” are used to determine when and if a student can “exit” services mandated as English for Speakers of Other Languages (ESOL). Though several states use a lower score to allow students to stop receiving ESOL services. The designation as an English Language Learner, most often decided by scores on a WIDA Access test, means that a student is legally entitled to receive ESOL services and can not be denied those services until they are able to show they have gained the necessary skills for proficiency in English. Though, it should be noted that several states use a lower score to allow students to stop receiving ESOL services.

Advanced Placement Testing

Developed at the end of World War II, the goal of both the AP test and the introduction of college level coursework to high school classrooms was part of an effort to better and more quickly prepare students for college and career. In 1955 the College Board took over the administration of AP courses, and the AP tests themselves. From the original 11 topics; Mathematics, Physics, Chemistry, Biology, English Composition, Literature, French, German, Spanish, and Latin, the College Board now offers AP tests in 34 subject areas.

Many colleges and universities recognize and will award college credit for scores above a 3 or 4 on AP tests for commensurate coursework. These credits may not count for degree completion in some cases and at several elite colleges, scores don’t count at all. For example, Brown University, Dartmouth, Williams, and Cal Tech do not award credit for any AP test. As a result some high schools have begun to phase out, or have completely abandoned AP coursework. For example, in 2018, eight prestigious high schools including Georgetown Day, Holton-Arms, Landon, Maret, National Cathedral, Potomac, St. Albans and Sidwell Friends, announced they would completely phase out AP courses by 2022. Criticism of the Advanced Placement program has long centered around controversy that the tests are racially biased and that the coursework does not allow students and teachers to go in-depth into topics in ways a college course might.

Standardized Tests – A Lay of the Land – Looking Forward

All states in the United States that receive federal funds for education, which includes all states as well as the DODEA and Bureau for Indian Education and the District of Columbia, are required to gauge student achievement through the use of standardized tests of student learning. These tests are standardized in that they use a similar set of questions, and tests agreed upon skills. These tests are, to varying degrees, aligned with state curricula, to assess students at several points during elementary school and in multiple subjects during secondary public education.

There have been many efforts over the years to work to make education more a part of the federal standardized system and laws such as the ESEA, or Elementary and Secondary Education Act, have provided systems of rewards, and or penalties for states to support federal requirements. For a period of time beginning in 2010, the United States Department of Education worked to create a national set of standards called College and Career Readiness Standards, that also required a standardized test called by its acronym PARCC, or the Partnership for Assessment of Readiness for College and Careers. States could opt into the consortium and at its most popular, 22 states had signed on to use this common assessment and its constituent curriculum.

PARCC is now in use in only the District of Columbia, The Bureau of Indian Education and in DODEA, and there does not seem to be political will to build more standardization across states for assessments. A quick look at what standardized tests each state is using to gauge student achievement shows that many states have created their own state made and state administered tests including the Colorado Measures of Academic Success (CMACS) and the Florida Assessment for Student Thinking (FAST). But many states, including California, Connecticut and Nevada have again signed on to a consortium of states using the Smarter Balanced Test which aligns with the older College and Career Readiness Standards and was developed with input from both teachers and higher education on test questions and content.

Standardized tests, while meant to show student learning, have been used in many states as a way of trying to assess teaching, with some states connecting teacher evaluations and pay to student outcomes on standardized tests. These models are flawed in multiple ways, as standardized test scores are positively correlated to family levels of education and socioeconomic status. Again assessments and tests, even large scale standardized tests are, at best, a snapshot of student learning and can tell us very little about instruction, or future achievement of students.

Conclusion

As we have discussed in earlier sections, tests and test scores often hold outsized influence on the lives of humans inside of classrooms. Teachers may be paid or promoted based on students’ standardized test scores. Students may be retained at a specific grade level, or not gain admission to a college based on a test score. And though we know that tests used to gauge student learning often don’t provide definitive information on student knowledge or learning, and that scores are often skewed based on race, socioeconomic status and on family educational attainment, teachers must become critical consumers of tests and assessments. Meaning that it is important to question test questions, to research who is profiting from tests and testing materials, and to be skeptical of results that disadvantage students. Still assessing is one of the most important skills any teacher can develop. Guided by questioning, as ongoing assessments of learning, quizzes, observations, and meaningful and qualitative evaluation of student work are all very important data needed to guide both teaching and learning, to identify risks to student learning and to help to remediate and enrich student learning.

Meet A Scholar

Sonia Nieto

Dr. Sonia Nieto (1943- ) a leading scholar in multicultural education and advocate for educational equity, has played a crucial role in shaping critical conversations around educational assessments. Her work has helped draw attention to and highlight standardized tests’ limitations and potential biases related to students from diverse backgrounds. While not directly involved in developing assessments, her work has profoundly influenced how educators approach and interpret these tools, emphasizing the importance of equity and fairness in assessments and advocating for approaches that accurately reflect individual strengths and needs.

The ‘one-size-fits-all’ approach to traditional assessments creates the potential to perpetuate educational inequities by disadvantaging students from marginalized backgrounds due to inherent biases. These biases can stem from cultural assumptions embedded in the language, content, and scoring criteria, leading to inaccurate representations of students’ authentic abilities. Similar to instructional strategies, assessment methods should consider students’ diverse learning styles and cultural backgrounds and allow students to demonstrate their understanding in meaningful ways. This overall approach to teaching and learning emphasizes the importance of:

Utilizing authentic materials: Assessments should reflect students’ cultural backgrounds and lived experiences, making them more relatable and engaging.
Incorporating multiple assessment methods: Employing varying assessment tools, including performance-based tasks, portfolios, and self-reflection, to provide a well-rounded perspective beyond test scores of student learning.
Considering cultural context: Recognizing the influence of cultural background on learning styles and communication preferences is crucial for interpreting assessment results accurately and avoiding misinterpretations.

Sonia Nieto has played a significant role in promoting inclusive and equitable education, including assessments. Her work continues to inspire educators and inform policymakers to create instructional systems that accurately reflect the diverse potential of all students. To learn more about Dr. Nieto’s work on linguistic diversity, visit Chapter 5. Critical Discussion Questions

What are the unintended consequences of current assessment practices, and how can we mitigate them?
How can we effectively assess the complex skills and dispositions valued in 21st-century learning?
How can we create assessment systems that are culturally responsive and address the diverse needs of all learners?
How can we involve students in the assessment process to promote self-reflection, ownership of learning, and a growth mindset?

Reflection, Metacognition, and Alternative Assessments

One concern with assessment in many school forms is authenticity. As we have discussed in this chapter, assessment can serve many purposes: it can be used to classify and divide students into groups; it can be used to drive instruction; it can be used to shape curriculum; it can be used to generate revenue for textbook and educational resource companies. Assessment can be standardized and aligned with state or national curriculum.

But our students aren’t all the same. They learn differently. What are we doing to assess those differences? And what are multiple ways students can SHOW their learning? In what ways can we use assessment to provide learners with more information about how they learn, so they can make choices and have agency in their learning?

Some scholars and teachers focus on metacognition and self-assessment as ways for students to be in charge of their own learning. Metacognition is, on a basic level, “thinking about thinking,” or learning to understand one’s own thought processes (Flavell, 1979). More specifically, metacognition requires learners to reflect and self-regulate in order to understand what they have learned and how they have learned it (Darling-Hammond, Austin, Cheung, and Martin, 2003). Reflection, as John Dewey claimed in 1933, is what generates learning, more so than experience alone. Metacognition and reflection can bring awareness and intentionality to learning and assessment of learning. As Taczak and Robertson (2017) argue, “when cognition and metacognition are accessed together through reflection, students are able to assess themselves,” and this assists transfer of skills and knowledge to other settings and fields (p. 212).

Metacognitive assessments, then, allow learners to review their work, reflect on their progress toward goals, and predict their learning outcomes based on their performance and understanding. They also allow learners to self-regulate by setting goals and plans for future learning. Portfolios are one example of metacognitive assessment, one that is holistic, student-centered, and developed over time. In this section, we will discuss a few types of alternative assessments, ones that include students’ own thinking, self-assessment, and reflection.

Alternative Assessments

Portfolios. Often used in writing classrooms, portfolios are purposeful collections of work that students curate along with reflection and analysis of their progress. Often portfolios are designed in collaboration between and among students and teachers, can be individualized for each learner, and show student growth and development toward learning goals over time. In the writing classroom, portfolios emphasize revision and the writing process over the final product. Students must review, categorize, analyze, organize, and plan how to show a reader not only what they learned, but how they learned it (Reynolds and Davis, 2014). In some situations, students are asked to document revisions, and most writing portfolios incorporate some element of reflective writing in order to describe, narrate,and explain the texts within the portfolio, considering a student’s work as evidence of their growth (Yancey, 1998).

Accompanied by other means of assessment, such as those recommended by the National Council of Teachers of English (1993), including “narrative evaluations, written comments, dialogue journals, and conferences” portfolios can be individualized to each learner and involve the student as a participant in their own learning and assessment, thereby assisting in developing agency. Alongside portfolios as assessment, some teachers work with their students to design assignments and develop guidelines for assessment to counter typical classroom power structures (Reif, 1992). Doing so challenges the notion that teachers control the criteria for defining “good” work.

Student Designed Rubrics. We have discussed rubrics (and critiques of them) earlier in this chapter. Now, let’s consider rubrics in which students have a say in design, evaluative criteria, and values. Grounded in constructivist principles, the act of designing a rubric as a class dwells in process space (rather than product focus) and relies on students effectively analyzing conventions of the genres they read and are expected to write, and then translating into their own words those conventions, aiding in transfer to their own writing. In assisting in identifying and defining assessment criteria, students are given agency and participate in co-constructed understandings that are made more powerful within the discourse community.

When combined with other strategies for writing, reflection, and revision within a discourse community, and used formatively over time, not only do student designed rubrics assist in reflection, self-assessment, and deep reading, they promote transfer. In the foundational text “Writing as a Mode of Learning,” Janet Emig (1977) suggests that writing seeks “self-feedback,” in that the process of writing, when students learn to write in a “familiar and available medium,“ they are better able to give themselves feedback (p. 125). The process of designing a rubric together can make visible the kind of “self-feedback” unique to writing and bring it into a collective sense of response, creating a shared understanding of process, genre convention, and mode. In addition, the rubric becomes its own genre, that students can more easily access as readers in other contexts.

Put together, different forms of alternative assessments resist what Peter Elbow (1993) calls “forms of judgment” in classrooms, or practices that seek to rank, rather than truly evaluate and value the work that students do.

Anti-racist Assessment Practices

In recent years, and in the wake of highly visible and racially motivated violence against people of color, school systems and higher education institutions have begun engaging in open discussion about how pedagogical practices feed systemic racism. While this is a developing conversation, there are some currently agreed-upon principles and practices recommended to help teachers ground assessment in anti-racist practice. In this section, we will introduce a few of those practices, with the expectation that readers will use these ideas as a springboard for further investigation.

First, and perhaps most importantly, researchers suggest that teachers begin to implement anti-racist pedagogy by becoming critically reflective themselves. This means interrogating one’s own biases, developing an awareness of students’ individualities and backgrounds, and adjusting curriculum to convey diverse perspectives and meet learning needs. Assessments are one aspect of the curriculum that will need to be adjusted.

As we have discussed, often, grades measure behaviors and compliance instead of actual learning. In redesigning assessments to foster anti-racist principles, we can make a few adjustments to instruction that make a difference in transparency, learning, and student agency.

Teach what you’re trying to measure. First, we can align our assessments with course goals and objectives, so that we teach what we are trying to test. For example, if we’re asking students to write to show understanding of content in a social studies course, but we grade grammar and mechanics heavily without teaching grammar and mechanics, a student may have a clear understanding of the content but be penalized for errors in the technical aspects of writing.

Give students practice and choice. Second, when we have taught something, we can facilitate transfer and ensure student learning by giving them direct, relevant practice and feedback on that practice. Allowing choice about how to show mastery is critical here, too, as it increases student investment in the learning. For example, if we want students to be able to write thesis statements, we must follow direct instruction in identifying and writing thesis statements with independent practice in the student’s own draft, on a topic of their choice. To further show mastery, students might select their own “best” thesis to share from several papers at the end of the assessment period.

Allow for revision and reflection. Third, when assessments have a reflective component, students can identify and discuss how well they have mastered learning goals, with their own work as a kind of evidence. Given the opportunity to show not just what they know, but how they know they know it, students can better identify, define, and make an argument for their own learning. Likewise, the opportunity to revise assignments allows students to strategically apply new ideas and learning over time, which emphasizes learning as a process rather than assessment as a product.

Two specific approaches to antiracist assessment are ungrading and contract grading.

Ungrading is a feedback centered approach that counters the traditional grading system, which tends to rank, sort, and categorize students and their work. While there are many different approaches to ungrading, most include self-assessment, collaboration, reflection, and revision and are used across the curriculum in a variety of ways to foster justice and equity in an unjust system (Blum, 2020). Contract grading requires students to deeply self-assess using agreed-upon measures for mastery and allows students to choose projects and assignments that best meet their own learning needs and goals within the context of a course. Some educators, like Asao Inoue (2015, 2019), argue that labor based grading contracts are a tangible way to resist institutional racism by upending embedded power structures.

APPLICATION

Case Study:

Multilingual learners across the nation (42 states are part of the WIDA consortium) complete the WIDA ACCESS assessment annually to monitor their growth in English across the four domains – speaking, writing, reading, listening. The images below capture two ACCESS Online Sample Items from Grades 4-5 in the domain of Reading for a selection titled “Let’s Go Shopping.”

Review the two assessment prompts – the questions, answer selections, and corresponding images.
Based on the information presented in this chapter, would you identify this as an effective assessment?
1. What is being assessed here? Does that align with the purpose of the assessment?
What can a teacher learn from the results of this assessment?
What might be some consequences of this assessment, both short term and long term?

Post-Reading Activities on Assessment:

Design Your Ideal Assessment: Reflect briefly on the chapter’s key points on assessment methods and their strengths and weaknesses. Think of a learning scenario where you need to assess someone’s understanding of a topic and create your own assessment method for this scenario. Consider sharing your designed assessment to a partner. Be sure to consider factors like:

What learning objectives are you assessing?

What type of knowledge or skills do you want to measure?

What format would be engaging and effective (e.g., project, presentation, game)?

How would you ensure the assessment is fair and unbiased?
Learning Reflection: Think back to a recent learning experience (e.g., a class, workshop, online course) and reflect on the assessment methods used in that experience. Did they effectively measure your learning? What were the strengths and weaknesses of the assessments? How could they have been improved? Consider how the different assessment methods discussed in the chapter could be applied to your chosen learning experience. Which methods would be most appropriate and beneficial and why?
Assessment in the Real World: Think about a field or activity you are interested in, such as sports, music, or business. Identify and consider the different types of assessments used in your chosen context. For example, in sports, there might be performance evaluations, skill tests, and game statistics. Write a short reflection on the following questions: How do these assessments contribute to the overall goals of the activity or field? What are the potential benefits and drawbacks of these assessments? Are there alternative assessment methods that could be considered?

Glossary

Backwards Design: A way to design units and lessons that begins at the desired results based on learning goals or content, or literacy, standards and then gathers evidence of learning based on performance or project-oriented assessments.

Criterion-Related Validity: This is the extent to which an assessment is related to a purported outcome. For example, SAT and ACT exams claim validity because the scores correlate, or predict, college GPA.

Construct Validity: How accurately a test measures a concept that the test designed to measure. Capacity measures like the Cattell Culture Fair Intelligence Test claim to have a construct validity in that the test creators claim the test measures cognitive abilities free from co-variants like sociocultural or environmental factors.

Figures

Sonia Nieto by Natalie Frank is shared with a Creative Commons Attribution 4.0 International License

References

Blum, S. (2020). Ungrading : Why Rating Students Undermines Learning (and What to Do Instead). West Virginia University Press.

Bowen, R. S. (2017). Understanding by Design. Vanderbilt University Center for Teaching. Retrieved [May 29, 2024] from https://cft.vanderbilt.edu/understanding-by-design/.

Darling-Hammond, L., Austin, K., Cheung, M., & Martin, D. (2003). Thinking about thinking: Metacognition. The learning classroom: Theory into practice. Stanford University School of Education.

Dewey, J. (1933). How we think: A restatement of the relation of reflective thinking to the educative process (2nd ed.). Heath.

Elbow, P. (1993). Ranking, Evaluating, and Liking: Sorting out Three Forms of Judgment. College English, 55(2), 187–206. https://doi.org/10.2307/378503.

Emig, J. (1977) Writing as a Mode of Learning. College Composition and Communication, 28; 2, 122-128.

Flavell, J. H. (1979) Metacognition and cognitive monitoring: A new area of cognitive-development inquiry. American Psychologist, 34, 906-911.

Inoue, A. (2015). Antiracist Writing Assessment Ecologies: Teaching and Assessing Writing for a Socially Just Future. The WAC Clearinghouse; Parlor Press. https://doi.org/10.37514/PER-B.2015.0698

Inoue, A. (2019). Labor-based grading contracts: Building equity and inclusion in the compassionate writing classroom. Fort Collins, CO: WAC Clearinghouse.

Learning Point Associates. (2004). A Closer Look at the Five Components of Effective Reading Instruction: A Review of Scientifically Based Reading Research for Teachers. ERIC. Retrieved June 14, 2024 from https://files.eric.ed.gov/fulltext/ED512569.pdf.

NCTE Executive Committee. (1993). Resolution on Grading Student Writing. National Council of Teachers of English. https://ncte.org/statement/gradingstudentwrit/.

Reynolds, N., & Rice, R. (2014). Portfolio Teaching A Guide for Instructors (3rd ed.). Bedford/St. Martin’s.

Rief, Linda. (1992). Seeking Diversity: Language Arts with Adolescents. Portsmouth, NH: Heinemann.

Taczak, K., & Robertson, L. (2017). Metacognition and the reflective writing practitioner: An integrated knowledge approach. In P. Portanova, J. M. Rifenburg, & D. Roen (Eds.), Contemporary perspectives on cognition and writing (pp. 211–229). The WAC Clearinghouse, University Press of Colorado.

University of Oregon. (2024). DIBELS: Dynamic Indicators of Basic Early Literacy Skills. https://dibels.uoregon.edu/about-dibels.

Yancey, K.B. (1998). Reflection in the writing classroom. Utah State University Press.

License

Icon for the Creative Commons Attribution 4.0 International License