What Would A “Good” Test Look Like?

Share Button

The decision by a group of Seattle teachers to boycott a standardized test this winter could spill out to other cities as a decade of frustration over testing simmers, writes Greg Toppo in USA Today:

“Teachers at Garfield High School, Seattle’s largest, last December said they’d pass on giving the latest Measures of Academic Progress, or MAP test, a diagnostic tool that also screens students for remedial or gifted classes. Given several times a year, it’s also used indirectly to rate teachers, but Garfield teachers say it’s not aligned to the state curriculum and produces ‘meaningless’ results. They have until Feb. 22 to administer the test or face unpaid suspension.

Since then, teachers at two more Seattle schools have said they’ll sit out the test, with the approval of leading academics and both major U.S. teachers unions.

Elsewhere, the Chicago Teachers Union this week launched a campaign ‘in support of local and nationwide efforts to eliminate standardized non-state mandated tests’ from public schools.

In Providence, R.I., a group of high school students on Wednesday led a protest urging lawmakers to get rid of the high-stakes New England Common Assessment Program (NECAP) as part of new graduation requirements. And in Portland, Ore., a student group is encouraging classmates to opt out of the standardized Oregon Assessment of Knowledge and Skills (OAKS). The group hopes to persuade at least 5% of students to stay home, triggering an automatic ‘In Need of Improvement’ designation at each school.” (Read more here.)

My view on this: Tests are necessary. We need to have an objective way to measure students’ progress. But when the tests are perceived by teachers and students as “meaningless,” as unconnected to what they’re teaching and learning, we have a problem.

How can we design “good” tests, tests that feel fair and relevant? Any ideas?

Share Button

31 Responses to “What Would A “Good” Test Look Like?”

  1. Tests are not, in fact necessary. The only reason to have an “objective” way to measure students’ progress is because you distrust your child’s teacher. It’s like saying that you distrust your local mechanic, so every year, you’ll require everyone in the state who owns a car to go a nationally “certified” mechanic from another state to inspect your car. And as for objective, I think this woman says it better than I ever could:
    “deliberate social and political choices regarding public education have created whatever racial and socioeconomic differences in achievement on standardized tests exists—commonly referred to as the “achievement gap.” The so-called achievement gap and its deficit-oriented stance suggests that because black students—and Latino, Native American, and poor students—do less well on standardized tests than do their wealthy, white counterparts, something is wrong with them.

    But remember, Michelle Rhee said, “Our kids are not broken.” And I agree with her. So, if our kids are not broken, why is our focus on closing the so-called “achievement gap” between students’ test scores? If our kids are not broken, why is education reform work fueled by the idea that we should raise the test scores of black, Latino, and Native American children to the test score levels of white children?

    I’m not advocating or supporting low test scores; our analysis of the problem and the solution is wrong. Our education system has created our condition. Our system has determined that this gap would exist, and we have imagined that our students are the problem. Educators now imagine our students and the communities we serve to be deficient, as problems to be solved, as errors to be corrected.

    Test score mania has turned our schools into test prep factories where the study of languages and music and art—those elements that humanize people—those things are sacrificed, and we pressure students to catch up so that the gap is closed. Gloria Ladson-Billings recently said, “Catching up is made nearly impossible by our structural inequalities.””


    See also:

  2. Larry Spring says:

    The problem with the current wave of testing is that a need for reliable measures (needed for accountablity) has outweighed the need for valid measures. A test is only valid if the test-taker is trying to the best of their ability. Many current measures risk measuring something very different than what they intend… a combined notion of a student’s knowledge mixed with their willingness to put forth effort on the test. I wonder if anyone is thinking about the effort that students are (or are not) putting into this endeavor and what that means for teachers, schools, and accountability systems.

  3. Don Davis says:

    Courtney, superb response! The testing industry is motivated by profit and accepted by misconceptions. Many of those who chant we should catch up to Finland, fail to note that Finland does not in fact test children to death.

    The worst bit of all this is that those students who such testing should supposedly whom are harmed most. It’s low-SES schools that are under the most pressure to teach to the test. Higher level thinking skills and creative projects are put on the back burner in a push to check off boxes on a scope and sequence chart. Teachers are limited in their ability to adapt to individual learners’ needs and instead forced to implement some out of the box learning system with ‘fidelity’.

    But the focus on testing and placing blame on teachers does accomplish something. It gives everyone a convenient scapegoat that mega-testing corporations can exploit for financial profit. Films such as “Waiting for Superman” are blatant, misleading, malinformed propaganda which fuels the public’s distrust of teachers — all to the financial benefit of misguided (if not blatantly unethical) testing practices and a flight to charter schools.

    I would disagree with one small point – race is certainly a factor. But conflating race and SES obfuscates the real problem which almost no one wants to admit or tackle.As Krashen points out – we don’t have an education problem, we have a poverty problem.

  4. I don’t claim to have written that. It was written by Dr. Camika Royal, who I think is someone to watch on this topic. If you read the rest of her essay, I think it’s very moving, very powerful, and very true.

  5. Matt says:

    I am teacher and I support testing. Much of the judgements of learning for students in schools are entirely subjective. Everyone involved in making the judgment about learning success wants to see success! The student, parent, teacher and administrator all want to feel like everything is going great and it is entirely normal and ok for that to be the case.

    Put another way: If I (as a teacher) get to pick the content, design the instruction, give the instruction, create the assessment and grade the assessment then I can pretty much make any student look like they are or are not learning. I love looking at results of externally created, scored and administered tests because they provide a control/check against my own bias and limitations.

    If all students ace my class, but can’t pass a state level exam then I need to critically look at my own practice and behavior and I am ok with that, in fact it is one of the major ways I get better!

    • Don Davis says:

      The problem is that state mandated tests aren’t divinely inspired creations nor do they always measure what they purport to measure. How many math tests are really tests of mathematical reasoning but rather verbal reasoning? Moreover, how can a test created by native speakers (ignorant of second language acquisition) and intended for native speakers meaningfully capture what an English learner in their second year in the US has learned? Also, how many of these tests are influenced more by external factors (e.g. the students’ home life and parental vocabulary) than by classroom instruction?
      Given that affluent students begin to differentiate themselves earliest from less affluent peers in their reading fluency (rather than math) – and that this results from access to well stocked libraries and summer enrichment – and that this determines the trajectory of ‘learning’ that is measured later – we are very far from an objective assessment of students learning.
      Moreover, an important question is not whether children should be tested – but also how often and how many. Not every child needs to be tested to acquire a statistically significant sample. Further, the question is whether tests should be the end all and be all of education.

  6. Bob says:

    We can design tests that predict job performance, grades and health even when controlling for environment, are highly reliable, and are unbiased against poor or minority students.

    Oh, wait a minute. We already have those tests. They’re called “IQ tests.”

    • Don Davis says:

      Um… whoa. You are quite mistaken. That is very inaccurate. IQ tests are best at measuring one thing – ability to take an IQ test. Tests such as the SAT (which is basically an intelligence test accepted by Mensa for admission) correlate with parents’ wealth much more than anything else.
      There are marked discrepancies in performance by ethnicity and SES – does anyone really believe that poor people just happen to be less intelligent?
      These tests are measures of behaviors – behaviors that learners can appropriately or not perform based on their previous interactional histories which are largely influenced by their verbal communities. Isn’t a little disconcerting that rich white people seem to score markedly better on a test designed by overwhelmingly rich white people to measure intelligence? It’s self justifying hegemony nothing more.

      • Bob says:

        When one compares siblings in the same family, thus controlling for environment, the higher-IQ sibling generally does better in life––his/her test performance correlates to job performance, academic achievement, etc.

        Yes, poor people are on average less intelligent. This is the case in any society where cognitively demanding jobs are higher paying, and there is some degree of social mobility. It is probable that the causal arrow points more from low intelligence to poverty than vice versa because IQ correlates more with early childhood SES than with adulthood SES.

        IQ also correlates very well with physiological indices, not just cultural ones. These include brain myelinization, reaction time, and brain size. They even predict innovation rats over time! http://www.gwern.net/docs/2012-woodley.pdf

        If you want a review on IQ’s correlates, you could do better than “Intelligence: Knowns and unknowns,” a report issued in 1996 by the APA.

  7. Don Davis says:

    “It is probable that the causal arrow points more from low intelligence to poverty than vice versa because IQ correlates more with early childhood SES than with adulthood SES.”

    This is thinly veiled social Darwinism coated with the trappings of research. I would like to know which ref this is from. Much of it seems as if could have been rehashed from the Herrnstein book.
    There is very little income mobility in the US – less than 4% of those born to the lowest quantile progress to the highest quantile in their lives.

    As far as it being genetic because it correlates more significantly to early childhood SES – Early childhood is also the period in which most language skills are acquired, which establishes the student’s learning trajectory. We are not born with language – we learn it through our interactions – this is determined by environment not genetics.

    How often does one hear “correlation does not imply causation?” As far as the correlates and earnings – this only highlights that IQ test performance correlates significantly to whatever other factors determine “success” (academic or professional). Perhaps unsurprisingly, researchers and educators have frequently identified that success relates significantly to students exposure to certain verbal communities of practice.

    Also, though many previously believed that “IQ” was static, there is increasing discussion that it is more malleable than previously believed.

  8. Don Davis says:

    “IQ” is not fixed at birth. It is dependent on experience and environmental enrichment.

    Wolfe, P., & Brandt, R. (1998). What do we know from brain research? Educational Leadership, 56(3), 8–13.

  9. Steve Straight says:

    I teach at a community college in Connecticut. Students are generally placed by Accuplacer, mandated by the system. If students don’t like their placement, we offer a challenge test, a written response to a short (600-750 words) argument prompt. Our rubric includes the skills we want to see in college-level writing: ability to summarize accurately, to use quotes and ideas smoothly, to show structure, transition, etc. In the past few years, we have been offering sample challenge tests to area high schools. Using the rubric, teachers (and students) can score their own tests and see where they would place (college level or developmental). Both teachers and students are reporting that these challenge tests are far more “meaningful” because the score shows where they stand in terms of college readiness, not some abstract score. The tests can be used as a wake-up call, as a diagnostic, and as first drafts of a longer essay.

  10. Bob says:

    No, IQ correlates more significantly to ADULTHOOD SES. I apologize for the confusion. (E.g., Gottfredson, 2008, “Of what value is intelligence?”)

    IQ does not need to be wholly genetic in order for my statement to be true. It merely needs to be largely genetic. Even the lowest estimates of narrow-sense heritability hover around 25 percent. Imagine if, say, nutrition explained 25 percent of the variation in IQ! 25 percent is quite large. And, in fact, most estimates are 40 to 80 percent.

    “As far as the correlates and earnings – this only highlights that IQ test performance correlates significantly to whatever other factors determine “success” (academic or professional).”

    The correlation exists among siblings, thus controlling for many environmental factors. IQ also correlates significantly with physiological indices.

    Where do you get the number that only 4 percent of people from the lowest quartile progress to the highest quartile?

    I assume that by “the Herrnstein book” you mean The Bell Curve. TBC is fairly mainstream in psychometrics. See this excellent review. http://www.polymath-systems.com/intel/essayrev/bellcrev.html

    Did you read the two papers I mentioned?

  11. Don Davis says:

    As far as the 4% from bottom to top – this has it: http://www.fas.org/sgp/crs/misc/R42400.pdf but it’s not the ref I’m building off of.

    As far as the review : “Recent evidence suggests that only about 50% of the factors accounting for intelligence are genetic.”

    Fifty-percent is a huge amount and does not justify the sort of calcified class system we have at all – much less does it provide psychometric justification for the systemic inequities in education and earnings in the US.

    “IQ correlates more significantly to ADULTHOOD SES.” Fair enough, I fully accept that “IQ”, whatever that measure represents, correlates significantly to SES. My contention is that it does not represent any sort of innate ability. This is also supported by the review you provided.

    Just to be sure we aren’t talking past each other –
    I accept that “IQ” (whatever this in fact measures) does in fact correlate significantly with SES and educational attainment. I even accept an argument of causality i.e. that whatever factors lead one to score high on an “IQ” test facilitate greater educational attainment and lifetime earnings. However, I (and many others a bit more critical – this goes back to Kuhn noting the need to step out of established paradigms) do not believe that it (whatever is measured by “IQ” tests) presents an objective measure of innate ability.

  12. Bob says:

    I think that IQ is a measure of two largely different factors: ‘g’ and ‘s’. It is ‘g’ that is mostly innate (I know of an unpublished meta-analysis that claims 85 percent heritability), largely unaffected by education or ‘culture’, and predictive of physiological indices and cross-temporal innovation rates (and most ethic differences). ‘s’, ‘specific factors’, are specific bits of knowledge that are often highly culturally loaded and affected by educational factors (they are what the Flynn effect has boosted). The trick is to find a test that measures ‘s’ as little as possible. So is IQ innate ability? Yes and no.

  13. Bob says:

    What do you think IQ tests measure?

    • Don Davis says:

      “predictive of physiological indices and cross-temporal innovation rates ”
      Which physiological indices?
      I’m not sure how cross-temporal innovation rates vary in contrast to innovation rates. “Innovation” is a trainable behavior. (But the argument hasn’t been that these behaviors aren’t trainable, but rather the extent to which they are trainable and concomitantly to what extent these measure might be used to justify societal inequities in income and educational attainment.)
      85% heritability is a much larger number than 50%. It would be important to know how social factors were controlled for. Is there a statistically large pool of children taken away from their parents and raised in higher (or lower) SES environments?

      Most objectively, I would say that IQ tests are a measure of how well someone might be expected to perform on an IQ test. Moving beyond that – are these non-verbal IQ tests? More specifically, one might say that IQ tests measure the ability of an individual to display behaviors which are believed to be representative of “intelligence” as determined by the people who constructed the test, which is largely reflective of larger socio-historical trends and verbal communities.

      It would seem that the underlying disagreement is the extent to which ‘s’ can be eliminated in IQ tests and the extent to which what the ‘g’ measures is trainable or inherited.

      • Don Davis says:

        “When one compares siblings in the same family, thus controlling for environment, the higher-IQ sibling generally does better in life”

        As far as siblings as a control for everything else – every child will have a different interactional history than every other. This is trivially true. It becomes exceedingly difficult to say that one child didn’t get extra attention (or extra apples) or have baby sitters that engaged in the same behaviors.

        My apologies – “brain myelinization, reaction time, and brain size” these are the physical indices to which you were referring.

  14. Bob says:

    This is correct. Microenvironments appear to have a reasonably large effect. However, one can control for the most obvious confounds by comparing siblings.

    I do not know how the meta-analysis was conducted. However, given that ‘g’ explains only 50 percent of the variance in IQ, a heritability for g of 85% and a heritability for IQ of 50% are not mutually exclusive.

    • Don says:

      Ahh. Fair enough. This still represents significant variability in response to the environment . The Woodley article is quite clever. However, I don’t think the case is made that the intelligence being measured is truly genotypic nor a measure of static innate ability – but it is quite interesting. The most recent ‘dysgenesis’ does interestingly enough parallel growing income disparities in the US.’

  15. Bob says:

    Despite the fact that a significant percentage of the poor (though not all of them!) are innately incapable of significantly rising in SES, I think that income redistribution and liberal economic policies in general are still justified.

  16. Don says:

    Given difficulties in eliminating the s in testing and the other noted variability, it seems quite extreme to claim that a majority of poor are genetically incapable of ‘being smarter’.

  17. Don Davis says:

    I should now bracket my assumptions and indicate why I believe I will not be convinced of the point that is trying to be made.
    1. Women earn on average less than men.
    2. Disproportionately many Hispanics and African Americans are in poverty. “White” people represent the largest percentage of top earners.

    I will not believe that women, Hispanics, and African Americans are by nature less intelligent than “white” men. Moreover, given the evidence provided and reviewed in this blog and elsewhere, it is far from certain to what extent “intelligence” is a static, innate trait.

    Oh and this 2007 study finds no statistically significant relationship between wealth and IQ:

    Zagorsky, J. L. (2007). Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress. Intelligence, 35(5), 489–501. doi:10.1016/j.intell.2007.02.003

  18. Bob says:

    Interesting article; thanks.

    “I will not believe that women, Hispanics, and African Americans are by nature less intelligent than ‘white’ men.”

    Why not? Women and minorities may suffer depressed income for reasons unrelated to IQ, yet still have innately lower IQs.

  19. Bob says:

    The article controls for too many variables on further reading. Many variables that it controls for are highly g-loaded themselves. It is not surprising if g only affects wealth through mediators, not directly.

  20. John says:

    People and the communities that they build are by their very nature, non-standard. Trying to apply a broad, standard evaluation system across the board is an inherently flawed system. The only way to accurately gauge performance of students is to localize the specific test material, meaning the test would have to be devised by an educator who is familiar with the curriculum of the individual institution being tested. From there, some kind of arbitrary standard would have to be applied to the individual tests, by some centralized body. The single test for all students is as much a critique of the test itself as it is the students being evaluated. From this perspective, it is true that the tests are meaningless.

  21. Jordan says:

    I think a good test would be a test that tests your logic, your problem solving skills, all that sort of stuff! It needs to challenge your brain! Make you think! That’s what I think a good test would be.

Leave a Reply