How is assessment shackling schools?


Assessment is used less as a tool for improvement than as a manacle for accountability, straitjacketing teaching and shackling learning.



‘Everybody has won, and all must have prizes’

Lewis Caroll

They began running when they liked, and left off when they liked, so that it was not easy to know when the race was over. When they had been running for half an hour or so, the Dodo called out ‘The race is over!’ and they all crowded round, panting and asking, ‘But who has won?’

This question the Dodo could not answer without a great deal of thought, and sat for a long time with one finger pressed against his forehead, while the rest waited.

At last the Dodo announced: ‘Everybody has won, and all must have prizes.’

When I was doing my GCSEs, I was always a bit puzzled as to why we never got our exam papers back. It seemed to leave me blind as to how to improve. Then, when I first became a teacher and started assessing students’ work, I felt a bit like the Dodo. I’d look from the assessments to the criteria in bemusement. I’d agonise about which level to label a piece of work with. I’d gingerly assign one, but if I relooked later, I’d often come up with a completely different level. But as everyone had to move onwards and upwards, I tended to err towards optimism, as if all should get prizes for progress.

Last week I posted on formative assessment, and how it got hijacked by gimmicks. This week I want to write about how the topsy-turvy world of summative assessment straitjackets teaching and shackles learning.


What’s assessment for?

Assessment is one of the most complex issues in education. Fundamentally, there’s no agreement on what it’s for. Tim Oates at Cambridge Assessment has enumerated about 80 separate purposes that assessment data are used for. Broadly, they include diagnosing students’ strengths and weaknesses in subjects; evaluating whether students have learned what teachers have taught; judging what students are learning more broadly; challenging the brightest students whilst allowing the weakest access; deciding whether to award qualifications and distinction; reporting to parents on children’s progress; comparing departments; benchmarking between schools; inspecting the quality of teaching and achievement in schools; monitoring performance locally; publishing data to hold the government accountable for public spending… the purposes of assessment are almost innumerable; many of them are conflicting. The more purposes an assessment has, the more strain is placed on its design: GCSEs, for instance, encompass several. The crux is the tension between assessment for improvement and for accountability.

The assessment for accountability regime gets teachers fixated on exam drills, hooked on levels, ratcheting up ladders and bogged down in bureaucracy, all of which straitjacket teaching, shackle learning, and crowd out important formative assessment.

Fixated on exam drills drill

The high-stakes exam system and league table metrics exert inordinate pressure on teachers. This anecdote from an English teacher, which will seem familiar to many, reveals the pervasive, chronic short-termism of drilling in exam technique and timing that the system forces us into:

My year 11 pupils couldn’t have told you the name of a poet, but they could tell you that section A of paper 1 assessed your reading skills, that it involved 4 questions, that each question had 10 marks, that question 3 nearly always dealt with the writer’s techniques, that they should spend about 10 minutes reading the passage and then 10 minutes answering each question that followed, which left about 70 minutes for section B, which assessed writing and involved two tasks, both of which were worth 20 marks, but the first of which you should only spend 25 minutes on, leaving 45 for the piece of creative writing, which in any case you would write beforehand, memorise and regurgitate in the exam so you didn’t have to waste any time thinking and structuring a story from scratch.


It’s not just at GCSE that teachers are affected, but also at Key Stages 2 and 3. The 2011 Lord Bew report on assessment and accountability took 12 weeks, 4000 online responses and 50 interviews. Respondents with ‘considerable and significant concerns’ criticised National Curriculum levels as ‘too broad, inconsistent across Key Stages, not specific enough about a pupil’s attainment in any given subject and difficult to interpret, including for parents.’ The counter-intuitive consequences of the high-stakes accountability system, the report concluded, was pervasive teaching to the test, placing a ceiling on attainment of pupils, and impeding their progress.

Hooked on levels hook

It seems, however, as if we’re all addicted to grades and levels. Dylan Wiliam uses the colourful analogy that we have our students hooked on them like drug addicts, that we teachers are pushers and parents are co-dependents. Like any addiction, it absorbs attention, temporarily gives gratification, artificially inflates self-esteem and exacerbates the problem it seeks to remedy. Wiliam’s summary of the research is that constantly giving grades actually lowers achievement. Not only that, but giving comments with grades means that students don’t read the comments, as they’re too busy comparing grades. He concludes that grades inhibit learning.

APP grid

The problem with national levels is that the success descriptors are vague, abstracted, overcomplicated, overlapping, over-generalised, jargon-heavy, vacuous, and almost unintelligible. Try using the above English descriptors to reliably distinguish between pupils’ writing at levels 4, 5 and 6 in assessment focus 2: ‘understand, describe, select or retrieve information, events or ideas from texts and use quotation and reference to text’:

  • For a level 4, ‘comments supported by some generally relevant textual reference or quotation’
  • For a level 5, ‘comments generally supported by relevant textual reference or quotation, even when points made are not always accurate’
  • For a level 6, ‘commentary incorporates apt textual reference and quotation to support main ideas or argument’.

If that impenetrable bureaucratic educationalese makes your brain hurt, imagine what it’s like marking, leveling and moderating 30 essays using sub-level guesstimates across four assessment foci. Three different teachers could assign the same piece of work three different sub-levels; in fact, one teacher might on separate occasions assign three different sub-levels to one piece of work. Validity and reliability depend on the precision and specificity of assessment criteria, and are compromised if they are vague and vacuous.

Numerical levels, though, are easily compared across departments and schools. Parents thirst for them, managers and teachers persist with pushing them, and students become more and more addicted to them.


Ratcheting up progress ratchet

The linearity of levels has led to the idea that a students’ grade or level should never go down. Pupils, parents and managers complain unless there’s continual upward progression being evidenced by numerical levels, regardless of reliability.  But this anecdote illustrates the problem with generic levels when applied across content of varying difficulty:


In a media unit in English, the assessment question was: ‘How does Spielberg create drama and tension in the film Jaws?’ When marking these essays, there were kids who had effectively got to the top of the level grid.  They’d analysed the film perfectly. Admittedly they’d spelt some words wrong, but they weren’t being marked on that. They were being marked on how well they could analyse, and I didn’t see how they could have done it better. So what was I supposed to do? Give them a level 8, the top grade you can get at Key Stage 3?  But I knew these pupils were not level 8 students.  If I gave them a level 8 now, there would be outrage when at the next assessment on Shakespeare they went back to level 5. But if I gave them a level 5, I was at a loss as to what to put for a target.  In the end, I gave them level 5 and made up a target that wasn’t on the grid.


Like a game of snakes and ladders, no one wants to be the snake that moves pupils down.


Encumbered by bureaucracy Bureaucracy

The classic example of how assessment and accountability bogs down teachers in bureaucracy is the last government’s 2008 DCSF £50 million initiative: ‘Assessing Pupil Progress’ (APP). It diagnosed school assessment’s ‘unnecessarily bureaucratic, time-consuming and workload-intensive burdens; and promised to ‘replace existing bureaucratic internal school assessment practices with a more streamlined and purposeful approach’. It had precisely the opposite effect, as another English teacher writes here:


The APP grids came on a double-sided A3 sheet consisting of the 14 English Assessment Focuses, or AFs, broken down by the 8 key stage three national curriculum levels – that’s 112 tiny little boxes of skills targets. I taught 90 key stage three pupils, so that meant 90 double-sided sheets of A3 to update every half term.  That was how often we updated the sheets, but there was a suggestion that it should be updated after every piece of assessed work. If that’s called reducing bureaucracy, I’d love to see what increasing it is like.


Because of the imprecision, some schools started assigning national levels to 2 decimal places, giving out levels like 4.45. But this isn’t just anecdotal. From 2008 to 2010, teachers’ Union NASWUT received increasing numbers of reports from teachers about the burdens of APP. It put this down to ‘inappropriate approaches to implementation promoting its use in ways never intended’. It was only ever supposed to be used twice or three times a year, NAWSUT argued. But I think it’s more likely that the entire micro-prescriptive premise of APP was fundamentally flawed in the first place.


The assessment for accountability regime is shackling learning. Shackles

The effects of the assessment for accountability regime on learning are imposing, obscuring, weakening and demotivating. It imposes crude and unhelpful student labels: from the ages of 7-14, countless conversations between students compare what level they are: ‘I’m a level 5b: what are you?’ It obscures what students actually know and can do – who knows what 5b actually means in Maths? How strong or weak their grasp of number, algebra, statistics and geometry is obscured. Assigning a ‘best fit’ level across all subject areas, weakens teachers’ and parents’ clear understanding of pupils’ specific weaknesses or misunderstandings. Worst of all, expected national progress of 2 national levels from Years 7-9 is pretty demotivating for students. What if students don’t progress as ‘expected’? ‘ Ellie, at the start of Year 7 you were on a level 5. By the end, you’re on a level 5. Well done’. It’s not very motivating, is it? Assessment for accountability even fails on its own terms: as Tim Oates points out: ‘generalised reporting using levels obscures the fact that too great a proportion of pupils fail to attain elements of the curriculum that are vital for the next phase of their education’.

So why do we still use levels?

If levels are so evidently counter-productive for learning, why do so many schools still use them? In June 2012 the DfE took the decision to scrap national levels and not replace them in its letter to the expert panel on the National Curriculum Review. So why haven’t more schools taken the chance to do away with them?


In a word, accountability. Schools are locked into levels because OFSTED inspections and data dashboards measure progression in levels; league tables measure progression from baseline levels; and SLT, perversely incentivised by these metrics, enforce levels on teachers. Even if a school somehow devised its own system of progression, how would it benchmark?

The 2011 Bew report stated that ‘in the short term, we believe we need to retain levels as a means of measuring pupils’ progress and attainment. Key Stage 1 continues to be reported by levels, and therefore to measure progress robustly, Key Stage 2 results should be reported in the same way’. This sounds identical to the argument for using levels at KS3: that KS2 data is measured in levels, so there’s no other option.

With the focus on measurement, benchmarking and accountability, the logic seems inescapable. But such regressive logic reminds me of the start of Stephen Hawking’s A Brief History of Time, when he mentions a conversation Bertrand Russell had after a lecture he gave on the universe. An old lady came up to him, and said, ‘Rubbish. The world is really a flat plate on the back of a giant tortoise.’ When he asked what the tortoise was standing on, she replied: ‘Very clever, young man, very clever. But it’s turtles all the way down!’ It sometimes seems as if the education system yields to none in embracing the logic of the Dodo.


“It’s turtles all the way down”.

Mind you, it’s easier to criticise than to propose a credible alternative. But it’s also important to diagnose before you prescribe. If this post diagnoses the extent of the problem, the next tries to envisage an alternative assessment regime. If Wiliam is right and we’re addicted to levels, as my friend Harry said: we’re going to need a methadone.



About Joe Kirby

School leader, education writer, Director of Education and co-founder, Athena Learning Trust, Deputy head and co-founder, Michaela Community School, English teacher
This entry was posted in Education. Bookmark the permalink.

20 Responses to How is assessment shackling schools?

  1. Reblogged this on Scenes From The Battleground and commented:
    Another one that I can’t really avoid reblogging.

  2. Michael Tidd says:

    While a lot of what you say here is true, it is important to note that the levels were originally only intended to be used at the end of each key stage, not after each and every piece of work. Again it’s a problem of asking too much from one simple system.

  3. webby101 says:

    Have you read any Royce Sadler? His thinking has similarities to yours. I saw him at the ACER conference in Sydney last year. He compared a technical specification which was absolutely explicit about exact sizes in millimetres etc. with an English rubric. He certainly convinced me that the English rubric, with it’s highly interpretable adjectives and verbs, could be equally applied to both a Year 8 and a university level essay. He is an education professor in Queensland; a system where there are no external exams at all, just coursework, so he uses these concepts to argue for rigorous moderation regimes. He argues that experts (i.e. the teachers in this case) have a notion of quality (guild knowledge) that can only be approximated by criteria and so rubrics should be seen as the servant and not the master. He also advocates the “hidden” criterion; something you don’t tell the students about in advance, something you don’t even predict that you will use, but a criteria that you bring out during marking to differentiate between the formulaic and more genuine responses. He thinks this is not a problem if you are upfront about it.

  4. … this seems mainly aimed at English assessment and it does ring true. By way of contrast however, we (at mys school) use levels in maths throughout KS3 and it seems to be very effective. I think the old adage that “maths is (almost uniquely) differentiated largely by task rather than outcome” means that, for us, NC levels are objective, accurate and helpful measures. Agreed? Please let’s not follow the Govian mantra that if it doesn’t work properly in one subject then every subject has to be changed – a staggering example of lazy and impatient thinking which is doing immeasurable damage to parts of our education system.

  5. Pingback: Improving Pupil Feedback | The English Department's Blog

  6. Pingback: How could assessment & accountability unshackle schools? | Back to the Whiteboard

  7. Pingback: d assessment & accountability unshackle schools? | Pragmatic Education

  8. Pingback: How could assessment & accountability unshackle schools? | Pragmatic Education

  9. Pingback: Why isn’t our education system working? | Pragmatic Education

  10. Pingback: Edssential » Creating a culture of critique

  11. Pingback: How can we improve our education system? | Pragmatic Education

  12. Pingback: A summary of ideas on this blog | Pragmatic Education

  13. Pingback: Which cognitive traps do we fall into? | Pragmatic Education

  14. Pingback: Books, bloggers & metablogs: The Blogosphere in 2013 | Pragmatic Education

  15. Pingback: Big Fish, Little Fish, Cardboard Box: The Risk of KS2/KS3 Transition | Literacy SENse

  16. Pingback: A guide to this blog | Pragmatic Education

  17. Pingback: Creating a culture of critique –

  18. Pingback: What Sir Ken Got Wrong | Pragmatic Education | Magnitudes of dissonance

  19. Pingback: Golden needles in a haystack: Assessment CPD trove #4 | Joe Kirby

  20. Pingback: Articles | Joe Kirby

Leave a Reply