What Makes 6-Mark Questions Hard to Mark — for Humans and AI Alike
Six-mark questions are the most demanding marking task in a GCSE or A level paper. Not because they are long — a student might write only three or four sentences — but because they require a fundamentally different marking approach from the rest of the paper.
On a one-mark or two-mark question, the marker is looking for specific information: a named fact, a correct calculation, a defined term. If it is there, the mark is awarded. If it is not, it is not. The decision is, in principle, binary.
On a six-mark question, that approach does not work. The student is expected to produce an extended response that demonstrates understanding of a concept, a process, or an argument. The quality of that response — how well it is organised, how fully it develops its ideas, how precisely it uses subject-specific language — determines the mark. Two students can both mention the same facts and earn different marks because one of them has explained those facts more clearly, applied them more accurately, or structured the argument more coherently.
This is levels-of-response (LOR) marking. It is the standard approach for extended questions across GCSE and A level, and it is the hardest type of marking for AI to get right.
GradeDrive's LOR mode was built specifically for this challenge. This post explains how it works and what it means for teachers who mark extended writing at scale.
What Levels-of-Response Marking Actually Involves
In point-based marking, the mark scheme lists specific points and the marker checks whether each point is present. In LOR marking, the mark scheme describes what a response at each level looks like — and the marker decides which level the response belongs to before assigning a mark within that level.
A typical GCSE six-mark scheme might have three levels:
Level 1 (1–2 marks): The student makes simple, isolated points with limited or no explanation. Ideas are not linked. Subject-specific language is minimal or used incorrectly.
Level 2 (3–4 marks): The student makes relevant points and provides some explanation. There is some logical structure and some correct use of subject-specific language, but the response is incomplete or partially developed.
Level 3 (5–6 marks): The student provides a well-structured, detailed response that demonstrates clear understanding. Points are logically connected and fully explained. Subject-specific language is used accurately and consistently.
The marker reads the whole response and decides: does this response read like a Level 1, Level 2, or Level 3 answer? Once the level is assigned, the mark within the level (the lower or higher mark in that band) is determined by how well the response fits the level description — whether it is closer to the threshold or the ceiling.
This process requires holistic judgement. It is not possible to correctly apply LOR criteria by searching for keywords or counting marking points. The quality of the argument, the coherence of the structure, and the precision of the language all matter and must be assessed together.
How GradeDrive's LOR Mode Works
GradeDrive's LOR mode processes extended writing questions through a different pipeline from point-based questions. Rather than checking for the presence of specific marking points, it evaluates the response against the band descriptors in the mark scheme — the same criteria a human examiner would use.
Reading the scheme. When a teacher uploads a paper containing LOR questions, GradeDrive identifies them during the calibration pass and reads the band descriptors for each. The descriptors specify what a response at each level must demonstrate: the content requirements, the quality indicators (explanation, analysis, argument), and the language requirements. These become the criteria against which the student's response is assessed.
Evaluating the response. GradeDrive assesses the student's response against each level's criteria simultaneously. It is not looking for the presence of keywords. It is evaluating whether the response demonstrates the qualities described at each level: whether the explanation is developed, whether the points are linked, whether the language is accurate and appropriate. The assessment produces a level assignment with a confidence indicator.
Assigning the mark within the level. Once a level is assigned, GradeDrive determines the mark within the band by evaluating how fully the response meets the level descriptors. A response that clearly meets all of Level 2's criteria but falls short of Level 3 on most indicators earns 4 marks. A response that meets Level 2 on some indicators but is incomplete on others earns 3 marks. The distinction is made by reference to the specific language of the band descriptors.
Teacher review. Every LOR mark is presented to the teacher in the review interface with the level assigned, the mark within the level, and the specific reasoning referenced to the band descriptors. The teacher sees the same information they would use to make their own marking decision, making it fast to confirm the AI's call or override it where they disagree.
Why Point-Based AI Fails on Extended Writing
When a general-purpose AI is asked to mark an extended writing response, it typically defaults to a form of point-matching — identifying whether the response contains the facts or ideas listed in the mark scheme's indicative content section. This produces a mark that reflects the quantity of relevant content present, not the quality of the argument.
The problem is that LOR marking is explicitly not about quantity. Two students can produce responses of identical length covering identical facts. If one of them has explained the causal relationships between those facts and the other has listed them without explanation, they belong in different levels. A point-matching approach assigns them the same mark.
This is the category error that undermines most AI marking on extended writing: treating LOR questions as if they were point-based questions, with more points equalling a higher mark. GradeDrive's LOR mode exists to correct this.
Essay Questions at A Level
At A level, extended writing questions can be worth 12, 16, or 20 marks and may cover entire sides of A4. The LOR marking approach scales to these questions, but the complexity of the mark scheme increases significantly.
A level LOR schemes typically include:
- Band descriptors that are more specific about the quality of analysis and evaluation expected at each level, not just the presence of correct knowledge
- Indicative content sections that list the kinds of points a high-quality response might contain — without requiring all of them, and without treating their presence as sufficient for a high mark
- Assessment objective weighting, where marks are allocated across different skills (knowledge, analysis, evaluation, communication) and the balance between these affects the final mark
GradeDrive's LOR mode handles A level extended writing using the same band-descriptor approach as GCSE, extended to accommodate the additional complexity of multi-objective assessment schemes. For questions where the scheme separates marks across assessment objectives (AO1, AO2, AO3), GradeDrive processes each objective separately and combines the marks according to the scheme's weighting.
What This Means in Practice: A Marking Example
Consider a GCSE Biology question: "Explain how the body responds to a fall in blood glucose concentration." [6 marks]
The mark scheme has three levels, as described above. Level 3 requires the student to explain the mechanism fully: glucagon released by the pancreas, glycogen broken down to glucose in the liver (glycogenolysis), glucose released into the bloodstream, blood glucose returns to normal. The response should explain the causal chain, not just name the components.
A point-matching AI might award Level 3 marks to a response that mentions glucagon, the pancreas, glycogen, the liver, and the bloodstream — five relevant terms — even if the response consists of disconnected sentences with no causal explanation: "Glucagon is released. The pancreas does this. Glycogen is in the liver. The liver breaks it down. Blood glucose goes up."
GradeDrive's LOR mode evaluates whether the response demonstrates understanding of the mechanism, not just familiarity with the vocabulary. The response above — five relevant terms, no causal links, no explanation of how or why — belongs at Level 1 or the bottom of Level 2. A response that explains the trigger, the hormone, the site of action, the mechanism, and the feedback outcome in connected prose belongs at Level 3.
The difference in mark is significant: 2 marks vs 6 marks on a six-mark question. On a 60-mark paper, getting LOR questions right has a substantial effect on the accuracy of the total mark.
The Teacher Review Step on LOR Questions
LOR marking is where teacher expertise is most valuable, and GradeDrive's review interface is designed accordingly.
When the teacher reaches an LOR question in the review, they see:
- The student's full response
- The level GradeDrive has assigned
- The mark within that level
- The specific band descriptor language that informed the assignment
- A brief explanation of why the response was placed at this level rather than the adjacent ones
This gives the teacher everything they need to agree or disagree quickly. If the AI has assigned Level 2 and the teacher thinks the response is Level 3 — because it contains a piece of analysis that GradeDrive underweighted — they override the level and the mark adjusts accordingly. The override takes one keyboard shortcut.
For borderline responses — those that sit close to the boundary between two levels — GradeDrive flags these explicitly so the teacher knows to give them more attention. This mirrors the practice of experienced human markers, who annotate borderline scripts for second-marking or moderation.
Frequently Asked Questions
Can GradeDrive mark 6-mark questions? Yes. GradeDrive's LOR mode handles 6-mark extended writing questions using the band descriptors in the mark scheme, not point-matching. It assigns a level and a mark within the level, with the teacher reviewing and confirming the result.
Does LOR mode work for all subjects? Yes. LOR marking is used across GCSE and A level in Science, History, Geography, English, Psychology, Business, and many other subjects. GradeDrive's LOR mode works from the band descriptors in the uploaded mark scheme, so it applies wherever the scheme uses levels-of-response criteria.
Can GradeDrive mark A level essays? Yes. A level extended writing questions — including multi-objective schemes where marks are split across AO1, AO2, and AO3 — are handled. The LOR pipeline scales to longer responses and more complex mark scheme structures.
How does LOR mode differ from point-based marking? Point-based marking checks for the presence of specific facts or ideas. LOR mode evaluates the quality of the argument holistically against band descriptors. For extended writing, only LOR mode produces reliable marks — point-matching systematically over-rewards responses that contain relevant vocabulary without demonstrating understanding.
What happens with borderline responses? GradeDrive flags responses that sit close to the boundary between two levels, prompting the teacher to give them closer attention in the review step. This mirrors how experienced examiners handle borderline scripts.
How long does it take to review LOR marks? LOR questions take slightly longer to review than point-based questions because the teacher is evaluating a holistic judgement rather than a binary one. GradeDrive provides the level assignment, mark, and reasoning in a single view so the teacher can confirm or override quickly. For a class set of 30 students, a paper with two LOR questions typically adds 10–15 minutes to the review compared to an all-point-based paper.
Try GradeDrive free — upload a paper with extended writing questions and see how LOR mode handles them.
Ready to reclaim your evenings?
Join teachers across the UK using GradeDrive to mark papers faster, more consistently, and without the Sunday-evening dread.
GradeDrive Team
The GradeDrive team is made up of educators, engineers, and product designers on a mission to reduce teacher workload through focused AI tools.