Make It Fair: The Quiet Expropriation of Britain’s Culture for Machine Training
When the State Treats Copyright as a “Growth Lever,” It Starts Selling What It Does Not Own
Thesis statement: The push to legalise large-scale AI training on copyrighted works through broad exceptions and opt-out “rights reservation” is not modernisation but an engineered transfer of value from creators to capital—enabled by the state’s fiscal appetites, excused by slogans, and preventable through transparency, licensing, and cryptographic rights infrastructure.1
Keywords: copyright, AI training, text and data mining, fair dealing, opt-out regimes, transparency, licensing, political economy, creative industries, database right, data protection, cryptography, blockchain, NFTs, provenance
I. The New Hunger
Every age has its preferred theft.
Some generations stole land with maps and muskets. Others stole labour with contracts drafted so politely that the ink itself appeared to bow. Ours has discovered a more fashionable larceny: the polite seizure of intellectual property under the banner of “innovation,” carried out not with crowbars but with crawlers.
Generative AI does not merely use culture. It consumes it. It is trained—meaning, in plain English, built—by copying vast quantities of text, images, music, film, archives, and databases. Copying is not an incidental side effect; it is the mechanism. One can dress that mechanism in technical costume (“tokenisation,” “vectorisation,” “embedding,” “weights”), but the legal fact remains stubbornly human: the reproduction of protected works, at industrial scale, for commercial ends.
This is why the argument matters in the United Kingdom. The question is not whether machines can compose a passable paragraph, imitate a visual style, or generate a melody that sounds like something you almost remember. The question is whether the state will redefine property so that the most valuable raw material of a nation—its creative output—becomes a free quarry for whoever can afford enough compute.
A government can do many things to encourage investment. It can lower taxes. It can simplify planning. It can fund research. But when it chooses to “encourage innovation” by weakening creators’ rights, it is not merely fostering a sector. It is reallocating ownership. And ownership is the one subject on which a civilised society does not get to be vague.
II. Fair Use Does Not Live Here
A certain rhetorical contagion has crossed oceans: the habit of speaking as though “fair use” were a universal doctrine, hovering benevolently over any copying that feels socially useful. The United Kingdom has no such doctrine. It has fair dealing—a framework of specific, enumerated exceptions, limited by purpose and interpreted with an eye to market harm and fairness.
This difference is not a technicality. It is the difference between a narrow permission and a floating pardon.
The most relevant UK provision is the text and data mining exception in section 29A of the Copyright, Designs and Patents Act 1988. It permits copying for computational analysis only when the purpose is research for a non-commercial purpose, and only when the person has lawful access to the work.1 The statute does not wink. It does not smuggle in industrial training through the back door. It draws a bright line: non-commercial research, not commercial product development.
Nor is copyright the only fence. The UK also recognises database rights, which protect substantial investment in obtaining, verifying, or presenting database contents, restricting extraction and re-utilisation even where individual items might not qualify for copyright protection.2 A modern training pipeline that scrapes, indexes, and reuses curated datasets can therefore collide with more than one right at once—which is precisely the point of having rights: they are meant to collide with would-be appropriators.
So when policymakers suggest that a “commercial research exception” might be introduced for AI training, they are proposing not a marginal tweak but a reversal of the moral logic of the law. Commercial model training is not sheltered scholarship. It is capital formation.
III. The Proposal: Opt-Out as a Policy Aesthetic
The contemporary state loves a trick that looks like liberty while functioning like coercion: the opt-out regime.
In December 2024, the UK government opened a consultation on changing how copyright applies to AI development, explicitly canvassing options that include a broad data mining exception permitting commercial use, and an approach modelled on rights reservation—meaning creators would need to signal that they do not want their works used, rather than AI developers needing permission to use them.3 The consultation’s own framing is revealing: one option is a broad exception “subject to few or no restrictions” permitting commercial use “for any purpose.”3 Another is an exception with rights reservation “underpinned” by supporting measures such as transparency.3
A rights reservation model is often marketed as respectful: “Creators keep control; they simply opt out.” That slogan relies on a misunderstanding of what property is.
Property is not the right to beg. It is the right to exclude.
The opt-out model converts an exclusion right into an administrative burden. It shifts the cost of enforcement from the industrial user to the individual owner. It demands that creators publish machine-readable reservations, monitor extraction, detect infringement, identify uses inside proprietary models, and litigate against firms whose legal budgets are larger than many creators’ lifetime earnings. It is not a compromise. It is an asymmetry, engineered and then described as balance.
The House of Lords Library briefing prepared ahead of a January 2025 Lords debate on the government’s rights reservation proposal made the stakes explicit: the consultation contemplated a copyright exemption and a model in which rightsholders would need to opt out from having their material used for AI training.4 When the state frames the matter this way, the legal default has already been psychologically moved. The burden has already been reassigned. The cultural asset has already been relabelled “training material.”
IV. “Commercial Research” Is a Category Error with Teeth
The phrase “commercial research exception” is attractive because it is lubricated with virtue. Research sounds public-spirited. It evokes universities, laboratories, scholarship, verification, the patient accumulation of truth. It distracts from the central fact: if the activity is commercial, the value created belongs to someone. The question is who.
Research exceptions exist because research produces public goods: knowledge that is hard to exclude others from, verification that stabilises the public sphere, critique that disciplines power. A limited, carefully bounded exception may be justified because it yields diffuse societal benefit while minimising market harm.
Commercial AI training is not that. It is product development, aimed at revenue, market share, and durable competitive advantage. Model weights, embeddings, and downstream applications are capital assets. They are owned. They are licensed. They are sold. They are defended by trade secrecy and contract. Calling this “research” is like calling a casino a charity because it prints brochures about probability.
Even the consultation paper admits—more politely—that a broad exception with few restrictions would improve AI developers’ access to training material and investment, but would not meet the needs of rightsholders, because they could not control or seek remuneration for the use of their works.3 That is the point in a single sentence: investment for one sector, uncompensated loss for another.
If a government wishes to subsidise AI, it has an honest path: public funding for public research; procurement of licensed corpora; compute grants; support for open datasets where rights are cleared. What it does not have is a moral right to force creators to subsidise private firms by altering the default rules of ownership.
The state cannot credibly claim to “support creators” while making them the involuntary donors of the inputs.
V. The State’s Incentive: Money, Headlines, and Concentrated Pressure
A political system does not need malice to do ugly things. It needs incentives.
The incentive here is structural. AI firms and platform companies are concentrated interests. The benefits they receive from a permissive training regime are large and measurable. That makes lobbying rational. Creators are dispersed interests. The losses are spread across tens of thousands of individuals and small enterprises. That makes coordination difficult and enforcement expensive. The resulting policy drift is not mysterious. It is the normal arithmetic of rent-seeking.
Add the state’s fiscal appetite and the drift becomes a programme.
Governments like growth. They like investment announcements. They like the theatre of “national renewal.” They do not like paying directly for subsidies, because that requires budgets, debates, and an inconvenient form of accountability. Changing default property rules is cheaper: it costs the public in private, not the government in public.
This is why the political language becomes syrupy. We are told we must “unlock” data. We must “remove barriers.” We must “modernise.” One notices that the only thing being unlocked is other people’s work.
When the creative sector resisted, it did so in numbers and in public. The closing day of the consultation period—25 February 2025—saw a coordinated “Make it Fair” campaign across the creative sector, described by the Society of Authors as an “unprecedented day of action,” with reports of thousands of responses sent to government and calls for writers to contact MPs.9 High-profile artists publicly condemned the opt-out approach, characterising it as legalising theft and urging resistance.8
And yet the impulse persists, because the incentive persists: creators do not promise quick inward investment; compute does.
VI. Transparency: The Battle Parliament Tried to Fight
If there is a single concept that separates licensing from looting, it is transparency.
You cannot license what you cannot see. You cannot negotiate if you are denied the basic facts. You cannot enforce the law if the alleged infringer is allowed to keep the evidence in a sealed box labelled “trade secret.”
This is why parliamentary battles have repeatedly returned to a simple demand: require AI companies to disclose the copyrighted works used to train models offered to UK users, so that rightsholders can enforce existing law and negotiate lawful licences. In May 2025, a revised transparency amendment was tabled in the House of Lords precisely because a prior version had been rejected on “financial privilege” grounds in the Commons—meaning, in plain political terms, that the government treated the costs of enforcement as an unacceptable expense.7
The wording games were revealing. The revised amendment sought to avoid direct spending implications by shifting from “must” to “may,” in order to dodge the Commons’ procedural veto.7 The spectacle was almost too honest: the legislature attempted to require disclosure; the government resisted; procedure became the excuse; and creators were told—again—to trust future consultations.
In the same period, the argument from the pro-extraction side became equally candid. A prominent former senior executive argued publicly that requiring prior permission to train on copyrighted works would “kill” the AI industry in the country “overnight.”18 In other words: if property rights are enforced, the business model suffers. The confession is useful. It clarifies what is being asked of society: not a calibration of rules, but the sacrifice of consent for speed.
Creators responded with a basic point that should embarrass any civil servant: “You can’t enforce the law if you can’t see the crime taking place.”10 That is not rhetoric. It is the minimum condition of legality.
VII. From the Public Web to the Private Drive: AI and Access to Files
The debate is often framed as scraping public websites. That is already serious. But the more dangerous frontier is not the public web; it is the private file system.
AI tools are being integrated into operating systems, office suites, email clients, team chat, and cloud storage. Their promise is “productivity.” Their prerequisite is access: an assistant that cannot read your documents cannot assist you; it can only perform party tricks.
Access has consequences. Indexing is copying. Embedding is transformation through reproduction. Context windows require transmission of text to remote services or local models running with privileged permissions. Even when vendors promise they will not use enterprise content to train their public models, the architecture still involves replication, storage, and processing. The risk surface expands: misconfiguration, cross-tenant leakage, credential compromise, and onward use of “telemetry” for model improvement.
The Information Commissioner’s Office has repeatedly emphasised that there is no “AI exemption” from data protection obligations, and its work on generative AI and data protection has focused on lawfulness, transparency, purpose limitation, accuracy, and the allocation of responsibility across the AI supply chain.14 Its broader guidance on AI and data protection stresses accountability, transparency, lawfulness, and risk assessment (including the need for appropriate impact assessments in higher-risk contexts).13
This matters for copyright and political economy because the categories bleed. Once the state normalises ingestion without permission in the copyright sphere, it becomes easier—culturally and administratively—to normalise it elsewhere. “It’s just analysis.” “It’s just training.” “It’s just productivity.” And suddenly the distinction between lawful access and lawful copying dissolves into a fog of convenient inevitability.
A civilised society does not treat access as a silent surrender. It treats it as a contract.
VIII. Licensing: The Adult Solution the State Pretends Not to See
The pro-exception argument is usually framed as a necessity: “Licensing at scale is impossible.” This is the sort of claim that sounds technical and is actually moral. It means: “We prefer not to pay.”
Yet the market has been assembling licensing mechanisms precisely because the law, properly enforced, compels it. In April 2025, UK licensing bodies announced a collective licence intended to compensate authors for the use of their works in training generative AI models—explicitly positioned as a market-based alternative to an opt-out exception.11 The point is not that every licence design will be perfect. The point is that “impossible” tends to become possible when the alternative is illegality.
Licensing does three things that broad exceptions cannot:-
It forces price discovery, rather than state-imposed surrender.
-
It creates incentives for clean datasets, provenance, and auditability.
-
It preserves creative industries as industries—capable of reinvestment—rather than turning them into unpaid inputs.
A government that genuinely wants both AI and the creative industries to thrive should embrace licensing and transparency as the default, not dangle opt-out as a “solution” and hope that creators drown quietly in compliance chores.
IX. The Timeline That Matters: 18 March 2026
This debate is not airy speculation. It is pinned to a date.
The Data (Use and Access) Act 2025 requires the government to prepare and publish—within a defined period—a report on the use of copyright works in the development of AI systems and to lay it before Parliament.5 A formal progress statement published in December 2025 confirmed that the government intended to lay the full report and an economic impact assessment before Parliament by 18 March 2026, and that the work would consider access and use of data, technical measures and standards, disclosure by AI developers, licensing, and enforcement mechanisms.6
So the state is not merely “listening.” It is preparing to decide. The only question is whether the decision will treat creators as owners or as obstacles.
X. The Cryptographic Exit: Making Expropriation Harder Than Permission
Law is necessary, but law is not sufficient. Enforcement is costly, copying is cheap, and the internet rewards speed over scruple. A serious response must therefore include technical leverage—systems that embed consent, provenance, and payment into the infrastructure, so that “training without permission” becomes operationally difficult rather than merely unlawful.
This is where cryptography—and, judiciously, blockchain-based registries and tokenised licences—can shift the balance. Not as a fashionable ideology, but as engineering.
1. Rights registries built on hashes, not hype
Creative works do not need to be stored on a public ledger. The ledger can store a cryptographic fingerprint (a hash) and a timestamp, linked to an asserted rightsholder identity and a licensing policy. The result is a public, tamper-resistant record: who claims rights over what, from when, and under which terms. The work remains wherever it belongs—publisher servers, archives, private storage—while the registry supplies verifiable reference points for permission.
2. Tokenised licences as machine-readable permission
A token (sometimes marketed under the broad label “NFT”) can represent a licence grant rather than an image of a cartoon animal. The token can encode scope (training vs inference), duration, territory, volume limits, attribution requirements, and permitted model versions. The practical advantage is machine readability and auditability: systems can be built to require a valid licence token before ingestion, and to log use against it.
The point is not speculation. The point is governance: permission that can be verified automatically, without begging, without guesswork, without the fiction that “silence” is consent.
3. Micropayments and streaming royalties for continuous training
Training is not always a one-off event. Models are updated, fine-tuned, aligned, and refreshed. Payment structures can mirror this reality through programmable micropayments or streaming royalty mechanisms tied to auditable ingestion events—so that use triggers compensation without requiring a lawsuit as the payment mechanism.
4. Compute-to-data: move the model, not the corpus
One of the most effective ways to prevent unlicensed copying is to avoid handing over the data at all. Under compute-to-data architectures, rightsholders host the content in controlled environments—potentially using secure enclaves—where training jobs can be executed without the raw corpus leaving the boundary. The system returns permitted outputs (for example, parameter updates) plus a cryptographically signed audit log showing what was accessed and under what licence.
This converts training from a scrape-and-run free-for-all into a governed interaction.
5. Provenance chains and enforceable attribution
For outputs, cryptographic provenance standards can embed origin assertions into generated media, enabling platforms and downstream users to distinguish licensed from unlicensed derivations. This will never be perfect—nothing is—but perfection is not the goal. Leverage is. If unlicensed training leaves detectable signatures that carry contractual and legal consequences, the economic incentive shifts away from theft and toward licences.
6. Rights reservation as a protocol default—without punishing creators
If a rights reservation model is pursued at all, it should be implemented as a protocol standard with strong defaults and penalties for circumvention—so that creators are not forced to tag every work like a shopkeeper forced to label each item “not free.” The burden must sit with crawlers and model builders: to honour signed reservations, to log compliance, and to prove lawful ingestion when challenged.
In short: cryptography can make consent cheap, provenance verifiable, and payment automatable. It can turn “permission” into a technical primitive rather than a moral plea.
The goal is not to make creativity “blockchain-native.” The goal is to make theft expensive and honesty efficient.
XI. What Must Be Defended
A state that can weaken copyright by redefining training as “research” will not stop at training. The principle—property is optional when a fashionable industry wants it—spreads. Today it is authors and musicians; tomorrow it is journalists, archivists, photographers, educators, and anyone whose work can be copied, parsed, and monetised at scale.
This is why the central claim must be stated plainly, without euphemism:
Creative work is not the government’s to give away.
There is a legitimate place for narrow, carefully bounded exceptions for genuine non-commercial research.1 There is a legitimate place for innovation. There is even a legitimate place for national strategy. What is not legitimate is the quiet conversion of private rights into public giveaways—especially when the giveaway is not for the public, but for private firms with the loudest lobbyists and the most flattering economic forecasts.
The next steps are not mysterious. They are merely unfashionable because they require discipline:-
mandate transparency so licensing can exist;7
-
support collective and scalable licensing pathways;11
-
enforce “lawful access” as a real gate rather than a slogan;1
-
and adopt cryptographic infrastructure that makes consent machine-verifiable rather than bureaucratically performative.
If the country wants AI, it can have AI. If it wants a creative civilisation, it can have that too. But it cannot have both by making one sector the unpaid feedstock of the other.
Notes
-
Copyright, Designs and Patents Act 1988 (UK), section 29A (text and data analysis for non-commercial research).
-
Copyright and Rights in Databases Regulations 1997 (UK) (implementing database right protections).
-
Department for Science, Innovation and Technology and UK Intellectual Property Office, Copyright and Artificial Intelligence: Consultation (London, 17 December 2024).
-
House of Lords Library, Copyright and Artificial Intelligence: Impact on Creative Industries (London, January 2025).
-
Data (Use and Access) Act 2025 (UK), section 136 (report on the use of copyright works in the development of AI systems).
-
Department for Science, Innovation and Technology, Copyright and Artificial Intelligence: Statement of Progress under Section 137 of the Data (Use and Access) Act (London, 15 December 2025).
-
Rachel Hall, “Lords examine new amendment to data bill to require AI firms declare use of copyrighted content,” The Guardian, 15 May 2025.
-
Dan Milmo, “Elton John calls UK government ‘absolute losers’ over AI copyright plans,” The Guardian, 18 May 2025.
-
Teddy McDonald, “Government consultation on AI and copyright closes with unprecedented day of action by creative sector,” Society of Authors, 28 February 2025.
-
“Creatives demand AI comes clean on what it’s scraping,” The Register, 12 May 2025.
-
Ella Creamer, “Collective licence to ensure UK authors get paid for works used to train AI,” The Guardian, 23 April 2025.
-
Directive (EU) 2019/790 on copyright and related rights in the Digital Single Market, Article 4 (text and data mining with rights reservation).
-
Information Commissioner’s Office (UK), Guidance on AI and Data Protection (updated 15 March 2023; notice of review following the Data (Use and Access) Act 2025).
-
Information Commissioner’s Office (UK), Response to the Consultation Series on Generative AI (13 December 2024).
-
“Guardian joins media coalition to protect original journalism from unpaid use by AI,” The Guardian, 26 February 2026.
-
Telegraph Media Group, “An Open Letter to Our Fellow Leaders in Global Media” (press release dated 25 February 2026; published 26 February 2026).
-
“Sky News forms consortium to drive push for AI standards,” Sky News, 26 February 2026.
-
Mia Sato, “Nick Clegg says asking artists for use permission would ‘kill’ the AI industry,” The Verge, 26 May 2025.