"Fair use protects criticism and commentary—but not a scoop of an author’s first publication. In the AI era, curate datasets and block verbatim, unreleased outputs."
— Aditya Mohan, Founder, CEO & Philosopher-Scientist, Robometrics® Machines
Cups ringed the copy desk, pencils worn to nubs. A courier’s envelope lay open, its contents heavy with scandal: pages from Gerald Ford’s unpublished memoir, A Time to Heal. Time magazine had paid for an exclusive first‑serial; deadline lights blinked across town. In this room, an editor traced a finger under the lines that mattered—Ford’s account of the Nixon pardon—and a copy editor, blue pencil poised, murmured, “This is the heart of it.” Phones crackled. Galleys snapped off the typesetter. Someone said, almost to the room, almost to the clock, “We go now, or we go never.”Hours later, the piece ran with roughly 300–400 verbatim words from the manuscript. Time canceled its deal; lawyers sharpened their knives. The question hit the Court like a paperweight: Can a magazine scoop an author’s first publication by quoting the author’s own words?
Writing for the Court, Justice Sandra Day O’Connor held that the article was not fair use. By printing generous verbatim excerpts from an unpublished manuscript to “lend authenticity,” the magazine had “arrogated to itself the right of first publication,” a marketable right the law protects. The Court explained:
“The unpublished nature of a work is a key, though not necessarily determinative, factor tending to negate a defense of fair use,” and “under ordinary circumstances, the author’s right to control the first public appearance of his undisseminated expression will outweigh a claim of fair use.”
First publication, the Court added, is a “threshold decision—whether and in what form to release [a] work.”
Harper & Row teaches that fair use disfavors exploiting the expressive core of an unpublished work—especially the passages that carry its narrative “heart.” For AI, that draws a practical boundary: training that relies on non‑expressive signals (facts, metadata, statistics), short snippets for indexing, or transformative analysis is far safer than ingesting or emitting verbatim, not‑yet‑released text or images. Fair use disfavors exploiting unpublished expressive cores. Training may rely on non‑expressive signals and short snippets, but verbatim early‑release content remains high‑risk—rigorous dataset curation and output filters are essential.
Not by default. A cloud service that stores your photos or documents cannot simply use them to train its own internal models without clear permission and a lawful purpose. Even where Terms of Service mention “improving the service,” U.S. law adds constraints:
Consumer protection (FTC): Using people’s photos to build face‑recognition without express consent has led to orders requiring deletion of data and models (algorithmic disgorgement). Transparency and truthful disclosures are mandatory.
Biometrics (state laws like Illinois BIPA): Faceprints and similar signals require informed, written consent, retention schedules, and purpose limits; violations have produced major settlements and injunctions.
Copyright & contracts: Users generally retain copyright; hosts obtain a limited license to store/transmit. Training beyond that scope requires specific authorization and, when compelled by court order, should be bounded by protective controls (attorneys’‑eyes‑only, clean rooms).
Children’s data: Images of children trigger additional duties (e.g., COPPA and school privacy regimes), typically requiring verifiable parental consent.
Rule of thumb: Treat personal cloud content as today’s private papers. If you’re a provider, use opt‑in consent for any training beyond core functionality, offer granular controls, and prevent downstream reuse. If you’re an investigator, seek narrow, particularized process (not “turn over everything”), with protective orders for any sensitive model artifacts.
Bottom Line
The lesson of the Unpublished Page is simple: don’t seize the author’s premiere with the author’s own words. For AI, that means curating what you train on and filtering what you emit. For cloud services, it means consent first, purpose limits, and proof that your use is necessary and narrowly tailored—not merely convenient.