Files
Download Full Text (885 KB)
Description
It is beautiful, the Mandelbrot set. In full, it resembles a horseshoe crab or a sideways Rorschach test, flecked with patterns of spikes and lobes. Zooming in at any point near its edge reveals infinite depths of swirls, reminiscent of Van Gogh’s Starry Night or an intricate paisley textile. The shape could easily hang on a wall, an example of fine abstract art. Yet the entirety of that shape can be computed by repeated application of a simple mathematical function:2 f(x) = x2 + c. Is all the beauty and complexity of the Mandelbrot set “inside” the symbols of that mathematical function? And what does it mean for something to be “inside” a mathematical function in the first place?
These seemingly abstract, philosophical questions are at the heart of heated and difficult debates of law and policy today. The recent explosion of generative artificial intelligence has prompted questions of how AI technology interacts with copyright laws, privacy rules, and other regulatory policies. These legal regimes often have predefined notions of when information is “inside” something. Yet these notions appear puzzling, contradictory, or plainly wrong when applied to AI, and that risk burdening the new technology with nonsensical and outdated doctrines.
Copyright law, among others, depends on what is “inside” an AI model, the enormous and seemingly incomprehensible set of numeric parameters that make AI systems tick. This Article takes aim at that question. Its central argument is that, because modern AI models are just exceptionally large mathematical functions, intuitions about simpler mathematical functions will sharpen an understanding of what is inside AI. This Article will take a tour of those intuitions—fully accessible to anyone, no special math skills required—and use them to identify new approaches to evaluating AI under copyright.
A brief summary is as follows. Conventional accounts of AI and copyright law make two separate assumptions about what is “inside” a model. First, models are trained on copyright-protected information—images, texts, sounds, and so forth. Many commentators believe that the training data is “ingested” into, and is consequently inside, the model. Second, AI models have been observed to produce, upon carefully crafted prompting, outputs highly similar to copyrighted works. The apparent implication of this behavior is that the model must have “memorized” the copyrighted material during training, again an assertion that the material is inside the model. These allegations have serious consequences, suggesting that AI models may be infringing on contraband, subject to the severe penalties of copyright law. To gain a sharper sense of what is inside the model, this Article presents two basis premises that are well-known among the AI research community but rarely explored in the legal literature.
• First, a model is simply a mathematical function, a series of basic operations like multiplication and addition that convert one list of numbers into another. Even the most remarkable of generated AI outputs—art, music, and literature—are just the consequence of simple mathematical operations, different only in scale from the mathematical formulas that anyone is familiar with.
• Second, the AI model is not an arbitrary or incomprehensible black box. Rather, it corresponds to an approximation of a mathematical object called a manifold, a geometric representation of a collection of useful real-world information, like grammatical sentences or comprehensible sounds.
These two concepts upend the implicit assumptions that laws make about what is inside AI models. First, they show a distinction between the numerical parameters of a model and the manifold that a model represents. Information can be inside one without being inside the other, and laws frequently conflate these two meanings of “inside.” Second, the nature of the AI model as an approximation suggests that, at least in an ideal case, the model should discern larger patterns about data rather than merely incorporate that data. Applied to copyright law, the implication is that the broad, simplistic definition of “inside” that the statutory text adopts—a work is “inside” a thing if the work can be “perceived . . . with the aid of a machine”3—is logically questionable. Courts and lawmakers will need to revisit seemingly basic questions of copyright law anew in view of the unique nature of AI technology.
Publication Date
9-2025
Book Title
AI & COPYRIGHT The Evolving Legal Landscape
Publisher
Foundation for American Innovation
Keywords
Artificial Intelligence, Copyright Law, Machine Learning Models, Intellectual Property, Generative AI, Data Storage, Algorithmic Governance, Digital Technology Law, AI Regulation, Computational Creativity.
Disciplines
Computer Law | Intellectual Property Law | Internet Law | Law | Science and Technology Law
Recommended Citation
Duan, Charles, "Inside AI" (2025). Contributions to Books. 350.
https://digitalcommons.wcl.american.edu/facsch_bk_contributions/350
Included in
Computer Law Commons, Intellectual Property Law Commons, Internet Law Commons, Science and Technology Law Commons