Comparing AI Prompts to Button-Pushing on a Camera

Plenty is being said about AI systems that generate visual works, written works, music, etc. And plenty more will be said, especially now that lawsuits have been filed against some of the AI-generated image companies. In this post, I want to address a misconception about authorship in copyright law that may be warping the AI conversation. As I understand the argument, some AI proponents allege that the act of writing prompts is comparable to the act of pushing the button on a camera and, therefore, vests copyright rights in the proverbial “button pusher.”

Although it is possible to conceive a scenario in which this analogy might apply, it is important to first understand that the underlying premise (i.e., that button pushing establishes authorship in a photograph) is wrong. In fact, when photography emerged as the first machine-made work, it posed a challenge to copyright law that still provides an ideal context for discussing what it means to say that copyright protects creative expression the moment the author causes that expression to be fixed in a tangible medium. Note that the key ingredients are expression, an author, and fixation, and inherent to the process binding all three is an interval of human effort enabling the author’s concept (or vision) of the expression to be manifest as fixation.

With photography, the interval of effort may be stately or a mere fraction of a second, but copyright law does not discriminate between the photographer who carries a vision in her mind for weeks of preparation and arrangement and the photographer who captures a fleeting moment from real life. In both cases, triggering the shutter is the proximate cause of fixation,[1] but vesting copyright rights in the photographer is predicated on an assumption that, even in a fraction of a second, she made creative choices sufficient to find a modicum of original expression in the image.

Various Scenarios in Which It Is Not About the Button

In the case of a studio shoot with a lot of preparation, lighting, props, wardrobe, etc., the photographer may not even touch the camera very often. It may be mounted on a tripod with an assistant triggering the shutter from a computer or remote control while the photographer directs all the creative aspects that comprise the resulting images. Copyright holds unequivocally that this individual is the author of the photographs because it is his expression that is being fixed in each image, but the mechanical “button-pushing” is irrelevant except as a purely mechanical step in fixation.[2]

For the street photographer or photojournalist, the same principles apply, but copyright allows for the arguably metaphysical assumption that even in the tiny interval between seeing the real-life subject and capturing it, the photographer makes subtle choices that imbue the work with sufficient expression to be protected. Again, the button causes fixation but is not the basis of authorship, and this would be evident in the analysis of the content and qualities of the photograph, if it were to become the subject of a copyright infringement lawsuit.

By contrast, if a truly accidental photograph is captured (e.g., by a camera accidentally dropped from the Eiffel Tower), there is no authorship in that image—not because a human did not push the button, but because there is no colorable nexus between the human’s mental conception and the resulting photograph. On the other hand, if a photographer intentionally drops a camera from the Eiffel Tower and triggers the shutter by remote on its way down, copyright attaches to those images—not because a human pushed the button, but because a human conceived of the series of falling photographs and arranged the circumstances by which they could be made.

Although it is important to note that cameras are not machines trained with a corpus of existing photographs, this last example may be the closest analogy to the prompt directing the AI generator (in its current state) to make an image. If the prompt writer has a general sense of the image she wants to produce, but there is still an element of chance about what the machine will make, the prompt writer may argue that she is no less an author than the photographer who intentionally allows some element of chance into the process of making his images.

While this premise sounds reasonable as a general proposition, what it really implies is a case-by-case consideration as to how much human expression exists in the resulting works. Even in the example of the camera tossed intentionally off the Eiffel Tower, the photographer can control certain qualities in the images and may even have a vision for how they are to be used, displayed, or distributed. He knows the characteristics of the camera and lens and can select settings with the intent to control some of the qualitative results in the final photos.

By contrast, the prompter directing the image-generating AI is arguably not in control of enough of the qualitative elements in the final image to claim authorship—at least not at the current state of the technology. Entering the prompt “A mermaid wrestling a sea lion in outer space in the style of Cartier-Bresson” may produce an image that checks each of those boxes, but the prompt writer is not controlling the qualitative choices that comprise the result. Composition, line weight, shading, lighting, texture, scale, proportion, etc. are all “selected” by the AI based on what it has “learned” from the millions of visual works fed into its code, so there is a critical disconnect between the human’s vision of “A mermaid wresting a sea lion in outer space in the style of Cartier-Bresson” and the interval of effort that fixes the image in a tangible medium.

At some future state of the technology, the human may prompt a draft image to be made and then prompt changes to the qualitative elements, at which point it may be tough to deny that there is authorship in the resulting work. If these technologies develop in this way—such that the prompter is essentially painting with words instead of a stylus—this anticipates that, for instance, a disabled individual could truly create visual works with her mind akin to the way Stephen Hawking wrote books. But in this paradigm, the AI does not present a unique challenge to the concept of authorship because the human is in control of sufficient expression in the work.

Dynamic Ethical Standards

Of course, this theoretical discussion assumes integrity among individuals who claim authorship in various works. The guy whose camera accidentally snaps a photo does not have to admit he played no role in its making, and AI currently presents a similar challenge. The issue of integrity is a hot conversation we’re having in response to generative AI—especially in academia where ChatGPT is already “writing” papers for students. Notably, few people would question the judgment that the student who turns in a paper “written” by an AI is a cheat deserving the same sanctions as if he were caught plagiarizing. Yet, somehow, when the material is a “creative” work, AI advocates argue that the prompter is an author of a visual work comparable to a photographer using a camera.

This dichotomy can only be reconciled by confronting the fact that certain uses of AIs are not only not authorship but are needlessly destructive to the very purpose of intellectual and cultural endeavor. The student who shirks writing his own paper learns nothing and so, potentially graduates from a program unqualified. Likewise, the prompter using an image-generating AI is not an artist and contributes nothing to the purpose of art. Thus, while there may be uses for these systems, their potential cultural value depends on more than technological development for its own sake.

Because these technologies are still new and still primitive relative to their expected capabilities, it is hard to predict where the more serious aspects of the narrative will lead. Some of the generative AIs are barely more than toys at the moment (e.g., turning profile pics into oil paintings), but what they will do a year from now, let alone five years, will inform how we address the issues—cultural, legal, and ethical. For now, though, I insist that no, prompting is not equivalent to button-pushing with a camera, even if button-pushing were as significant as many people think it is.

[1] This is true with digital photography. With film, one could argue that the latent image on the negative is not fixation until it is at least developed because it cannot be perceived by either human or machine reader.

[2] And there are likely to be further steps like retouching or printing, which may fix the final version of the image.

Photo by author.