Gen AI & the Hubris of Data

data

In almost every discussion I’ve had with creators about generative AI (GAI), I have said that we should not overlook Big Tech’s capacity for exaggeration and total flops. Because it is possible that AI products may be the next Google Glass due to cultural and/or economic forces that reject their business models. For instance, last week, Digital Music News (DMN) announced a partnership between Amazon and the AI music product Suno for the next generation of Alexa+. DMN quotes Amazon’s Panos Panay, SVP of Devices and Services thus:

Using Alexa’s integration with Suno, you can turn simple, creative requests into complete songs, including vocals, lyrics, and instrumentation. Looking to delight your partner with a personalized song for their birthday based on their love of cats, or surprise your kid by creating a rap using their favorite cartoon characters? Alexa+ has you covered.

The first time I read about Suno, it struck me as a gimmick that may not attract or sustain enough market interest to be profitable. Just the example cited above of making personalized birthday songs seems like the kind of thing a household can only do a few times before it gets stale. “Surprise your kid by creating a rap…” sounds like what the kids calls “cringy.” But the broader question posed by Suno is whether consumers want “personalized” music, or whether the whole concept is the just another hubristic statement about the power of data in the arts.

There have been many arguments presented by theorists and scholars that consumer data either obviates the need for creators’ rights (copyrights) or justifies substantially limiting those rights. The general premise is that if consumer data informs creators about what audiences want, this insight lowers the risk of investing in production. Lowering that risk, say the theorists, implies rethinking copyright protection—or even rethinking the nature and value of creators, as Professors Sprigman and Rustalia proposed in a paper I critiqued in 2018.

As argued in that criticism and elsewhere on this blog, the goal of artists and creators is not necessarily to give audiences what they want. While one cannot dispute the market value of certain “formulas,” there is substantial evidence that when producers strive too hard to meet audience expectations, audiences are often disappointed. In short, risk is inherent to creative expression and audience experience.

In every medium and every genre, consumers want to be surprised by artists, and shifting modes of expression reflect artists’ personal responses to contemporary events. In general, the most successful (i.e., meaningful) works are the ones we didn’t know we wanted until we had them. And once these works become part of the vernacular of our lives, we cannot imagine living without them.

By contrast theories about the power of data as a predictor of creative success are founded in a techno-centric arrogance that, to me, is exemplified in a product like Suno. The idea that the consumer wants music to be tailored from a few instructions—“Alexa make me a punk rock song about a guy who lost his job.”—is typical of the kind of “innovation” many technologists would develop by ignoring fundamental reasons we enjoy music in the first place.

As explored in this post about opera, I agree that music, and other expressive media, can be replicated by an AI to provoke emotional responses in human observers. Simply put, if a composer knows that minor chords have a certain effect on the Western listener, then an AI can follow the same rule to produce a “melancholy” tune. But the science of music and human psychology only explains our instinctive, animal-like responses to combinations of sounds while leaving out the rest of the experience.

We cherish our playlists for reasons that transcend the sounds’ effects on our brains—i.e., transcend mere taste. We relate and return to artists or their messages; we store and recall memories in the songs we replay; and we connect to friends and family through songs we have in common. Suno, outputting a bespoke song like a tepid cocktail cannot provide any of that. On the contrary, it omits all those aspects of music that make us care about it, suggesting that its outputs are indeed gimmicks destined to become as dull as they are disposable when the short-lived novelty wears off. At least that’s my prediction.

There is, of course, a more insidious question worth asking—namely whether a product like Suno, especially when paired with Amazon, is less significant as a custom jukebox than it is as a new surveillance device. The use of personal data to micro-target and manipulate people and alter the course of major world events is not science fiction anymore. In that light, is it not conceivable that, say, 100-million people expressing their sentiments to an AI “music composer” will add color to data that will only exacerbate surveillance capitalism? That would be one hell of a way to pervert music.


Photo by: Cm2012

Amazon Fades from the ebook Legislation Narrative

Before I let the topic of these state ebook lending bills go for a bit, there is one aspect of this story that should not be overlooked. I was thinking about it when I saw a tweet criticizing Governor Hochul’s December 30th veto of the New York version of the bill. Media professional and professor Dan Gillmor, who has over 46,000 followers, summed up the sentiments of many when he wrote…

Mr. Gillmor’s hyperbole is an example of that blinkered view which finds it sensible to vilify publishers while ignoring authors, as if the interests of two were not intertwined. But the comment also reminded me that the force still driving this willful blindness is a belief that internet platforms can and should obviate the need for intermediaries like publishers. What’s especially funny about that idea in context to this story is that it was the monopolistic conduct of one internet platform—Amazon—which served as a major predicate for advocating the ebook bills in the first place. For instance, in Maryland, which passed its bill into law and now faces litigation by the publishers, all the supporting letters in the record contain the following:

For example, Amazon and Audible currently have between them over 20,000 “exclusive” titles. They will license these titles – which include high demand content by J.K. Rowling, Margaret Atwood, Alice Walker, Dean Kootz, Neil Gaiman, and others – to consumers, but not to libraries.

The headline of a Washington Post article from March 2021 (when the eBook bills were still percolating in state legislatures) identified Amazon as the publisher refusing to license titles to libraries. And as the AAP complaint against the State of Maryland notes, “The Maryland act’s legislative history and public statements by state legislators and public officials reveal some very specific concerns about this company.” The complaint adds that legislative sponsors specifically and repeatedly cited Amazon, but then avers, “… there is no contention that publishers more broadly are failing libraries. Nor is there any question that the marketplace for library ebooks and audiobooks is flourishing.”

After passage of the Maryland law, Amazon Publishing signed a deal with Digital Public Libraries of America (DPLA) to make their ebooks available to U.S. libraries, and it is reportedly negotiating terms to make its Audible audiobooks available as well. That’s a good thing, but we should not lose sight of the distinction between Amazon and the major publishers, who were not withholding their ebooks from libraries. Because whatever drove Amazon’s decision at the time (strategy for global domination?), it must be viewed as an outlier unique to that leviathan of a company and not aligned with the rest of the industry whose core business is still book publishing.

The library associations highlighted the conduct of one tech industry publisher as a reason to promote legislation that would divest individual authors of their copyright rights. Thus, despite the claims that these bills are not anti-author, a major prong of the argument for them boiled down to this:  Amazon behaved like a monopoly, so authors should pay the price. And this aspect of the story is transformed from the absurd to the grotesque when we remember that Amazon was among the platforms once touted by copyright critics as an antidote to the “gatekeeping” engaged in by publishers.

As the ebook bills gained momentum in four states—NY, MD, MA, RI—the Amazon predicate faded into the background to the extent that now, according to observers like Gillmour, the story is all about the publishing “cartel” quashing “reasonable” legislation solely directed at the price of ebook licenses for libraries. A couple of problems with this narrative, though.

The first is that, even if the libraries have a sound complaint about the cost of ebook licenses, that’s a subject for negotiation and not grounds for a futile attempt to legislate away the rights of authors. Second, if the libraries want to make a case for calling the current terms of ebook licensing unreasonable, they need to at least do some math. It is not enough to just compare the ebook purchase price to the library ebook license price and declare the difference extortionate on its face.

Because whatever the ideal cost of ebook licenses should be for libraries, the current rate of approximately three times the consumer price for ebook purchases is not as unreasonable as the library associations make it seem. The simple fact is that lending an ebook to multiple readers is a different market from selling an ebook to one reader. Let’s do a quick, back-of-the-envelope review for context.

A two-year license fee of $65 provides free access to an ebook to roughly 52 readers at a cost to the library system of $1.25 per reader. But based on the rhetoric employed by the library associations, they seem to want the same 52 readers to be provided access at a cost of about $ 0.29 per reader, but even then, not really. Because digital materials never wear out, what the libraries actually want (i.e. unlimited licenses at consumer purchase rates) is for the authors and publishers to make titles available until that per reader cost approaches zero. Clearly, there is a threshold when too low a fee would cannibalize the market for ebook sales, which would end the market for ebooks, period.

For further context, keep in mind that one reason the libraries claim a right to buy, rather than license, ebooks is that they are used to buying hardbound copies and loaning them to their communities. But here, the library associations are comparing apples and oranges and not taking an honest account of cost to the library system for providing its services. Because a physical book requires infrastructure and labor to maintain, a $30 clothbound copy, for instance, may cost the library around $1.44 per reader to serve the same 52 patrons.[1] The broader point is that the two-column argument the ALA et al presented to state legislators is not a full picture.

Finally, I would add that libraries will not stay relevant in a world where they put too many eggs in the digital lending basket. At the point at which one’s “library” experience is little more than tapping a button to access a book through an electron reader, the relationship with the individual library evaporates rather quickly. If the library associations were to take a serious read of the landscape, they might consider whether Amazon’s original refusal to license its titles has something to do with that company’s strategy to replace publishers, libraries, and any other distribution channel it doesn’t control. Because that’s the real battle of the digital age, and to that end, the libraries and publishers should be allies.


[1] Based on a staffer making $10/hr and spending three minutes managing a book for a single patron. The actual per read cost is likely higher.

What’s in the Box? Counterfeits and Online Marketplaces

In March, Senators Durbin and Cassidy introduced the INFORM Consumers Act, legislation meant to provide us with greater transparency when shopping through large online marketplaces, which is to say Amazon. In a co-authored editorial in Roll Call, the senators state:

It is well documented that third parties are selling massive amounts of counterfeit, stolen and unsafe consumer products on online marketplaces. The Office of the U.S. Trade Representative reported last year that the “rapid growth of e-commerce platforms has helped fuel the growth of counterfeit and pirated goods into a half trillion dollar industry.” Also last year, the Department of Homeland Security stated that such trafficked goods “threaten public health and safety, as well as national security.”

At this point, we probably all have a pretty good intuition that when we order various goods from Amazon, the source of the product may be questionable. If it’s a phone case for ten bucks, there’s probably no great risk, but as Senators Durbin and Cassidy note, if it’s a carbon-monoxide detector that doesn’t work, that’s another matter. Thus, the INFORM Act proposes to mandate a verification process for online marketplaces to certify some degree of legitimacy and accountability by third-party sellers through the collection of bank, tax ID, and physical address information. Any third-party sellers that fall out of compliance would have to be banned from the marketplace, and the process would be enforced by the USTR.

While the legislation strikes me as a good step toward demanding some accountability from the online marketplace, the platforms’ control over the display of information may yield results that are more translucent than transparent, but time will tell. Further, I believe Congress and other governing bodies around the world should be more aggressive with Amazon in particular.

One reason a platform like Amazon provides such fertile opportunity for counterfeiters is that we tend to shop on the platform quickly while looking at two things:  a photograph and a price. The photograph is easily deceptive, and only when the price seems unrealistically low do we, perhaps, pause to wonder whether there is any deception afoot. All that text, including the meaningless name of the seller, is probably ignored most of the time.

If this describes the habits of millions of consumers, it seems the task at hand is to require Amazon et al do far more to prevent counterfeits from trading on their platforms in the first place. And, of course, one way to achieve that end is to make Amazon or WalMart or Target liable for harm resulting from the transaction of dangerous products. Liability does wonders for cleaning up corporate conduct; in fact, it is often the only thing that does. Amazon et al would say that this is too burdensome, but is it? Durbin and Cassidy write:

… Amazon and the powerful online marketplace lobby say our bill is too onerous. They say that they already do a great job of policing who is selling what from where on their websites and that the best solution is to leave the status quo in place.

Reality couldn’t be further from the rosy picture painted by these companies. We need to take stronger steps to both prevent illicit sales on online marketplaces and to make sure bad actors are held accountable. As The Wall Street Journal recently reported, law enforcement investigators say they struggle to obtain information from Amazon about shady sellers on their marketplace. 

What’s interesting about that reference to Amazon’s opacity and uncooperative posture, if you read the recent story byAditya Kalra and Steve Stecklow for Reuters, is that it seems that one of the “shady sellers” operating on Amazon is Amazon. Because, of course, there are two sides to the counterfeiting narrative—potential harm to consumers and certain economic harm to legitimate manufacturers.

Though it probably comes as little surprise to many, documents obtained by Reuters investigators show that Amazon, at least in India, has been using its proprietary data to track certain brand trends, replicate (knock-off) those attributes in its own house brands, and then ensure that is house brands appear in search results above the same brands they copied. Kalra and Secklow write:

In sworn testimony before the U.S. Congress in 2020, Amazon founder Jeff Bezos explained that the e-commerce giant prohibits its employees from using the data on individual sellers to help its private-label business. And, in 2019, another Amazon executive testified that the company does not use such data to create its own private-label products or alter its search results to favor them.

But the internal documents seen by Reuters show for the first time that, at least in India, manipulating search results to favor Amazon’s own products, as well as copying other sellers’ goods, were part of a formal, clandestine strategy at Amazon – and that high-level executives were told about it.

So, not only should Amazon’s extraordinary data-driven advantage disqualify it from becoming a counterfeiter on its own marketplace, but having demonstrated its effectiveness at doing so, we should also conclude that it has the resources to comply with the INFORM Act, and a lot more. If Amazon has the ability to track specific sizing trends in a brand of men’s shirts for the purposes of copying the products and undercutting the brand’s market, surely it has the ability to connect a few data points to keep products like counterfeit smoke and carbon-monoxide detectors off its pages. Again, Kalra and Stecklow write:

The 2016 document stated a goal: offer Amazon’s own goods in 20% to 40% of all product categories on Amazon.in within two years. Amazon would achieve profitability in its private-brand business by ‘only launching products that will provide more margin than comparable reference brand products’.

We get it. When Amazon calls proposals like the INFORM Act “burdensome” this is shorthand for the fact that they like making money better than they like spending it. No kidding. But as the senators also note, Amazon seems to have plenty of money to burn on rocket fuel. So, it can probably bear the “burden” of protecting buyers and sellers on its platform.