The music AI attribution problem

By: Yung Spielburg

Published: 2023-04-03

Bit Rate is our member vertical on music and AI. In each issue, we break down a timely music AI development into accessible, actionable language that artists, developers, and rights holders can apply to their careers, backed by our own original research.

This issue was originally sent out under our collab research vertical known as The DAOnload.

Why attribution is critical in today’s music economy

In today’s TikTok-driven market, music artists increasingly embrace “letting go” of their work as a survival strategy. Viral unlicensed remixes and clips can unexpectedly alter artists’ careers and the lives of those behind these fleeting, lightning-in-a-bottle moments.

The core idea is that increasing the surface area of a musical work through unrestricted distribution (e.g., pirated remixes, TikTok videos) ultimately benefits the artist’s brand — driving not only music consumption but also ticket sales, merchandise, sponsorships, and other revenue streams by maintaining visibility among casual audiences. In essence, the music industry’s value now lies in artist brands rather than in the music assets themselves, necessitating a more relaxed approach to content in a competitive landscape.

Generally, it’s a win-win scenario: Creators of unlicensed derivative works build their following, while original creators gain follow-on interest as well as the ability to properly monetize the derivative work if they choose to.

But even still, this system relies on one critical aspect, without which the entire structure collapses: Creative attribution, or the ability to identify the author of the original work.

For music sampling or interpolation, attribution is straightforward, with established crediting and clearance systems. However, when emulating or “taking inspiration from” existing works, analysis becomes murkier, potentially involving litigation and conflicting expert opinions (e.g. the Blurred Lines case). Recent initiatives across the music and tech industries, like Spotify’s Songwriter page and TikTok’s Crediting Creators tool, have aimed to enhance attribution visibility in a more free-form media world.

… But now enter generative music models, which may be upending attribution as we know it in the music business. The urge to map existing sampling structures onto generative AI tools is very tempting, but may prove to be a difficult framework to apply.

Why creative AI makes attribution so hard

In numerous founder interviews, we heard that attribution in music AI is not possible at the level we’d expect elsewhere in the traditional music industry. Unlike with sampling, it’s hard to determine how much training data influences a generative AI model’s output, or how much the originator should be compensated if the model is commercialized.

Current creative AI architectures leave considerable room for debate as to whether an individual work can directly influence an output, especially with the large-scale models needed for high-quality results. As a model’s parameters increase with more artist-contributed data, the final representations become a complex stack of overlapping references. For instance, a large-scale music AI model could generate a piece in a particular EDM artist’s style by using overlapping data from thousands of similar-sounding artists in the training data, with no clear way to determine which ones were — or, equally crucial, were not — included.

In an email to Water & Music, music AI practitioner Mat Dryhurst illustrates this problem by comparing his homegrown Holly+ model, built on Holly Herndon’s voice, to larger-scale models like Stable Diffusion or Midjourney:

“Holly+ is all Holly — easy. [Revenue from] a model with only ten producers can be split ten ways — easy. Billion-parameter model? Forget it.”

Importantly, most of the notable music AI models released so far in 2023 are built on extensive amounts of training data. For instance, Google’s MusicLM is built in part on MuLan, which spans 44 million recordings totaling 370,000 hours (~42 years) of audio, and the Free Music Archive dataset, which includes 343 days of Creative Commons-licensed audio. Consequently, Google researchers are likely to face similar attribution challenges in music as their counterparts do in text and visual art. (Compounding the issue, influential AI companies like OpenAI are ironically making the training data behind their generative models more opaque due to competitive market pressures.)

The power of starting small: Case studies

Interestingly, many music AI founders and developers are addressing the attribution issue (for now) by starting small.

Compared to other creative fields, music AI developers are taking ethical sourcing of training data more seriously, to avoid launching a commercial product that is effectively dead on arrival due to rights issues. In our Season 3 research, we found that developers tend to commission artists and producers directly for training data, structuring deals primarily as buyouts (e.g., Mubert Studio pays artists $0.50 per sample for inclusion in their Render tool). This hands-on approach contrasts with those taken by large-scale image and text generators, which currently face potential legal repercussions for scraping copyrighted content from the Internet.

Spawning — an initiative co-founded by Dryhurst, Herndon, and a group of their peers — aims to establish tools and processes for artist consent in AI training data. Dryhurst writes to us that he hopes in part to “remove data from large models so that artists might have a hope to monetize with private models,” adding that there’s another potential future of “bundl[ing] artists together into co-owned models.” Companies like Harmonai and Semilla are working with independent artists to build custom AI models using this private, fine-tuned approach.

Never Before Heard Sounds aggregates creative AI models into a single interface via their new, browser-based take on digital audio workstations (DAWs). One such feature is a library of timbre transfer models (including Holly+) with a “live-linking” approach for attributing specific artists’ AI models within the DAW file itself. (You can view our workshop with the Never Before Heard Sounds team here for more info on how this works.)

Karen Allen, founder of Infinite Album, popped into one of our Season 3 deep-dive workshops and shared that the company was in the process of creating an attribution mechanism for their underlying generative music engine.

Infinite Album targets an underserved niche market of Twitch streamers needing interactive, copyright-safe background music. The tool differentiates itself by offering adaptive music that integrates with the game app/mod platform Overwolf. Users can select different “vibes” for music they wish to generate. That music will then align with various in-game events — e.g. programming music to sound sad when a character dies, or hyped up when achieving a goal.

Infinite Album commissions proprietary music from artists for their model training. Allen explained to Water & Music that the company can identify which artist’s data contributes to each “vibe” that a user chooses for the music they want to generate, allowing them to factor in an artist’s frequency of appearance in compensation models.

Note that Infinite Album combines both algorithms and machine learning to achieve a more transparent, controllable generation process by first breaking down the production into small, atomic tasks. Each task is then backed by an independent generation system. This approach enables more precise control over output, as well as detailed attribution of each task’s contributions.

In contrast, large-scale AI models such as ChatGPT and Midjourney rely solely on deep learning, making attribution an immense challenge. This difficulty is expected to increase over time as large-scale AI models incorporate more user-generated content, allowing anyone to contribute training data.

In conclusion: The attribution advantage

Enhanced attribution functionality and standards for creative AI would offer numerous benefits to music-industry stakeholders. Aside from giving artists a clearer view of their “AI footprint,” attribution mechanisms could enable music AI startups to explore more flexible compensation models for training data providers. Time will tell if being able to offer different compensation structures will attract better proprietary data than competitors and prove to be a market advantage for startups.

The alternative at the scale that many of these startups may want to operate in the future is simply… no attribution at all. If there is no attribution, there is no way to offer royalties or payments based on usage — in which case we would have to write new, AI-specific licensing frameworks on the backend from scratch. Is the music industry prepared for such a change?

Even more resources

Didn’t have time to drop into our Discord server this week? No worries. Stay up to date right here in your inbox with the best creative AI links and resources that our community members are sharing each week.

This is a truly communal effort — shout-out to @cheriehu, @aflores, @brodieconley, @moises.tech, @BenLondon12, @yung spielburg, @KatherineOlivia, @Brien, @jonlarsony, @itsmi.kee, @k0rn, and @Ragdolltk for curating this week’s featured links:

Music-industry case studies

Dozens of entertainment trade organizations, including but not limited to A2IM, AIM, ASCAP, BMI, BPI, IFPI, NMPA, RIAA, and SESAC, signed the Human Artistry CampAIgn, laying out a set of principles for supporting human creators in an AI-driven world
Tencent Music released a new virtual artist, Lucy
Linkin Park released a music video made with the visual AI tool Kaiber (which we covered in a previous issue of the DAOnload)
Taiwanese artist Sandee Chan released a new track where all vocals are performed by an AI trained on her voice
Cyber PR is offering an AI Music Marketing course
YouTubers are playing around with this Kanye voice AI model
Developers are playing around with AI-generated covers and mashups of popular songs in the Diff-SVC Discord server

AI tools, models, and datasets

Okio (new company creating Nendo, an open-source generative AI music tool suite)
WavTool (GPT-4-powered plugin to interact with a DAW using natural language inputs)
EDGE (new model to convert music to dance choreography)
Firefly (Adobe’s new visual AI tool, trained on Adobe stock images)
GPT-4 (OpenAI’s latest large language model update — which, as we mentioned above, has zero info on its training data)
Bard (Google’s AI search tool — waitlist only)
Fireflies (AI tool to transcribe and summarize meetings — not to be confused with Adobe’s Firefly)

Legal and commercial developments

Generative AI startups raised nearly $6B of funding in 2022, up from $1.5B in 2021 (as reported by Music Ally)
The US Copyright Office issued a clarifying brief on how they will treat works submitted for registration that were (partly) created by AI — tl;dr, it’s still case-by-case
The US Federal Trade Commission published official guidance on consumer protection around voice clones and other forms of AI deepfakes
The UK Government released a report on a pro-innovation approach to AI regulation

Other resources

[podcast] Ezra Klein’s view on AI
[paper] How large language models will impact the labor market
[paper] Generative AI and the Digital Commons
[newsletter] The AI Copyright Fight: A Guide
[article] Voice AI is already getting past national security systems
AI audio panel discussion featuring reps from Google, TikTok, Splice, and Stability/Harmonai

Follow more updates in Discord

Keep an eye on our #ai-news-bulletin — our read-only channel where our research team curates the latest tweets related to creative AI news, tools, and developments, exclusively for members.
Drop the coolest audio AI tools you find in #audio-ai-tools.
Join the general community discussion in #ai-avatars.