How music AI content and copyright detection actually works

By:Cherie Hu

Published:2024-07-02

When it comes to the value of music AI, data is like oil — and rights holders are determined to control the pumps.

As music AI applications gain millions of users and generate increasingly realistic outputs, the industry faces new challenges in copyright protection and fair artist compensation. Over 350 music industry organizations have signed ethics statements on music AI, emphasizing the importance of data transparency and artist consent in the model training process. Meanwhile, copyright infringement lawsuits and cease-and-desist letters against music AI startups are piling up, involving every major rights holder.

The next six months will define the future of the music business, as we move beyond philosophical and ethical debates to practical solutions for IP protection in an AI-led market.

Below, we outline the emerging supply chain of music AI content and copyright detection — from scanning models’ training data behind the scenes, to detecting music and voice deepfakes on platforms like YouTube and TikTok. We examine how each step works, who is involved, and why it matters.

The field is complex, rapidly evolving in real time — and may never be fully resolved.

I. Auditing AI models’ training data

Nearly every lawsuit filed by music rights holders against AI firms centers on one issue: The alleged use of copyrighted material to train AI models. The RIAA is suing Suno and Udio for copyright infringement, seeking up to $150,000 in damages per work infringed; Universal Music Group and other music publishers are going after Anthropic for ingesting song lyrics; Sony Music Group has warned 700 AI firms not to touch their data; the list goes on.

The challenge here is that we can't definitively know what data was used to train an AI model unless developers disclose it — or allow an audit. The music AI developer community is split on data transparency, with many prioritizing speed and revenue over ethical considerations. Some of the fastest-growing music AI startups, including Suno and Udio, have been opaque about their training data, with investors even supporting their decision to forgo licensing deals before securing funding.

Others see consensual data sourcing as both an ethical necessity and a competitive advantage. For those willing to be transparent, the main technical tool for auditing training data is audio content recognition software (ACR), which checks the contents of a given training dataset against existing music catalog and rights databases. For instance:

Stable Audio used Audible Magic's audio fingerprinting and identification services to remove copyrighted music from datasets like Freesound. (The model developers documented their process in detail on HuggingFace.)
BMAT is developing a technical certification to scan AI training datasets against their audio fingerprint database, with voice cloning startup Voice-Swap as their first test case.

Beyond ACR, high-level industry certification initiatives are also emerging:

Fairly Trained — co-founded by Ed Newton-Rex, former VP of Audio at Stability — has certified 13 music and voice AI startups to date. Their certification model focuses on criteria like legal clearance of data (“explicitly licensed, openly available, in the public domain, or owned by the model developer”) and meticulous record-keeping.
AI:OK — backed by the Irish government and co-founded by former RIAA CTO David Hughes — is developing a similar “trustmark” for "responsibly created" AI music, to be guided by an Advisory Council of music industry stakeholders.

Importantly, the technical rigor of these certifications remains unclear, and has faced critique from the wider AI community. For example, B2B image generator BRIA AI was quietly removed from Fairly Trained’s list of certified models after questions arose around the true nature of the startup’s training data. To date, the Fairly Trained team has not provided any public transparency on the motivation behind BRIA’s removal, or why their vetting process had not picked up on any potential issues.

II. Scanning user uploads

Some music AI apps like Suno, Udio, and Cassette allow users to upload audio files as inspiration for AI-generated output, a functionality known as “audio prompting.”

While innovative, this feature clearly poses legal risks if users upload copyrighted songs as references. To mitigate this risk, music AI tools can use the same ACR technology to scan user uploads, to ensure they are original and not sourced from copyrighted material. Stable Audio’s closed-source 2.0 model uses Audible Magic for this purpose.

The concept of flagging user uploads for copyrighted material isn't new. YouTube has long used its proprietary Content ID system for this purpose, while rights holders use third-party services like Audible Magic and Pex for similar monitoring and flagging across dozens of social media and UGC platforms.

However, rights holders generally encourage user-generated content around music, as long as there are proper licenses in place. In contrast, they take a “default block” stance against AI: No copyrighted material is permitted as a user-uploaded reference for audio prompts.

III. Detecting AI-generated music or vocals

As AI-generated music becomes more prevalent, distinguishing real from fake grows increasingly difficult. A recent YouGov survey found that only one in five Americans are confident they can spot the use of AI in music creation or completion.

In response, several companies, including Deezer Research, IRCAM Amplify, CoverNet, and Pex, are developing tools for detecting AI-generated tracks, including both wholesale AI generations of full songs and voice clones that impersonate existing artists.

These tools target B2B customers like streaming services and distributors that are grappling with the influx of AI-generated music, and want to ensure catalog quality by blocking uploads of infringing or low-value AI-generated content. For instance, Deezer recently offloaded 26 million tracks — amounting to 13% of its catalog — as part of a wider shift towards quality over quantity of engagement on its platform. It makes sense that the company would eat their own dog food and build their own deepfake detection tech to keep their platform clean.

Most tools treat AI output detection (“was this track generated by an AI model?”) as separate from music copyright detection (“is this output infringing on existing copyrights?”). Each solution has a distinct technical backbone, and relies on different reference data.

For example, tools like IRCAM Amplify’s AI Generated Music Detector and CoverNet’s AI detection feature can identify not only whether a certain track was AI-generated, but also which specific AI model (e.g. Suno versus Udio) generated the track. To accomplish this, companies generate their own outputs using existing music AI tools, then train their own AI model to detect differences between those outputs and a benchmark dataset of non-AI music. Deezer trained their own detection AI by running 25,000 real songs (from the Free Music Archive) through open-source AI music generators, and then teaching their own AI to spot subtle imperfections in visual representations of the music, known as spectrograms.

Separately, Pex is developing a voice AI detection tool to address the growing black market of deepfake AI covers on YouTube, TikTok, and other platforms (à la “Heart On My Sleeve”). Pex’s proposed approach involves matching biometric traits of voices against an existing database of voice fingerprints.

(source)

This process is more complex due to the lack of a comprehensive, platform-agnostic database of voice IDs tied to artist names, and the fact that some artists may have multiple distinct vocal styles throughout their careers. Some industry sources claim that you can still rely on specifically trained audio fingerprints to identify singers, just like you can train a fingerprint to detect covers; for instance, Audible Magic still relies primarily on their cover detection technology (known as Version ID) to detect voice AI clones. But this is not a perfect system, because it can’t explicitly identify if the voice clone is impersonating a specific artist. In other words, while ACR technology can be used to identify if a “Michael Jackson AI cover” on YouTube is indeed a cover of an existing Michael Jackson song (i.e. a re-recording of an existing MJ composition), it cannot tell you if the voice AI clone is impersonating Jackson or another artist entirely.

Moreover, while these detection tools all claim a 98%–99% confidence rate in their accuracy, success is a moving target, which makes the interpretability of confidence rates blurry at best. As AI improves at making music, detection methods must keep pace with model updates, which occur on a weekly basis. Updating a detection system to accommodate new models can take anywhere from one day to one month, depending on a team's technical and financial resources.

Hybrid content is also tricky. Currently, no tool can identify which specific part of a song is AI-generated if only a portion of it (e.g. just the vocals, backing chords, or percussion) is artificial. Several music companies mentioned in this article are actively working on developing this functionality, as it's in high demand from industry customers.

IV. Watermarking and self-reporting AI-generated works

UGC apps like TikTok, YouTube, and Instagram are implementing their own tools for clearly labeling AI-generated music and visual media on their platforms, including manual tagging and automatic watermarking.

Watermarking has a long, contentious history in digital music, dating back to embedding marks in digital music downloads to help rights holders track unauthorized sharing and usage (remember “DRM”?). In the case of AI, watermarking is driven less by commercial incentives and more by regulatory mandates. In 2023, major tech firms such as OpenAI, Alphabet, Meta, and Adobe made voluntary commitments to the White House to implement watermarking measures for enhancing AI safety.

One year after these pledges, AI content watermarking by tech companies is starting to show real implications for music. For example, Google has implemented SynthID watermarking for outputs from its AI models — including Lyria, which powers music and voice AI tools on YouTube Shorts. For audio AI outputs, Google’s method involves converting the generated sound into a visual representation (a spectrogram), embedding a hidden watermark, and converting it back into audio. SynthID can then scan content on apps like YouTube to detect these watermarks, helping users identify if something was created by Google's AI tools.

Similarly, Adobe — which has released nearly a dozen different generative AI features across its suite of products, including Photoshop and Illustrator — created its own symbol that can be added to the metadata of images, videos, and PDFs to indicate AI use. Images created with Adobe’s AI features are now labeled as "Made with AI" on Instagram and other Meta platforms. TikTok and Instagram also offer users the option to self-report if their uploaded content incorporates generative AI.

While these solutions are widely adopted — by over 37 million creators, in the case of TikTok’s self-reporting tools — they are incredibly vulnerable to inaccuracy and manipulation. Self-reporting systems rely on user honesty to be effective, while nearly every AI watermarking system has been tampered with by researchers. And labels may unintentionally misrepresent creators who use AI as only a minor part of their process. For example, a single pixel edited with Photoshop’s AI-powered tools could flag an entire post on Instagram as AI-generated, alarming creators who unexpectedly see a "Made with AI" designation on their largely human-driven work.

Now is the time for the music industry to monitor these developments and assess their implications for music AI tools. After all, many music AI features in popular apps like Logic Pro, BandLab, and Ozone treat AI as just one element of a wider creative workflow; it is debatable whether songs made using these tools should be labeled as "AI-generated."

The big picture: Suno and Udio lawsuits

Where do the music industry’s latest lawsuits against Suno and Udio fit into the picture? A closer look at the complaints reveals how the music industry at large is addressing the lack of explicit knowledge about AI training data.

In both suits, the plaintiffs conducted systematic tests of the AI music models — using specific prompts designed to target particular copyrighted songs, then analyzing the outputs for similarities to those songs. Their argument relies on inferential evidence, with rights holders claiming that it would be impossible to achieve the quality and variety of outputs that Suno and Udio allow without training their models on vast amounts of copyrighted music.

This methodology is imperfect, and can lead to ambiguous results. The case-by-case manual inspection of outputs is time-consuming, and can never provide 100% certainty on the contents of a training dataset.

But just because a particular method is imperfect doesn’t mean it won’t hold up in court. Even as the field of music AI continues to evolve, the industry's strategy for litigating against AI companies mirrors previous lawsuits against big tech — namely, apply legal pressure in the absence of transparency. The message is clear: "Cooperate, or we will litigate."

Revisit Water & Music’s analysis on music AI: