The new music AI stack: Key takeaways from our NYC event

By:Cherie Hu

Published:2024-11-22

On November 20, Water & Music hosted an intimate evening in New York exploring practical applications of AI across music creation, voice synthesis, and rights management.

Our speakers — leaders from several market-leading music AI companies — shared concrete examples of how these tools are being integrated into daily workflows, while surfacing critical questions about attribution and value creation in an AI-first music industry.

Who was in the room

We welcomed over 40 industry leaders to the offices of Mach1, a spatial and immersive audio production studio and longtime member of the W&M community.

Our audience included representatives from:

Music-tech: Spotify, SoundCloud, Amuse, Dolby, SiriusXM, Splice, Pex, and several early-stage startups
Labels, publishers, and management: Roc Nation, Atlantic Records, gamma
Big tech: Apple and YouTube
Investment: J.P. Morgan, Raine, Bain Capital, GoldState Music

Our featured speakers were:

Rebecca Hu — Product Manager, Suno
Juliette Rolnick — Music Product, ElevenLabs
Benji Rogers — Co-President, Sureel / Partner, Lark42
Yung Spielburg — Head AI Analyst, Water & Music; Grammy-winning producer

Tl;dr

AI tools are maturing beyond novelty features into practical production instruments, with sophisticated control and editing capabilities becoming standard.
Voice synthesis is finding strong product-market fit in promotional content and international reach, potentially transforming how artists approach global markets.
Real-world workflows are emerging that blend AI efficiency with human creative direction, though questions of flexibility and rights remain unresolved.
Attribution technology could fundamentally reshape how value is captured from AI music creation, shifting industry focus from market share to "attribution share."

Full recaps, links, and slides (where applicable) for each talk are included below.

Suno’s evolution in AI music creation

Rebecca Hu, Product Manager at Suno, demonstrated how their AI music generation platform has matured beyond basic creation tools.

Since announcing a $125M funding round in May 2024, Suno’s user base has more than doubled, with over 25M people having made a song on Suno to date. Artist partners like Timbaland are also investing in Suno as a professional-grade creative tool.

Their V4 model, released the day before our event, delivers several key improvements:

Enhanced audio fidelity and vocal quality
Improved lyrical coherence
Better instrumental separation
The ability to "remaster" older songs to the new model's quality level

New features like "Covers" and "Personas" also enable users to capture and extend specific musical styles. In a real-time demo, Hu used the glitchy pop persona “Sugarbean” to generate a theme song for Water & Music in seconds.

However, the underlying business model and rights structure behind Suno remain key topics of discussion amidst ongoing litigation.

Pro-tier Suno users receive full commercial rights to their generated music — but as our audience questions revealed, this applies only to master rights. The issue of publishing rights remains complex: Can one even hold publishing rights on an AI-generated song? Were any songwriters or publishing rights invoked in the generation process? If you generate an instrumental and write your own topline on it, what then?

While Hu acknowledged these legal challenges, she noted that Suno is also actively working on their own attribution and monetization solutions behind the scenes.

Listen to Suno’s V4 showcase playlist, and revisit the Water & Music theme song Rebecca made here, using the Sugarbean persona (login required).

ElevenLabs’ creator-friendly voice cloning at scale

Juliette Rolnick discussed how ElevenLabs’ AI voice cloning technology is transforming marketing and fan engagement for companies of all sizes

The ElevenLabs platform now serves both individual creators and major enterprises, with adoption by 60% of Fortune 500 companies. On the enterprise level, the two main use cases are multilingual content dubbing and conversational AI for customer service.

For creators, Rolnick highlighted several key applications:

Creating high-fidelity voice replicas from existing recorded content like interviews and podcasts.
Generating promotional content in 30+ languages while preserving the original vocal qualities.
Producing content like podcast ads, social media voiceovers, and tour promotions without requiring additional recording sessions.

Rolnick also shared case studies demonstrating the real-world impact of voice AI:

Travel creator Drew Binsky saw his AI-dubbed videos reach 2.6M views on average, compared to 1.1M for non-dubbed content. Dubbed versions of his older content also generated 17% of total views, showing how voice AI can help creators extend their content lifecycle.
Huberman Lab, the world's top health podcast, is partnering with ElevenLabs to reach new audiences in Hindi and Spanish, incorporating cultural nuances into translations with support from ElevenLabs’ in-house team.
NBA star Luka Dončić's Luk.AIinitiative uses voice cloning to create personalized fan experiences and marketing campaigns across multiple languages.

Voice cloning has emerged as one of the most actively pursued AI use cases in music. Major partnerships and deals in this vertical include HYBE’s acquisition of Supertone, Universal Music’s partnership with SoundLabs, Voice-Swap’s partnerships with SoundCloud and BMAT, and Lauv’s partnership with Hooky. (Importantly, translation tech has only been solved for speech; the wider AI community is researching how to translate vocals while maintaining melodic structure and timing, but no commercial solutions exist currently.)

An unexpected area of traction for ElevenLabs' voice cloning has also emerged in electronic music production, where producers are using the platform's voice design capabilities to create custom voiceover or conversational samples (thank you, Fred again..) for tracks.

ElevenLabs is also developing their own generative music tool to compete with platforms like Suno; however, they are taking a more measured approach to its release, with no public launch date yet.

Revisit Juliette’s slides here.

Professional AI production workflows in practice

Grammy-winning producer Yung Spielburg offered a candid perspective on integrating AI into professional music production workflows, demonstrating live how these tools are reshaping his creative process.

Through a real-world case study of creating a podcast theme, Spielburg walked through his multi-tool approach:

Suno to generate initial ideas and elements, including an evocative slide guitar part
AudioShake for stem separation to isolate useful components
Samplab to convert complex audio elements into MIDI data – particularly useful for matching bass lines to intricate guitar parts
Ableton Live for combining the above elements and crafting the final composition

He also demonstrated a practical solution for content creation challenges, showing how a combination of Midjourney, Runway, and CapCut could produce professional-looking social media content for under $50 per month.

While this workflow enables rapid iteration, Spielburg noted important tradeoffs that producers must consider. When using AI-generated elements, you can sometimes "paint yourself into a corner" with clients, Spielburg said. Unlike traditionally recorded music, where every element can be adjusted, AI-generated components often offer limited flexibility for changes. A client's request to "push that word back" or "change that chord" might be impossible without regenerating entire sections.

Spielburg’s primary concern about future leverage for creators wasn't about AI's creative capabilities, but rather about the potential liability issues. "If I want to use [AI-generated] audio, am I setting myself up to be liable if there are takedowns later on?" he said.

Despite these challenges, Spielburg emphasized the practical necessity of embracing these tools to remain competitive. The focus, he said, should be on understanding each tool's strengths and limitations, while maintaining high standards for the final product – regardless of whether AI is involved.

Revisit Yung’s slides here, and our previous analysis of model controllability and professional AI workflows here.

The future of AI attribution

Benji Rogers of Sureel/Lark42 presented a breakdown of Sureel's AI attribution technology and his vision for the future of music rights in the AI era.

According to Sureel’s projections, 60% of all music created will involve AI components by 2027. Meanwhile, traditional music licensing models are at risk of becoming obsolete, as AI companies increasingly rely on synthetic data for training — a trend we covered in our previous webinar.

This shift makes comprehensive opt-out and attribution systems not just valuable, but essential for rights holders wanting to participate in this new economy — far beyond the string of surface-level AI policy statements published recently. As Rogers put it: "You don't need petitions, you need APIs."

At the core of Sureel's approach is technology that creates machine-readable "DNA" profiles of music — breaking down songs into their constituent elements, including vocals, instruments, MIDI data, and lyrics. This granular understanding enables rights holders to protect their existing works through opt-outs, and track and monetize their influence on AI-generated content when they opt in.

To demonstrate the technology's real-world impact, Rogers analyzed 150 of the most-streamed songs on Suno and Udio against three well-known tracks: A famous country song, a hip-hop track containing a sample, and a gangster rap song. The analysis revealed three distinct levels of influence in the AI-generated content:

Minor influence requiring minimal concern
Substantial influence warranting closer attention
Critical influence showing significant attribution that demanded investigation

When Sureel specifically prompted these AI platforms to create songs similar to these three tracks, the attribution patterns became even more pronounced.

Notably, this influence crossed genre boundaries in unexpected ways. The famous country song's DNA appeared not just in country-style generations but also in rock and metal tracks; the hip-hop track showed strong attribution patterns across R&B, metal, and reggae. This suggests that rights holders need to think beyond traditional genre-specific licensing when considering their music's value in AI systems.

Sureel aims to make this attribution tracking accessible to creators at all levels — starting with free access for the first 100 tracks, before moving to per-asset pricing for larger catalogs and enterprise tiers for major rights holders.

To close out, Rogers advocated for shifting industry focus from market share to a new AI-centric value stream known as "attribution share." Rather than just tracking direct usage and streams, rights holders need to understand and capture value from how their works influence AI models' overall creative capabilities.

Whether this attribution-based revenue will significantly impact industry bottom lines — or disrupt the current, major-dominated power structure — remains to be seen.

Revisit Benji’s slides here, and read more about Sureel’s vision for attribution share over market share here.

Revisit Water & Music’s previous research on music AI:

Have any additional questions or feedback? Please reach out to members@waterandmusic.com!