Breaking down the music AI tech stack

By: Yung Spielburg

Published: 2023-01-25

Bit Rate is our member vertical on music and AI. In each issue, we break down a timely music AI development into accessible, actionable language that artists, developers, and rights holders can apply to their careers, backed by our own original research.

This issue was originally sent out under our collab research vertical known as The DAOnload.

Last week, our community got an inside look at bleeding-edge music AI tools from Harmonai, the open-source music AI arm of Stability.ai, and Pollinations, a company focused on creating more accessible interfaces for existing AI models.

The first workshop from Harmonai walked us through the Google Colab notebook for their flagship generative music model, Dance Diffusion. We learned how to fine-tune our own music AI models as well, using our own audio files and catalogs. Have you ever wondered what an AI model based on you might generate? There is now a real, concrete path to exploring that question, thanks to models like DD.

Colab notebooks are definitely for the more technically inclined — but Harmonai is open-source, so if anyone is interested in working on front-end designs for Dance Diffusion, their Discord server can be found here.

With Pollinations, founder Thomas Haferlach walked our community through several models that they are hosting on their Replicate page:

A prompt guide to help “speak the language” of a model
An image generator which Pollinations developed a special interface for allowing batch generations
An upscaler to increase image quality
An interpolator to seemingly “blend” between two images

You can access recordings of our workshops here:

Harmonai and Pollinations’ workshops showed us different layers of the generative AI tech stack for music.

A recent article on Andreessen Horowitz’s blog outlines three relevant tech layers for generative AI at large:

Infrastructure — i.e. hardware (GPUs, TPUs) and cloud platforms (Amazon’s AWS, Microsoft’s Azure) for compute power and processing.
Models — e.g. Stable Diffusion, GPT-3, and DALLE, which ultimately power third-party AI tools either through APIs or through open-source checkpoints.
Applications — i.e. consumer-facing products (ChatGPT, RunwayML) using generative AI, running on either proprietary or licensed models.

Each of these layers would not exist without the previous one, and each has their own questions to answer around business models and defensibility.

How does this breakdown apply to music?

Most of the music AI startups we’ve studied and interviewed are mostly operating at the model and/and application layer. For instance:

Harmonai is focused on building new AI models for music that can power third-party applications behind the scenes.
In contrast, the likes of Pollinations are more focused on building a better front-end creative UX with the flexibility to tap into multiple different third-party models underneath, instead of building their own models in-house.

A small few are operating at additional layers not mentioned in a16z’s original blog post, but that are of central importance to the music industry specifically:

Content distribution. For instance, Boomy allows users not only to create music with AI, but also to distribute and monetize that music on streaming services like Spotify and social apps like TikTok, participating in the incumbent music economy alongside those songs that are not AI-generated. Because music copyright has so many more moving parts than in the visual and text AI worlds, music distribution is its own separate beast in the AI tech stack — and could be a significant competitive advantage for music AI startups, if they have the resources and industry know-how to get it right.
Consumer formats and UX (as opposed to just creative UX). For instance, Bronze is hoping to pioneer a new interactive consumer format for AI-generated music, in the same category as the MP3. The format they have in mind will allow artists to provide every listener with a different version of their song each time it is played. (Their web-based release with Jai Paul is a good example of this technology in action.) The MP3 was a patented technology that required licenses to use, and generated enormous value for the format’s creators. Next-gen format patents that drive core consumer products could also be another clear path to defensibility for music AI companies.

The key overarching point to drive home is that not all music AI companies are created equal, or have the same goals or customers. Depending on where in the music AI tech stack they sit, they’ll be prioritizing different features, working with artists differently, and pursuing different kinds of business models.

EVEN MORE RESOURCES

Didn’t have time to drop into our Discord server this week? No worries. Stay up to date right here in your inbox with the best creative AI links and resources that our community members are sharing each week:

Tools and models

Respeecher (Voice cloning and substitution)
Perplexity.ai (AI-powered search engine)
Video2Music(Qosmo demo generating music for videos)
ROME (avatar creation from just a single image)
Audio Generation with Diffusion (Flavio Schneider’s mind-blowing examples of audio diffusion and upscaling)

Music-industry case studies

AI-powered stans — some Ariana Grande fans are using using AI voice synthesis/transformation models like DiffSVC to make Ariana sing songs by other celebrities, like “Kill Bill” by SZA.
drayk.it — thanks to this new tool from mayk.it’s head of product Neer Sharma, you can make Drake sing about anything in under one minute.
AI and music journalism — TIDAL commissioned this piece from Simon Reynolds about whether AI tools like ChatGPT can replace music writers.

Legal developments

Responsible AI Licenses (RAIL) — call for participation in working groups around tooling governance, public policy, license development, and more
Deepfakes in the courts — article explaining how even audio deepfakes can be presumptively valid under UK law

Other articles

10 Best AI Voice Generators (Unite.ai)
AI Art and the Problem of Consent (ArtReview)
AI in 2023: The Application Layer has Arrived (Digital Native)