Breaking down the music AI tech stack

Bit Rate is our member vertical on music and AI. In each issue, we break down a timely music AI development into accessible, actionable language that artists, developers, and rights holders can apply to their careers, backed by our own original research.

This issue was originally sent out under our collab research vertical known as The DAOnload.


Last week, our community got an inside look at bleeding-edge music AI tools from Harmonai, the open-source music AI arm of Stability.ai, and Pollinations, a company focused on creating more accessible interfaces for existing AI models.

The first workshop from Harmonai walked us through the Google Colab notebook for their flagship generative music model, Dance Diffusion. We learned how to fine-tune our own music AI models as well, using our own audio files and catalogs. Have you ever wondered what an AI model based on you might generate? There is now a real, concrete path to exploring that question, thanks to models like DD.

Colab notebooks are definitely for the more technically inclined — but Harmonai is open-source, so if anyone is interested in working on front-end designs for Dance Diffusion, their Discord server can be found here.

With Pollinations, founder Thomas Haferlach walked our community through several models that they are hosting on their Replicate page:

  1. A prompt guide to help “speak the language” of a model
  2. An image generator which Pollinations developed a special interface for allowing batch generations
  3. An upscaler to increase image quality
  4. An interpolator to seemingly “blend” between two images

You can access recordings of our workshops here:


Harmonai and Pollinations’ workshops showed us different layers of the generative AI tech stack for music.

A recent article on Andreessen Horowitz’s blog outlines three relevant tech layers for generative AI at large:

Each of these layers would not exist without the previous one, and each has their own questions to answer around business models and defensibility.

How does this breakdown apply to music?

Most of the music AI startups we’ve studied and interviewed are mostly operating at the model and/and application layer. For instance:

A small few are operating at additional layers not mentioned in a16z’s original blog post, but that are of central importance to the music industry specifically:

The key overarching point to drive home is that not all music AI companies are created equal, or have the same goals or customers. Depending on where in the music AI tech stack they sit, they’ll be prioritizing different features, working with artists differently, and pursuing different kinds of business models.


EVEN MORE RESOURCES

Didn’t have time to drop into our Discord server this week? No worries. Stay up to date right here in your inbox with the best creative AI links and resources that our community members are sharing each week:

Tools and models

Music-industry case studies

Legal developments

Other articles