This article was originally published on Resident Advisor as part of Water & Music founder Cherie Hu’s guest-edited month of June, featuring specially curated content exploring new possibilities in music and tech.
By now, ChatGPT is a household name and has been integrated into the daily workflows of engineers, researchers and students all over the world. Image generators like Midjourney and visual AI platforms such as RunwayML are changing the way game developers, designers and filmmakers approach their craft. However, it’s only recently that musical applications have entered the headlines.
Why has music AI lagged behind in both development and adoption? One big reason is funding. More than $7 billion dollars was invested in generative AI in 2022, and less than five percent of that went to music-related projects. Another factor is the data bottleneck—finding sufficient volumes of quality and cleared data to train on is very difficult. Many of the popular image generators and LLMs (Large Language Models) have been trained on vast swaths of copyrighted material Even flirting with copyright infringement in the music business can make your product dead on arrival (see Musicfy, which was shut down for rights violations and relaunched with clean models). So many music AI companies interested in commercialising their products have been commissioning proprietary data, licensing from sound libraries and using open-source datasets.
That said, emerging music technologies have danced on the legal and ethical lines before. The iPod was released three years before the iTunes music store launched—what did Apple think we were all putting on our iPods?—and YouTube was a cesspool of rights violations for years until Google ensured rights holders would be paid.
To compare current visual and music AI models, we need to line up a well-known image generator: think DALL-E, Stable Diffusion or Midjourney, alongside Google’s MusicLM. The latter was built in part on MuLan, which spans 44 million recordings totaling 370,000 hours (~42 years) of audio, and the Free Music Archive dataset, which includes 343 days of Creative Commons-licensed audio. (We can assume this is composed largely of YouTube data sources and therefore includes copyrighted works.)
This music model is, in my opinion, the highest-fidelity synthographic music generator currently available. Synthographic means that while it’s trained on previously existing recordings, the model doesn’t directly manipulate any previously existing recording in the creation of new compositions and audio files. Instead, these generators employ deep-learning techniques like diffusion and other neural networks to learn various elements like structure, chords, melody, instrumentation, tempo, key and genre more indirectly.
Synthographic models can either perform raw audio synthesis or generate MIDI data that triggers sounds in a proprietary DAW. While unlocking the former has the greatest potential—giving users the most creative freedom—many of the technologies reviewed below use the latter process. Applications using raw audio synthesis are largely still prototypes. And this process of generating MIDI that triggers preset sounds means the actual sounds that the user hears are dependent on sound design from producers. These tools are being used to create music that’s competitive in the landscape with songs we listen to every day, while applications performing raw audio synthesis are largely still prototypes.
Below is a rundown of some of the currently available AI tools.
Soundful competes with both sound libraries for content creators and sample libraries for producers and artists. The tool allows users to generate full pieces of music (which can later be separated into stems) or individual parts one at a time. The sound quality is comparable to high-quality sound libraries—which is to say, it sounds professional but not wildly inventive. Why highlight Soundful? The compositions. I generated multiple pieces with sophisticated, tasteful and fun harmony and arrangements. I actually found myself transcribing the progressions, and debating shelling out $50 for stems.
To get started, head to the search bar at the top of the page. Soundful is in Beta so you may run into a “coming soon” message from time to time as you whittle down your search by common attributes such as genre, mood or artist. The artist list is more of a prototype than a fully realised function, running from 808 Mafia to Alok in a seemingly partial list. This kind of direct artist reference also raises questions about attribution. Even if these artists’ works aren’t part of the data set used to train the model, what are the legal and ethical implications of using their names as prompts?
Once the user decides which kind of music they want to create, Soundful generates a two-to-four-minute track known as a “preview.” The free tier allows for ten downloads per month of full tracks, but if the user wants stems, that requires a paid subscription. Users can purchase the copyright for their output (the assumption is both the composition and sound recording, but this is unspecified) for $50, but only if they’re already paying $10 per month (or $89 per year).
There are endless boutique sample packs out there—as well as Splice or Loopmasters—but Soundful guarantees users that every generation is unique. While most “previews” will be a pass, producers and artists only really need one big YES to inspire an idea. The fact that I was able to generate anything that might inspire a song is notable.
BandLab is both a music creation and music discovery platform allowing users to make, share and discover music. Making music occurs in BandLab Studio, a proprietary, browser-based DAW that sits in between GarageBand and Logic in terms of complexity. BandLab also has a social component that allows users to make profiles for sharing and promoting their music. You can sort by skill, location and genre. Discovery is a familiar format, with “editors picks,” “recent releases” and “featured artists,” as well as the ability to follow artists and livestreamed shows from top creators on the platform. Producers and artists can fly solo or find collaborators through their “creator connect” feature.
The AI-powered “songstarter” allows a user to select from a menu of genres and generate 30-second previews of tracks that can either be saved for later or discarded. There’s also a pair of digital dice you can roll if you’d rather leave things to chance.
The generated tracks can then be brought into BandLab Studio where they’re separated into individual MIDI tracks that users can manipulate by adding instruments or recording new audio on top. Not only was I able to generate tracks worth building on, some of BandLab’s vocal presets were really good. Its auto-pitch plug-in was a standout. The platform has a few different vocal settings—classic, modern rap (duplicating the vocal recording and pitching it down the octave to play along with the original recording), robot (emulating a talk box), big harmony (stacking multiple voices) and duet (adding a third harmony and adjusting the formant so it sounds like a different voice singing).
BandLab’s browser-based DAW is impressive. Latency is one of the hardest variables to solve in these kinds of tools and I didn’t find latency a hindrance to the creative process at all, at least with smaller track counts. There will be some yearning for more detailed editing and organisational capabilities for producers and artists coming from advanced DAWs such as Logic, Ableton Live or Pro Tools, but overall there’s a lot here. The feature of note—the ability to generate full tracks with the click of a button—dramatically shortens the time it takes to start to see a full picture.
I have no intention of fully producing in BandLab regularly, but I would definitely export and process certain files using the tool. That said, this is exactly how I ended up converting from Pro Tools to Ableton a few years ago.
CoSo + Create (Beta) by Splice
Splice‘s CoSo is a free mobile app that uses AI to organise sample “layers” from its existing sample library into “stacks.” The technology also employs an audio engine that can manipulate existing samples’ tempo and pitch to match each other. The user selects a starting style, and CoSo browses Splice’s library for a collection of loops that go together. The software defaults to four layers, but users can add up to four more, choosing from any of the 14 different categories of instruments offered in Splice’s full tool on browser or desktop. Just this month, Splice rolled out Create, a beta release of this functionality for desktop. In addition to all of the same functionality as CoSo, you can export your project as an Ableton Live session. Splice will create an Ableton file with the stems pre-loaded into clip view. I always need to take one step of recording the stems into arrange view, but that’s a small ask when compared to the amount of friction these tools are eliminating.
CoSo is on this list because I use it daily. The app allows users to save samples that they like directly to the desktop app, which automatically syncs upon launch. Instead of doom-scrolling Twitter or wandering aimlessly on TikTok, Instagram or YouTube, why not browse CoSo for inspiration?
Next time you sit down to write new music, you’ll have a batch of samples pre-screened and ready to go. Anecdotally, I know songwriters and producers who do this in between sessions in their Lyfts and Ubers so when they arrive there’s a little inspiration in their pocket, if needed.
Stem-splitting software allows users to take a single audio file and separate it into individual instruments. This technology has been around for years. Audioshake, LALAL.AI, Izotope RX9, Acoustica 7.3, XTRAX Stems and Spleeter are just a few. I see two main use cases for this technology. One, it can be a lifesaver for artists who need stems in a pinch, either for live shows or for sync requests original sessions to be lost or for producers to become unresponsive). Stem splitters also make it easier to do remixes and create mash-ups—DJ software such as Serato DJ has even integrated this kind of software to enable on-the-spot mash-ups.
Though some get close, I’ve yet to come across a 100 percent flawless stem-separation. For my money, Audioshake and LALAL.AI are the highest quality. They have the least amount of artefacts, very intuitive workflows and straightforward pricing. Purchases are made either by purchasing a la carte stems or in batches. Audioshake offers nine stem options, most of which are $10 per stem. Some are more (instrumentals are $25). Users can get the full batch of nine stems for $50. LALAL.AI has a similar pricing setup.