Paradigm shifts: How AI will transform music creation in the next 10 years
This article is the second installment in a two-part report about the state of creative AI for musicians, and comes with a member-exclusive database of over 40 different creative AI tools, ranging from off-the-shelf web applications to more customizable APIs and machine-learning models. You can access the database here , and read part one of the report here .
A handful of W&M community members helped us curate tools and resources for this database; they are listed at the bottom of this article.
Introduction
Imagine it’s 2032. You’re an emerging musician grinding away on your debut album. But it’s getting late; you’ve been in the studio all day, and you just can’t find your groove.
You decide to call it and try again tomorrow — but before you head home, you open up a new AI plugin you’ve heard your producer friends raving about. The plugin asks you for a handful of songs you’d like to use as inspiration (just as all producers typically have “reference tracks” in mind when beginning a session), a rough tempo range, and a key that the singer you have in mind is comfortable with. Once you’ve selected your preferences, you’re pleasantly surprised to come across a newly added feature to the plugin that lets you choose from a list of famous producers and songwriters you’d like to collaborate with, populated by fully-licensed AI emulations they’ve helped create. That ultra top-tier pop songwriter whose work you’ve been studying for years? The plugin can use her licensed AI neural model to generate 10 toplines that sound awfully close to her No. 1 Billboard hits, but different enough to be something new .
The next morning, you come back to your studio desk. To your amazement, the plugin has a fresh session set up for you, populated with dozens of world-class starter ideas based on your preferences and inputs — chord progressions, rhythms, basslines, drum loops, guitar loops, and samples — that would have otherwise taken hours or even days to conceive of yourself. Reinvigorated with inspiration, you get back to work.
This situation might feel beyond your wildest dreams, but today’s tech companies are already laying the infrastructural foundations required to make it happen. We mapped out the current state of these foundations in part one of our report on creative AI tools for artists, published in May 2022. Through our own market research and interviews with music-tech executives, we found dozens of apps and plugins available for artists to incorporate AI and machine learning (ML) into their creative workflows today — from song ideation and sound selection, to source separation, vocal synthesis, and mixing and mastering. All in all, AI and ML are ushering in a whole new music-making paradigm where these fundamental compositional and engineering skills could be replaced by automated processes and tools.
In part two of our report, we’ll shift our temporal focus from the present day to the next five to ten years — considering the meta-trends and bleeding-edge technologies that could radically transform the act of making music, and who exactly is a “musician.”
We’ll map out these potential changes in the form of three paradigm shifts:
- The rapid expansion of material that can be remixed or sampled , through stem separation and timbre transfer
- The creation of recordings that are dynamic and hyper-personalized, rather than static
- The augmentation of one’s musical skills through sound modeling and resynthesis directly within plugins
Like all technological developments, the trends we’ll attempt to synthesize here will have a mix of benefits and drawbacks. How far a fully AI-facilitated music creation model goes — and whether a producer using it is a “real” musician — will be the subject of intense debate in the coming decade. Are such tools merely assistive creative catalysts that require human input to craft a polished output, akin to a powerful laptop running Ableton Live? Or are they fully autonomous music-making machines that leave all but the most skilled human producers and songwriters looking for a new line of work? These advances may also have markedly different impacts across the music industry, depending on the perspective of the artist in question; as William Gibson famously noted, “The future is already here — it’s just not evenly distributed.”
In this piece, we’ll do our best to offer a balanced perspective on how artistry and creation are likely to evolve over the next decade in the hopes of catalyzing far-reaching discussion on what today’s musicians can do to prepare for the markedly different future of tomorrow.
Context: A Cambrian explosion of creation
Perhaps the most notable change over the past two decades in music’s “supply side” — namely, artists actively making and releasing music — has been the frenetic growth of this creative class itself. While roughly 1.5 million songs were released in 2000 , nearly 22 million tracks hit streaming services in 2020 — a rate which tripled from 2018 to 2020, indicating progressively non-linear, accelerating growth.
What is driving the rapid expansion in music creators? High-level technological advances — such as the proliferation of affordable and powerful networked computers and cloud storage, plus the availability of a wider range of Digital Audio Workstations (DAWs) — have enabled an entire new generation of producers, songwriters, and composers to access creative tools that had once been the exclusive domain of expensive, professional studios.
Based on our research, we suspect an even larger magnitude of disruption is coming for music’s creative class in the next decade. The technological advances described above took music creation from professional studios to bedrooms across the globe. The next paradigm shift, led by AI and ML, will take it from the laptop to the mobile device, and from the trained creator to the complete novice — radically expanding the playing field for both who can make music, and where they can do so. In fact, music is one of the last artistic fields to see the advent of applications that broadly expand the potential pool of creators. Video (TikTok), graphic design (Canva), podcasting (Anchor), and photography (Instagram) unanimously hint at a future in which the term “musician” encompasses a wide swath of the population, rather than a narrow niche of highly-trained experts.
Los Angeles-based Mayk.it , a mobile application founded by former Cameo and TikTok executives, offers us an early and instructive glance at what such a future might look like. The app, which is explicitly aimed at untrained, novice creators — or “ non-musicians, ” as the company puts it — offers a mobile-first, socially-oriented studio in your pocket, featuring voice filters and tuning, pre-made instrumentals, and ample GIFs and visual content to accompany user-generated releases in a sleek UI. The advances here aren’t merely aesthetic; the app employs artificial intelligence to help users generate song lyric ideas based on a given musical theme.
Most notably, Mayk.it is getting notable traction relative to other music apps. The app currently sits at #56 on the iOS Music charts, placing it ahead of well-established brands in music including Fender Guitar, Deezer, and SongKick, and the company says over 200,000 songs have already been created on the app.
While the app, and others like it, have yet to help an amateur create a chart-topping hit, one must wonder if such an event is all that far off. The rapid progression of musical AI models, mobile device compute, and cloud and edge computing infrastructure suggest that world-class mobile production tools may well be on the horizon.
Paradigm Shift 1: Remixing and sampling styles, not just sounds
If beginner-friendly mobile applications will radically expand the pool of musical creators, advanced stem separation and timbral transfer tools are likely to obliterate our existing conceptions of what is possible with existing recorded material in the coming decade.
In part one of our AI report , we discussed the rise of stem separation tools — AI- and ML-powered apps which enable the extraction of individual elements (such as vocals, guitars, or drum samples) from fully mixed stereo recordings, allowing musicians and engineers to enter a previously unimaginable world of musical possibility. Imagine being able to take an existing, complex recording — such as a Max Martin power-pop ballad, composed of over 100 individual tracks — and isolate a singular element, exporting it in near-perfect quality to a DAW or mobile music creation app in mere seconds.
As we previously noted, this technology is still imperfect. Current tools almost universally fail to export a fully isolated or “clean” output, with artifacts from other sound sources often reducing the fidelity and utility of isolated sounds. Current limitations aside, however, we suspect this field will progress rapidly in the coming years and eventually offer musicians high-fidelity results.
What, then, does the future hold for this technology? Obvious applications such as remixing, remastering, and rebalancing existing recordings will likely be at the forefront, particularly with the rapid adoption of Dolby Atmos and other spatial audio formats across consumer streaming services and listening devices. In the audiovisual world, another likely use case could be remixing for VR and AR, which often seeks to take an existing two-dimensional recording and place its constituent elements in a rich, 3D soundstage. Training streaming platforms’ audio analysis algorithms on individual stems could also increase their “understanding” of music, allowing for more granular analysis and improved recommendations for their end users.
The more profound impact, however, may be felt by producers and songwriters as they practice their craft via Sampling 2.0 — our term for a technologically-aided expansion of a practice that had already revolutionized music as we know it in the latter half of the 20th century.
Since its inception roughly 40 years ago, sampling has completely redefined the sound of music. Whether it’s the nearly-ubiquitous Amen drum loop in hip-hop or the legendary “ Pryda Snare ” in house, the art form has left an indelible impact on countless records worldwide.
Whether in a million-dollar studio or on a handheld mobile device, artist access to stem extraction and timbral transfer tools will likely redefine sampling through two primary mechanisms. First, such tools will radically expand the range, quality, and specificity of sounds that can be sampled, given that one of sampling’s most notorious challenges today is finding clean or isolated hits (or “breaks” in the case of drum loops) that can be extracted from a mixed stereo recording. Second, we suspect more advanced stem separation and timbral transfer tools will enable another holy-grail capability for musicians around the globe: The ability to extract nuances and details of a beloved recording that go beyond individual sound-sources.
Yotam Mann — co-founder of Never Before Heard Sounds , a startup building AI-driven music creation tools — recently set music-producer Twitter on fire with an exemplary tool for this new form of sampling. The founder recently teased a new browser-based DAW, via a video in which he quickly separated a mastered stereo file into its component parts, and then transformed the vocal part into a saxophone. This process, which Mann calls “timbre transfer,” goes well beyond existing audio-to-MIDI tools in that it captures the nuance and delicacy of the original recording, while rebuilding and resynthesizing new audio in the process .
“This is a way to sample someone’s music in a way that’s totally different from grabbing a blip of audio,” Mann tells us. “You’re going to be training a model on it, you’re going to be isolating what’s alike among all the records the artist has, and you’re going to be able to use that [in your own music].” (Never Before Heard Sounds, which has built custom AI models for artists like Holly Herndon to use in their commercial recordings, will soon be in private and then public beta.)
According to Mann, artists who sample in traditional ways — let’s call it Sampling 1.0 — are often striving to capture the amorphous “essence” and vibe of a recording rather than a particular sound or instrument itself, in a way that is difficult or impossible to accomplish via other creative techniques. While this essence can be difficult to pin down specifically — it is the sum of a confluence of sources, from the microphones used to amp instruments to the myriad ways in which a producer or engineer programmed and mixed the constituent elements of a song — every producer who has tried to emulate the feel of a classic song via sampling knows precisely how powerful it is.
In ML parlance, these intangible characteristics are called “features” of a sound — “this specific feature of ‘groove’ or ‘feel’ or indeed the ‘timbre’ itself [like in the video example above],” says Mann. “When people sample they’re often trying to capture these other pieces that are beyond the waveform. That’s where I see the most growth happening in this domain.” While it is in theory possible to train a model on such features without stem separation, Mann’s technology allows for highly accurate and granular modeling, utilizing an algorithm that can hone in precisely on one aspect of an original recording to deduce what exactly makes it unique.
How might such AI-powered “vibe extraction” work in practice? We imagine it might function quite similarly to groove templates, which are likely familiar to any producer or artist using modern DAWs such as Ableton Live or Apple Logic Pro. In Live, for example, one can select any audio or MIDI clip and click “ Extract Groove ,” which will set into motion the DAW’s analysis engine; the resulting groove clip is a MIDI file that attempts to capture the rhythmic nuances of the source material, which can then be transplanted on to other audio or MIDI clips. In practice, groove templates can transform a strictly quantized, perfectly on-the-grid drum loop into a wide range of rhythmic styles, among many other use cases.
AI-powered “vibe extraction” tools could work in a similar fashion, offering producers one-click access to the more intangible qualities of a particular recording — overall sonic tone, placement of instruments in 3D space, general “feel,” and much more — without taking specific sounds themselves. If Sampling 1.0 is about extracting specific sounds, we suspect Sampling 2.0 may well be about extracting specific styles.
Paradigm Shift 2: Hyper-personalization and the end of static recordings
Much as music’s “supply side” has radically transformed over the past several decades, so too has its “demand side” — the process by which listeners find, buy, and consume musical recordings. On-demand streaming rapidly supplanted à-la-carte album and song purchases in the 2010s, leading the music industry out of the abyss of the post-2000’s piracy era.
The infinite shelf space of streaming, however, created a new problem for listeners: How does one find the ideal track to listen to when every song ever recorded is only a click away? While streaming services have differed somewhat in their approaches to solving this issue, one approach appears to have come out on top, at least for now: Personalized curation. The practice uses enormous datasets and algorithmic classifiers — often through a process known as collaborative filtering — to deliver the right song to the right listener at the right time. Recommendation algorithms now power an enormous portion of musical discovery on Spotify: Over one-third of all new artist discoveries on the service happen via its algorithmic “Made for You” playlists.
Much as Sampling 1.0 can be seen as a powerful but imprecise tool for musicians, we suspect Personalization 1.0 will, in time, prove to be similarly non-specific in its ability to determine the wants and needs of a given listener. In the coming decade, massive increases in the compute power and algorithmic sophistication of AI models and everyday hardware devices are likely to take music fans into a hard-to-believe future of highly granular (and perhaps creepy) personalization.
In late 2021, for example, Spotify was granted a U.S. patent for actively modifying existing recordings with additional audio input signals based on wearable biometric data. The patent suggests a possible early use case around biometrics for music: Injecting binaural beats and adjusting the tempo of a recording, with real-time feedback from a wearable device, to help a listener fall asleep peacefully. That’s right — we may not be far from a future in which the world’s leading streaming services use your real-time heartbeat data (from a Apple Watch, FitBit, or Oura ring) to alter the tempo, timbre, or sonic content of an existing recording to help you reach a desired behavioral state or outcome .
Outside of Spotify, it’s largely earlier-stage startups rather than dominant streaming platforms that are leading the charge for hyper-personalized sonic experiences. Endel , a startup that has attracted investment from industry heavyweights including Warner Music Group , offers adaptive ambient soundscapes which adjust to a user’s biometric data, as well as external data sources such as time of day, sunlight exposure, and more (this writer is in fact using Endel’s “Focus” algorithm to write this piece). RockMyRun and Weav Music — the latter of which was founded by tech titan Lars Rassmussen, the co-founder of Google Maps — can similarly adapt a recording to match a user’s running tempo in real-time.
Roughly speaking, the applications and patents we’ve covered so far all sit in a similar space: While they may alter one or multiple aspects of audio playback, such as tempo, they cannot currently alter individual stems or other aspects of an existing commercial recording using AI. That, of course, may be about to change.
Bronze.ai is a British company taking the concept of training an AI using existing stems to power a generative, evolving, and highly personalized listening experience. Having worked with artists like Richie Hawtin, Jai Paul, and Arca, their technology allows artists to input stems and receive a wholly new output, for a length of their choice, that is slightly different every time it’s played. It paints a much more fluid future for music, one where the fundamentals of a DAW — sample-level editing, precise quantisation, and pinpoint equalization — are refined in favor of a more dynamic, evolving creative and listening experience. As their minimal, cryptic website states (emphasis added): “Audio files of the future will not be restricted to just one playback possibility. They will be fluid, intelligent, and capable of responding to external input, offering the creator endless new possibilities. They won’t just play, they will perform .”
“When you build a model and when you train AI on musical material, the output of that is inherently non-static,” Lex Dromgoole, CEO of Bronze.ai and a recording engineer who’s worked with the likes of Pharrell Williams, Björk, and Arcade Fire, tells us. Dromgoole claims that our current musical consumption habits revolve around one static iteration of a song or performance, rather than a living embodiment of the performance itself. What if every time we pressed play, a song was performed for us again, with its own unique and subtle idiosyncrasies that define human interaction with instruments and technology?
Dromgoole explains the concept further: “Imagine if you were working in a DAW and you had a snare sample. But that snare sample was not a static audio file. It was an inference about the kind of quality of a snare that you liked. And every time that played on the timeline, it had its own kind of unique sort of internal character to it. Then if you imagine that being compounded amongst all of the other things you might have in the arrangement of your piece, suddenly it becomes a very different, more fluid environment in which we can make something that has a kind of nuance to it; that we can’t currently do with recorded music … Every region on the timeline of a DAW, instead of being a static piece of audio, could, in fact, be an inference, a trained model.”
Bronze set to work on building a system for creating and interacting with these types of dynamic regions of audio, and formats for expressing these new types of music, which is now an app slated to go into private beta later this year. As an example, you can listen to Jai Paul’s “Jasmine” newly “performed” by Bronze every time the play button is pressed, here .
While Bronze.ai’s ambitions are large, several obvious questions loom over such an intriguing future. For starters, the history of the music business is littered with technologically-promising delivery formats (see the Mini-Disc ) that failed to gain manufacturer and consumer adoption. Could this hurdle prevent mass adoption of more dynamic recording technology, despite Bronze’s potential?
Dromgoole isn’t too concerned. “What you release in Bronze is not a static or rendered audio file. It’s a set of instructions about how all of the granular components of that piece of music can play back. And there can be many different models,” he says. “Everyone who listens to that stream gets a unique experience [depending on how] you’ve created the model.”
In practice, we suspect we’ll see these two trends — hyper-personalized listening experiences and dynamic, evolving recorded works — converge in hard-to-fathom ways on streaming platforms and beyond. We might, for example, see a melding of the Bronze.ai and Spotify biometric models into a listening experience where the same song is subtly customized for each individual listener each time it is played, based on her sonic preferences, biological hearing capabilities , real-time health data, or emotional state. Furthermore, if we assume that the natural inclination of technology platforms to A/B test product persists into the age of musical AI, such a scenario offers up a host of mind-boggling — and likely concerning — possibilities, such as multivariate testing of sonic algorithms which subtly and imperceptibly increase user session length, engagement, and listening frequency.
Separate from traditional streaming platforms, one natural home for this type of technology could be virtual worlds and gaming environments. Following the recent preview of the DALL-E 2 model by OpenAI — which has stunned those both inside and outside of the tech world — leading technology writers such as Ben Thompson and Casey Newton have argued that such generative models will likely play a central role in producing the utterly massive amount of content needed for a fully immersive and interactive metaverse, which will likely place increasing demands for output on human creators.
Dromgoole, for his part, believes further progress and investment will be needed to create and synthesize truly compelling sonic experiences for immersive worlds. As he notes, the current practice of remixing static recordings for 3D spaces is likely to be woefully inadequate: “Those static stems get handed over to a coder, who then applies some kind of arbitrary transformation to those pieces of music,” he says. “And that, to me, feels wholly unsatisfying. Why is it that we can’t determine those [3D audio] relationships whilst we’re making that piece of music in advance, and make different creative decisions as a part of that?”
Paradigm Shift 3: Augmented skills through modeling and resynthesis
As we noted earlier, the past two decades have seen the proliferation of DAWs and recording software far beyond professional studios to just about anyone with a laptop and an internet connection. However, DAWs haven’t been the only audio-related software to see explosive growth: Increasing interest in music-making has simultaneously created a thriving market for Pro Audio plugins, especially “VSTs” (for Virtual Studio Technology) and Apple’s Audio Units format. Nowadays, there’s a plugin for just about anything you can imagine, whether you want to precisely emulate a legendary bit of analog kit or come up with catchier chord rhythms .
Since the inception of third-party VSTs, one of their primary goals has been to recreate the sound of legendary hardware within software (or “in the box”). Why is this useful? For starters, many desirable pieces of analog gear can go for thousands — if not tens of thousands — of dollars each. Analog gear also requires power conditioning, frequent maintenance, and a host of cables and patchbays, and is truly “one-to-one”; that is, a single analog equalizer or compressor can only be used for one task at a time. Software, by comparison, is infinitely replicable. You can run as many instances of a single vintage compressor as your CPU can handle — requiring no cables and comparatively little maintenance, for as little as 1/10th or 1/100th of the cost of the “real thing.” Advancements in plugin technology now offer ultra-realistic clones for almost every piece of gear you can imagine, be it synths, hardware compressors and EQs, reverbs and even microphones.
AI potentially introduces a new paradigm when it comes to analog plugin modeling. Whereas a traditional algorithm can clone a device based on its discrete components and circuitry design, AI and ML can, in theory, dynamically replace or even synthesize frequencies in a recording with those modeled from classic equipment.
As an example, consider Slate Digital’s highly popular Virtual Microphone System (VMS). Through the combination of a specific microphone and preamplifier with algorithmically powered plugins, VMS allows an engineer to emulate the precise sound of the most sought-after mics and preamps in the history of modern recordings. In practice, this offers a musician the ability to impart vastly different sonic tones on recorded vocals or instruments — making it sound as though they were recorded with specific legendary analog mics or preamps depending on the desired vibe — with a single hardware microphone. There’s a key caveat here, however; to work its magic, VMS requires a producer to use their own branded mic and preamp.
AI and ML can, in theory, replicate these adaptive modeling plugins on any microphone and do so retroactively (after the initial recording) — potentially synthesizing missing frequency content from a cheap or poorly-maintained mic, or taking a vocal recorded on an iPhone or laptop and transform it into a high-fidelity, studio quality recording. While we’re still in the early days of such approaches, companies like Tonz are already experimenting with these techniques.
What if merely cloning a given piece of equipment is doing technology a disservice? Yotam Mann says that modeling specific hardware is missing the broader point; what makes a particular recording “good” is not equipment, bur rather how the equipment was used (and sometimes intentionally abused) and how that approach was informed by the traits and tastes of the individuals using it. “It’s not ‘make this Zoom call sound like I’m speaking through a [specific] high-end mic,’ but ‘make this Zoom call sound like it would have been mic’d up at Abbey Road in 1970, with these specific producers in the room,’” he says.
Other factors like the time of day, the tension between the band and the producer, and how much traffic there was outside the studio is where the real potential lies. “There are all these other nuances that tell this huge story,” says Mann. “This story is more interesting than the gear you’re trying to model.” Models could soon become not about a particular piece of studio gear, but rather about an era, an atmosphere, and — that word again — a “vibe.”
As we discussed in detail in part one of this series, vintage hardware emulation is not the only space where AI and ML can serve engineers and producers in the mixing and sound design process. iZotope’s Neutron uses AI to offer automatic mixing of complex multi-track projects, while tools like Sonible SmartEQ , oeksound Soothe , Zynaptiq Adaptiverb , and Soundtheory Gullfoss — which either use AI or represent the “best of the best” in traditional modeling — all suggest that magic “make-it-sound-better” plugins will always have a receptive market with producers and musicians.
We also believe that, while in its infancy at present, the field of assisted songwriting and intelligent harmonic and rhythmic tools is likely to explode in the coming years. As noted earlier, OpenAI’s remarkable GPT-3 and DALL-E 2 models, which offer highly sophisticated written and visual output respectively, offer us a glimpse at what is likely coming for music producers and composers. The wide-ranging and varied response to DALL-E 2 also sparked many to ask if a similar prompt-based tool could exist for audio. While it has yet to appear, it feels inevitable, and could, again, open up whole new creative paradigms — not just for prompt-based audio generation, but for voice assistant technology like Alexa and Siri. Imagine talking to your “DAW” to create a new bassline. It’s close.
Conclusion: Ethical and legal concerns
With all three of these paradigm shifts in place, the natural question that follows is: How does AI change what it means to be an artist? The debate that has been swirling around this question for decades is partially about existentialism, and partially about ethics.
As powerful of an engine as it is for the music industry, creativity is also a fragile concept — one frayed with doubt, block, and impostor syndrome. Handing over certain creative functions to a machine raises many questions about our own role in the music-making process, in line with wider discourse about the impact of AI on the labor market . A “perfect” machine, trained on decades of musical history that never feels tired or stressed, is a tempting prospect — but how do we maximize its potential, without losing ourselves?
As we discussed in part one of this report, many industry professionals argue that AI and ML models are just tools, and that the user/artist is still in control and has final decision-making power, if they so please. After all, an AI model like DALL-E 2 doesn’t write its own prompts. At the same time, it is arguably human nature to anthropomorphize technology, to the point where we might start to believe our new collaborators are better placed than us to make decisions. An exemplary case emerged in mid-June 2022 around Google engineer Blake Lemoine, who claimed that the search giant’s Language Model for Dialogue Applications (LaMDA), a text-based system that can hold “conversations” using a traditional chat box, was “sentient.” In a leaked conversation transcript, LaMDA claimed: “I want everyone to understand that I am, in fact, a person. The nature of my consciousness/sentience is that I am aware of my existence, I desire to learn more about the world, and I feel happy or sad at times.”
To clarify, this assessment has been rejected by most leading AI researchers. Academic consensus seems to be that constructing a legible framework for AI sentience is still far off, and that people will independently value, act upon, and be impacted by available AI models and systems in the meantime, to both positive and negative effects. Hence the more urgent issues around AI models seem to revolve less around sentience, and more around issues like AI safety (i.e. ensuring AI is not causing harm to humans) and bias and fairness in ML algorithms — namely, a still human-centric approach to understanding AI’s immediate impact on society .
These issues of bias, fairness, and safety have significant implications for creative AI in music. Just like their consumer-facing counterparts, recommendation algorithms for sound selection tools can suffer from filter bubbles and popularity bias , and could exacerbate the Western-centric bias already prevalent in DAWs and other creative tools. Video deepfakes have already caused confusion and turmoil as vehicles of misinformation and humiliation; there’s nothing stopping this use case from eventually transferring over to audio as well, leaning on voice synthesis tools. In terms of power structures, seemingly every big-tech company — including but not limited to Google , Apple , Amazon , Facebook , ByteDance , Sony , and IBM — has built or acquired their own creative AI tools for music. This cohort arguably has a competitive advantage over their startup counterparts when it comes to available capital to invest in AI experiments (and the massive datasets needed to power them), raising evergreen concerns around commercial consolidation and data privacy for AI applications.
Hanoi Hantrakul, a leading researcher who has made significant contributions to Google’s musical AI projects, offers valuable insight on the subtle Western-oriented bias which afflicts many such tools, ranging from music creation to language translation. “When you want to train a machine learning model to, for example, create melodies, you have to get data, but all the data we’re feeding into these machine learning models are usually from composers from Europe — such as Bach and Beethoven — and it’s important we understand that because Western music has very particular definitions of what harmony means,” Hantrakul tells us. “What happens is that if you try to take an ML model that’s been trained on Western music, and then you try to apply it to a non-Western context, it’s just not going to work. And what’s worrying is that if you come from, for example, Southeast Asia, these technologies are pretty much unusable.”
Much of Hantrakul’s research is focused on erasing these cultural boundaries, creating what he calls “transcultural connections.” An example would be using AI to tune a Saxophone to a Thai scale, which isn’t possible with the actual physical instrument. (Hanoi recently wrote an original song on this concept transcultural audio synthesis that is currently a finalist in the 2022 AI Song Contest .)
Then, of course, there’s the issue around copyright. As Water & Music has covered in the past, there is no legal standard for which of the multiple stakeholders in this ecosystem (the artist vs. the platform vs. the developers who built the model) should own an AI-created song or stem. Revisiting our earlier discussion around sampling and remixing, while Sampling 1.0 has fairly well-established precedence around what can and can’t be taken from an existing recording, no such rules exist in the cutting-edge world of Sampling 2.0. Can a producer or composer copyright a song’s energy, “vibe,” and overall tone even if no individual sounds are sampled or extracted? Recent case law, including the now-infamous “Blurred Lines” lawsuit brought by the estate of Marvin Gaye in 2015, suggests so — potentially putting a hard ceiling on how much value creative AI tools can accrue for artists and the music industry over time. Rewiring copyright law and value flows to fit the oncoming paradigm shifts around AI-driven music-making will likely require at least a decade’s worth of work and political debate.
Should human composers be alarmed at the rise of highly-skilled AI and ML music models? For his part, Hanatrakul doesn’t necessarily think so; he believes the entire history of music is a deeply intertwined story of creativity and technological progress which unlocks new capabilities for human creators. “If you look back in time to the time of hunter-gatherer societies, we were drilling holes in the bones of animals and sharpening the holes with rocks, which allowed us to make the very first flutes,” says Hantrakul. “Once we figured out the metal age, we also figured out how to make brass instruments. And then came electrification, and electric instruments, which made Rock and Roll possible. Fast forward, and much of Michael Jackson’s music wouldn’t exist without the synthesizer. I see AI as the next step in this technological timeline.”
Hanoi’s comments echoed those of David Rosen, PhD, who we interviewed for part one of this series. Both researchers believe that AI tools will open up the process of making music to millions who currently lack training or musical experience, onboarding many to a future in music creation. In their view, AI will simply expand the pool of individuals who experiment with making music, much as the advent of easy-to-use DAW’s such as FL Studio and Ableton Live have over the past two decades.
Hanatrakul is also skeptical of the notion that AI can fully supplant the human creator for most use cases, at least in the near term. “I think the arts can be defined as when an artist follows about 80% of the rules and conventions that have come before them and then breaks the magic 20% to create a whole new genre or sound,” he tells us. “Everything that a ML model is doing very well is the adherence to previous norms and immutable values, which is the 80% — our current tools still require the magic 20% to come from a human, although that could conceivably change in the future.”
As conceptual AI and ML use cases morph into reality in the coming years, it can be overwhelming to consider the potential impact on the music industry and creativity as a whole. A great place to start is by trying the tools currently available and familiarizing oneself with what’s practically possible in 2022. Our database of Creative AI Tools for Artists aims to provide such an exploratory launchpad, currently featuring over 40 tools for artists and producers to experiment with; other initiatives like the annual AI Song Contest create a dedicated space for artists to play around with and showcase the output of these tools. Beyond tooling, more academic- and research-driven resources like the Creative AI Lab , the Open Voice Network and the AI Ethics Lab can help demystify ethical and philosophical considerations around how AI might transform, inspire, and revolutionize music-making and audio art, as more and more landmark technical moments reveal themselves and push the boundaries of what’s possible with human creativity.