Inside Sony Computer Science Laboratories' vision for music AI
This is a recap of a workshop that took place at Water & Music’s Wavelengths Summit on May 6, 2023. You can read our overall recap of the Summit experience here .
In today’s rapidly evolving music AI landscape, much of the spotlight falls on what major rights holders and tech conglomerates will do next. After all, these powerhouses possess the vast amounts of training data (read: music catalog) and the computing resources that are crucial to supporting the next wave of innovation in music AI models and tools. Recent announcements like YouTube’s new music AI incubator with Universal Music Group, Meta’s open-source audio generation suite AudioCraft , and ByteDance’s new AI-powered music creation tool Ripple suggests that momentum around large-scale music AI deals is quickening pace, at an unprecedented commercial scale.
Yet, amidst all this activity, one pioneer that has been silently pushing boundaries for years remains surprisingly under-discussed.
Sony Computer Science Laboratories (SonyCSL), a research organization under Sony Corporation, has been quietly experimenting with creative AI tools for over 35 years. Crucially, they are also one of the only music AI research groups that already has a direct collaborative relationship with a major label, in a landscape where negotiations between rights holders and AI companies seem to be getting more heated.
Founded in 1988, SonyCSL at large uses complexity science, data science, and AI to investigate fundamental issues in areas such as the understanding and creation of music, language and communication systems, and the dynamics of innovation and creativity. Their Music Team, based in Paris, France, works with musicians and rights holders (especially Sony Music) to push the boundaries of creativity and understand the complexity of modern music production processes using cutting-edge AI research.
It’s crucial to note that while Sony Music and SonyCSL both call Sony Corporation their parent company, they function as independent entities with distinct leadership. Yet, their intertwined collaboration history sheds light on the myriad opportunities and challenges that arise when integrating AI into the music industry, especially around rights.
SonyCSL was one of the flagship sponsors of Water & Music’s inaugural Wavelengths Summit on music and tech, which took place in Bushwick, Brooklyn on May 6, 2023. As a partner, SonyCSL hosted a rare, intimate workshop for Summit attendees giving an exclusive look at their music AI tools, their philosophy around AI development at large, and the trends they’re excited to expand on this coming year.
Featured speakers included Michael Turbot (Technology Promotion Manager at SonyCSL; previously Head of Innovation at Sony Music France) and Stefan Lattner (Managing Researcher, SonyCSL Music), who balanced extensive academic research expertise with real-world knowledge of music-business complexities in our discussion.
Below is a recap of the main themes and takeaways from our Summit workshop. Quotes have been condensed and edited for clarity.
The rocky history of “Daddy’s Car”
SonyCSL’s work in music AI hasn’t always been accepted with open arms.
In September 2016, SonyCSL’s music project Flow Machines released “ Daddy’s Car ,” a song composed in the style of The Beatles with the help of AI. Crucially, Flow Machines’ AI systems handled only the musical composition layer (i.e. melody and harmony), with a human being — French musician Benoît Carré (a.k.a. SKYGGE ) — stepping in to write the lyrics and produce and arrange the song as a whole.
Several major publications covered the launch at the time — framing the song as a monumental, and perhaps existential, milestone for music AI development, especially in terms of the tech’s ability to mimic well-known celebrity styles.
What was not captured in the media, though, was that “Daddy’s Car” also represented what Turbot described as a “catastrophic” misstep in communications between SonyCSL and the music industry.
For one, Flow Machines did not ask for Sony Music’s permission to use The Beatles’ IP to train a music composition AI model for “Daddy’s Car.” When the song was released commercially, it came as a surprise to both the label and publishing side at Sony. The surrounding media hype also led the public to believe the song’s construction was 100% automated, when in reality AI had contributed only select musical elements.
Internally at Sony, this damaged SonyCSL’s reputation and caused Sony Music to distance themselves from the research group, due to concerns about AI competing with their IP interests.
“My CEO came to me saying, ‘I have artists calling me saying that we are replacing them, so don’t speak to SonyCSL anymore,’” said Turbot, who was working at Sony Music France at the time.
Turbot ultimately joined the SonyCSL team in January 2020 to help rebuild the research group’s music-industry relations strategy — with a focus on building clear guidelines for training data usage and ethics, and clearly articulating the group’s vision for generative AI as a collaborative tool for artists, rather than a direct replacement.
In terms of training data, SonyCSL Music today relies on public-domain music, purchase datasets from music libraries, and direct, vetted artist contributions to train their AI models, collaborating with artists on model development at every step.
“We have been audited by Sony Music Publishing and Sony Music just to make sure they could be safe sending us artists to use our tools,” said Turbot. “The good news is that they are sending more and more artists to us now, but we don’t use Sony Music or Sony Music Publishing data. We hope one day it will be possible, but it’s going to be a very big legal discussion. We have to be extremely careful with our use of data, because otherwise we know that the industry would instantly close those doors for us.”
SonyCSL’s music AI tools and philosophy
To date, SonyCSL Music has built a wide range of AI tools, from desktop apps to DAW plugins, aimed at helping professional musicians augment their creative processes.
Examples that their team showcased at our summit include:
- GANstrument — a standalone app that uses AI to streamline sound synthesis, especially creating unique mixes between musical and everyday sounds as uploaded by the user.
- Piano Genie — an Ableton plugin that can propose missing parts of a piano composition in real time, a task that SonyCSL calls “piano inpainting.”
- DrumNet — a standalone app and VST/AU plugin that can generate drum rhythms on top of existing songs, with users indicating the drum style by positioning a point on a 2D plane.
- BassNet — a standalone app that can generate bass lines in response to an audio input, with the ability for artists to control factors including note density, articulation, and timbre.
At large, SonyCSL Music’s philosophy around their tools keeps professional artists in the steering wheel — an example of the “human-in-the-loop” paradigm of machine learning, which envisions continuous feedback loops between humans and computers.
Instead of 100% autonomous models that generate fully-formed music without human interaction, SonyCSL’s tools are more lightweight and specific, usually representing only one instrument or step of the production process. The intention with this approach is to have these models serve as assistants or companions to artists in the context of a wider, established professional workflow, without disrupting or constraining artists’ creativity.
“If you have AI tools that are to some degree autonomous, but I as the artist still have the last word in it, and I have control over it, it can lead me to explore a bigger space of possibilities in a shorter amount of time,” Lattner said during our Summit workshop. “So efficiency does not necessarily mean that we need fewer people. Efficiency could also mean that people working on music have the possibility to be even more creative and try new things they wouldn’t have tried, because it would’ve been too cumbersome otherwise.”
For these models, smooth workflow integration requires optimizing for ease of use and real-time performance within DAWs and other live music production environments. Perhaps counterintuitively, this sometimes means scaling back on the model’s power and range of expression to make it more usable and adaptable to artists’ needs.
For instance, with DrumNet , SonyCSL “started to train the model in a multidimensional latent space, but quickly realized that’s actually very cumbersome to navigate,” said Lattner. “So I was like, ok, we can sacrifice a bit of expressivity of the model and just train it on a two-dimensional, light space. Then it’s also very direct how to interact with the model. I made it weaker on purpose, but then in the end the model was easier to use.”
New opportunities in music AI research
As for what is coming next for music AI research, SonyCSL’s team touched upon three trends during our Summit workshop:
Lightweight, customizable AI models. As discussed above, SonyCSL is interested in further exploring more lightweight, instrument-specific models that artists can more easily tailor to their individual styles. Aside from being more immediately useful in a professional workflow context, this approach also ends up being easier from a legal and logistical perspective, in terms of gathering and vetting the underlying training data.
Outside of SonyCSL, other companies like Harmonai and Semilla are taking a similar approach to making lightweight model development more accessible, helping artists fine-tune their own AI models on their own voices and instruments.
Iterative text-to-audio models. Building off their philosophy of integrating into artists’ existing creative workflows, Turbot and Lattner shared that they were exploring more iterative approaches to text-to-music and text-to-audio interfaces that allowed for specific generations and modifications of sounds, instead of generating a fully-formed song clip from the jump without further customization capabilities (what Water & Music previously described as the “time-to-banger” tradeoff ).
“Currently, text-to-audio is a very absolute thing,” said Lattner. “You write something in there, and you get a full output … you could maybe think about a more iterative approach to reshaping sound over time.” Aside from SonyCSL, companies like WavTool are experimenting with integrating text-to-audio prompting into DAWs in more iterative, seamless ways.
Preservation of traditional cultures with AI. In the same spirit of building lightweight, custom models, Turbot and Latter shared their recent experiences of using AI to help preserve, expand, and reinterpret traditional musical and cultural heritage.
“We had a residency with a traditional Galician band in our studio, and they were talking to the family of a famous feminist singer from this area of Spain who passed away,” said Turbot. “Her family wanted this band to reproduce her voice for political reasons. The whole CSL team understood how AI can be positive in this way … AI can be really impactful for niche creative communities, who will be able to train their own models and perpetrate the cultural heritage in a way that they didn’t have the opportunity to do otherwise. These communities will be able to make their own instruments and mix and modernize it with AI.”
Overall, Turbot and Lattner expressed optimism about the future impact that AI could have on music creation, emphasizing that the tools they’re developing are still works in progress and require both sides — technology and rights, human and machine — to adapt to and learn from each other.
“We have to work a lot to explain to people that technology alone is not a problem,” said Turbot. “What’s good or bad is how people use it.”