Three parallel futures for social audio
Clubhouse is celebrating its one-year anniversary with equal parts hype and pressure.
Since its launch in April 2020, the drop-in audio chat app has been downloaded over 13 million times; raced its way to a Series C funding round from the likes of Andreessen Horowitz, Tiger Global and DST Global at a reported $4 billion valuation; and rolled out features like direct tipping and a creator accelerator program. Celebrities ranging from Mark Zuckerberg and Elon Musk to Lindsay Lohan and 21 Savage have graced the app’s stages and hallways, in addition to a wide range of music-industry executives. Several of the app’s most active users have even started their own “Audio Collective” specifically for entrepreneurs looking to use Clubhouse to grow their businesses.
And now, big-tech companies want Clubhouse’s lunch.
Spotify. Twitter. Facebook. Reddit. Discord. LinkedIn. Slack. This is the growing laundry list of corporations that have launched their own Clubhouse lookalikes so far this year — embedding social audio as a feature within their existing products, rather than building a standalone app as Clubhouse is trying to do. In Spotify’s latest earnings call, CEO Daniel Ek compared social audio to the “Stories” format, in how both are now table stakes for tech and social-media companies to compete for share of daily user engagement. Amidst all this activity, Clubhouse’s monthly downloads reportedly fell by over 70% in March.
Understandably, a lot of the hype and media coverage around the overcrowding of social audio has focused on inter-company competition, and the elusive question of who is going to “win.” But I think people are spending so much time trying to predict social audio’s winner, that there isn’t nearly enough clarity about social audio’s actual use cases.
Today, I want to focus on a more fundamental question: What is social audio even good for in the first place? How can we use it to improve our social and creative experiences online, instead of just adding more to the existing noise?
Over the past several months, I’ve been observing and digging a bit deeper into the specific features that each of the above companies — as well as smaller startups like Stationhead, Sonar, Capiche FM and Fireside — are building around social audio. In the process, I’ve come to an important realization: Just like with online communities in general, social audio is not a monolith. Different kinds of communities, artists, entrepreneurs and businesses will want to use social audio for vastly different purposes, in ways that will arguably require different kinds of products.
Below is my three-part framework for understanding the wide variety of value propositions for social audio, depending on who is building it and whom they are targeting. To clarify, these use cases don’t have to be mutually exclusive — hence my titling this piece “parallel futures,” rather than just “possible futures.” That said, each of these use cases ultimately involves a different kind of industry, user behavior and underlying business model.
Yes, LinkedIn, Discord, Slack, Facebook, Twitter, Reddit, Spotify, Clubhouse, Sonar, etc. are all investing in social audio at the same time — but they’re not all trying to accomplish the same thing.
Before moving on… a clarification
I’ve noticed that when people use the phrase “social audio,” they usually really mean “social and live audio” — i.e. audio that is consumed socially and communally in real time.
In contrast, there are several other apps and features — such as Cappuccino, Racket or audio DMs in WeChat — that can be considered “social audio” in that they leverage audio in highly social and connective ways, but are not livestreamed and/or are primarily intended to be consumed on-demand. Most conversations about “social audio” today leave these kinds of companies out, opting instead to focus on live-first audio options like Clubhouse, Twitter Spaces and Discord voice channels.
For the avoidance of doubt, this piece will focus on the “social and live” subset of social audio. On-demand social audio is also certainly an interesting and high-growth area, but that is beyond the scope of this discussion.
I. Social audio = workflow tool for podcasters
Companies executing on this vision: Spotify, Capiche FM, Fireside, Riverside.fm, Stationhead
I find it super fascinating that it’s mostly social media companies, not podcast or radio companies, that are leading the investment and hype around social audio. This imbalance in the market runs the risk of leading us astray of the bigger picture, because we can’t talk about the future of social audio without talking about the future of podcasts.
Let’s zoom out a bit. Right now, the podcast industry is seeing faster, more aggressive consolidation than any other entertainment industry — including, yes, music. All of these headlines are from the last six months:
- Libsyn buys podcast creation/hosting platform Auxbus, podcast advertising marketplace Advertisecast and podcast membership platform Glow.
- Spotify buys podcast ad-tech company Megaphone and live/social audio startup Betty Labs (the company behind Locker Room, now Greenroom).
- Audacy (f.k.a. Entercom) acquires podcast ad marketplace Podcorn.
- Acast acquires podcast hosting and monetization platform RadioPublic.
- iHeartMedia acquires podcast ad-tech company Triton Digital from Scripps.
- Vox Media buys Cafe Studios.
- SiriusXM buys the show 99% Invisible.
- Amazon Music buys Wondery.
This is not just consolidation for consolidation’s sake; there’s a clear pattern in terms of what today’s biggest podcast companies are trying to build. In particular, two of the world’s biggest podcasting companies, Spotify and Libsyn, have both bought their way to a vertically-integrated podcast ecosystem in-house — from creation, to hosting and distribution, to monetization, advertising and direct-to-fan memberships. See the diagram below:
(Note that many of the above platforms can fall into more than one category — e.g. Anchor is approaching an all-in-one platform combining advertising and community engagement/monetization tools, in addition to podcast creation and distribution.)
This vertical integration parallels how major labels in music are also shifting from merely owning copyrights into building their own in-house, “independent” services companies for artists. Just like major labels, today’s podcast behemoths want to own the process as much as the product, all the way down the long tail of podcast creators.
Where does social audio fit in? For the few podcast companies that are interested in the format, they’re investing in social audio as just another engagement and monetization tool for podcast producers, embedded within these companies’ already-existing, vertically integrated workflows. It’s less about retaining listeners, and more about retaining creators.
The clearest example of this in practice is Spotify’s new ownership of Locker Room / Greenroom. Say you’re Bill Simmons, Danyel Smith, David Chang or any other host in The Ringer’s Podcast Network, which is owned by Spotify. Thanks to the Betty Labs acquisition, you can now, if you want to, record a podcast interview live for your audience in Locker Room / Greenroom, and even turn on ticketing, tipping or other monetization features if you’d like (these are forthcoming). Then, you can edit the episode through Anchor (if you don’t already use another DAW); handle advertising on Anchor or Megaphone; and distribute the episode globally onto Spotify (and other podcast platforms). From a strategic perspective, social audio helps Spotify get closer to its goal not only of competing with terrestrial radio for the largest share of audio listening time, but also of owning the full stack of creative and business tools for podcasters.
There are many smaller startups trying to execute on a similar vision, but without the added benefit of Wall Street money or millions of subscribers. For instance, Fireside, Capiche FM and Riverside.fm allow podcasters to record their episodes in real time for a live audience, including real-time text chat, and then distribute those recordings for on-demand consumption after the fact. Capiche also allows podcasters to splice clips from their live recordings as episode previews, similar to clips on Twitch. Stationhead, which operates on top of Spotify and Apple Music’s APIs, allows people to host real-time, music-focused audio shows for a live audience, including on-demand consumption after the fact exclusively within the app. In all of these examples, it’s clear that the target audience is specifically the community of semi-pro to pro podcasters, DJs and radio hosts, not so much everyday people.
The main takeaway from this model is that social audio turns the podcast recording process itself into a social, community-driven experience, with an opportunity for real-time listener input and monetization. In fact, this is already similar to how many artists and producers already use Twitch or Discord — livestreaming their production sessions while taking real-time input (and, in many cases, direct tips and subscriptions) from fans, then uploading the end result to DSPs for on-demand consumption after the fact.
Ironically, I don’t think companies in this category are trying to build entirely new social graphs through social audio. Livestreaming a podcast recording is still ultimately a traditional, top-down, broadcast-like experience. Sure, it is social in the sense that people are in the same virtual place together, listening to the same thing at the same time. But its ultimate strategic value for companies is as a B2B creative tool, more than as a social tool.
This brings us to the next use case…
II. Social audio = embedded community engagement tool for social media platforms
Companies executing on this vision: Discord, Slack, Facebook, Twitter, Reddit
This use case is getting the most hype simply because some of the world’s biggest social platforms are behind it.
Under this approach, social platforms invest in audio as a synchronous layer on top of primarily asynchronous interactions (e.g. messages/DMs, emails, forums, blog posts, newsfeeds) within existing social groups. This use case is less of a creative workflow tool, and more of an explicitly social tool for everyday use; the hope is that audio can help retain members of these online communities and give them a richer window into each other’s perspectives, interests and ideas than what text-only communication can provide.
To their credit, Discord helped usher live group voice chat into existing, asynchronous gamer communities — and, nowadays, mainstream Internet culture — several years before Clubhouse entered the scene. Now, other social platforms are joining the bandwagon and embedding audio in a way that’s more tailored to their respective interfaces and business goals. For instance, Twitter Spaces embeds and delivers social audio within speakers’ existing social graphs and news feeds on Twitter; Reddit Talk embeds social audio within specific subreddits; Facebook’s upcoming “Live Audio Rooms” will embed social audio within Facebook Groups and Facebook Messenger.
To reiterate, all of these companies are building social audio within preexisting groups, for use on preexisting products. We can debate whether this approach is or is not “innovative.” But I think this integration can mitigate three core problems that I’ve experienced on Clubhouse as a frequent user over the last year:
A. Context collapse. I wish I had a dollar for all the times I entered a Clubhouse room and was utterly confused about the background and credibility of who was speaking, what they were even talking about in the first place and/or what points they did or did not address yet in the conversation. This is different from, say, a Twitter thread or a Slack message thread, where you can easily scroll up to earlier in the conversation to catch up.
Embedding social audio within existing asynchronous communities doesn’t get rid of context collapse 100%, but it can alleviate the issue significantly. I’m biased, but take our Water & Music Discord hangouts as an example: We already have the established context that everyone in our Discord server is a paying Water & Music member, is at least a bit knowledgeable about the music industry and is interested in tracking emerging trends in music and tech. This shared context allows us to cut through a ton of noise during our hangouts and interviews that you might otherwise encounter on apps like Clubhouse, and get right to the good stuff. Our #hangout-chat text channel also allows speakers and listeners to share links that offer additional, real-time context to our live conversations, instead of leaving people hanging or confused on what exactly we’re talking about or referencing.
Similarly, Twitter Spaces allows hosts to display tweets at the top of their rooms, to provide additional context that is native to the social platform, and that listeners can browse simultaneously while the live Spaces audio plays in the background. See, for instance, how Billboard embedded tweets at the top of their recent Spaces discussion about the Billboard Music Awards finalists:
B. Lack of active, participatory listening. In Clubhouse, you are either onstage and ready to speak, or in the audience with no opportunity for interaction. There’s no in-between space for live engagement or feedback like what we would normally expect during a stream on Twitch, YouTube, Instagram or Discord. Real-time audience comments, questions and reactions are as much a part of the entertainment experience around a livestream as the content of the stream itself. Existing social platforms arguably have the user interfaces to incorporate this active listening into social audio more easily, as opposed to starting from scratch.
C. Fragmented follow-up engagement. Right now on Clubhouse, if you want to keep talking and connecting with people you met in a room after the fact, you either have to start a new live room with them, or go off-platform to another social app like Twitter or Instagram. In contrast, if you start a social/live audio conversation on Twitter, Slack, Discord or Facebook and want to keep in touch with people who spoke or were in attendance, you can simply follow or message them directly on that respective social app. The follow-up engagement experience is less fragmented, and helps position those social platforms as the central, go-to hubs for engaging with certain communities or groups of people.
That said, it does depend on the community you’re trying to speak to. For instance, you may have noticed that there are a ton of NFT-related rooms that take place on Clubhouse every day now. While this may feel like too much hype, the truth is that Clubhouse is a better, more central hub to spontaneously connect with other people interested in NFTs than, say, Twitter Spaces or Locker Room. This is in part due to the early-adopter and power-user base behind Clubhouse, which is much more tech-savvy and interested in entrepreneurship.
In contrast, Nick Jonas recently hosted a live Q&A for his latest album Spaceman on Twitter Spaces. This choice makes so much more sense for Jonas than Clubhouse, because a sizable chunk of his tens of millions of fans around the world are already active on Twitter. Why redirect them to a new app and add more friction to a process that these fans are so eager to participate in?
It might feel like incumbent social-media companies are investing in social audio features just because it’s buzzy — a strategy that Hot Pod’s Nick Quah recently described as “toss[ing] a bunch of in-category clones and incentives at the wall to see what sticks.” To really cut through the noise and make these investments worthwhile, I think companies and artists alike can start by asking themselves how they will really address the above three issues. Namely, how will you use social audio in your existing communities to add rather than strip context? How can you make the listening experience more active and participatory? And how can you optimize for follow-up engagement after the fact — such that social audio is just one part of a longer, more far-reaching communal journey?
III. Social audio = standalone format for real-time creativity
Companies executing on this vision: Clubhouse(?), Sonar
The previous two sections — social audio as a podcast workflow tool, and social audio as an embedded community-engagement tool — assume that social audio is merely a feature, not a standalone platform.
I think this assessment is true, if we limit our imaginations of what’s creatively possible.
I can’t speak for Clubhouse; I have no idea if they expect to get acquired in the next year, or have ambitions to continue developing a standalone social/live audio app. But one thing I hope people take to heart is that the most exciting part about the potential future of Clubhouse (and any standalone social audio platform) is not just about more conference calls with Elon Musk, Mark Zuckerberg and the Andreessen Horowitz team. Nor is it just about professional networking, or flaunting your awards or credentials regardless of whether they’re legit.
There are real opportunities for experimentation and development around social audio as a creative format that could warrant its own separate space — especially around spatial/360 audio and real-time, low-latency creative collaboration — if companies want to invest in it.
For instance, something that I’m shocked is missing from most conversations on the “future” of Clubhouse is the fact that many of the app’s most talked-about events have treated Clubhouse not as a professional networking stage, but as theater and performance art:
- The weekly “Cotton Club” room required everyone onstage to change their avatars to a black-and-white image of a famous jazz musician, after which point you could request songs to play in the background and even order virtual drinks and hors-d’oeuvres from the hosts.
- A crowdsourced, audio-only, live staging of The Lion King, with a 40-member cast and real-time instrumentation, drew over 5,000 concurrent listeners on the app.
- The comedian Roy Wood, Jr.’s humorous “Chaos Room” invited hundreds of people onstage and encouraged them to unmute their mics and all talk at the same time, rendering the room intentionally unintelligible.
- The interactive, shoot-your-shot bar simulation “NYU Girls Roasting Tech Guys” has humiliated everyday tech bros and international celebrities alike, and the women behind the series are now repped by WME as they aim to expand the brand across podcasts, books, TV, film and more.
In all of these examples, the hosts weren’t just creating another podcast or hosting a discussion that could have taken place on Twitter or IG Live; they were taking unique advantages of the creative and moderation features on Clubhouse in a way that wouldn’t make sense on any other app.
Unsurprisingly, there are more startups than incumbent tech corporations that are interested in innovating around social audio as a creative format in this way. For instance, Sonar (pictured below) is a social app built not just on live audio, but also on spatial audio (i.e. the audio you hear will depend on where you are spatially in a given room; each user is represented by a dot and can move around the room freely), as well as on a visual layer that anyone can customize with emojis. I’ve seen people use Sonar for all-out emoji-based art exhibitions, theater auditions with blocked-off “waiting rooms” and legit language lessons in virtual classrooms complete with chairs and whiteboards. There is a strong sense of exploration, discovery and spontaneous connection in Sonar that Clubhouse seems to be clinging onto right now.
Trying to capture and build for these creative experiences can sometimes feel like trying to catch lightning in a bottle. But incumbent social networks already dominate our existing social graphs, and incumbent podcast companies will likely dominate the transition from live to on-demand audio. For standalone social audio apps, a clear, deep understanding of how live audio stands out as an emergent creative tool is perhaps the only strategy left — and a really powerful and underrated one at that.
So… what does this all mean for the music industry?
Clearly, there’s a lot going on in the world of social audio right now. Where does that put artists, music fans and the music industry at large?
In my mind, there are two main takeaways:
A. “Linear dies and on-demand wins” is bullsh*t
The rise of social audio is debunking one of the biggest myths that one of the world’s largest music streaming companies has been pushing to the public for the past year.
Almost exactly a year ago (April 2020, the same month Clubhouse launched), Spotify CEO Daniel Ek was wrapping up a quarterly earnings call when an analyst asked him about the company’s stance on livestreaming. At the time, the COVID-19 pandemic was driving unprecedented activity around livestreaming — both from incumbents like Instagram and YouTube, and from a burgeoning ecosystem of dozens of new video and audio livestreaming startups, including but not limited to Clubhouse.
Ek’s response to the analyst was telling, and ultimately inaccurate (emphasis added):
… while most focus on the competition between streaming services, we continue to be focused on the billions of users that are listening to linear radio. The 20-year trend is that everything linear dies and on-demand wins. This is a trend that we suspect will be accelerated by the COVID pandemic.
Ever since Spotify went public in 2018, their executives have been pushing a similar rhetoric to investors: It’s all about providing a better, more personalized and more convenient offering to fans (and a better financial deal to artists and music rights holders) than terrestrial radio.
Fast-forward to today, and it’s amusing how Spotify seems to have done a complete 180 on this stance. Yes, they’re still interested in competing directly with terrestrial radio — but they’re now using the exact linear techniques that they shunned just a year ago to do the job. In fact, according to Bloomberg, they’re hiring over 100 people right now to expand their live audio strategy, hot on the heels of their Betty Labs acquisition.
And it’s not just that Spotify reversed course on its strategy; if the evolution of the tech industry at large is any indication, “linear dies and on-demand wins” is just plain wrong.
Ever since Facebook launched its native livestreaming features in 2015, tying together live and on-demand media has been table stakes for big-tech companies and social platforms to stay competitive — and for Internet creators and influencers to build sustainable audiences:
- Twitch — artists can monetize not only in real time during livestreams on Twitch, but also on-demand via clips on Twitch, YouTube, Instagram, TikTok and more.
- Instagram — artists can go live via IG Live and then maintain an archive of these live shows on IGTV and the standard IG feed, and/or sell these archives to a larger rights holder (e.g. Verzuz selling to Triller).
- YouTube – artists benefit not only from on-demand media distribution, but also from a growing suite of native livestreaming features like Premieres and Live Redirects.
- TikTok — artists can not only benefit from the viral distribution of on-demand song clips and videos on the app, but can also livestream to and monetize their followers in real time.
- The list goes on.
So, in the playing field of big tech and live media generally, Spotify is actually catching up to everyone else. The future of social and live audio, just like the future of social and live video, is about more diverse ecosystems for creation and engagement — not about one format taking over the other.
B. Social audio remains an untapped direct-to-fan marketing tool
Perhaps the good news for the music industry is that most of the platforms where music fans are active are now investing in native social audio, and that a much wider and more mainstream audience is aware of the power of apps like Clubhouse. This means that, both right now and post-pandemic, social audio can serve as a crucial part of many artists’ direct-to-fan marketing and monetization toolkits.
In fact, a handful of social audio applications have already been partnering extensively with musicians to prove early use cases. 2020 saw an unprecedented number of producers, labels and other music brands launch their own Discord servers, hosting live audio meetups, production sessions and beat battles exclusively for server members. Indie artist Axel Mansoor was Clubhouse’s official icon for several months, and other artists like Fat Tony have used the app to host virtual album release parties. Aside from Nick Jonas, celebrities like Taylor Swift and NCT have also used Twitter Spaces to host fan conversations and premiere new music.
To reiterate everything discussed above — it’s important to continually keep expanding our minds of what’s possible creatively with the social audio format, and not just be limited to a top-down conference call, webinar, interview or panel discussion. With music in particular, there’s so much room to experiment, especially with newer rather than incumbent apps:
- Immersive, interactive audio installations and audio worlds (especially relevant for apps like Sonar that incorporate spatial live audio)
- Listening parties
- Live superfan Q&As and game shows
- Gaming meetups
- Real-time jam sessions and collaborations
Of course, artists should only invest in social audio if they’re comfortable with the format and with open conversation, and if it makes sense based on the channels on which they’re already active. Just like at the company level, individual artists and their teams should always have a clear answer to what value social audio will really provide fans at the end of the day, beyond just riding the current hype train. But from that point of clarity, the possibilities are endless.