AI Sound Effects

A new AI thing launched a few days ago, which can generate sound effects from a text prompt:
https://elevenlabs.io/app/sound-effects

From all of my posts criticising the hype of AI, I do try to keep an open mind but also keep the ‘critical thinking’ gears engaged.
With any of these AI projects, my first two questions are:
1. what was it trained on? did they have permission?
2. what are the rights for anyone who uses it?

So for Q1: First I hunted through their website and could find no mention whatsoever of what it was trained on, which seems strange. But checking their twitter feed they announced: “Thank you to our partners @Shutterstock who provided licensed tracks from their expansive and diverse audio library to help create our model.”

Interesting.
More on that in a second, but for Q2 there is nothing in their terms about copyright. Who owns a sound effect that is generated? Why this matters is that if a film soundtrack is audited, or a DMCA notice submitted to a production company due to a possible copyright breach, then there has to be proof that a legit license exists for the use case. Without it, the sound effects are worthless for professional work.

I eventually found a contact for legal questions and submitted a request:

“I have searched your terms & conditions but cannot find the answer to a very simple question. Can you please advise? When I generate a sound effect with your new AI product, who owns the copyright of the generated sound?”

This morning I got an initial response:

“Thank you for reaching out with your question about copyright for sound effects generated with our AI product. I understand that clear information about ownership is important, and I appreciate your patience as I look into this for you.
 I’m going to escalate this question with our legal department to get the specific details and ensure you have an accurate answer on this as it’s a new feature. I’ll get back to you as soon as possible with the information you need.”

So they launched a product without clear legal info on the rights of use.
Interesting.

While I wait for an answer from the legal team I kept searching.
First I found this:

Wow ok. So you “can” use it for commercial purposes BUT you are the one who is liable for it.
I kept reading. Then I found this:

Interesting.
While you the user are liable to the full extent of the law, they are protected to a maximum of US$100.

Next I thought I’d check out what Shutterstock have to say about AI, since it is their sounds being mashed up by the AI.
With regards to compensation for people who upload media to their site, for licensing they state:

 

Wow.
It feels like a pattern is forming: “all care, no responsibility.”

I’ll update this when the legal team clarify the actual use of their “text to sound effects generator”

Next I thought I will try it out, how useable is it?
The hype is certainly there in large letters.

Now I don’t know if this is the marketing department having too much coffee, or their example users having very small imaginations but one thing I can assure you is that no, they cannot generate any sound imaginable. When I have mentioned the pathetic hype associated with AI on this forum many times, this is what I mean. Sure, make marketing claims. But don’t promise the entire world of sound when (a) you can’t and (b) you dont even have the legal framework resolved.

From my tests I found that the sounds it generated were very low fidelity, like worse than a $200 handheld recorder quality. And the specificity is non-existent. A very explicit text description might get vaguely close one or two times out of ten. Now I can imagine the typical response is ‘but wait, it will get better’ but that idea has some issues. First, it only gets better with a combination of (a) more data and (b) user feedback. Eg if you ask for a “brick thrown on the bonnet of a car” and you choose the 10th version, you are then training the AI with user feedback. Good luck with that, you’ll need a lot of people with spare time on their hands…

But to that point, what I really discovered, or realised was that “AI Generation” is incredibly unreliable and the only thing that makes it useful is a human sifting through the useless crud looking for a gem. And by a gem, I do not mean a pearl or a diamond. I mean a bit of coal, or a stone, or something even remotely useful. Why is that a problem? Well as the quote goes “Time is the school in which we learn, Time is the fire in which we burn.” The one thing all humans have in common is that time is their most valuable commodity. When I think of how a sound effects editor or sound designer works, they have a huge resource right in front of them: their sound library. And when they put a “text prompt” into their sound library app, eg if there is a “brick thrown on to a bonnet” it will be shown immediately. They audition it & beginning working with it. Some sounds require many components and some of the best components have nothing to do with the first search term at all. People have been putting lion roars etc into explosions etc for a very long time. We love that!

But AI reduces you to someone auditioning sounds from an unreliable low resolution source, where even explicit descriptions do not guarantee anything even vaguely close. It really is like the ‘use glue on your pizza’ google AI search.

I’m sure they will continue receiving VC to progress whatever it is they aim to achieve. But the residual thoughts I have at this stage are:

– its a solution looking for a problem. Casual use on their free tier does not keep the lights on. They have to find a paid commercial use.

– AI use like this will die from a thousand cuts. Just as there are legal ambulance chasers, there will also be AI copyright infringement lawyers, who will gum up the aspirations of these companies before they ever get to the point of it being fully functional. Its a whole new industry for them, and ironically they will be using AI to do it!

I’ll update this when their team clarify the legality of use…

Ok thanks for coming to my TED talk

_________________________________________________________________________

And today, another:
https://stability.ai/news/introducing-stable-audio-open

“The new model was trained on audio data from FreeSound and the Free Music Archive. This allowed us to create an open audio model while respecting creator rights.”

I have submitted a request to them too, for clear legal guidance on use.

FreeSound terms are here

“License restrictions when publishing new sounds that include/modify/remix other sounds…

I have also submitted a request to Freesound, asking if they are aware & whether it follows allowed use. And also whether there is an OPT OUT button for FreeSound users.

 

 

_________________________________________________________________________

20240606
Update 1: from Free Music Archive
“We did not give Stability.ai permission.
To be continued.
Team Tribe of Noise”
_________________________________________________________________________

20240606
Update 2:
stable-audio-open:
“All audio files are licensed under CC0, CC BY, or CC Sampling+”

I found this info here:
https://news.ycombinator.com/item?id=40587685#40588214

blargey
If you look at the repo where the model is actually hosted they specify
> All audio files are licensed under CC0, CC BY, or CC Sampling+.
These explicitly permit derivative works and commercial use.
> Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
So it’s not being glossed over, and licenses are being abided by in good faith imo.
I wish they’d just added a sentence to their press release specifying this, though, since I agree it looks suspect if all you have to go by is that one line.
(Link: https://huggingface.co/stabilityai/stable-audio-open-1.0#dat… )

https://huggingface.co/stabilityai/stable-audio-open-1.0#datasets-used

Datasets Used
Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model (t5-base) for text conditioning.

Attribution
Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
FreeSound attribution [csv]
FMA attribution [csv]
_________________________________________________________________________

20240607
Update 3
Freesound has written a blog post discussing the issue

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *