AI Music Just Had Its ChatGPT Moment (Udio & More)


Alright, so AI music is going through its own renaissance right now. Some people are saying it’s having its chat GPT moment. And while that’s more of a subjective statement, objectively, we got many new tools, versions of tools, and some techniques that make things possible that you couldn’t have done as a consumer a few months ago. Mainly Udio, which just released, and all of the internet is going crazy about this because some of these voices are so darn impressive.

In many cases, they’re better than Suno, but there are other limitations. We’re going to talk about all this soon, as there is a lot to explore and learn here. And that’s why today I’ll be showing you the most interesting and revolutionary AI audio tools that I discovered over the last few months. This is a topic I haven’t been covering too much, but I’ve been following it with great passion. Because to be honest, some of these apps are the most fun I’ve had with AI ever since the release of GPT-4 and it got me equally as excited as seeing what Sora can do or is going to be able to do but the good thing is these tools are here now so without further ado let’s have a look at the AI music landscape what apps do you need to know about and as per usual I’ll leave you with a few tips and tricks and recommendations alright so first things first what prompted me to create this video today a lot of things have come out over the past few weeks but I’m creating it now it’s this thing yudio.com as we’ll talk about in this video there are direct competitors Suno and they just create songs from next to nothing for people that have known musical knowledge whatsoever. And apparently this is a more capable version of Suno, which is a wild statement because Suno recently came out with their V3. If you’re not familiar, this is your type in one word, and it’s going to give you a full-fledged track ready to go with vocals, with audio, with everything. Very nice. In a second, we’ll look at some of the best examples our community has created. But basically everybody was blown away by Suno a few weeks ago, and now Udio comes out and raises the bar. They do stereo audio and multiple vocals in one, and the quality is just higher than anything we’ve seen before. Just have a quick listen. If you want to hear the full thing, the link is below. Notice how you had multiple voices there at the same time. Incredibly effective. And if I heard this on the radio, I don’t think I would immediately realize this is AI, right?

As a matter of fact, if you want to create a song, all it takes is this little box up here. You can simply sing a song about a cat with a hat. Excellent, right? And then you can pick a genre here. Now, there’s a bit more customization if you want, but we’re going to go with country here. And I’m going to make the lyrics auto-generated. You could craft these with ChachiBT and simply go into custom and insert your lyrics here. But for now, we’re going to stick with auto-generated, and in manual mode, you essentially turn off the prompt generator that runs in the background. So if you have a more detailed prompt, or if you crafted your prompt very carefully and you want it to respect it precisely as you put it in, turn on manual mode. Usually, you can keep this off, as the prompt generator works pretty well here. And then I just say create, and here’s the thing. And we’re going to keep this in the video because right now it is very, very overloaded. It works here and there. Sometimes it takes 20 minutes. Sometimes it takes an hour. Sometimes it doesn’t work at all. So many people are using these tools right now, and they seem to have not prepared for all the traffic that is coming in, so a lot of times you’re just going to get errors. So it might take a little bit of patience, but as you can see, when you’re creating these, you’re going to see that you have 600 tracks and another 600 tracks. So 1,200 in total. 600 of those are going to be priorities, meaning they’re going to generate faster. So let’s have a quick listen to this cat with a hat song that we just created. Okay, that’s pretty good. So then you could go ahead and remix this. And I really like this. You could customize the lyrics, you could change the style, and you could adjust the variance here to make it super similar to the original track or super different. This is something that Suno has been lacking.

So I really like this. Once you have a track you’re happy with, you can just simply click share and download audio here or shared with people like so this is on a free account all i did is log in with my google account and i’m right here depending on the point of time when you watch this video this might have changed but as of now this is just a beta when you go to the pricing page it doesn’t even show your price it just says hey we just got started so i think i illustrated the point of how to create songs pretty well it’s really not that complicated this is made to be very simple but for now i just want to show you some samples of what yudio sounds like and then i want to talk about what i subjectively feel like is better here than in any other app before Let’s have a listen. So first things first, let’s point out that this is an instrumental, and the thing that is exceptional here is the clarity of the instruments. And a lot of people across the internet have already stated that these are the best AI-generated guitars that have ever been here. In this case, the banjo is absolutely incredible. Now let’s look at another one to highlight some of its voice-generation capabilities, which are also very advanced. That is really great. So I’m not sure how to put this objectively or how to measure it. But simply put, the voice generation here is a lot better than in Suno. In Suno, if you pay attention, it simply sounds like it is AI-generated. In this case, for me, this might have just crossed an uncanny valley. The voice generation here is just next level to what we have had so far. Have a listen for yourself. But look, a lot of this is subjective, and I’ll honestly tell you I’ll still keep using both. I think Suno might be better for certain genres. This is more clarity, and the voices are generally quiet. I don’t think this is the night-and-day type of situation that most people make it out to be. But look, there’s one major limitation here. If you create these tracks, they’re going to be 30 seconds long. But here’s an important button that you could easily miss when you just open this up. It’s this extension button. And what it does is add another 30 seconds to your track. So let’s just extend this one right here. We say add a section, auto-generate lyrics again, and extend. Again, this might take some time, but once you have it, you can extend your track to a minute, and then you can do this for up to four minutes in total.

But here’s the thing: Urio is just one tool, and a lot of these you want to use in conjunction, and you want to know about the strengths and weaknesses of different ones. So that’s why in this video we’re going to be exploring multiple tools, and then you can use whatever you need while creating music with AI, or you could even use some of them together. Well, there’s really nothing. I want to talk about stable audiov2, audioshake.ai, and suno.com. Okay, so these are different tools for different purposes. What do you need to know? Suno creates full-fledged songs, including visuals, just like UDio. They’re essentially competitors trying to do the same thing. This is really made for people who have no idea what they’re doing with music and who want to create tracks. That makes it an incredible toy to play with. [“Stable Audio V2”] As a matter of fact, it’s so good that we challenged our community members to create different tracks with it, and I’ll show you some results here in the end so you can see what worked best for them. But these other ones are quite different. For example, stable audio is the most secure one. All of these give you the rights to the track, so to state that there is no copyright on top of them because they’re original works of art, Now, there’s a whole discussion around that; let’s not get into it, but basically, Stable Audio takes it a step further because they train their entire AI on licensed tracks that they already own. Meaning, if you want something that is future-proof 100% of the time, stable audio is going to be your friend, but this thing doesn’t generate vocals. It just creates all these different styles and more, and this is best used for background music. So as you can see, sounds like this can easily be generated in here. Plus, there’s another function where you can actually upload your voice and then it comes back with a track. This is really fun to play with as you can upload one track and then transform it into another, and that is something you’re not able to do in some of these others because, if I move on over to the next one, I talked about audio shake. This one was actually brought to my attention by a Grammy award-winning music producer that I met while attending the Imagine AI live conference in Vegas two weeks ago. His name is Young Spielberg, and he was holding an entire presentation around AI audio tools.

Now, this is the one that stood out to me. This is the one that I haven’t used or seen before, Audio Shake. Apparently, this is by far the best stem separator in the whole game. So this is the other side of the spectrum. On the one hand, you have Suno, which is for people who know nothing about music. And then you have Audio Shake, which takes a track and splits it into stems. By the way, the stems are the different tracks. So drums, guitar, vocals, etc. And it can create new music from that if you know what you’re doing. All of the power of AI. Now, the thing about Audio Shake is that it’s prohibitively expensive. Look at that. In the lowest plan, one stem is $5. So if you want to take four or five stems from one track, then you’re going to be paying $20 per track right there. And that’s the MP3. If you want the high-quality version, it’s twice that. And here’s an example of what that sounds like: So here’s the full track. Here’s just the vocals. You get the point. And according to him, this is the best one out there in terms of splitting stems. But now we get to the last one, which is Suno. And this is the one that I want to spend a little more time on, because this has been so much fun. Plus, I gotta say, many people who are not that deep into AI actually brought up Suno in a conversation with me recently because somebody recommended it to them. They tried it, and they had an incredibly fun experience playing around with it. And this has been a reoccurring theme. I can confirm. My sessions in here when I create music with Suno are equally, maybe even more fun as listening to some of my favorite music because you’re just more involved and you get to direct how these songs are created kind of I mean it’s very limited in the creation but after having played with this app a lot there’s a few things that work and a few things that don’t work so now that I gave you an overview of what different apps exist and what you would want to use them for I’m gonna show you what actually works so first things first there’s two versions there’s V2 and V3 with V2 you get some free credits when you sign up and you can just start playing around immediately as of now but V3 is where they raised the bar and the quality is just at a level where you’re like whoa don’t get me wrong V2 is really good and fun to play with but V3 is the mind-blowing one as you can see here all it takes is a short description and you pick which model you want to use that’s how simple this is okay sure you can turn on custom mode and plug in your own lyrics this is what I did a lot you can write the lyrics with chat GPT and then insert them here you can pick a style of music a simple Google search should give you some ideas here but here’s the thing not all of these are created equal and this is kind of the main learning that I want to transport through this video because the quality of these Suno songs are going to depend upon what this model was trained on and we don’t exactly know what it was trained on. All we can do is experiment with it and get a feel for what works better and what doesn’t work so well.

So now I’m going to show you all the tracks me and the community created that worked really well, so you know which genres to pick for your own creations, which are probably going to also work really well. And here’s the thing: from my testing so far, everything that works inside of Suno also works in UDO. So the same principles that we talk about in this video apply to both, so pay close attention. And I’ll just start with my creation here because this is what I found when you go into more classic genres where a lot of the music is going to be in the public domain, like a classic orchestra, and then you can even add a female solo singer. These more classical genres always work better just because there’s so much copyright-free music in the public domain, meaning it’s so old that it’s not copyrighted anymore that they could train up on this, and the results are really good, so here I created something that I call the AI advantage anthem. Let’s have a listen. Incredible, right? And the reason for that is because the slightly grainy feel of the voice makes it easier for the model to generate that, plus the classical music in the background. It just has many examples. That’s why it’s going to do really well on this. One minor tip is that if you want to say something like chat GPT, then separate the letters, and it’s going to read it correctly. Same thing with AI over here. This is something I discovered when crafting the lyrics for this. Okay, so here’s another one, and this one was created by community member CJ, and he created True Hearts tribal rhythm here. Chillhop is this ambient lo-fi type of hip-hop, and it works really well because the internet is scattered with those.

So again, if you just think about what kind of music they had a lot of access to, it’s also going to work really well here. Let’s have a quick listen. That’s really good. Now, you might question the quality of the lyrics versus a real-life singer, but that is just a thing where we’re relying upon an AI voice synthesizer. I think that’s a good way to think about this. Think about the music; that’s what we talked about the past few minutes, but then also think about the vocals and think about the fact that that’s going to be AI-generated. So I think here the voice is really good, but the beat is almost flawless. And that’s just not going to be the case with every single genre. Some just really don’t work that well. I know a lot of it is subjective, but to me personally, the pop songs or the techno songs, I don’t know, don’t really work. It’s just that with such high-quality pop music, this just doesn’t stand the test of direct comparison. Nevertheless, if you go in and you actually use some smart lyrical writing like Romany did here, she actually won the challenge when we issued this. A combination of multiple musical styles, like you can see right here, plus some smart lyrical writing, makes it so that even with version two, there’s the free version. She created something that really captured the imagination of many community members, and she was voted to be the winner of this challenge.

About Anushka Agrawal

Leave a Reply

Your email address will not be published. Required fields are marked *