In the end of the previous month i released a game called "Isekaing: from Zero to Zero" - a musical parody adventure. For anyone interested to see how it looks like, here is the trailer: https://youtu.be/KDJuSo1zzCQ
Since i am a solo developer, who has disabilities that preventing me from learning certain professions, and no money to hire a programmer or artist, i had to improvise a lot to compensate for things i am unable to do. AI services proved to be very useful, almost like having a partner who deals with certain issues, but needs constant guidance - and i wanted to tell about those.
Audio.
Sound effects:
11 labs can generate a good amount of various effects, some of them are as good as naturally recorded. But often it fails, especially with less common requests. Process of generation is very straightforward - type and receive. Also it uses so much credits for that task that often it's just easier to search for the free sound effect packs online. So i used it only in cases where i absolutly could not find a free resourse.
Music:
Suno is good for bgm's since it generates long track initially. Also it seems like it has the most variety of styles, voices and effects. Prolong function often deletes bit of previous aduio, you can to be careful about that and test right after first generation.
Udio is making a 30s parts, that will require a lot more generations to make the song. Also it's not very variable. But, unlike Suno, it allows to edit any part of the track, that helps with situations where you have cool song but inro were bad - so you going and recreating that. The other cool thing about it that you have commercial rights even without subscription, so it will be good for people low on cash.
Loudme is a new thing on this market, appeared after i was done making the game, so i haven't tested it. Looks like completley free service, but there are investigation that tells that it might be just a scam leeching data from suno. Nothing are confirmed or denied yet.
If you want to create a really good song with help of AI, you will need to learn to do this:
Text. Of course you can let AI create it as well, but the result always will be terrible. Also, writing the lyrics is only half the task, since the system often refuses to properly sing it. When facing this, you have two choices - continue generating variations, marking even slightly better ones with upvotes, so system will have a chance to finally figure out what you want, or change the lyrics to something else. Sometimes your lyrics will also be censored. Solution to that is to search for simillarly-sounding letters, even in other languages, for example: "burn every witch" -> "bёrn every vitch".
Song structure. It helps avoid a lot of randomness and format your song the way you want to - marking verse, chorus, new instruments or instrument solos, back vocals or vocal change, and other kind of details. System may and will ignore many of your tags, and solution to that is same as above - regenerations or restructuring. There is a little workaround as well - if tags from specific point in time are ignored entirely, you can place any random tag there, following the tag you actually need, and chances are - second one will trigger well. Overall, it sounds complicated, but in reality not very different from assembling song yourself, just with a lot more random.
Post-edittion. You will often want to add specific effects, instruments, whatever. Also you might want to glue together parts of different generations. Your best friend here will be pause, acapella, pre-chorus and other tags that silence the instruments, allowing smooth transition to the other part of the song. You also might want to normalize volume after merging.
VO: Again, 11labs is the leader. Some of it's voices are bad, especially when it comes to portraying strong emotions like anger or grief. The others can hardly be distinquished from real acting.I guess it depends on how much trainng material they had. Also a good thing that every actor that provides voice to the company is being compensated based on amount of sound generated. Regeneration and changing the model often gives you entirely different results with same voice, also text are case-sensitive, so you can help model to pronounce words the way you want it.
Hovewer, there are a problem with this service. Some of the voices are getting deleted without any warnings. Sometimes they have special protection - you can see how long they will stay available after being deleted, but ONLY if you added them to your library. But there are a problem - if you run our of subscription your extra voice slots getting blocked, and you losing whatever voices you had there, even if you will sub once more. So i would recommend creating VO only when you finished your project - this will allow you to make it in one go, without losing acsess to the actors that you were using.
Images.
There are a lot of options when it comes to image generations. But do not expect an ideal solution.
Midjourney is the most advanced and easy to use. But also most expencive. With pro plan costing my entire month income, i could not use it.
Stable Diffusion is the most popular. But also hardest to use. There are a lot of services that provide some kind of a SD variations. Some of them are a bit more easier than others. Also some of the models don't have censorship, so if you struggle to create specific art piece due to censorship - sd is your solution.
Dall-e 2 is somewhere between. Not as hard as SD, not as good as MJ. Also has a TON of censorship, even quite innocent words describing characters like "fit" can result in request block. Also do not use it trough Bing if you want to go commercial - for some unknown reasons Bing does not allow that, but it's allowed if you use platform directly.
Adobe's generative tools are quite meh, i would not recommend them, except for two purposes. First - generative fill of the Firefly. It might allow you to place certain objects in your art. It does not work way more often that it does, but it's there.
The second service you might not know about, but it's CRUCIAL when working with AI. Have you ever got a perfect generation, that is spoiled by extra finger, weird glitch on the eye, unnessesary defails of clothing, etc? A photoshop instrument "spot healing brush" (or it's various knockoffs in other programs) will allow you to easily delete any unwanted details, and automaticly generate something in their place. It is something that will allow your ai-generated art look perfectly normal - of course, with enough time spent on careful fixing of all the mistakes. Highly recommend for anyone who wants to produce quality output.
Thanks to all that, i was allowed to create a game with acceptable art, songs, and full voiceover with minimal budget, most of it went on subscriptions to those ai-services. Without it, i would have no hope to produce something on this level of quality. However, there are negative side as well - there were "activists" who bought my game with intention to write negative review and refund it afterwards due to use of AI that they consider "morally wrong". However, considering that all other feedback were positive so far, i think that i have met my goal of creating something that will entertain people and make them laugh. Hopefully, my experience will help someone else to add new quality layers to their projects. I have all reasons to believe that this soon will become a new industry standard.
[link] [comments]