The journey to episode one

Writing this as my final version of episode one is exporting from my video editor and I start uploading it to platforms. The first episode is now more of a short introduction to the series and concepts I hope to cover. Future episodes are more likely to be deeper dives on a single topic.

There is quite a lot I have learnt to get this far. I’d say most of my time has been spent trying to find the right ‘text to speech’ (TTS) system to use. I am not a big fan of the rolling monthly subscription model which is what most of the online systems offer. Ideally I would like to run my own offline TTS system but after trying quite a few I have not found one that I like yet. I may change to one of them in the future if I do get one up and running locally.

Recently when searching again as I had finally finished the text of episode one I discovered speechgen.io which offer a per word type of billing with no time restrictions. So you just top up your account when you run out of credits. Which works much better for my needs as these will not be regular yet.

The voice I chose is ok. There are certainly a few times you notice that it’s not real, especially if you listen carefully. The one thing I have done that makes it sound much more human is to add in breathing pauses. The original output, like most TTS systems I have tried, have constant spacing between sentences. In real life, if you say a long rambling sentence then you will take a longer breath than if you are listing short phrases.

So putting in a few dramatic pauses makes it sound more natural. I feel it also give the listener more time to process the word spoken. I guess the GPT systems will crack this soon if they have not already but till then I highly recommend adding a few bit of silence after things you want people to take note of. It seems to make them land better.

I really got into the TTS way of writing though. There are obvious problems that really jump out when you hear someone read back what you have written. From them annoying spell check errors that you fail to pick up to the overly long sentences that just sound a bit convoluted when spoke.

During the writing process this TTS was done on my phone where I have a very basic free software that will read out my draughts. So I can get it right before paying for the final version. It’s also a good way of gauging how long the final version will run for.

While doing this I got a little looping background, from footage I took so no AI there. Put an overlay over that with a much higher loop rate, which hopefully makes it less monotonous. In a similar vein under the audio is a barely audible sounds from a video I shot. It’s a stormy beach video with crashing waves. I slowed it down by 4 and made it so you can barely hear it. Again I’m hoping this tricks a few people brains into not realising that it’s an artificial generated voice.

I also discovered PeerTube and I will be publishing there firs to start with then filter it out from there.

https://spectra.video/a/eewbf/video-channels

I also intend to host audio for the podcast version here on this site so you can subscribe via rss. Though will get round to adding the audio and video on any platform that will have it.

Recent Posts

Categories