PATRICK JOSEPH BESONG: July 2009 Archives


Now through the magic of Photoshop, the secret is finally revealed...

I was going to write a blog post about MacSpeech Dictate, a piece of software that I got today that does speech to text, but I thought "Why type it when I can just say it?" So, that's what I'm doing right now, and you get to see how well MacSpeech Dictate works after a short training session. I was really surprised how well it interpreted almost every word that I said.
We had done a project last year for Philosophy 12 that required us to caption many videos. We used Dragon Naturally Speaking to make a first pass of the text of the videos. Trying to capture the text directly from the videos will not yield very good results, since the speaker has not done any training in MacSpeech Dictate. The trick, however, is to wear a headset microphone where you can hear what is being said in the video, and you repeat it into the microphone. Since your voice is recognized by the speech to text software engine, you get much better results. It is still pretty tedious, however, as you need to keep replaying the video to make sure you got the words right. But once you get through the whole video, you have a pretty good text transcription of the video. Now to make captions, I can just import this text file created by MacSpeech Dictate into my Parity software. I just set in the Parity preferences how many characters per line I want, and when I import the text, it will try to break the text up into that many characters per caption, and also try to break at commas or periods. Then I just need to load my movie and click the Set Timecode button once, and then just hit the Return key as each phrase is spoken in the video. Once I get to the end of the video, I'm done. I can export as one of 14 different formats, including SRT, which can be uploaded to YouTube where it is set as captions.
As I was using this process to caption Cole's latest YouTube video, "A Response to a Call for Amazing Stories of Openness from PSU", I got to thinking "What if I got Cole to record himself reading the training script for MacSpeech Dictate?" So, I could play back his recording of the voice training script into MacSpeech and create a new speech profile for Cole's videos. I'm thinking I could possibly get a more accurate direct translation from the videos this way, and I wouldn't have to go through the process of repeating everything he says. I could just let the video run and let MacSpeech transcribe it for me. It probably won't be as good as live training, but I think it might be worth a shot.
So, with very little editing (I'd say it got 95% correct), I've typed my latest blog post without typing it all. What do you think?
