MacSpeech Dictate for Speech-to-text Video Captioning
I was going to write a blog post about MacSpeech Dictate, a piece of software that I got today that does speech to text, but I thought "Why type it when I can just say it?" So, that's what I'm doing right now, and you get to see how well MacSpeech Dictate works after a short training session. I was really surprised how well it interpreted almost every word that I said.
We had done a project last year for Philosophy 12 that required us to caption many videos. We used Dragon Naturally Speaking to make a first pass of the text of the videos. Trying to capture the text directly from the videos will not yield very good results, since the speaker has not done any training in MacSpeech Dictate. The trick, however, is to wear a headset microphone where you can hear what is being said in the video, and you repeat it into the microphone. Since your voice is recognized by the speech to text software engine, you get much better results. It is still pretty tedious, however, as you need to keep replaying the video to make sure you got the words right. But once you get through the whole video, you have a pretty good text transcription of the video. Now to make captions, I can just import this text file created by MacSpeech Dictate into my Parity software. I just set in the Parity preferences how many characters per line I want, and when I import the text, it will try to break the text up into that many characters per caption, and also try to break at commas or periods. Then I just need to load my movie and click the Set Timecode button once, and then just hit the Return key as each phrase is spoken in the video. Once I get to the end of the video, I'm done. I can export as one of 14 different formats, including SRT, which can be uploaded to YouTube where it is set as captions.
As I was using this process to caption Cole's latest YouTube video, "A Response to a Call for Amazing Stories of Openness from PSU", I got to thinking "What if I got Cole to record himself reading the training script for MacSpeech Dictate?" So, I could play back his recording of the voice training script into MacSpeech and create a new speech profile for Cole's videos. I'm thinking I could possibly get a more accurate direct translation from the videos this way, and I wouldn't have to go through the process of repeating everything he says. I could just let the video run and let MacSpeech transcribe it for me. It probably won't be as good as live training, but I think it might be worth a shot.
So, with very little editing (I'd say it got 95% correct), I've typed my latest blog post without typing it all. What do you think?

Wow! Great stuff Pat. Blog and twitter entries saved as text from speaking into an iPhone can't be far off...
Yes, that would be cool. I'm thinking if the speech-to-text engine could be put on a server and you could send recordings to a folder in your personal space, which could perhaps trigger the speech-to-text engine to analyze the recording and provide you with a text file much like YouTube does with videos. To take part in the service you would have to submit a recording of the training script so the engine could analyze your voice and create a profile for your userid. That would be pretty slick.
I love this idea, Pat. I can imagine setting up a global profile that stores my voice training script and it can be auto applied to anything I produce. I'll record and send you the script you sent me so we can try this thing out. Great idea!
Pat, this is neat. Do you have information on organizations that are using it and how they are using it?
When I was at NMC this year, Morgan Reed from the Univerity of British Columbia did a demo at the 5 Minutes of Fame on transcribing digital recordings. If you go to http://www.nmc.org/2009-summer-conference/videos and select the NMC Five Minutes of Fame video from the list you can skip to about 46:20 to see his demo using MacSpeech. He is using headphones and a mic to repeat what he hears.