Recently in Cognition/Linguistics Category

Pack and Play Brownbag on Social Network Analysis

| | Comments (0)

ETS is happy to announce a "Pack and Play" brownbag session on Social Network Analsysis in which Elizabeth Pyatt will introduce general concepts of social network analysis.

About Pack and Play

"Pack and Play" brownbags is a new brownbag event designed to explore different topics and facilitate creativity/problem solving. They are currently being administered by Kate Miffitt (

About Social Network Analysis

Social network analysis (SNA) is the study of analyzing social connections between individuals and how this contributes to the overall community structure. This session will introduce concepts of social network analysis such as centrality, outliers and brokers and applications of SNA in fields such as sociology, politics, linguistics and epidemology. The session will include a brainstorming discussion of how SNA can be incorporated into educational technology, particularly analytics.


Possible Improvement in Speech Recognition?

| | Comments (0)

One of the challenges of video captioning is that it does rely human intervention to achieve the most accurate results. That's because speech recognition is only reliable in certain circumstances, usually when the speaker has set up a profile on a Dragon speech recognition engine (this could include instructors BTW).

To achieve the best transcription in other circumstances though (and human listeners require 96-98% accuracy), you usually need a person to do one of the following:

  1. Watch and transcribe a video
  2. Watch a video and correct speech recognition errors (e.g. "Rest in Peas" for "Rest in Peace")
  3. Have a videographer watch and repeat the words on the video through her or his trained speech recognition speech system

Note that all of the above assume that someone is spending time re-watching the video. Ugh!

Could an Easy Button be Coming?

What we are all waiting for is the captioning "Easy Button" that will allow use to upload any video file and presto - get back a reasonably accurate transcription regardless of the speaker.

The good news is that Norwegian University of Science and Technology (NTNU) has been working on new speech recognition algorithms. Unlike previous systems, it appears that this one will include a little more old-fashioned phonetic and phonological information and won't be quite as reliant on statistical models.

It still might not be perfect. As with current systems, you will need high quality recordings so the right amount of phonetic information can be retrieved. I suspect that any speaker outside known linguistic parameters (e.g. a speaker with an undocumented accent) will still be able to throw off the system.

But I am glad that linguistics is being included in the solution.

Why "Accessify" is a Word

| | Comments (0)

The accessibility been using the verb "accessify" in recent months and a question that has come up is - "Is accessify a word?" My answer is yes, and most linguists would agree. Here's why:

It Sounds Like English

With very rare exceptions, a word can enter a language only if follows the rules for permissible combinations of consonants and vowels. For instance "accessify" and "access" follow the rules, but something like "bcess" or "bccefmgi" would not.

It Uses an English Word Formation Rule

The suffix -ify is a suffix used to make new verbs out of nouns like accessibility or mystery (mystify). acid (acidify) and even class (classify). The suffix -ify isn't the only option. Another more common suffix is -(r)ize, but "accessorize" is too ambiguous to be useful.

It's on Google!

The previous two characteristics apply to possible words, but the question "Is accessify a word?" is also asking if it's used, preferably by someone who knows about accessibility. And the Google search results confirm that people are using "accessify" quite a bit, especially the people at accessify.comand Accessify Forum.

Still the Doubt...

And yet...this word is not accepted everywhere. For instance, it is not yet listed in the Oxford English Dictionary. The word "accessify" is still working its way through social channels.

I have faith in it though. If there's one thing accessibility can use it's a verb that explains the process of optimizing documents and tools for accessibility concisely and clearly.

Understanding Speech Recognition

| | Comments (0)

One of the dreams of solving the captioning backlog is to rely on speech recognition. I do have to say that speech recognition is far more effective at time than I would have dreamed, but still my intuition has told me it's not entirely working. A fascinating article from Robert Fortner on "The Unrecognized Death of Speech Recognition", essentially backs up the intuition with some hard numbers. He notes that accuracy has not improved much since the early 2000s and that in most cases, the rate is not within human tolerance (humans apparently have about a 2% error rate and even that can lead to some pretty ridiculous arguments).

When Speech Recognition Works

Speech recognition can be effective in two situations

  1. Specific context (airport kiosk, limited menu commands) - even here though it should be noted that it's pretty darn easy to frustrate the average health insurance voice recognition system so that they give up.
  2. Specific speaker - Speech recognition is effective when trainied on a single voice, and the training time is shorter than it used to be. For captioning purposes, this means that if a single speaker makes the original audio (e.g. faculty lecture) or someone else repeats what's on the audio (the captioner), speech recognition is pretty effective.

By the way, in the recent Second Accessibility Summit, Glenda Sims noted that correcting an inaccurate transcript is more difficult than starting from scratch.

What Speech Recognition Is

To understand why speech recognitin isn't improving, you should consider the task it's trying to perform. When human ears listens to language, it hears a stream of separate words and sounds and groups those into words and sentences. The reality is that speech is a continuous sound waves with very subtle acoustic transitions for different sounds (see images below, the bottom ones are the spectograms that phoneticians use). Your ears and brain are doing a lot of processing to help you understand that that person just said.

Two Wave Forms for Two words

Your brain not only breaks up sound waves, it also accounts for the acoustics of different genders, different regional accents,filtering out different types of background noise and it probably includes some "smart guessing" on what a word is as well (which doesn't always work). It's no wonder that replicating the functionailty of the mechanism is taking time.

Ingoring the Linguists

There's one factor that Robert Fortner points to - speech specialists are not always involved. As one IBM researcher claimed "Every time I fire a linguist my system improves"...but apparently there is an upper limit to this without more information. Maybe it's time to start rethinking the problem and if the programming team might need some outside experts.

Book Review: The Wisdom of Crowds (via Clickers?)

| | Comments (0)

In the spirit of continuing to clean my desk, my next book to review is The Wisdom of Crowds by James Surowiecki. I think a lot of people at ETS are familiar with the book, but I think it's worth explaining exactly how the wisdom is generated.

The term "Wisdom of Crowds" seems to suggest a scenario where people make decisions as committees, but that's not what it really is. Rather the "wisdom" comes from being able to tap into the results of multiple individual decisions rather than relying on a single committee or expert.

A classic example is a contest to guess the weight of an ox. Individually, the guesses varied widely, but the average of the guesses was within one pound of the actual weight. It wasn't the case that the group decided the weight of the ox, but rather that the individual guesses added up to the correct answer.

I admit that I've always been a little skeptical of "collaboration" because I often equate with group think, but this kind of collective wisdom still values individual diversity. In fact, Surowiecki argues that you get the best results specifically when you can factor in individual input.

There are a lot of interesting applications to this concept in the book, but I think one of the most important is ensuring that you really ARE getting a diversity of opinion. One reason that anonymous voting is so important is that it does insure you are getting an accurate opinion from individuals and not votes partially based on social pressure.

Another situation this applies to is getting feedback from your students. I think a lot of us have experienced the eerie silence that follows the instructor's request for an answer, not to mention the awkward nods of agreement with slightly puzzled faces. Are the students agreeing with you or just trying to mirror your opinion?

One reason I like the concept of clickers is that it does enable the kind of high volume individual input needed to assess your students' actual thinking. We talk about how it can assess misconceptions (true), but sometimes it can access a wisdom you didn't know was there.

Earlier this week, I was talking about gender stereotypes in language and asking students if they could identify some stereotypes. In more than one case though, I saw some puzzled looks. I began to realize that some of my research may be getting out of date, at least in their circles.

I'm also reminded my personal guideline of multiple tabloid sources. If one tabloid claims a movie star is an alien spy, it's probably a lie. But if two more or tabloids independently have the same is probably true.

New Media Seminar Week Final!: Comics and Design

| | Comments (0)

We ended this seminar with a reading from Scott McCloud's Understanding Comics which was presented in...comic format.

Frames and Time

The topic was "Frames" or how we interpret the passing of time based on the sequence of panels. Perhaps the most interesting observation is that even most panels are still images, very are are actually single moments in time. McCloud points out that if there are 2 or more dialogue balloons in a panel, we have to infer a time/sequence that the characters would convey the dialogue. Other panels may also feature motion lines or other conventions to convey the passage of time in a single image. In other words, comics have to compress reality a bit in the images in order to push the narrative forward.

Design Question: How do we learn this?

I actually read the whole thing many years ago, and I recall thinking "Duh". It's not that McCloud is not accurate, but that these conventions are so well designed that comic readers tend to pick them up unconsciously just be reading them. In other words, there no comic book literacy lessons that readers have to learn beforehand. Most of understand that THWACK! is a sound effect, dialogue takes time and the difference between omniscient narration in boxes at the edges of the panels and character dialogue balloons.

Not even Twitter and Facebook are this easy.

How did this happen? Partly because comics do adapt from other conventions like text. For instance both Western comics (images & dialogue) and Western text are read left-to-right, top-to-bottom. In Japan though, manga comics might be published so that images and text are scanned right to left (they are reversed when they get translated to English).

What I think is more interesting are the new conventions that were introduced with minimal fuss. Illustrators drew in some lines to simulate motion, and readers generally got it. We also figured out dialogue balloons and that the line pointing to a character meant that the character is the speaker. More interestingly, these conventions have been translated across cultures into places like Japan, China and Brazil.

Are comic book artists tapping into hard-wired visual processing algorithms? Or is it just that they understand our cultural visual vocabulary so well? I think you can debate either side, but we can learn a lesson in adaptation here. Comic book illustrators, for the most part, have been able to develop a visual vocabulary that is easily learned. I'm sure there are lots of lessons here if we could expand on this study from creating better diagrams to understanding how to make new interfaces.

The S word - Semiotics

Another interesting point for me is that McCloud is delving into a lot of semiotic theory...without ever once using the word semiotics. He really does an excellent job of explaining the mechanics of delivering the narrative in comic form without ever getting too technical (it wasn't just the images - it was the combination of images and pithy explanatory text that worked). One of the target audiences may be instructors, but it really does work for a comic book reader wanting to learn more about the craft.

This is a great example of making an esoteric topic accessible to general audiences. And that's a skill we all wished were a little more common this semester.