Recently in Multimedia Category

Understanding Speech Recognition

| | Comments (0)

One of the dreams of solving the captioning backlog is to rely on speech recognition. I do have to say that speech recognition is far more effective at time than I would have dreamed, but still my intuition has told me it's not entirely working. A fascinating article from Robert Fortner on "The Unrecognized Death of Speech Recognition", essentially backs up the intuition with some hard numbers. He notes that accuracy has not improved much since the early 2000s and that in most cases, the rate is not within human tolerance (humans apparently have about a 2% error rate and even that can lead to some pretty ridiculous arguments).

When Speech Recognition Works

Speech recognition can be effective in two situations

  1. Specific context (airport kiosk, limited menu commands) - even here though it should be noted that it's pretty darn easy to frustrate the average health insurance voice recognition system so that they give up.
  2. Specific speaker - Speech recognition is effective when trainied on a single voice, and the training time is shorter than it used to be. For captioning purposes, this means that if a single speaker makes the original audio (e.g. faculty lecture) or someone else repeats what's on the audio (the captioner), speech recognition is pretty effective.

By the way, in the recent Second Accessibility Summit, Glenda Sims noted that correcting an inaccurate transcript is more difficult than starting from scratch.

What Speech Recognition Is

To understand why speech recognitin isn't improving, you should consider the task it's trying to perform. When human ears listens to language, it hears a stream of separate words and sounds and groups those into words and sentences. The reality is that speech is a continuous sound waves with very subtle acoustic transitions for different sounds (see images below, the bottom ones are the spectograms that phoneticians use). Your ears and brain are doing a lot of processing to help you understand that that person just said.

Two Wave Forms for Two words

Your brain not only breaks up sound waves, it also accounts for the acoustics of different genders, different regional accents,filtering out different types of background noise and it probably includes some "smart guessing" on what a word is as well (which doesn't always work). It's no wonder that replicating the functionailty of the mechanism is taking time.

Ingoring the Linguists

There's one factor that Robert Fortner points to - speech specialists are not always involved. As one IBM researcher claimed "Every time I fire a linguist my system improves"...but apparently there is an upper limit to this without more information. Maybe it's time to start rethinking the problem and if the programming team might need some outside experts.

Emboirdery Wins at NMC!

| | Comments (1)

If you haven't seen my Twitter or Facebook post...I am very excited to announce that one of my embroidery designs won an award at the 1st NMC Science Art show. I will also take this opportunity to show off my pictures:

I always find it amazing that women who "don't do math" can do this (and knitting, quilting...).

Pie Chart Accessibility

| | Comments (0)

Like pie charts but worry about screen reader accessibility? There is a simple workaround - provide both the chart and the numeric in text form. The latest survey results of screen reader usage from Web AIM provides a great example of chart and graph accessibility.

While a pie chart (embedded in an image) is provided for each result, the table with the data is shown immediately below. The ALT tag used simply indicates the chart being shown (e.g. alt="Chart showing mobile screen readers used"), but the actual numbers can be seen in the table beneath the image. Not only does this method avoid adding a lengthy image description, but provides numeric data to low-vision users and simplifies the presentation of the pie chart (no tiny numbers needed in the image).

Another good lesson from these pie charts is that they are usable for users with color deficient vision because they rely on values of lightness and darkness. They are still colorful, but each slice color is differentiated by lightness/darkness not color (i.e. hue).

Some Key Accessibility and Captioning Resources

| | Comments (0)

I was going to make a list, but I realized I could point you to some pre-existing lists I have already created at

If you are especially concerned about captions, you may also want to see

MidAtlantic Educause Report

| | Comments (0)

Last week, I got a chance to attend part of the Mid Atlantic Educause session which happens every year in January. This year, it was in Baltimore and was close enough that there were multiple attendees from Penn State from the different campuses. Between the lunch and the breaks, I had a good chance to compare notes with nearby institutions.

I went to several sessions, but the ones that got the most out of was a demo of augmented reality (AR) and the Policy overview session.

Policy Overview

One of the functions of the Educause consortium is to be a voice for the educational community in Washington, so they have an office in DC to monitor legislative and regulatory issues. The session was headed by Senior Policy Director Steven Worona and he covered issues that a lot of us were concerned about including copyright, student privacy, accessibility and security. I was also able to sit with him at lunch for a Birds of a Feather, and he gave some very helpful insight to different issues. If you are attending Educause, this is a session I would definitely recommend.

Augmented Reality

Augmented Reality is a buzzword I have been hearing, but not one I had seen in practice yet. There are lots of variations including iPad tours of museums (similar to enhanced audio tours in the 80s) and 3D pop-up books using the special 3D classes. The speaker, Jonathan Cabiria, predicted that we may all be walking the streets with special glasses to take advantage of enhanced information from different vendors and locations (I think I saw this in a movie).

One of the more interesting demos are programs which connect paper objects with your Web cam allowing the computer to "think" that your piece of paper does something. There is a demo of this from Olympus where you can print a paper camera, then hold it up in front of a camera and click the "buttons" on it. I tried this in my cubicle, but I do have to report that I couldn't get the web cam to recognize my camera. However, it's still strange to see a live picture of yourself holding a paper camera on the Web.

I do think AR and 3D are about to take off, and here's my anecdote to explain why - Over the break I went to see Tron in 3D so I could experience the cool graphics. Ever since Avatar came out, I've gotten used to going to the theater, getting the 3D glasses then handing them back at the end of the show. This time though, the note asked us not to steal the glasses for use with home theaters. Apparently, some people out there are finding enough 3D experiences that they are stealing glasses.

What was once an experience confined to an engineering lab 10 years ago or only a few movies, is becoming commonplace.

New Media Seminar Week Final!: Comics and Design

| | Comments (0)

We ended this seminar with a reading from Scott McCloud's Understanding Comics which was presented in...comic format.

Frames and Time

The topic was "Frames" or how we interpret the passing of time based on the sequence of panels. Perhaps the most interesting observation is that even most panels are still images, very are are actually single moments in time. McCloud points out that if there are 2 or more dialogue balloons in a panel, we have to infer a time/sequence that the characters would convey the dialogue. Other panels may also feature motion lines or other conventions to convey the passage of time in a single image. In other words, comics have to compress reality a bit in the images in order to push the narrative forward.

Design Question: How do we learn this?

I actually read the whole thing many years ago, and I recall thinking "Duh". It's not that McCloud is not accurate, but that these conventions are so well designed that comic readers tend to pick them up unconsciously just be reading them. In other words, there no comic book literacy lessons that readers have to learn beforehand. Most of understand that THWACK! is a sound effect, dialogue takes time and the difference between omniscient narration in boxes at the edges of the panels and character dialogue balloons.

Not even Twitter and Facebook are this easy.

How did this happen? Partly because comics do adapt from other conventions like text. For instance both Western comics (images & dialogue) and Western text are read left-to-right, top-to-bottom. In Japan though, manga comics might be published so that images and text are scanned right to left (they are reversed when they get translated to English).

What I think is more interesting are the new conventions that were introduced with minimal fuss. Illustrators drew in some lines to simulate motion, and readers generally got it. We also figured out dialogue balloons and that the line pointing to a character meant that the character is the speaker. More interestingly, these conventions have been translated across cultures into places like Japan, China and Brazil.

Are comic book artists tapping into hard-wired visual processing algorithms? Or is it just that they understand our cultural visual vocabulary so well? I think you can debate either side, but we can learn a lesson in adaptation here. Comic book illustrators, for the most part, have been able to develop a visual vocabulary that is easily learned. I'm sure there are lots of lessons here if we could expand on this study from creating better diagrams to understanding how to make new interfaces.

The S word - Semiotics

Another interesting point for me is that McCloud is delving into a lot of semiotic theory...without ever once using the word semiotics. He really does an excellent job of explaining the mechanics of delivering the narrative in comic form without ever getting too technical (it wasn't just the images - it was the combination of images and pithy explanatory text that worked). One of the target audiences may be instructors, but it really does work for a comic book reader wanting to learn more about the craft.

This is a great example of making an esoteric topic accessible to general audiences. And that's a skill we all wished were a little more common this semester.

Tailgate: Considering Un-Narrated Media

| | Comments (0)

It's Monday after an excellent Media Commons Tailgate, and it's time to contemplate any insights I've had.

Where we are in the Story

My first thought is that the Media Commons staff and faculty have come a long way in terms of understanding building media assignments. There are lots of great examples of video projects in the classrooms, and good resources for free media and thoughtful ways to use and expand learning spaces.

Another high note was the keynote from Chris Long which did justice to the notion how a new media evolves. I particularly liked Plato's term pharmakon (φάρμακον) because it has a host of rich meanings implying danger as well as opportunity. Our job is to present the opportunities, but it's always good to watch for the dangers.

Narrated vs. Un-Narrated Media

So on that note...I am wondering if we are focusing so much on developing narration, we are forgetting the possibilities of media without a narration. It is a good educational practice that students are exposed to the process of creating a video short like a PSA or a documentary, because I think many of them will be asked to think of transmitting ideas in various media. But the truth is that this type of assignment doesn't match with every learning objective.

There are times when the objective is to collect and analyze loosely related artifacts and maybe (or maybe not) construct a analytic narrative around them. Consider the anatomical movement analysis project done by Renee Borromeo's kinesiology class. This assignment, unlike others requires only that students shoot a video of a movement in a certain way.

I think there are immediate applications to disciplines not at the Tailgate such as the sciences, but there are just as many applications even in the social sciences and history. Consider a historical problem such as the identity of Jack the Ripper. An exercise like this requires a student to study contemporary news articles, police reports, photos and forensics in light of what was then known and what we can interpolate now. Based on this a student may be able to build a partial narrative, but it's unlikely one will be fully developed, unless it truly becomes a piece of fiction.

This weekend, I was reminded that transmedia is a perfect tool for this kind of analytic assignment. Instead of constructing a narrative beforehand, we can use it to compile our artifacts and represent it in ways that could help a student understand the context. The timeline approach to the collapse of Arthur Anderson is a perfect example of that.

I think the use of video and transmedia to construct a new way of telling a story is exciting, but I believe I am intrigued by the possibility it can allow students to really understand what "narrative" means.

New Media Seminar Week 7: What Did I Learn This Week?

| | Comments (0)

We had an interesting week here in the Penn State New Media Seminar Week about our satisfaction with the program. My own personal take is that even though I have some questions about some aspects, the experience has had enough positives for it to be worthwhile.

This week's reading was Will There Be Condominums in Data Space? by video artist Bill Viola (whose material is on YouTube). As usual, it sparked a lot of interesting discussion, but the reading itself left me somewhat detached.

I'd love to summarize the reading itself, but I truly found it hard to follow. At some point he discusses the contrast between communicating with Japanese modern technology and communicating with the dead in an ancient Japanese ritual. Later he discusses memory space and time comparing Western realism (capture a moment exactly) vs Oriental (East Asian) art which captures an essence which remains eternal time. At some point "notation systems" entered the picture and then non-linear presentation.

Maybe the most interesting quote from Viola was

When I edited a tape with the computer, for the first time in my life I saw that my video piece had a "score", a pattern, a structure that could be written on paper.

As my colleague Dave Stong pointed out, this apparently "blew his mind." And I do understand Viola's point that technology's ability capture moments in time as they happen allows us to understand our actions better. When I look at my blog entries from several years ago, I do notice how some patterns have changed and others remained the same. Sometimes I revisit things that I have forgotten.

This is something that has been available to anyone able to record something at the moment (e.g. diarists, annual chronicles/survey), but this must have been new to Viola, and perhaps to many of us as well. This point alone is worth investigating.

So why am I so detached from this article and so many others? I'm sure it's mostly due to desire for a more analytic, scientific paper. It's been suggested that I should be more patient and give the paper more time and effort. Perhaps.

But a fascinating thing I noticed is that we all pulled different ideas from the paper based on our past experiences and perceptions. Some felt Viola was heading towards the type of interactivity found in video games, and others felt he was talking about perception of time. A few thought some of us were in outer space (and they had a point because I knew I was wildly extrapolating).

If this paper were "art", this would be a very good reaction. But should the the theory of New Media be "art" or should it be more like traditional science? What I mean is that in the sciences, there is a genuine attempt to build a common sense of understanding through a common methodologies. There are plenty of debates about both results and methodologies, but the results seem much more accessible to me.

But I don't think any of the readings I have completed have been that. Viola's paper seems to be a stream of consciousness with very academic terminology. Yes Viola touches on some interesting ideas, but his anecdotes seem very random and not systematically compared.

Yes the Japanese have rites to communicate with the dead, but so does our culture. Why assume otherwise? The answer is that people tend to have a blind spot for their own culture - which can be dangerous. And why bring it up? Is it to mention the contrast between cultures? Between communication media? OK, but what's the next step? If the issue is managing new tech, would it make sense to see what happened before and see how it was resolved?

I think that's where a lot of frustration is coming from with the readings, at least with me. There seems to be a lot of "vision", but almost a vision without a context. We can tell from the papers that they didn't always predict the future, but I think I knew that as well.

One challenge that has come up is to suggest what we would prefer. I expect not everyone will agree, but here is some of my list.

None of this means, I will be dropping out of the experience, but I will likely remain skeptical.

New Media Seminar Week 6: "The Medium is the Message"

| | Comments (0)

This semester I have been participating in a seminar on new media, and we did an overview of the meme "The medium is the message" originated by Marshall McLuhan. I should admit up front that I only had time to read the Wikipedia Cliff notes version, but it touches on issues that come up in linguistics and related subjects.

What is "Medium" and "Message"?

One issue is what is a "medium" for McLuhan and what is a "message"? Even those who read the article weren't too clear on it. Mark Federman from the McLuhan Program in Culture and Technology argues that the general interpretation of a communication channel being more important than the message is not correct.

Instead, Federman argues that McLuhan meant the following:

Marshall McLuhan was concerned with the observation that we tend to focus on the obvious. In doing so, we largely miss the structural changes in our affairs that are introduced subtly, or over long periods of time. Whenever we create a new innovation - be it an invention or a new idea - many of its properties are fairly obvious to us. We generally know what it will nominally do, or at least what it is intended to do, and what it might replace. We often know what its advantages and disadvantages might be. But it is also often the case that, after a long period of time and experience with the new innovation, we look backward and realize that there were some effects of which we were entirely unaware at the outset. We sometimes call these effects "unintended consequences," although "unanticipated consequences" might be a more accurate description.

Many of the unanticipated consequences stem from the fact that there are conditions in our society and culture that we just don't take into consideration in our planning. These range from cultural or religious issues and historical precedents, through interplay with existing conditions, to the secondary or tertiary effects in a cascade of interactions. All of these dynamic processes that are entirely non-obvious comprise our ground or context. They all work silently to influence the way in which we interact with one another, and with our society at large. In a word (or four), ground comprises everything we don't notice.

I can't argue with that, but how "profound" is this? This is something almost any historian or anthropologist could tell you. McLuhan was admittedly writing in the mid 60s, but I suspect that "unanticipated consequences" was something already known in this historical field.

And what is a "message"? According to Federman (quoting McLuhan), a message is "the change of scale or pace or pattern" while a medium is any extension of ourselves. Federman indicates that this includes tools such as a hammer, a wheel and so forth as well as "language" (on a side note, many linguists would say that "language" is more like an expressive act singing and dancing rather than a "tool").

At this point, I am going to translate as follows:

McLuhan: "The medium is the message."

Pyatt: "A tool [medium] sets up an unconscious pattern of behavior [message]" and and thus "A new tool triggers a change, usually unanticipated" or even "Societies unconsciously adjust around the affordances of a tool."

FYI - affordance means the functions enabled by a tool.

Why this Statement?


My question is this - Is McLuhan trying to deliberately trying confuse us?. My colleague Dave Stong was alluding to this in our discussion, and maybe he was on to something. The "message is the medium" statement is much more provocative than the one I made...and potentially more misunderstood.

Do you want to know where our discussion returned to many times? The impact of a communication channel (e.g. text, TV, radio...) on the content of what the person was trying to say. But apparently that may not be what McLuhan meant at all. This is, shall we say, disappointing.


OK - I really don't think McLuhan was trying to pull a fast one on us. First, I wonder if he was borrowing from semiotic theory where a major theme is how the composition of a "message" or "transmission" (art, text, TV show..) affects how it is interpreted by the receiver. This is something I have always found interesting, yet I have never been satisfied with that theory (see later discussion). I also wonder if he was "thinking aloud" somewhat for our benefit. I doubt he meant to be confusing, but he sure did an excellent job of it.

No matter what you may think of semiotics though, I think almost everyone would agree that many presentations carry both a conscious message as well as "subtext". For instance, designers of book covers and album covers will spend a LOT of time on font selection, text placement, art placement/selection and so forth in order to "stand out" as well as appeal to a particular demographic. Again I am not sure how profound this revelation is. I am much more interested in the mechanics of this.

Ultimate Dissatisfaction

I think my ultimate dissatisfaction with McLuhan and others is that they are saying something basic in a very confusing way. Hasn't everyone heard of a case of an unintended consequence of a new invention? I know the impact of the cotton gin on accelerating slavery in the South. I doubt that was Eli Whitney's intention. Similarly, does any designer not know that changing the presentation affect how the message is perceived? I am not sure that McLuhan is adding to this (actually R.K. Logan argues that this concept was new to communication studies, which makes me feel that we need to emphasize a broad liberal education.)

What I would like is a more systematic investigation of how these "messages" and "media" interact (both McLuhan's kind and the conventional kind). We all speak of metaphors of music carrying meaning, but how does it differ from language really? And how do you know which font to use. I'm sure our designer friends could tell us intuitively, but it's nice to supplement with usability studies (even if you want to dispute some findings).

I know I am speaking somewhat as a science geek here, but I do wonder when we will get past the obvious statements dressed in ambiguously defined language.

I genuinely think these are interesting topics to explore.

Post Script: Alternate "Message"?

Still thinking about "The medium is the message", is McLuhan's actual meaning (i.e. message) that we need to pay attention to the medium (as in "The economy is the message...dummy" to paraphrase James Carville.) I like this interpretation since it makes McLuhan's thesis much more comprehensible. But I could still be off base.

Post Script 2:

In an ironic example of how medium affects the message - I started watching some video archival footage of Marshal McLuhan (per Brian Young's suggestion) from various TV shows. It's a lot less technical and a lot more accessible. There's even a record of him analyzing the Nixon vs Kennedy debates where Nixon won on the radio, and Kennedy on TV. I will agree that this concept was very revolutionary at the time. Another good example was his discussion of how most political debates are rigged to be boring.

What I still find interesting about the seminar is none of us yesterday realized that McLuhan discussed presidential debates or even invented the term "global village"? Why? What's the communication gap? If we had known, I think we would have had a very different reaction to the reading.

New Media Seminar Week 5: Behold the Personal Computer

| | Comments (0)

The good news is that I finally made it to a New Media discussion in ETS. This week's discussion focused on the paper Personal Dynamic Media, a paper by Alan Kay and Adele Goldbergdating to the 1970s. This is a paper which documents a mid-1970s project on children's computing. At this point we see a personal computer with a mouse, monitor and keyboard and "Smalltalk" programs built for drawing, musical play and text editing (with font changes).

Where we were

This all seems very standard now, but I wouldn't see a mouse or change a font until 1986 on the Mac. Believe me, it was hot stuff when I was a college freshman. Personal (stand alone) computers did arrive by the 80s, but they were limited to green screen or black and white text monitors (with a third "amber" option coming in the late 80s). You could do some graphics, but it helped to know a programming language and inserting a Spanish ñ required entering a numeric control code (fixed on the Mac).

Before that though, computers were connected to main frames, and possibly operated via punch cards. You needed specialized training to run these beasts and a high tolerance for memorizing obscure command sequences.

In this paper though, Kay and Goldberg were trying to demonstrate that computers could move beyond executing calculations to a tool where digital data could be manipulated in a wide set of applications. In fact, the data didn't have to be presented as numbers - it could be presented as an audio signal or a set of pixels. And this is what we love about computers. Some of us still are manipulating numbers but many of us are manipulating text, images, and video. We may even be manipulating maps, bibliographies or knit designs. And we use computers to talk to each other, which not even Kay and Goldberg realized would happen. It's an amazingly wide range of functions for a machine that processes ones and zeros.

How Much "Programming"?

A lot of use computer applications, but relatively few of us really contemplate the calculations behind the scenes. Again, this is not what computer engineers had in mind, but what are the implications?

Intuitively, we all realize that being able to hand build a computer is NOT a prerequiste for sending e-mail or checking in on Facebook. And yet, a lot of us in ETS are better able to maximize our technology experience BECAUSE we understand some underlying aspect of the technology. It's definitely the case that knowing "code" gives you more ability to manipulate your data. On the other hand, code requires you to understand the obscure commands computer civilians detested.

This paradox is important to instructional designers because our customers (faculty and students) want more power over their digital environment, usually in the form of a tool that does "something" that isn't available now. But sometimes getting that function requires specialized knowledge to deliver. That's why programmer and other technology specialists are still in business.

In an ideal world, everyone would have the ability to create the tools they need...but we aren't there yet and may well never be. The computer has given us a lot of opportunities we never had before, but there are still more opportunities for those who know code. It could be a good skill to know, kind of like math, public speaking or understanding the Constitution.

And that paradox has also led to a dilemma for me as an instructional designer. One school of thought asks us to focus more on pedagogy and project management and less on the "guts" of technology. It does seem impossible that we could ever keep in touch with all the new developments.

Yet, I have always been afraid of moving too far away from the technology. I don't think I could ever know everything, but do I want to understand the backbone of where we're going? Absolutely, because how could manage a technology you don't understand? It leads to some of the flaky decisions that low level techies would never make.

Why these readings?

A comment I've seen in other blogs about this course is what the "point" of the articles are. Sometimes they're so archaic, they seem laughable (this article assumes that a 72dpi monitor is "high resolution" like newsprint).

But I confess that I don't think the articles aren't all that enlightening by themselves. I'm getting a lot more out of them by discussing different issues with the reading group. We have the techonology, but we are still evolving our culture around them in ways we haven't predicted. Sometimes it's nice to go back to where it all began.