Visualizing mental representations: High-dimensional semantic space

  Roy B. Clariana (rclariana@psu.edu)

Clariana, R.B. (2000). Feedback in computer-assisted learning. NETg University of Limerick Lecture Series.
See http://www.ul.ie/techcomm/NETgLectureSeries.htm

Mental representations can be visualized in several ways.  Collins and Quillian (1972) proposed semantic networks that use nodes to represent concepts and lines between nodes to show the relationships between concepts.  For example, some relationship data from McClelland (1981) that describes two fictitious gangs, the Sharks and Jets, can be displayed as a semantic network (see left panel of Figure 1).  For descriptive purposes, added information about Gangs and about Sharks and Jets not in the original data set was added to fill out the semantic network.  The Shark’s gang member named Dave is a divorced drug pusher.  Semantic networks have been used to describe many aspects of information, such as the hierarchical structure of the information.

 

 

Figure 1.  A semantic network (left) and a MDS (right) of associations for two fictitious gangs (from McClelland, 1981).

 

Graphical displays of psychological space have also been called high-dimensional semantic space (HDSS; Foltz, Kintsch, & Landauer, 1998) and cognitive maps (Diekhoff & Wigginton, 1982).  Several recent computational approaches show similarity in mental representations as distances in psychological space rather than as nodes and links (Diekhoff & Wigginton, 1982; McLeod, Plunkett, & Rolls, 1998).  These approaches provide another way of displaying and of thinking about an individual’s mental representation of information.

A scaling procedure, such as multi-dimensional scaling (MDS), can display relationship data visually in fewer dimensions.  For example, to use MDS to display HDSS, first relationship data is described in a weight matrix (see Table 1).  A 1 in the matrix indicates an association between the column and row instance, while a 0 indicates no association.  Each row (or column) in the table is a high-dimensional vector.  Next MDS can be applied to that matrix.  MDS was conducted with Table 1 data using SPSS 9.0 including the standard default values for MDS except for selecting create Euclidean distances and selecting display group plots (see right panel of Figure 1).

 

Table 1.  Weight matrix of some of McClelland’s (1981) data (with 8 examples of gang members).

 

 

 

Examples

 

 

 

 

 

Gangs

Ages

 

Education

Status

 

Profession

Examples

Alan

1

0

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

1

0

0

 

Art

0

1

0

0

0

0

0

0

1

0

0

0

1

1

0

0

0

1

0

0

1

0

 

Clyde

0

0

1

0

0

0

0

0

1

0

0

0

1

1

0

0

0

1

0

0

0

1

 

Dave

0

0

0

1

0

0

0

0

0

1

0

1

0

0

1

0

0

0

1

0

1

0

 

Don

0

0

0

0

1

0

0

0

0

1

0

1

0

0

0

1

1

0

0

1

0

0

 

Doug

0

0

0

0

0

1

0

0

1

0

0

1

0

0

1

0

0

1

0

0

0

1

 

Earl

0

0

0

0

0

0

1

0

0

1

0

0

1

0

1

0

1

0

0

1

0

0

 

Fred

0

0

0

0

0

0

0

1

1

0

1

0

0

0

1

0

0

1

0

0

1

0

Gangs

Jets

1

1

1

0

0

1

0

1

1

-1

0

0

0

0

0

0

0

0

0

0

0

0

 

Sharks

0

0

0

1

1

0

1

0

-1

1

0

0

0

0

0

0

0

0

0

0

0

0

Ages

20s

0

0

0

0

0

0

0

1

0

0

1

-1

-1

0

0

0

0

0

0

0

0

0

 

30s

1

0

0

1

1

1

0

0

0

0

-1

1

-1

0

0

0

0

0

0

0

0

0

 

40s

0

1

1

0

0

0

1

0

0

0

-1

-1

1

0

0

0

0

0

0

0

0

0

Education

JH

1

1

1

0

0

0

0

0

0

0

0

0

0

1

-1

-1

0

0

0

0

0

0

 

HS

0

0

0

1

0

1

1

1

0

0

0

0

0

-1

1

-1

0

0

0

0

0

0

 

Col

0

0

0

0

1

0

0

0

0

0

0

0

0

-1

-1

1

0

0

0

0

0

0

Status

Married

1

0

0

0

1

0

1

0

0

0

0

0

0

0

0

0

1

-1

-1

0

0

0

 

Single

0

1

1

0

0

1

0

1

0

0

0

0

0

0

0

0

-1

1

-1

0

0

0

 

Divorced

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

-1

-1

1

0

0

0

Profession

Burglar

1

0

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

1

-1

-1

 

Pusher

0

1

0

1

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

-1

1

-1

 

Bookie

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

-1

-1

1

 

First, note that Art is nearer Clyde than Alan (see right panel of Figure 1) showing that the Art and Clyde (with four overlaps) have more similar weight vectors (see Tables 1 & 2) than Art and Alan (with only two overlaps).  Simple correlations of vector elements from Table 2 for Art, Clyde, and Alan provide the same information: Art vs. Clyde, r = .54; Art vs. Alan, r = .08; and Clyde vs. Alan, r = .08.

 

Table 2.  Vector elements for Alan, Art, and Clyde.

Alan     [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0]

Art        [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0]

Clyde   [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1]

 

Next, Art is located near single, bookie, and Jets, as well as Junior High School, and 40-ish.  Next, note that in this MDS representation (see Figure 1), Dave is nearest divorced, pusher, and Sharks.  Category data from the original data set, such as age 20-ish and High School graduate, are shown as instances along with gang-member instances.  Data information that all Sharks share, such as hanging on the north-side and wearing blue, stack precisely on top of Sharks in the MDS, since Sharks, blue, and north-side all have identical weight vectors. In this sparse MDS with only 8 gang members, instances of gang members and characteristics of the instances are all mixed together.  Note how the MDS changes when more instances of gang members are included in the matrix (in this case 27 instances, see resulting MDS in Figure 2).

 

 

  Figure 2.  MDS of McClelland (1981) data including 27 gang members.  Top and bottom panels are identical, except the bottom panel has been stretched to display details.

 

The gang member instances grouped together in a cluster or cloud because of the relative similarity of their weight vectors while the vectors that characterize the instances, such as pusher, bookie, and burglar, move away on the dimensional scales (see upper panel of Figure 2).  The concept of gang member is represented in this HDSS as a cloud or cluster of instances of gang members, even though there is no weight vector called gang member in the data set to cause this grouping.  The cloud of instances is highly structured.  Similar gang members will be nearer each other because their vectors are similar.  For example, why are Doug and Nick near each other (see center of the bottom panel of Figure 2)?  At first glance, they seem dissimilar since Doug is a Jet pusher and Nick is a Shark bookie.  However, since Doug and Nick are both 30-ish, high school graduates, and single, they actually have a lot in common.

 

HDSS as Structural Knowledge

The association between instances in HDSS looks like structural knowledge, which is the knowledge of the inter-relationship of knowledge elements.  It has been suggested that developing structural knowledge of a content area is necessary in order to flexibly use that knowledge-base (Jonassen & Wang, 1992).  This MDS approach visually depicts structural knowledge (Diekhoff & Wigginton, 1982) and demonstrates that hierarchical structural knowledge can emerge in HDSS when enough cases or examples (vectors) are included in the weight matrix.  This visual approach would further suggest that if you directly teach structural knowledge relationships before multiple instances are established, then the structural knowledge associations would simply be another vector like gang member or pusher or Alan.  Thus, teaching structural knowledge relationships too soon would result in relatively weaker and even possibly meaningless one-vector effects on HDSS rather than regional (cloud) multi-vector effects that would be quite meaningful to the learner.

If structural knowledge emerges in actual mental representations in a similar way to this MDS model, then CAL should present many cases or examples per coherent knowledge base (in this example, the cluster did not emerge until at least 22 gang members were included in the MDS matrix).  Probably human neural networks are more capable of deriving structural knowledge than this MDS approach, suggesting that fewer cases would be required for the emergence of structural knowledge.  Nevertheless, learners can only interact with existing instances in order to generate structural knowledge (Jonassen & Cole, 1993).  Given the likely importance of structural knowledge, it is reasonable in CAL to include many cases and examples and then require the learner to interact with the cases.

Note that HDSS is a model of association weights, not of activation weights.  This is an important distinction.  The entire weight matrix is probably never activated at the same time, only parts are activated.  An input pattern, such as a question, interacts with the entire weight matrix to produce a network of activation that is a subset of HDSS, and the HDSS response to the question would be one instance or several instances within that subset.  For example, if the learner were asked to list all Jets who are burglars, first Jets and burglar would activate (see Figure 2).  Note that Jets and burglar, besides being instances in HDSS, are also an element of every other vector in the matrix.  Any other instance in the matrix that has a 1 in the Jets and/or burglar element of its vector also activates to some degree.  This process has been referred to as spreading activation and will include nearest neighbors in HDSS, such as George, Lance, John, Jim, and possibly Alan (also a correct response) and Ken (an incorrect response).  This is a likely subset of HDSS that is activated by the question.

Next, this subset is reduced to one or just a few instances.  Kintsch (1994) has described a process of how the activation level of only one or a few instances in a weight matrix increases as the question vector cycles upon the weight matrix while activations of all of the other instances decrease (relatively).  A small number of instances with the greatest activation levels when the pattern stabilizes, such as George, John, and Lance, are the HDSS response to the question.

In addition, a vector with a one-element association is established between the question pattern “list all Jets who are burglars” and the response pattern “George, John, and Lance.”  If the same question is asked later, the response will be considerably quicker because it involves vectors with fewer elements, which will cycle faster.  This process could be called rote responding.  It is also referred to as the familiarity-based process of recognition, which stabilizes in about 200ms (Rotello & Heit, 1999), compared to the slower (800ms to 2000ms) recall-like or recollection process that involves many more vectors representing more qualitative information and requiring more cycles for the pattern to stabilize (Dobbins, Kroll, Yonelinas, & Liu, 2000).  If the system doesn’t immediate identify an input pattern, then it resorts to the slower recollection process.  It is not surprising that our neural system would act this way, since there is obviously a powerful survival benefit in shortening the response time to danger.

This vector-based approach is easy to apply and understand and is highly explanatory.  However, it oversimplifies how a question is handled by an individual’s mental representation.  First, it reduces weight matrix associations between instances to either a 1, 0, or –1; and there is possibly a gradation in associations.  Second, the model assumes a relatively naïve HDSS since it tends to disregard preexisting association data.  Regarding gradation in association, many neural-systems rely on sigmoid activation functions which drive activation towards 1, 0, or –1 (McLeod, Plunkett, & Rolls, 1998); therefore, this simplification may not be a problem.  Regarding the naiveté of the HDSS representation, a relatively uncluttered area of HDSS can be established for new content by simply adding vectors of unique instances, such as context characteristics.  For example, the more unique context vectors that are included, the further the entire content representation is driven away from preexisting content instances and thus into clear HDSS space (new dimensional space).  This easily accounts for context differences in recall memory, such as a list learned underwater is not as well remembered on land and vice versa (Godden & Baddeley, 1975, 1980).  The underwater context variables, which are many, compartmentalize the list words together within a generally less accessible area of HDSS when recalling the list on land.  Reestablishing context variables, such as wearing a mask and breathing through a mouthpiece while on land, should tend to bring up the underwater list.

In addition, instances and associations (the elements in an instance) in HDSS may have idiosyncratic activations.  For example, if Ken was just previously mentioned, the residual activation from that thought would add to the weak activation that results from the “Jets who are burglars” question.  Thus, Ken would become active enough to emerge as a correct answer, though Ken is not a Jet.  Also, specific instances in HDSS may have higher baseline activation than other instances, probably due to individual familiarity with those instances.  Idiosyncratic activations would account for idiosyncratic responses to text, questions, and feedback.
References

Collins, A., & Quillian, M.  (1972).  Experiments on semantic memory and language comprehension.  In L. Gregg, ed., Cognition in learning and memory.  New York: Wiley.

Diekhoff, G.  M., & Wigginton, P. (1982).  Using multidimensional scaling-produced cognitive maps to facilitate the communication of structural knowledge.  A paper presented at the Annual Meeting of the Southwestern Psychological Association. (ERIC Document Reproduction Service No.  ED 218 245)

Dobbins, I. G., Kroll, N. E. A., Yonelinas, A. P., & Liu, Q.  (2000).  Distinctiveness in recognition and free recall: The role of recollection in the rejection of the familiar.  Journal of Memory and Language, 38, 381-400.

Foltz, P.  W., Kintsch, W., & Landauer, T.  K. (1998).  The measurement of textual coherence with latent semantic analysis.  Discourse Processes, 25, 285-307.

Godden, D. R., & Baddeley, A. D.  (1975).  Context-dependent memory in two natural environments: On land and underwater.  British Journal of Psychology, 66, 325-331.

Godden, D. R., & Baddeley, A. D.  (1980).  When does context influence recognition memory.  British Journal of Psychology, 71, 99-104.

Jonassen, D.  H. (1985).  Generative learning versus mathemagenic control of text processing.  In D.  H.  Jonassen, ed.  The Technology of text : Principles for structuring, designing, and displaying text, 9-45.  Englewood Cliffs, NJ: Educational Technology.

Jonassen, D.  H., & Cole, P. (1993).  Learner-generated vs.  instructor-provided analysis of semantic relationships.  In Proceedings of Selected Research and Development Presentations at the 15th Annual Convention of the Association for Educational Communications and Technology. (ERIC Document Reproduction Service No.  ED 362 170)

Jonassen, D.  H., & Wang, S. (1992).  Acquiring structural knowledge from semantically structured hypertext.  In Proceedings of Selected Research and Development Presentations at the 14th Annual Convention of the Association for Educational Communications and Technology. (ERIC Document Reproduction Service No.  ED 348 000)

Kintsch, W. (2000).  Metaphor comprehension: A computational theory.  Psychonomic Bulletin and Review, in press.

Kintsch, W. (1994).  Discourse processing.  In d’Ydewalle, Eelen, & Bertelson, eds.  International perspectives on psychological science: Volume 2: The state of the art, 135-155.  Hove, UK: Lawrence Erlbaum Associates.

McClelland, J.  L. (1981).  Retrieving general and specific information from stored knowledge of specifics.  In Proceedings of the Third Annual Meeting of the Cognitive Science Society, 170-172.

McLeod, P., Plunkett, K., & Rolls, E.  T. (1998).  Introduction to connectionist modeling of cognitive processes.  Oxford, UK: Oxford University Press.

McNamara, D., Kintsch, E., Butler, S.  N., & Kintsch, W. (1996).  Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text.  Cognition and Instruction, 14, 1-43.

Rotello, C. M., & Heit, E.  (1999).  Two-process models of recognition memory: Evidence for recall-to-reject.  Journal of Memory and Language, 40, 432-453.