painaa text summary utility (Pathfinder analysis of inferential aggregated associations)

I named this algorithm painaa both for its Pathfinder roots and also because it means "weigh" and "print" in Finnish (I think), and that is pretty much what it does.

The utility converts sentences in a written text into a co-occurrence vectors (see Figure 1). If the term occurs in the sentence, then a 1 is entered into the vector cell. For example, sentence "A" contains the key terms cat, dog, and pet, so the the sentence A co-occurrence vector contains three 1's.

 
 
Term co-occurrence vectors
Three sentences in an essay     cat dog car pet truck
A. contains the key terms "cat" "dog" and "pet"

A 1 1 0 1 0
B. contains the key terms "dog" and "truck" B 0 1 0 0 1
C. contains the key terms "car" and "truck" C 0 0 1 0 1

Figure 1. Three sentences and the resulting 3 term co-occurrence vectors.

Next the utility converts each co-occurrence vector into its resulting propositions (see Figure 2). For example, sentence A contains cat and dog which forms the proposition cat-dog, cat and pet which forms the proposition cat-pet, and dog and pet which forms the proposition dog-pet (see the left panel of Figure 2).

  A propositions   B propositions   C propositions
  cat dog car pet truck   cat dog car pet truck   cat dog car pet truck
cat -           -           -        
dog 1 -           -           -      
car     -    

 

    -           -    
pet 1 1   -           -           -  
truck         -     1     -       1   -

Figure 2. Propositions for each co-occurrence vector.

Then my utility aggregates the co-occurrence vectors into a lower triangle association array by simply adding the propositions across all co-occurrence vectors (see left panel Figure 3). This lower triangle can be converted into a Pathfinder proximity file (filename.prx) and analyzed using Pathfinder KNOT software (see right panel Figure 3). However, Pathfinder analysis is already built into my utility (with q = n - 1 and Minkowski's r = infinity), and so the scores that you see and print out are the participant's common scores (e.g., KNOT Cmn scores), the intersection of the participant's essay PFNet with the expert's referent essay PFNet.

  cat dog car pet truck

   
cat -        
dog 1 -      
car 0 0 -    
pet 1 1 0 -  
truck 0 1 1 0 -

Figure 3. The lower triangle array for the propositions shown in Figure 2 and it's PFNet.

I only have data from one meager study so far (16 essays). The content of the essays was the structure and function of the human heart and circulatory system. The painaa utility was as good as the human raters and was about equivalent to latent semantic analysis (see LSA).

Here is a list of experiment studies so far:
CMC_2004.doc

CMC_2006.htm

updated April 22, 2004