APLNG
Statistical Analysis of Qualitative
and Corpus Data
Spring 2008
General
Information
Instructor:
Xiaofei Lu & Robert Schrauf
Mailbox:
305
Office:
301 / 207
Phone:
865-4692 / 865-9622
Email:
xxl13 / rws23 @ psu.edu
Webpage:
All additional course information posted in ANGEL
Lectures:
Monday & Wednesday,
2:30pm-3:45pm, 009 Sparks
Office
hours: By appointment
Required
Textbook
Oakes, M. P. (1998). Statistics for Corpus
Linguistics.
Course
Description
Qualitative
data and corpora include transcripts of interviews, narratives, conversations,
and print materials, and working with such data requires coding and
interpreting these texts. This course is designed to equip the student with the
basic statistical skills necessary for testing theories and drawing conclusions
from textual data and for designing visual presentations of that data.
Course
Outline
A. Introduction: the qual/quant continuum; mixed methods research; using
statistics to analyze qualitative data.
B. Transcripts as “Corpora”:
the
concept of corpus, criteria for building good corpora, and issues in treating
transcripts, including interviews, narratives, conversations, and published
texts, as corpora.
C. Kinds of Analysis and Associated Software: introduction to software that
are useful for statistical analysis of qualitative and corpus data.
a. For
linguistic analysis at the levels of word, phrase, collocation,
sentence, paragraph, document, and genre:
i.
AntConc/ WordSmith: examines how words, word-clusters, or phrases behave in texts, such
as their frequencies, contexts of occurrence, associations with other words,
and keyness in texts.
ii.
Coh-Metrix: produces indices of the
linguistic and discourse representations of a text. These values can be used in
different ways to investigate the cohesion of the explicit text and the
coherence of the mental representation of the text.
b. For
computerized coding and analysis of transcripts in behavioral science:
i.
Review of available
software
ii.
Linguistic Inquiry and Word Count (LIWC): examines standard linguistic
items (nouns, pronouns, articles), psychological processes (emotions, agency),
relativity (temporal relations), personal concerns (e.g. school, religion,
sexuality)
c. For
human coding of qualitative data in the social sciences:
i.
Atlas.ti / NVivo
/ Ethnograph: software facilitating the coding of transcripts,
developing code families, testing relationships, etc (used especially with
Grounded Theory approaches).
ii.
Traditional paper-and-pencil (and highlighters and post-its and colored files,
etc) approaches.
D. Statistics in Data
Collection and Preparation: In the data collection/data preparation stage,
several important statistical issues arise:
a.
Sample-size (i.e. determination of how many interviews, how many ‘texts’ are
necessary for generalizability.
b.
Data matrices for data preparation, including respondent-by-item matrices,
item-by-item matrices, respondent-by-respondent matrices, and unit-by-theme
matrices
c.
Intercoder reliability (i.e. methods for assessing
agreement among coders in applying the codes to the text).
E.
Basic Statistical Concepts and
Methods that are necessary and useful
for statistical analysis of qualitative and corpus data, including describing
data, comparing groups, describing relationships, log-linear modeling, and
Bayesian statistics.
F. Statistics for
Analyzing Data and the Visual Presentation of Results
a.
Analysis of cross-classified data: two-by-two and more complex
contingency tables, odds-rations and the log-linear model, and ways to graph
the results.
b.
Metric scaling: co-occurrence of words or codes; analysis and visualization, including
principal components analysis, multidimensional scaling (the group maps and
individual maps), and correspondence analysis.
Course
Requirements
Class
meetings will involve hands-on treatment of data sets, either provided by the
instructor, collected by the group, or volunteered by students. For each
procedure, the instructors will offer “big picture” explanations, followed by
step-by-step examples in the appropriate software (e.g. SPSS, Excel, or one of
the specialized programs listed in this syllabus), plus assigned problem
solving for homework between classes. The course will include three take-home
exams to be worked in Excel and/or SPSS. During the course, students will
be encouraged to set up a data set and analyze it using one of the methods that
interest them in particular. In the last several classes, students will
make presentations of these projects.
Grading
Exams
count for 75% percent of the grade (each exam contributing 25%), and the final
presentation counts for the remaining 25%.
Academic Misconduct
Tentative
Schedule
|
Week |
Date |
Topic |
|
What’s due |
|
1 |
M 01/14 |
Introduction |
|
|
|
|
W 01/16 |
Transcripts as “Corpora” |
|
|
|
2 |
M 01/21 |
Martin Luther King Day - No Classes |
|
|
|
|
W 01/23 |
AntConc / WordSmith |
|
|
|
3 |
M 01/28 |
Coh-Metrix |
|
|
|
|
W 01/30 |
Review of Computer Coding Software Linguistic Inquiry and Word Count |
|
|
|
4 |
M 02/04 |
Atlas.ti / NVivo
/ Ethnograph |
|
|
|
|
W 02/06 |
Sample Size |
|
|
|
5 |
M 02/11 |
Data Matrices |
|
|
|
|
W 02/13 |
Describing Data |
|
|
|
6 |
M 02/18 |
Describing Data/Comparing Groups |
|
|
|
|
W 02/20 |
Comparing Groups |
|
|
|
7 |
M 02/25 |
Comparing Groups |
|
Exam 1 |
|
|
W 02/27 |
Describing Relationships |
|
|
|
8 |
M 03/03 |
Describing Relationships |
|
|
|
|
W 03/05 |
Intercoder Reliability |
|
|
|
9 |
Spring break |
|||
|
10 |
M 03/17 |
Loglinear Modeling |
|
|
|
|
W 03/19 |
Loglinear Modeling |
|
|
|
11 |
M 03/24 |
Analysis of Cross-Classified Data |
|
Exam 2 |
|
|
W 03/26 |
Analysis of Cross-Classified Data |
|
|
|
12 |
M 03/31 |
Bayesian Statistics |
|
|
|
|
W 04/02 |
Bayesian Statistics |
|
|
|
13 |
M 04/07 |
Metric Scaling |
|
|
|
|
W 04/09 |
Metric Scaling |
|
|
|
14 |
M 04/14 |
Metric Scaling |
|
Exam 3 |
|
|
W 04/16 |
Metric Scaling |
|
|
|
15 |
M 04/21 |
Catch up/Final Presentations |
|
|
|
|
W 04/23 |
Final Presentations |
|
|
|
16 |
M 04/28 |
Final Presentations |
|
|
|
|
W 04/30 |
Final Presentations |
|
|