APLNG
596D: Computational and Statistical Methods for Corpus Analysis
2009 Summer
Institute in Applied Linguistics
Pennsylvania State
University
General
Information
Instructor: Xiaofei
Lu
Office: 301
Sparks Building
Mailbox: 305
Sparks Building
Phone: (814)
8654692
Email: xxL13
AT psu DOT edu
Meetings:
MTRF
4:15-6:15pm, 15A Sparks
Course
Description
This
course provides a hands-on introduction to the core and advanced computational
and statistical methods for analyzing corpus data. We will first introduce some
of the state-of-the-art computational tools for text processing and linguistic
annotation and demonstrate tools that can be used to query raw and
linguistically annotated corpora to extract occurrences of specific linguistic
patterns and grammatical structures. Next, we will cover some of the most
essential statistical methods used in analyzing and interpreting information
extracted from text corpora. We will conclude with a discussion on how these
methods have been combined in recent corpus-based studies, and how they may be
implemented in student-proposed research projects. This course will be highly
applied, and there will be substantial opportunities for demonstrations,
exercises, and discussions. By the end of the course, students are expected to
have a good grasp of the computational and statistical techniques necessary for
processing, annotating, and analyzing corpus data.
Course
Requirements
For students who register for graduate credit, evaluation will be
based on participation and a short take-home assignment to be distributed on
Friday 7/10 and due on Friday 7/17.
Tentative
Schedule
|
|
Day |
Topic |
Resources |
Readings |
|
1 |
M, 7/6 |
|||
|
2 |
T, 7/7 |
Analyzing raw data |
Lu (in press) |
|
|
3 |
R, 7/9 |
Granger (2003) |
||
|
4 |
F, 7/10 |
Biber (2006): Ch3 |
||
|
5 |
M, 7/13 |
Lu (2009) |
||
|
6 |
T, 7/14 |
|
||
|
7 |
R, 7/16 |
Statistical analysis |
|
|
|
8 |
F, 7/17 |
|
|
Recommended
readings
1.
Biber, Douglas (2006). University Language: A
Corpus-Based Study of Spoken and Written Registers. Amsterdam:
John Benjamins.
2.
Granger,
Sylviane. (2003). Error-tagged learner corpora and CALL: A promising synergy, CALICO Journal,
20(3): 465–80.
3. Lu, Xiaofei (2009). Automatic analysis of syntactic complexity in child language acquisition. International Journal of Corpus Linguistics, 14(1): 3-28.
4.
Lu,
Xiaofei (in press). What can corpus software reveal about language development?
In Michael McCarthy & Anne O'Keeffe (eds.), Routledge Handbook of Corpus Linguistics.
Oxfordshire, UK: Routledge.
5. Wynne, Martin (Ed.) (2005). Developing Linguistic Corpora: a Guide to Good Practice. Oxford, UK: Oxbow Books.