L2 Syntactic Complexity Analyzer

Xiaofei Lu


L2 Syntactic Complexity Analyzer is designed to automate syntactic complexity analysis of written English language samples produced by advanced learners of English using fourteen different measures proposed in the second language development literature. The analyzer takes a written English language sample in plain text format as input and generates 14 indices of syntactic complexity of the sample. This software is an implementation of the system described in:

The analyzer is implemented in python and runs on UNIX-like (LINUX, MAC OS X, or UNIX) systems with Java 1.6 and python 2.5 or higher installed. A minimum of 2GB memory is recommended. The analyzer takes as input a plain text file, counts the frequency of the following 9 structures in the text: words (W), sentences (S), verb phrases (VP), clauses (C), T-units (T), dependent clauses (DC), complex T-units (CT), coordinate phrases (CP), and complex nominals (CN), and computes the following 14 syntactic complexity indices of the text: mean length of sentence (MLS), mean length of T-unit (MLT), mean length of clause (MLC), clauses per sentence (C/S), verb phrases per T-unit (VP/T), clauses per T-unit (C/T), dependent clauses per clause (DC/C), dependent clauses per T-unit (DC/T), T-units per sentence (T/S), complex T-unit ratio (CT/T), coordinate phrases per T-unit (CP/T), coordinate phrases per clause (CP/C), complex nominals per T-unit (CN/T), and complex nominals per clause (CN/C). The analyzer calls the Stanford parser (Klein & Manning, 2003) to parse the input file and Tregex (Levy & Andrew, 2006) to query the parse trees. Both the Stanford parser and Tregex are bundled in this download and installation along with the appropriate licenses.


Frequently asked questions