ZPar

Overview

ZPar is a statistical natural language parser, which performs syntactic analysis tasks including word segmentation, part-of-speech tagging and parsing. ZPar supports multiple languages and multiple grammar formalisms. ZPar has been most heavily developed for Chinese and English, while it provides generic support for other languages. A Romanian model has been trained for ZPar 0.2, for example. ZPar currently supports context free grammar (CFG), dependency grammar and combinatory categorial grammar (CCG).

ZPar for Chinese, English and ZPar generic (language-independent) are compiled separately into three independent programs: zpar, zpar.en and zpar.ge, respectively. Each program needs to be executed with a set of corresponding statistical models. Some example sets of models are released together with ZPar source so that the public release can be used off-the-shelf.

The current version of ZPar is 0.5. Its release contains a model set for zpar.en, which supports labeled dependency parsing and context-free-grammar parsing.

Download and installation

The source code and models can be downloaded from sourceforge. Unzip the source zip file into the source directory and the corresponding model files each into a model directory. To compile ZPar, type make in the zpar source directory. The binary file zpar will placed in the dist folder. Type make zpar.en and make zpar.ge to make ZPar for English and ZPar generic.

Usage of ZPar for Chinese

Suppose that the source files are saved in the folder zpar and the models are saved in models. To run zpar, type zpar/dist/zpar models, and wait for the models to be loaded. After all models are loaded, type in Chinese sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar models input output to read Chinese sentences from the input and write the corresponding parses to output.

In the following example, inputs were shown in red and outputs were shown in blue.
bash$ zpar/dist/zpar models
Parsing started
Loading scores ... done.
Loading scores... done.
这是一个例子。
(IP (NP (PN 这)) (VP (VC 是) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 例子)))) (PU 。))
输入一个句子，程序会给出它的句法分析。
(IP (IP (VP (VV 输入) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 句子))))) (PU ，) (NP (NN 程序)) (VP (VV 会) (VP (VV 给出) (NP (DNP (NP (PN 它)) (DEG 的)) (ADJP (JJ 句法)) (NP (NN 分析))))) (PU 。))
ZPar通过机器学习获得知识；虽然大多情况正确，但是也会有分析失误。
(IP (IP (NP (NN ZPar)) (VP (PP (P 通过) (NP (NN 机器))) (VP (VV 学习) (IP (VP (VV 获得) (NP (NN 知识))))))) (PU ；) (CP (ADVP (CS 虽然)) (IP (ADVP (AD 大多)) (NP (NN 情况)) (VP (VA 正确)))) (PU ，) (VP (ADVP (AD 但是)) (ADVP (AD 也)) (VP (VV 会) (VP (VE 有) (IP (NP (NN 分析)) (VP (VV 失误)))))) (PU 。))
^D
Parsing has finished successfully.

Usage of ZPar for English

Suppose that the source files are saved in the folder zpar and the models are saved in models.en. To run zpar, type zpar/dist/zpar.en models.en, and wait for the models to be loaded. After all models are loaded, type in English sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar.en models.en input output to read English sentences from the input and write the corresponding parses to output.

Run zpar.en without command-line arguments to show options. In particular, the -o option controls the type of output. Use -ot to produce POS-tagged sentences, and -oc to produce constituent structures (brackets). The default option is -od, which produces dependency structures.

In the following example, inputs were shown in red and outputs were shown in blue.
bash$ zpar/dist/zpar.en -oc models.en
Parsing started
[tagger] Loading scores ... done.
[parser] Loading scores... done.
ZPar is a parser .
(S (NP (NNP ZPar)) (VP (VBZ is) (NP (DT a) (NN parser))) (. .))
Given a natural language sentence, ZPar produces its syntactic structure .
(S (VP (VBN Given) (NP (NP (DT a) (JJ natural) (NN language)) (SBAR (S (NP (VBN sentence,) (NNP ZPar)) (VP (VBZ produces) (NP (PRP$ its) (NN syntactic) (NN structure))))))) (. .))
ZPar works by training a model from annotated data , and making analysis using the model .
(S (NP (NNP ZPar)) (VP (VBZ works) (PP (IN by) (S (VP (VP (VBG training) (NP (DT a) (NN model)) (PP (IN from) (NP (VBN annotated) (NNS data)))) (, ,) (CC and) (VP (VBG making) (NP (NP (NN analysis)) (VP (VBG using) (NP (DT the) (NN model))))))))) (. .))
^D
Parsing has finished successfully.

Usage of submodels

ZPar consists of various implementations of a word segmentor, a POS-tagger, a joint segmentation and tagging system, a dependency parser and a constituency parser. To compile and use each submodel, run make [submodel], where [submodel] can be segmentor, [language].postagger, [language].depparser or [language].conparser. [language] can be chinese, english or generic. For example, if you want to compile the chinese dependency parser, type make chinese.depparser. To change the implementation method of particular submodels, modify the corresponding configurations from Makefile. For example, the macro SEGMENTOR_IMPL in Makefile defines the implementation of the segmentor. The corresponding code can be found at src/chinese/segmentor/SEGMENTOR_IMPL/.

Scripts

A pretty print script for the output of the constituent parser. Usage is prettyprint.sh conparser_output. Thanks to Silas S. Brown for providing the script.

Reference

Yue Zhang and Stephen Clark. 2011. Syntactic Processing Using the Generalized Perceptron and Beam Search. In Computational Linguistics, 37(1), March. [PDF][BIB]

License

The software source is under GPL (v.3), and a separate commercial license issued by Oxford University for non-opensource. Various models available for download were trained from different text resources, which may require further licenses.