Previous Up Next

2  Quick User Guide to mcfgcky

Mcfgcky is built to work with the command line, and it’s various functions are triggered by distinct ‘modes’. Each mode has its own syntax, and the command line parsing at present is rigid as to this syntax.

2.1  Debugging Mode

"Usage: mcfgcky -d grammar-file 3.sentence >\n"

Given an mcfg file at location ’grammar-file’ and a sentence set in double quotes, attempt to recognize the sentence.

EXAMPLES:

./mcfgcky -d grammars/parsingtest/bever.mcfg "the horse raced past the barn *"

Recognize the Kleene closure of the prefix ’the horse raced past the barn’ according to bever.mcfg

./mcfgcky -d grammars/parsingtest/bever.mcfg "the horse raced past the barn fell"

Recognize the sentence ’the horse raced past the barn’ according to bever.mcfg

2.2  Parsing Mode

"Usage: mcfgcky -p grammar-file 3.dict-file <4.-s 5.sentence/4.-f 5.in-file> <6.l/-pp> 7.<mcfg/-mg/-mgd> / 6.-dcfg> 7-8.outfile \n"

Given an mcfg file at location ’grammar-file’ and a sentence set in double quotes, or a corpus of sentences, attempt to parse.Return a sample of derivation trees in mcfg or mg output. MG output requires a dictionary file from the Guillamin compiler –for mcfg output, this file is never checked.

Options: -s/-f: if -s, parse the sentence given in double quotes. If -f, parse the unweighted corpus of sentences in location ’in-file’. Each sentence in this corpus is double-quoted, each is delimited with semicolon and newline. The formatting on this corpus is sensitive: please leave at least one line of whitespace at the end, and prepare the corpus in a bona-fide text editor, as some Mac (TextEdit) and Windows text editors will leave carriage returns which confuse filereader.ml, which parses the corpus into a useful format.

Example: “the horse raced past the barn”; “the dish ran away with the spoon”;

-l/-pp: if -l, latex-output for qtree. if -pp, relatively horrible text output.

-dcfg: output a product-sum graph of the situated context-free grammar, for dot. Requires an outfile name.

-mcfg/-mg: if -mcfg, mcfg derivation tree output. if -mg, mg derivation tree output. -mgd will produce an X-bar style MG-derived tree, not yet implemented.

EXAMPLES:

: ./mcfgcky -p grammars/unergunacc/unergunacc4.mcfg grammars/unergunacc4.dict -s "the horse raced past the barn *" -l -mg unergunacc.out*

Parse the Kleene closure of the prefix ’the horse raced past the barn’ according to unergunacc4.mcfg, returning a sample of mg derivation trees in qtree latex format.

: ./mcfgcky -p grammars/unergunacc/unergunacc4.mcfg grammars/bever.dict -f unerg.test.txt -l -mcfg unergunacc.out*

Parse the corpus located at unerg.test.txt, return mcfg trees in latex, requires a dummy dictionary file which will not be read.

: ./mcfgcky -p grammars/unergunacc/unergunacc4.mcfg grammars/unergunacc4.dict -s "the horse raced past *" -dcfg unergunacc.out*

Parse the Kleene closure of the prefix ’the horse raced past’ according to unergunacc4.mcfg, and return the and-or graph of the grammar conditioned on this prefix.

2.3  Statistical Mode

"Usage: mcfgcky -i/-ii 3.trainingcorpus/pcfg 4.test-file -l/-q 5.out-file\n"^

Generate or read in a probabilistic model for cfg or mcfg, and compute prefix probabilities, surprisals, and entropies for each prefix.

Options: -i/-ii: if -i, induct a P(M)CFG from a weighted corpus by building a mini-treebank and using Weighted Relative Frequency Estimation.

In addition to the above cautions regarding corpora, this corpus should be formated as follows:

(0.379, “the horse raced past the barn”);

(284e-6,“the dish ran away with the spoon”);

Corpreader.ml should have no problem with floats in standard or scientific notation to any reasonable number of places.

if -ii, read in a pcfg. (Train.ml, Pcfgreader.ml)

This PCFG should be formatted as follows.

“0.00064636 SUBJP → PRON NBAR”

“0.39936363 SUBJP → PRON”

Leave a line of whitespace at the end. At present, this mode is redundant in that it must read in the pcfg and the derivative mcfg seperately. Great confusion can result if the two do not match up; the typical method as of late has been to develop the pcfg and then derive the trivial cfg. Also, not much is known as how this mode works on non-context free grammars...there should be no problem, but it has not really been testsed.

TESTING


Previous Up Next