Assignment 5, Ling 645/CMSC 723, Fall 1997
Assignment 5, Ling 645/CMSC 723, Fall 1997
----------------------------------------------------------------
Assignment 5: Context-free grammars augmented with features
----------------------------------------------------------------
OVERVIEW
In this assignment you'll run the bottom-up chart parser that Allen
describes in his book. (Yes, the code comes from Allen himself.) The main
goals of this assignment are for you to have a chance to explore the
parser, to walk through its operation, and ultimately to get a feeling for
what it's like to do a bit of writing of grammar rules.
I very HIGHLY recommend that you READ THROUGH this entire assignment before
you do anything else.
A. GETTING THE PARSER
1. Create a directory called hw5 and go into it.
mkdir hw5
cd hw5
2. Get the software via anonymous ftp to umiacs.umd.edu;
the file you want is pub/resnik/ling645/hw5/CourseParser1.2.tar.gz
(see previous assignment for how to get files by anonymous ftp).
Then unzip and untar the file:
gunzip CourseParser1.2.tar.gz
tar xvf CourseParser1.2.tar
This will create a subdirectory called NLPcode. Go into it:
cd ParserCode
3. Get the full path name for the directory you are in using 'pwd'
(short for "print working directory"):
pwd
For example, the output of 'pwd' for where I am right now is:
/home/research2/resnik/ling645/lisp/hw5/ParserCode
4. Edit file loadFunction and move down to the line
where it says
(let ((*PATH* "/u/james/Code/CourseParser/ParserCode")
Change what's between the double quotes to be the directory you
got when you said 'pwd'. For example, in my case, the
it becomes:
(let ((*PATH* "/home/research2/resnik/ling645/lisp/hw5/ParserCode")
Then save the file.
B. RUNNING THE PARSER
5. In your ParserCode directory, go into LISP and load the
loadFunction.lisp file:
(load "loadFunction")
Then load up the parser:
(load "LoadParser")
Don't worry about the various warnings.
6. Load the lexicon and grammars from Allen, Chapter 4, by
executing the following:
(loadChapter4)
This defines the grammars and lexicons in Chapter 4.
7. Now try parsing a simple sentence:
(bu-parse '(a fish saw the man))
What you're seeing is a trace of the operation of the parser
Allen describes in his chapter, including the way it enters
constituents into the chart, and then, at the end, all the
constituents it found.
To see the "best" parse tree (or parse trees), type
(show-answers)
You'll get the following information, which we'll look at in some
detail now.
S57:<S ((INV -) (AGR |3S|) (1 NP51) (2 VP54))>
from 0 to 5 from rule -1>
NP51:<NP ((AGR |3S|) (1 ART44) (2 N45))>
from 0 to 2 from rule -2>
ART44:<ART ((LEX A) (AGR |3S|) (ROOT A1))>
from 0 to 1 from rule NIL
N45:<N ((LEX FISH) (ROOT FISH1) (AGR |3S|) (IRREG-PL +))>
from 1 to 2 from rule NIL
VP54:<VP ((VFORM PAST) (AGR |3S|) (1 V46) (2 NP53))>
from 2 to 5 from rule -5>
V46:<V ((LEX SAW) (ROOT SEE1) (VFORM PAST) (SUBCAT _NP) (AGR ?A4))>
from 2 to 3 from rule NIL
NP53:<NP ((AGR |3S|) (1 ART49) (2 N50))>
from 3 to 5 from rule -2>
ART49:<ART ((LEX THE) (ROOT THE1) (AGR |3S|))>
from 3 to 4 from rule NIL
N50:<N ((LEX MAN) (ROOT MAN1) (AGR |3S|))>
from 4 to 5 from rule NIL
This is telling you that the top level constituent is an S
with the following list of features:
((INV -) (AGR |3S|) (1 NP51) (2 VP54))
Recall that the "INV" feature concerns whether this is an inverted
sentence (e.g. a wh-question), which it's not, hence the "minus"
value. The AGR (agreement) feature here is set to 3rd person
singular (3s). And the two sub-constituents used to build
this S are a noun phrase, labeled NP51, and a verb phrase,
labeled VP54. Finally, since a chart item (corresponding to
a "state" in the Earley parser) identifies not only the rule
itself, but also the span, in this case from position 0 to position 5.
What about, say, constituent VP54? You can see that it's a
verb phrase spanning positions 2 to 5 ("saw the fish"). You
can also see that its agreement feature is 3s and its VFORM
feature (form of the verb), is PAST. How did the VP get those
feature values? When rule
VP(...) -> V(...) NP(...)
was invoked (Allen, p. 96, rule 5), the features inside
the VP were determined by matching the right hand side of
the rule against the V and NP constituents. The AGR and VFORM
features came from the V, which itself got those features from
the lexicon entry for the verb.
What does that lexicon entry look like? Well, you can either
go into subdirectory Grammars and look at file Chapter4.lisp,
where you'll see that one entry for "saw"is:
(saw (v (root SEE1) (VFORM past) (subcat _np) (agr ?a)))
Or, you could just parse a one-word "sentence" containing
the word you're interested in:
(bu-parse '(saw))
and look at the chart entries that come out, one of which will be:
V58:<V ((LEX SAW) (ROOT SEE1) (VFORM PAST) (SUBCAT _NP) (AGR ?A4))>
from 0 to 1 from rule NIL
C. PARSING WITH MORPHOLOGICALLY COMPLEX WORDS
8. The parser isn't doing much morphology for you, so you must
give it morphologically pre-analyzed input sometimes. That is,
sometimes you'll have to separate out the roots from the suffixes.
For instance, if you call
(bu-parse '(The man cries))
the parser will complain about an unknown word "cries"
The right call is:
(bu-parse '(The man cry +s))
Try it, and you'll see that one of the constituents built
is the V
V79:<V ((AGR |3S|) (VFORM PRES) (ROOT CRY1) (SUBCAT _NONE)
(1 V73) (2 |+S74|))>
from 2 to 4 from rule -LEX1>
That is, a V was built from V and +S using lexical rule 1
(Allen, p. 92) -- the feature-based parser is doing the
combining of the base form "cry" with morpheme "+s" to
get a V constituent corresponding to the following parse tree:
V
((AGR |3S|)
(VFORM PRES)
(ROOT CRY1)
(SUBCAT _NONE))
/ \
/ \
V +S
((LEX CRY)
(ROOT CRY1)
(VFORM BARE)
(SUBCAT _NONE))
If you look at the lexicon on Allen p. 93, you'll recall
that past tenses are built up this way in his system for verbs
having regular past tense morphology (e.g. "cry"), since you
could expect a morphological analyzer to break them up properly
(e.g. breaking "cries" into "cry +s"), but irregular past tenses
are handled by separate entries in the lexicon (e.g. the entry
for "saw" whose VFORM is PAST).
D. ADDING TO THE GRAMMAR AND LEXICON
9. To get a list of all the defined words, type
(defined-words)
That'll give you the following set, which I've re-ordered and
modified a tiny bit to make it clearer:
(
THE A ; determiner
HE ; pronoun
MAN DOG MEN SAW SEED FISH ; noun
CRY WANT SEE SAW ; verb
WERE BE WAS IS ; form of "to be"
+ING +S +ED +EN ; inflectional suffix for verbs
TO ; infinitival "to"
HAPPY ; adjective
JACK) ; name
Now look at the lexicon entry for "Jack":
(bu-parse '(jack))
or equivalently, see the lexicon entry on Allen, p. 93.
(Jack (name (agr 3s) (root JACK1)))
Notice that the word "Jack" is defined to be a member
of the category NAME -- it's not just any old noun.
Otherwise "the Jack" would be licensed by rule 2 of
the grammar as a noun phrase!
10. In the Grammars subdirectory, look at file Chapter4.lisp, and
in particular, look at the NP rules. (Equivalently, look
at the NP rules on Allen, p. 96.) Do any of the rules
say anything about the NAME constituent? Hmmm. So what
will happen when you type the following?
(bu-parse '(Jack saw the man))
That's right: if you take a look at the constituents that
were created, there's no NP for "Jack", because no grammar
rule says that NP's can be built from names. You get the
VP spanning positions 1 to 4 ("saw the man"), but there's
no NP from 0 to 1, so you can't get an S from 0 to 5.
Let's fix that, and in the process learn how to make changes
to both the grammar and the lexicon!
11. In the ParserCode directory, create a subdirectory called "new":
and go into that directory
mkdir new
cd new
Create a new file called "new1.lisp", and put the following
into it:
(setq *grammar-new1*
'((headfeatures
(np agr)
(vp vform agr))
((np)
-12>
(head (name)))))
This is a little mini-grammar containing just a single rule.
The context-free part of the rule is
NP -> NAME
with NAME declared as the head of the constituent,
and because AGR is a headfeature for NP's, that's
equivalent to
(NP (AGR ?a)) -> (NAME (AGR ?a))
Also add the following to new1.lisp:
(setq *lexicon-new1*
'((many (art (agr 3s) (root MANY1)))
(Edgar (name (agr 3s) (root EDGAR1)))))
This creates a little mini-lexicon with entries for
the words "many" and "Edgar". (Yes, I know the lexicon
entry for "many" is incorrect!)
Finally, to the bottom of the file, add the following
two lines:
(augment-grammar *grammar-new1*)
(augment-lexicon *lexicon-new1*)
The first is a command that says that the new grammar
should be added to the existing grammar. The second
adds the new lexicon entries to the existing lexicon.
E. LOADING UP YOUR ADDITIONS OR CHANGES
12. Now, get back to the ParserCode directory, get into LISP,
and get the Chapter 4 grammar and lexicon all loaded up --
if you're starting fresh, that's done by doing
Steps 5 and 6, above. (If you were working on a system
with multiple windows and kept LISP alive while editing
new1.lisp, obviously this part is unnecessary.)
If you feel like it, execute
(defined-words)
so you can see the current list of words in the lexicon.
13. Now load the additional grammar and lexicon entries you've
created by typing:
(loadf "new" "new1")
This tells the parser to look in subdirectory "new"
and load the file called "new1". Execute (defined-words)
again, and this time you should see "Edgar" and "many" on
the list!
14. To see if our lexical entry for "many" is correct, try
(bu-parse '(many men))
Notice that we don't get an NP spanning 0 to 2. Why is that?
Yup, it's because in our new lexical entry, the AGR feature for
"many" is incorrect -- it should have been 3p (third person PLURAL!)
not 3s. Let's fix that now.
15. Go back to file new1.lisp, and change the 3s in the entry for
"many" to 3p.
16. Now get back to LISP, grammar and lexicon all loaded up, etc.,
just like before -- i.e. the same state you were in at the beginning
of Step 12.
Note that if you never exited LISP, you can't just do
(loadf "new" "new1.lisp")
to get the changed lexicon entries -- you'll get a warning
saying that you tried to define rule "-12>" twice. To
get back to the state you were in at the beginning of Step 12,
you can execute
(loadChapter4)
again, and that will re-load the original grammar and lexicon,
wiping out the changes you made using new1.lisp.
17. Repeat Steps 13 and 14 -- this time the lexicon you loaded should
contain the CORRECT entry for "many" (since you've just fixed it)
and the parse should give you an NP spanning 0 to 2. Congratulations,
you've just succeeded in writing your first lexicon entry!
(A typical collegiate dictionary of English has on the order of
100,000 entries, so that means you only have 99,999 to go... :-)
18. Now let's verify that the new grammar rule also works the way
we want it to:
(bu-parse '(Jack saw the man))
As usual, you can type
(show-answers)
to see the "best" parse tree. Voila! Notice that the subject
of the sentence is an NP with a NAME as its head. Congratulations,
you've just succeeded in writing your first grammar rule!
(Any guesses as to how many more of *those* you have to write?...)
F. THE ASSIGNMENT
----------------------------------------------------------------
You should do:
Problem 5.1
Problem 5.2
EITHER problem 5.3 OR problem 5.4
(if you do both, I'll take the higher score of the two)
----------------------------------------------------------------
Problem 5.1 [30 points]
In new1.lisp, add lexical entries for the following words:
restaurant
admire
most
The easiest way to do this will probably be to look at similar
lexical entries in Chapter4.lisp, and to copy and modify them.
TURN IN: the listing of new1.lisp (or just the new entries),
along with a dribble of
(bu-parse '(edgar admire +s the restaurant))
(bu-parse '(edgar admire +s most restaurant +s))
If you want, you can wait until after you've done Problem 5.2
and turn in the final listing of new1.lisp after you've solved
both problems.
Problem 5.2 [40 points]
In new1.lisp, add lexicon and grammar rules so that the
parse
(bu-parse '(edgar saw the fish in the restaurant))
leads to a complete parse (i.e. a completed S spanning the
entire sentence) with "the fish" as the object of the verb
and "in the restaurant" as an ADJUNCT to the VP. That is,
the overall structure should be consistent with:
(S
(NP Edgar)
(VP (VP saw
(NP the fish))
(PP (PREP in)
(NP the restaurant))))
You do NOT need to be able to parse "the fish in the restaurant"
as an NP (although you can add the rule to do so if you'd like).
Here are three things that might help you as you do this. First,
remember that since it's a bottom-up parser, you can parse
constituents as you debug, rather than the whole sentence, e.g.
using
(bu-parse '(the restaurant))
to make sure that it parses properly as an NP, using
(bu-parse '(in the restaurant))
to make sure a PP gets formed, etc. That way you don't
have to focus on the whole sentence at once. When debugging
a program, it's always best to work in little pieces, checking
how things work each step of the way, rather than trying to do
it all at once.
Second, if you execute
(verboseon)
that will enable verbose tracing, showing you when dotted rules
(incomplete constituents) are added to the chart. E.g.
with verbose tracing on, when you do
(bu-parse '(a men))
you will see
Adding active arc:
<NP ((AGR |3S|) (1 ART603))>
-2>
(<ART ((AGR |3S|))>)
*
(<N ((AGR |3S|))>) from 0 to 1
which tells you that the rule
NP -> ART . N
has been introduced into the chart spanning positions 0 to 1,
with the agreement feature set to 3s. This makes it clear that
the features in the ART constituent succeeded in matching the
first part of the NP -> ART N rule, and now the rule is looking
to be completed by an N with the AGR feature also set to 3s.
In this case, that NP will never be completed, because the N
for "men" has AGR feature 3p.
You can turn off the verbose tracing by executing
(verboseoff)
Third, if you use emacs, you can use LISP from within emacs,
which will let you easily keep a record of what you're doing,
including the ability to scroll around. In emacs, type
M-x shell
where M is the meta key, or, equivalently
ESC x shell
This will put you in a shell within emacs. You can then type
lisp
as usual, and get LISP running. Except now you can do all
your LISP work within emacs, including scrolling, cutting and
pasting, and switching buffers (e.g. to edit files!). To kill
the buffer where the shell is, you can go into that buffer and
type
ctl-x k
TURN IN: the listing of new1.lisp, together with a dribble of:
(verboseoff)
(bu-parse '(edgar saw the fish in the restaurant))
Problem 5.3 [20 points]
[Remember, if you do this problem, you DON'T have to do 5.4!]
a. Why would the VP rule for subcategorized PP's (Allen, p.89) be
the wrong rule to use for the sentence in the previous problem?
b. Consider the following data:
*Sincerity admires John.
Edgar admires John.
John admires sincerity.
*Sincerity admires John.
Mary fed the cat.
*The cat fed Mary.
Describe in words what one might try WITHIN the grammar and
lexicon formalism given so far, in order to account for these
data. What, if any, problems or difficulties would be posed
by working within the current formalism?
(Assume Mary is human, not a cat! Ditto for all others
with human names.)
Problem 5.4 [20 points]
[Remember, if you do this problem, you DON'T have to do 5.3!]
a. Describe, either in plain English or by writing lexicon/grammar
entries by hand, changes you would need to make to
the lexicon and/or grammar in order to account for the
following data:
The beer was expensive.
A beer was expensive.
Beer was expensive.
*Restaurant was expensive.
*Beer were expensive.
Beers were expensive.
Beer in the restaurant was expensive.
*Beer in the restaurant were expensive.
Beers in the restaurant were expensive.
b. Based on the data in 5.4a, explain in words the ambiguity of
Edgar bought the beer.
G. OPTIONAL PART OF THE ASSIGNMENT
I'd really encourage you to try playing with the system -- adding
other lexical items you think might be interesting to play with, and
trying adding new grammar rules.
To that end, I'll offer 10 points extra credit on this assignment for
implementing an answer to 5.4a, illustrated by showing the new grammar
and lexicon. Your lexicon/grammar changes must lead to the right
results for
(bu-parse '(the beer was expensive))
(bu-parse '(a beer was expensive))
and the other 7 examples above, but you need NOT include the dribble
files to prove it. (Don't waste the paper!)
Return to the course home page.