Midterm 1 Solutions, Ling 645/CMSC 723, Fall 1997


1a.  Examples

  the cat chased bit died
  the cat that the dog that the mouse died

1b.  Recognitionn vs. parsing

A recognition algorithm answers a yes-or-no question:  whether or not
the string is in the language

A parsing algorithm also answers that yes-or-no question, and if the
answer is yes, it also yields the structure of the input string.


1c.  FSA analysis for center embedded sentences

No finite-state machine can provide a useful syntactic analysis for
the examples cited.  Any useful syntactic analysis will relate the
nouns to their corresponding verbs.  However, the relationship between
nouns and verbs in this language involves center-embedded/nested
dependencies, and no FSA is capable of keeping track of center
embeddings of arbitrary depth (which would be required since the
grammatical subset of L is infinite).

1d.  Generative capacity arguments and competence/performance

The argument that English is a regular language, using sentences like
those in part (a), hinges on the fact that center embedding in the
language can be *arbitrarily* deep.  If the number of nested
dependencies is finite, no matter how large, the argument does not
hold.  Thus to make the argument, it is necessary to claim that the
English language permits arbitrarily deep center embeddings.  Since
nobody actually utters or understands center embedded sentences beyond
a fairly small finite depth (say, 4 or 5 levels of nesting), the only
way to say that deeper embeddings are a part of English is to claim
that what we care about is an idealized version of the language,
independent of what people actually say or understand in practice.
And this is precisely the distinction between (idealized) competence
and (real-world) performance.

(Admittedly, even if English were regular, it might be very clumsy to
handle it using finite-state methods -- context-free grammars,
features, etc. would lead to a more elegant or perspicuous
characterization of English even if they were not strictly required in
terms of generative capacity.)

A variant on the argument goes as follows. Any *finite* language is a
regular language.  If you don't distinguish performance and
competence, then English as a language certainly couldn't contain any
sentence longer than the number of words a human being could utter in
a lifetime.  (This assumes human lifetimes are finite, but that seems
uncontroversial.) This may be a HUGE number, but it is definitely
finite, and so without the distinction English is formally a finite
language and therefore regular.


2a.  Relative clauses in a CFG framework

The key here was to distinguish transitive from intransitive verbs;
having done that, the solution "passes the gap" indicating a filler
seeking a gap into the relative clause and down into a verb phrase
containing a transitive verb with no object.

  S   ->  NP VPi
  NP  ->  NP that St
  St  ->  NP VPt
  VPi ->  died
  VPt ->  chased | bit
  NP  ->  the dog | the cat | the mouse

In case you're interested, this treatment is roughly what you'd find
in a GPSG framework.

  S      ->  NP VP
  S/NP   ->  NP V/NP         (symbol VP/NP indicates "a VP missing an NP")

  NP     ->  NP that S/NP    (symbol S/NP indicates "an S missing an NP")

  VP     ->  Vtrans NP
  VP     ->  Vintrans
  VP/NP  ->  Vtrans NP/NP    (symbol NP/NP indicates "an NP missing an NP")

  NP/NP  ->  epsilon         (or "trace")

This is handled much more *neatly* using features, but as the grammar
above demonstrates, it can be done in a strictly context-free way.

2b.  Parse of "the mouse that the cat bit died"

                          S
                __________|_______
               NP               VPi
             /  |  \             |
          NP   that  St         died
         /\          / \
      the mouse     NP  VPt
                    /\   |
                 the cat bit   

3a.  Parse of "mother 's brother 's car"

Most people got this right, so I'm omitting the solution here.


3b.  How parsing algorithms handle nondeterminism.

"Nondeterminism" describes a situation in which an algorithm is faced
with multiple valid alternatives at some step.  Notice that even if a
grammar is unambiguous, a parser might still face nondeterministic
choices.

The stack-based parsers (top-down, shift-reduce, and left-corner), as
we covered them in class, all handle nondeterminism via backtracking.
That is, they consider one alternative at a time, keeping other
alternatives on a possibilities list and going back to them when done
exploring the current alternative.

The table-based parsers (Earley's and CKY) handle nondeterminism using
a chart or table.  They work exhaustively, constructing all
possibilities simultaneously, using the chart/table in order to avoid
re-doing work that has already been done.
Return to the course home page.