Boston University Noun Phrase Corpus | ||||||||||||||||||
|
Welcome to the Boston University NP Corpus. This is a thoroughly coded corpus of noun phrases that is freely accessible to the public. For more information, see below. | ||||||||||||||||||
Click here to search the corpus
|
|
| The CorpusThe Boston University NP Corpus is specifically a corpus of 'possessive' noun phrases. The tokens are therefore all pairs of nominals (including pronouns), combined either via premodification or via postmodification:
|
| Obviously, these are not all truly possessives, but rather constitute a superset which includes all possessives. The corpus does not include as 'possessve' noun phrases tokens of noun-noun modification (compounding), such as "the garbage man", although these are often included within a possessive. The coding attempts to remain theory-neutral with regard to syntax. The corpus contains 10,008 tokens of such 'possessive' noun phrases, meaning that it contains 20,016 individual nominal tokens. All of the tokens are taken from sections of the Brown Corpus. Specifically, they are taken from the following genres:
|
| Both NPs in each of the 10,000 tokens have been annotated for many features, including the following:
|
| The tokens are available with the full co-text of each example, up to the limits of the samples that comprise the Brown Corpus. All text has been part-of-speech tagged using Fred Karlsson's English Constraint Grammar system.
| The Search EngineThe corpus is accessed by means of a complex search engine which, when fully developed, will allow the following operations:
|
| Who We AreThis corpus and website are two of the products of the NSF-funded project Optimal Typology of Determiner Phrases (BCS-0080377), with the following members:
|
This site is created and maintained by Gregory Garretson. Please direct all correspondence to him.
| |