CQPweb search interface

The English text of NewsScape and related Red Hen holdings are now available in CQPweb at https://corpora.linguistik.uni-erlangen.de/newsscape. This database is updated at irregular intervals, typically every six months. Access is limited to participants in Red Hen research projects.

Documentation

The tagset used is the Penn Treebank PoS tagset.

Related resources

Tutorials

Functionality updates

Some recent updates on the functionality:

    1. You will now be able to use lemma queries of the form {match/N} for all instances of match (match, matches) used as a noun or {match/V} for all instances of match (match, matches, matching, matched) used as a verb.
    2. You can now use restricted queries (second item on the left) to limit your queries to certain channels and certain years.
    3. If you want to limit even further (say to a certain day or to Jay Leno only), you will have to create a subcorpus first. This is quite simple; just follow the following steps:
      1. “Create/edit subcorpora” (last item under “User controls” on the left)
      2. Select Define new subcorpus via: “Scan text metadata” and hit “Go!”
      3. Which metadata field do you want to search? -> for instance, select “title”
      4. Search for texts where this metadata field … contains “Jay Leno”
      5. Hit “Get list of texts”
      6. Enter the name of the New subcorpus: “My_Jay_Leno_collection” and tick “include all texts” or select the texts you want to have
      7. Hit “Add texts”
      8. Now go to the “Standard query”, type in your search string and in the “Restriction” field select the subcorpus you have just created.

Example searches

  • The novel construction "because NOUN", as in "because science":
  • because _N* _{$} </s>

Tracking discourse on "Obamacare" or the "Affordable Care Act"

Let us start with a simple query:

If we want to know where this is used with what frequency, we can use the Distribution feature:

Here, we can sort by relative frequency by clicking on the arrow in the table heading:

We can see that MSNBC uses the phrase more than twice as often as Fox News and 5 times as often as CNN.

We can also take a look at the distribution over the years. This can be done in a table or in a bar chart as in the following screenshot:

We can see that the phrase "affordable care act" peaks in 2013 and rapidly loses ground.

Let us now compare these results with the results for the term "obamacare", which has the same extralinguistic referent as the phrase "Affordable Care Act":

Comparative Correlatives

The simple name for the comparative correlative grammatical construction is "the Xer, the Yer," as in "The bigger the better," "the more the merrier," and "the bigger they are, the harder they fall." It is quite a complex construction, studied early in the development of construction grammar in a classic paper: Charles Fillmore, Paul Kay, and Mary Catherine O'Connor, "Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone" Language, Vol. 64, No. 3. (Sep., 1988), pp. 501-538. The construction prompts for the mental construction of two linear scales and a correlation between them, and naturally is an excellent tool for explanation and persuasion. Fillmore, Kay, and O'Connor write, "This structure is used for expressing a correlation between an independent variable and a dependent variable. The propositions participating in the statement of correlation can be derived from the lexico-syntactic form of the sentence's two main components. . . . This use of the comparative construction is unique; the use of the definite article that we find in this construction is not, so far as we can tell, found generally elsewhere in the language; nor is the two-part structure uniting the two atypical the-phrases found in any of the standard syntactic forms in English. . . . In spite of the fact that it is host to a large number of fixed expressions, the form has to be recognized as fully productive. Its member expressions are in principle not listable: unlimitedly many new expressions can be constructed within its pattern, their meanings constructed by means of semantic principles specifically tied to this construction." Some researchers, such as Thomas Hoffmann, have proposed that the construction comes with not only a prompt for a clause-internal computation of scalar differentials and an apodosis-protasis relationship between the two clauses, but also associated gestures and a distinct intonation contour, namely a rise on the first filler and a fall on the second filler.


In Red Hen, with CQPweb, one can search for the construction and investigate the gestures and intonation.

CQP web uses these tags:

RBR Adverb, comparative

JJR Adjective, comparative

In simple query syntax, one can search for the Xer the Yer with a quick and inexact workaround:

the (_JJR | _RBR) *************** the (_JJR | _RBR)

Changing the Query mode to “CQP syntax” makes it possible to run a more exact search:

"the" [pos="JJR|RBR"] []* "the" [pos="JJR|RBR"] []* [pos="\."] within s

This query returns 39,198 matches in 33,770 different texts (in 1,766,766,393 words [263,445 texts]; frequency: 22.19 instances per million words).

Here is a video clip showing one such example:

http://babylon.library.ucla.edu/redhen/tweet?id=2015-12-17

Other search utilities are available. Using peck or the standard Unix command grep, one could seek

POS_02 THE\/DT\|[A-Za-z]*\/(RBR|JJR).*THE\/DT\|[A-Za-z]*\/(RBR|JJR)

or

egrep -i "THE\/DT\|[A-Za-z]*\/(RBR|JJR).*THE\/DT\|[A-Za-z]*\/(RBR|JJR)"

[Superlative adjective] [positive integer][Noun Phrase]

We say, "I know the fastest two swimmers," but not, it seems, "the fastest one swimmer." Why not? But wait! Is the claim true? Now that linguistics has turned to theories of language as usage-based, hypotheses about language (however they arise, including through adventitious encounters with data, or introspection, or dreaming, or . . . ) are expected to be tested against vast datasets of ecologically valid attested usage. So, let's query Red Hen's CQPweb, putting it through a sequence of searches. Note that we finally find "the most important one issue," which is interesting! So what happens to our hypothesis now?

theFastestOneRunner.docx