The English text of NewsScape and related Red Hen holdings are now available in CQPweb at https://corpora.linguistik.uni-erlangen.de/newsscape. This database is updated at irregular intervals, typically every six months. Access is limited to participants in Red Hen research projects.
The tagset used is the Penn Treebank PoS tagset.
Some recent updates on the functionality:
Let us start with a simple query:
If we want to know where this is used with what frequency, we can use the Distribution feature:
Here, we can sort by relative frequency by clicking on the arrow in the table heading:
We can see that MSNBC uses the phrase more than twice as often as Fox News and 5 times as often as CNN.
We can also take a look at the distribution over the years. This can be done in a table or in a bar chart as in the following screenshot:
We can see that the phrase "affordable care act" peaks in 2013 and rapidly loses ground.
Let us now compare these results with the results for the term "obamacare", which has the same extralinguistic referent as the phrase "Affordable Care Act":
The simple name for the comparative correlative grammatical construction is "the Xer, the Yer," as in "The bigger the better," "the more the merrier," and "the bigger they are, the harder they fall." It is quite a complex construction, studied early in the development of construction grammar in a classic paper: Charles Fillmore, Paul Kay, and Mary Catherine O'Connor, "Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone" Language, Vol. 64, No. 3. (Sep., 1988), pp. 501-538. The construction prompts for the mental construction of two linear scales and a correlation between them, and naturally is an excellent tool for explanation and persuasion. Fillmore, Kay, and O'Connor write, "This structure is used for expressing a correlation between an independent variable and a dependent variable. The propositions participating in the statement of correlation can be derived from the lexico-syntactic form of the sentence's two main components. . . . This use of the comparative construction is unique; the use of the definite article that we find in this construction is not, so far as we can tell, found generally elsewhere in the language; nor is the two-part structure uniting the two atypical the-phrases found in any of the standard syntactic forms in English. . . . In spite of the fact that it is host to a large number of fixed expressions, the form has to be recognized as fully productive. Its member expressions are in principle not listable: unlimitedly many new expressions can be constructed within its pattern, their meanings constructed by means of semantic principles specifically tied to this construction." Some researchers, such as Thomas Hoffmann, have proposed that the construction comes with not only a prompt for a clause-internal computation of scalar differentials and an apodosis-protasis relationship between the two clauses, but also associated gestures and a distinct intonation contour, namely a rise on the first filler and a fall on the second filler.
In Red Hen, with CQPweb, one can search for the construction and investigate the gestures and intonation.
CQP web uses these tags:
RBR Adverb, comparative
JJR Adjective, comparative
In simple query syntax, one can search for the Xer the Yer with a quick and inexact workaround:
the (_JJR | _RBR) *************** the (_JJR | _RBR)
Changing the Query mode to “CQP syntax” makes it possible to run a more exact search:
"the" [pos="JJR|RBR"] []* "the" [pos="JJR|RBR"] []* [pos="\."] within s
This query returns 39,198 matches in 33,770 different texts (in 1,766,766,393 words [263,445 texts]; frequency: 22.19 instances per million words).
Here is a video clip showing one such example:
http://babylon.library.ucla.edu/redhen/tweet?id=2015-12-17
Other search utilities are available. Using peck or the standard Unix command grep, one could seek
POS_02 THE\/DT\|[A-Za-z]*\/(RBR|JJR).*THE\/DT\|[A-Za-z]*\/(RBR|JJR)
or
egrep -i "THE\/DT\|[A-Za-z]*\/(RBR|JJR).*THE\/DT\|[A-Za-z]*\/(RBR|JJR)"
We say, "I know the fastest two swimmers," but not, it seems, "the fastest one swimmer." Why not? But wait! Is the claim true? Now that linguistics has turned to theories of language as usage-based, hypotheses about language (however they arise, including through adventitious encounters with data, or introspection, or dreaming, or . . . ) are expected to be tested against vast datasets of ecologically valid attested usage. So, let's query Red Hen's CQPweb, putting it through a sequence of searches. Note that we finally find "the most important one issue," which is interesting! So what happens to our hypothesis now?