We want to understand what is quantitatively implied by English-language modifiers associated with numerical expressions, such ‘about’, ‘almost’, and ‘around’ which are collectively known as hedges. Many hedges are used in English. We have distributed the questionnaire below (in red text) to students, and have received a couple dozen responses, but this method of interrogation is tiresome and perhaps dizzying for the informant.
English phrases for uncertain numeric quantities
In the list below, each line is an English-language phrase characterizing a numerical quantity. To the right of each phrase, indicate the smallest possible value that is consistent with the phrase and the largest possible value consistent with the phrase. If there are no single smallest and largest values, but rather gradations of consistency over values, please indicate how you would characterize values consistent with the phrase. Feel free to use words, fractions, or decimal numbers with as many decimal places as you need in your answers to convey precisely what you think. If it is not the case that any value between the smallest and largest would also be consistent, then please indicate this fact.
This is not a test. What is important to us is how you interpret the phrases. The phrases are given without context that might help you to determine their meanings, so just give the most reasonable interpretation. If there are multiple interpretations, please indicate what they are. We’re not really interested in humorous or fanciful interpretations (like the quote below from the show Dead Like Me). If a phrase does not have a reasonable interpretation, please say so. It is not crucial that your responses are absolutely consistent with each other. Instead, it would be most useful if you convey, for each phrase, what you soberly think is really implied by that phrase.
Thanks for any answers you can supply. Please return your answers to scott@ramas.com.
A potentially much better approach might be to design a von Ahn game (see http://video.google.com/videoplay?docid=-8246463980976635143 or http://www.youtube.com/watch?v=dtFroEJN1nI) to elicit the quantitative meanings of the hedge words.
The UC game is functioning now, and I've used up all of my email aliases creating ghost Facebook accounts. Next time you're on the Facebook, go to this link and give her a go:
http://apps.facebook.com/ucchallenge/main.php
Currently, the Facebook app presents a statement and a question and requests a response in the form of a number, as in the example below.
The response is restricted to numeric characters (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, period, plus, minus).
The response “>15” is not accepted, but “+15” is accepted, which a player might have intended to mean the same thing.
For instance, one statement mentions that “Taco Bell’s revenue jumped by around $1.1 million” then asks for a number in answer to the question “By how many millions did Taco Bell’s revenue jump?”
It seems likely that only a subtle respondent would thoughtfully answer anything other than “1.1”. Most readers probably would not even have noticed the presence of the “about” hedge in front of the dollar sign in the statement.
Do some words tend to invoke the ambiguity detector and other words invoke the risk processor?
I've got a program that automates the question finding process by searching for new articles online that have uncertain words and quantities in them. There are only the 13 questions for now because it was easier to test it that way, but I've got a whole slew of them for later.
It won't be a stretch to include interval responses too now that the crazy FB part is out of the way.
I just tried it on Internet Explorer 8 and was able to replicate the trouble Scott described. Everything explodes in IE8 but works fine in Firefox, Safari, Chrome, Opera, and Konqueror. No surprise there.
The app currently compares your results to everyone who's ever played the game, not just your friends. This can be easily changed, but I left it like this for initial testing so I didn't have to friend all of my aliases.
The content grabber downloads a day's worth of new articles via RSS from Yahoo's Top Stories feed. Then the parser searches for article sentences that contain any words in a predefined list of uncertain words (see attached "ucwords.txt"). If the matching sentence contains a numeric value after the uncertain word, then the sentence is tagged for manual verification as a candidate. I still have to verify the statements and create the questions manually, but this can also be automated. It wont be easy though, as the questions depend on the context of the statements and there are lots of statement forms with sometimes multiple references to numeric values. I also excluded several statements related to the Japan tsunami to remove the downer element.
about
approximately
around
nearly
almost
bordering on
close to
closely
in the ballpark
in the neighborhood of
in the range
not far from
not quite
roughly
upwards of
more than
less than
larger than
smaller than
on the order of
quantitative linguistics
glottometrics
The infrastructure seems to be in place. Great job! I could help by getting IRB approval here to use the data for research purposes. We do need to make some improvement in terms specifying a unit, allowing participants to provide intervals , providing some sort of wall post to show performance/skill, and perhaps most importantly, making it a bit more gamey. We might also have to somehow do some semantic analysis to assess the polarity of a statement (for correlating beliefs versus tone/polarity) - there are a number of NLP tools for this. For now, though, it is very cool. I will give it tons of thought.