Webinar Archives‎ > ‎

Notes from Predictive Search II


A few things to start us out:

1) We are trying to learn more about our audience for these webinars. If you have a moment, please press the pause button on this playback and go to bit.ly/predictive2 and take a brief survey on our operators. Don’t use Google to test the questions, just give your best answer from reading the survey, and please click submit when you are done! Thanks!

2) Follow what others have said, and add your comments on favorite uses of operators, etc., with our Twitter hashtag: #predictivesearch.

3) We are about to talk about a few of the most useful operators Google has to offer. However, it might still be a lot to take in at one sitting. Please feel free to watch this recording in pieces--just remember where you stopped and use the timeline posted on our site and the slider bar you see at the bottom of the interface to pick it up again at a later date. You can review portions of it in the same manner.

Will pose challenges, you press pause--machine learning may change results over time to not require operators

Review predictive search:  stop and think
Search for answer, not question
Can review notes/recording of last webinar

The AASL & ACRL standards speak to a student’s needs to be able to “choose a communication medium and format that best supports the purposes of the product or performance and the intended audience.” (ACRL) Good online search skills rely heavily on the flip side of that standard—a literate student should be able to think critically about an information need and envision the “communication medium and format” in which the desired information would be communicated if it was appropriate to his or her current situation.

I find that in the context of this process—if I predict in some detail what my perfect source will look like—suddenly advanced operators make sense, gain inherent meaning, and come much more frequently into use.

I also find that, while students vary in how visual they are, many students—even in elementary school—can map out or describe details of various web page formats once they are made aware that differences exist among blogs, wikis, discussion lists, etc.

So, when beginning a search process:

Stop and think for a moment about what your perfect answer will look like:

1. WHO cares about what I care about? Who do I trust to give me the information I need?

2. WHAT words would I use to describe it, what words would my trusted source use? Which would be most common?

3. WHERE would my trusted source publish this information?

4. WHY would someone use one format or another to communicate this information?

5. HOW will I know when I found what I want?

As long as thinking about these questions, naturally give rise to making use of operators.

For example:

An business student emulating an African tour company owner wants to check which national parks in Kenya get the most traffic.



[visitors parks kenya]

Problem with this search--not bring up the answer--lots of tourist sites. I can see their opinions, but right now, I want actual numbers of visitors, per park.

So--I think about those questions we asked before:

WHO: Kenyan Ministry of Tourism --has motive to create, and access to best information

WHAT: arrivals, park names (e.g., Mt. Kenya, Animal Orphanage, Amboseli), maybe statistics for a range of years

WHERE: Ministry of Tourism website

WHY: tables/spreadsheets organize numbers nicely (I think about it--table or spreadhseet--one axis, names of parks, one axis has years-- each cell tells specific number of visitors for specific park in one year.  Info might exist in text form, but it would be really difficult for me to use, and is therefore a less logical source to look for. Much as we would educate students to select some sort of table or graph over text to communicate this information to others.)

How: When I have numbers of visitors in each year to each of the national parks in Kenya, so I can easily compare them.

[visitors parks kenya site:tourism.go.ke filetype:xls]

Look at the result to confirm that it is what I needed. What did I do?

site:  looks within the web address I specify--here, I have said that I want to look for pages only within the site tourism.go.ke. Notice there are no spaces between the site: operator and the web address itself.

filetype: tells Google I only want see a particular type of pages--ones that are created in a specific application.

To explore further:

[study kenya parks visitors]

Notice that most of the compelling sources are in PDF format, so can narrow:

[study kenya parks visitors filetype:pdf]

If you look at our advanced search page, you will see a filetype: drop-down box. We list a number of types of files that we think you might want to look for. However, you can actually use any 3- or 4-charagter extension type.

[morse code “i love you” filetype:ogg] for Wikimedia’s sound files

[captain cook filetype:kmz] for Google Earth

Can even look for more than one with [captain cook filetype:kmz OR filetype:kml]--both Google Earth file types

OR is an operator that tells Google to look for TermA OR TermB (here FiletypeA or Filetype B) on a page. Usually, all the terms you type in are bound together with invisible ANDs. This means that we look for pages with all your search terms, unless you use OR to build in flexibility.

You can see how OR is working clearly with this Google Earth file search, since it would be impossible for one file to be both KMZ and KML. Using it, you can search for more than one filetype, or within more than one site, at once.

We will look more at OR in a few minutes.

[chore chart filetype:pdf]--Co-worker looked for chore charts, and most that she found were not printable. So, she tried this search (and also found them with filetype:xls).

I played with this concept--ended up looking for dumbell workout charts, and encountered tons of malware in my resutls--cloaked sites that look like the proper results, but take me to malicious pages. Luckily, Google puts up an interstatial warning page if we think a site might have malware, but still I like this solution. There might still be malicious websites that come up in a filetype: result, but we do ping the site to make sure that apage that is showing as a PDF is actually a PDF--so it strikes me as a safer way to search for subjects that are a target for vicious code.

I hear from educators that searching for worksheets is a great way to inadvertently stumble upon malware, so this might be a partial solution to that problem, as well.

Site: can go more in-depth:
The site: operator can offer depth:

I am researching Henry May's role in the Free Speech Movement, and found the Free Speech Movement database in the Internet Public Library.

Now, I want to search within it for all documents naming Henry May. The site does not have its own search bar.

Try pausing and creating this search yourself, then come back to hear my answer:

You can use the site: operator to search within a directory--one specific portion of a website.

[site:bancroft.berkeley.edu/FSM/ henry may]

You can also look across a specific top level domain (.com, .gov, .mil, .za, .ca, etc.)

A student is preparing to do some research on the role of women in the workforce. She wants to include some statistics, and does not want to have to gather them from a bunch of different sources and be bothered with citing them all—she’s also been told that she is supposed to pay attention to the authority of her sources. She figures that the government must be pretty authoritative as a source, and might have a bunch of statistics in one place. However, she knows that she has had trouble in the past when she has tried to work her way through government statistics—some obvious ideas have really weird names.

To save herself time, she decides to spend five minutes skimming lists of vocabulary terms for government sources about labor.

What search would you do? Pause and try it out!

I would use:

[glossary labor site:gov]

(FYI, would also work with something like site:go.ke.)

Now, this student wonders if there are other synonyms for glossary, which might make her miss a useful source. Using predictive search, this student realizes that sometimes people call online glossaries “dictionaries,” “list of terms,” or even “vocabulary.”

Does she need to use lots of ORs in her search: [glossary OR dictionary OR terms OR vocabulary labor site:gov]?

What would you do? Pause and try it out, then come back and hit play to see my solution:


I would use the ~ operator.

The tilde (~) is usually on the key to the left of the 1. You will need to hit shift to get the tilde.

The tilde operator triggers Google’s automatically generated thesaurus or related terms--they are not synonyms, but words we find people tend to use within similar contexts to each other.

[~glossary labor site:gov]

It happens that .gov sources are pretty controlled in their vocabulary, so it does not make much difference.

But try taking out the .gov restriction and doing this search:

[~glossary poetry]

Notice all the different terms appearing in bold?

I particularly like using the ~ with the search term glossary, since I often need terms of art. However, when I use the tilde operator, I do pay careful attention to the resulting terms. I use it most on certain words where I know I am looking for a range of related terms, but where I have used it before, and know what to expect.

~ helps expand—but what about when words get in the way? Like the elementary school student who wants to learn about coyotes and gets the Phoenix Coyotes. It is only the top result, but for a child it can disrupt her ability to attend to the other results.

What would you do? Press pause and try it out, then come back and press play to see what I did?


I used:

[coyotes –phoenix]

The minus sign tells Google NOT to bring back any pages that have the word phoenix. As with the tilde, filetype:, and site: please do not leave a space between the minus sign and the term you want to get rid of, because then Google will simply drop it as an extraneous character and AND together the terms and search for [coyotes phoenix].

Sometimes, you need too many minus signs, and need a different strategy. For example, for my 5th grade student who loved mythology and wanted to read about giants, built:

[giants -baseball -football -soccer] and could have continued -grocery and so on.

It is much more efficient, in instances like this, to narrow positively, with a search like [giants mythology].

With operators like -, ~, site: and filetype: the term needs to be touching the operator: [giants -baseball] finds pages about Giants that do not mention baseball, but
[giants - baseball] the - is not touching the search term, and Google ignores it completely.

For the most part these operators (-, ~, OR) will act on the one word they are touching. Site: is slightly different because it is not about terms, but web addresses.  Filetype, too.

If we want these operators to act on a phrase, we need to put quotation marks around it. For example, compare:

[hunger games OR harry potter] vs. [“hunger games” OR “harry potter”]

Note that in the first, the term harry is essentially ignored when hunger and games and potter are found, because  I essentially asked for the words hunger and potter with either games or harry. When I put in quotation marks, it tells Google that I want one phrase, hunger games or the other harry potter.

Also please note that the OR operator has to be in ALL CAPS, or it will be interpreted as a search term.

An advanced high school or college student who was trying to research the properties of pig skin may need quotation marks in the following way:

[pig skin]
[pig skin –“pigskin”]
[pig latin name]
[pig “latin name”]
[domestic pig “latin name”]
[sus scrofa skin]
--Can filter for academic articles through scholar.google.com or by using “reading level” filter in the left-hand panel.

I find that I use quotes a lot in combination with other operators and tools:

Last time, I demonstrated how I use “associated search terms,” visualizing common sentence structures in sources such as newspaper articles and using them to find less easily searchable information. I used, for example:

[“childhood obesity” “by 2015..2050” OR “by the year 2015..2050”]

The #..# is a number range search. It looks for any number from the first to the last or anything in between. It substitutes for a whole lot of ORs--which would not only be annoying, but would often take you over our 32-word limit.

You can also do #.. for a greater than search, though there is sadly no less than.

number range endlessly useful:
[hybrid "50.. mpg OR “50.. miles per gallon"]
["the senate voted 50.. to 1..49"]—limit to past month

When you need a range of words, not just numbers, you can also use the asterisk:
["spending on * education has * from * to"]
Finally, what happens when Google’s spell checking and synonymizing are too smart for my search’s own good?

Take the student who is trying to read about the [helgo bog buddha].

What happens when you run this search? How do you fix the problem? How would you keep if from happening next time you run the search? Pause here and try it out, then come back to hear my answer:


(NOTE: As of October, 2011, we have changed the "freezing" operator to be even easier to use. Now, where these notes tell you to use a +, you would instead surround a single word with quotes: e.g., ["word"].)

I would use the + operator. Again, in front of the words that Google “corrected” for me, with no spaces at all.

Google thought that instead of Helgo I might have meant Helga, and that instead of bog I might have meant big.

[+helgo +bog buddha] (Now, this would be ["helgo" "bog" buddha])

Big picture: once have command of operators, command of structure of information on the web, can start thinking really creatively about how to expose content you want:

I can find blogs and other related sources, where someone writes an article and communities of interest debate the ideas in them. What is common about these sources? They say Comments (#) on them. So I can search:

[nanotechnology debate "comments 50.." -site:youtube.com]

Other parts of pages that we don’t usually think to search are also searchable. For example, I teach students to edit Wikipedia, as part of lessons on information quality. It turns out that I can search for those warnings that appear within Wikipedia articles, called templates, like this:

[site:wikipedia.org "insufficient inline citations."]

I can even look for examples of  pages that have date created and date updated information at the bottom.

[octopus “created * 2000..2011” “updated * 2010..2011]

These are just a few examples, but the big idea is this: as we challenge ourselves and our students to pay attention to  the actual presentation and packaging of online information, it turns out that there are many, many searchable elements on any given page. By predicting what our source will look like before we try to find it, we can identify truly unique characteristics that are likely to appear on the page, and by using advanced operators we can actually functionally search for them.

In my experience, by slowly introducing operators in this context, it gives students something internally meaningful to remember about them, and creates a stickier learning experience.

--Still need to practice--operators and visualizing sources
--See attached links for more info.

Here is the link to the after-course survey: