Big Data

IBM reported in 2012 that 90% of the worlds data had been created since 2010. All that data written on DVDs would be a stack that reaches to the moon and back. What does this data mean? How can it be used? What sorts of tools can we use to visualize it?

Examples:

Google Flu Trends

The Center for Disease Control (CDC) tracks data from 3,000 doctors' offices, 140 labs, 3,000 outpatient health-care providers, vital-statistics offices in 122 cities, and epidemiologists at health departments in every state to come up with a percentage of flu-related doctor and emergency-room visits every week.

Or, you can simply count Google search terms such as "sore throat" and "cold chills."

Check out the comparison from the Popular Science graph shown below. The light-purple spike is from the CDC, while just below it the darker-purple spike reflects increase in Google search queries.

See the Google Flu Trends page for more information and access to the data. The Google trends page shows what searches are hot right now, and there is an option for finding what was hot on a particular day. It gives a sense of what is on the mind of America.

Google Flu Tracking

To be fair, you should also look at the March 2014 New York Times article showing the limitations of the Google Flu Trends prediction and cases where it has been wrong, pointing out that there is a problem with depending on "Big Data" from companies that hold some of the details private.

Microsoft Photosynth and ICE photo stitching

Get the big picture - and zoom in to get the small picture within the same application using Microsoft's Photosynth. Create a Photosynth account and you can upload your own pictures to have Photosynth create 3D model for you.

Photosynth

Download and use Microsoft ICE, a free product to photostitch your panoramas together.

Baby Name Voyager

Lots of number data has been turned into visual information in the Baby Name Voyager, showing popularity in names over the last 130 years. For instance, consider the popularity of the name "Jesus" which illustrates immigration trends from Spanish speaking countries:

Baby Names

Try typing in "Mohammed" or "Forrest" and note the sudden changes. Ask your students why they think there might have been these sudden changes. Ask them to forecast what they think the graph for the girl's name "Sunshine" looks like, and why.

If you are teaching CS1 or CS2, consider retrieving the data and giving something similar as a programming assignment.

Google's Ngram Viewer

Google's NGram viewer is a time machine for words that allows you to explore 5 billion words from 5 million books over 5 centuries. Take a look at: http://books.google.com/ngrams

Below is the result of a search on the phrase "computational thinking". Selecting the year ranges below the graph takes you to the books where this phrase was found.

Ngram Viewer

GoodFilms

See which movies are re-watched the most (x-axis) and which are the most critically acclaimed (y-axis) at Goodfilms. You can filter by where you can watch the movie (Netflix, Hulu, etc.) The bigger the dot, the more reviews it has. Hovering over a dot reveals which movie it is, along with a small graph of other movies that are similar in terms of being rewatched and critically acclaimed.

GoodFil.ms

Google's Earth Engine

See Google's EarthEngine showing a time-lapse sequence of satellite images since 1984 showing changes to the planets surface. There are some set locations such as Las Vegas and the man-made islands of Dubai that show drastic changes, but you can also zoom in on your own location.

Google Earth Engine Dubai

Live Flights Radar

At www.flightradar24.com you can see the pattern of flights in the air right now, along with details for each one.

Real-time flights radar
Music Timeline

Double-click on a genre to further zoom in.

The top part is:

Lines of Code

Educational Attainment interactive map

Educational Attainment

Hacking OKCupid to find true love.

Sources

    1. Steve Balmer, who used to be the chief executive at Microsoft, has put together a dataset that allows us to see where all government funding (local, state, federal) goes. See usafacts.org

  1. Project Gutenberg

  2. See the related project Cyberwill (modeled after Joe Zachary's Random Writer) for a PC executable and the data files to generate random text in the style of other texts.

  3. Hans Rosling's GapMinder.org

  4. Graph real-world data. See Rosling's TED talks. [Doesn't work with Google Chrome as of 11/3/13]

  1. IBM's Many Eyes

  2. Many datasets, many types of visualizations.

  3. The Google Big Picture group

  4. Government Data:

    1. City of Chicago public data portal

    2. Data.gov

    3. See the US Census Data

  5. Divvy Bikes usage data

  6. See uses of Big Data visualizations through the centuries, along with some modern equivalents

  7. Online book describing how to best visualize data.

  8. DataCounts! gives databases and online tools

See also these examples of how not to represent data:

    1. http://cas.illinoisstate.edu/jpda/charting_data/badcharts.shtml

References

  1. See Alex Pentland's (MIT) 20 min talk about the implications of Big Data.

  2. Latanya Sweeney's site at dataprivacylab.org/people/sweeney has great examples of this. See for instance aboutmyinfo.org where you can see how unique you are:

https://www.familytreenow.com/

Privacy


Latanya Sweeney's site at dataprivacylab.org/people/sweeney has great examples of how data affects privacy. See how unique you are (or not) at aboutmyinfo.org


On FamilyTreeNow.com type in your first and last name and state, and see if it can identify your relatives.


  1. See the April 6 2014 NY Times article on 9 potential problems with big data. For instance, "from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two."