Polluted Data

  1. Your Current Personal Data Pool Is Polluted

  2. Survivorship Bias and Incorrect Data

  3. History Is Full of Racism, Sexism, and Terrible Things, Therefore Our Big Data Set is Racist, Sexist, and Terrible

  4. Click Bait Titles Appeal to Emotion

  5. Cherry Picking

  6. Nutpicking

  7. Media Consolidation

Your Current Personal Data Pool Is Polluted

Video: 10 Things You have Heard and Re-told but are Completely False - Neil deGrasse Tyson - Cosmology Today - Jun 14, 2016

https://en.wikipedia.org/wiki/List_of_common_misconceptions

  • Color of the sun is . . . ?

  • What goes up must come down?

  • The brightest star in the sky is called . . . ?

  • Days get longer in the summer and shorter in the winter?

  • Sun rises in the east and sets in the west?

  • Solar eclipses are rare?

  • A day is about 24 hours?

How Many Glasses of Water is a Person Supposed to Drink a Day?

  • Article: The Water Myth - McGill - Christopher Labos MD, MSc - May 31, 2018

When You Look At Data From A Point of View, There Will Be Distortion

How Big Is Greenland Compared to Africa?

Why is the Mercator projection a distortion of truth and how does it affect our understanding of the Earth?


Data/Charts?

https://www.youtube.com/watch?v=E91bGT9BjYk


Data Needs to Be Presented In A Human Context

A little, a good amount, enormous? Feel free to pick your own words.

  • How would you describe 0.5%?

  • How would you describe 5%

  • How would you describe 50%

  • How many is 50?

  • How many is 5 thousand of a thing?

  • How many is 5 million of a thing?

Image: -

In math, one always means one. Contextualized however, one can mean different things.

  • As a result of his foolishness, one cookie fell on the floor.

  • As a result of his foolishness, ten cookies fell on the floor.

  • As a result of his foolishness, all of the cookies fell on the floor.

  • As a result of his foolishness, all of the Oreos fell on the floor.

  • As a result of his foolishness, all of the home-baked cookies fell on the floor.

When you bring people into it, the context can be wildly different

  • Today, a Sacramento man died of the flu.

  • Today, your mother died of the flu.

  • Today, ten people in the United States died of the flu.

  • Today, ten people in California died of the flu.

  • Today, ten people in the Natomas community died of the flu.

  • Today, a thousand people in the United States died of the flu.

  • Today, a million people in the United States died of the flu.

  • Today, a thousand Natomas High Schools' worth of people died of the flu.

How we present the information can change how we consider it. A number without human context often gets dismissed as a cold emotionless data point. Sometimes to get attention and to get people to realize the impact of a number in human terms, you have to put the number in a different human context.

A single death is a tragedy; a million deaths is a statistic.--Uncertain author. This quote is attributed to many in several forms.


What Data You're Not Getting

Survivorship Bias

      • Survivorship Bias: https://www.youtube.com/watch?v=P9WFpVsRtQg

      • Autism Prevalence Unchanged in 20 Years (Steven Novella) There is no autism epidemic. The number of diagnoses has increased, but the evidence strongly suggests this is due to better diagnosis, changing definitions, and greater acceptance. A new study looked at autism prevalence around the world; it showed no change from 1990 to 2010.

      • http://www.usccb.org/issues-and-action/child-and-youth-protection/upload/2019-Annual-Report-Final.pdf Why are there so many new cases against the Catholic Church in 2019? There were 700-1400 allegations per year leading up to 2019, but 2019 has 4,434 allegations?

          • Page 27: "Compared to 2018, the number of allegations increased significantly. This is in part due to the additional allegations received as a result of lawsuits, compensation programs, and bankruptcies, mak-ing up approximately 37% of allegations. These programs allow those who have previously reported allegations as well as those who have not yet come forward, to be considered for some type of monetary compensation."

Always Think About What Data You're Not Getting Right and Why

  • Image: Teen Pregnancies - TruthFacts - Wolff and Morgenthaler

  • Article: This Is How 'False Positives' And 'False Negatives' Can Bias COVID-19 Testing - Ethan Siegel - May 7, 2020

      • Backup Copy: This Is How 'False Positives' And 'False Negatives' Can Bias COVID-19 Testing - Ethan Siegel - May 7, 2020

      • Disease Tests have False Positives and False Negatives

      • Lets say 1 in 10,000 have a disease. A disease test has a false positive test of 1%.

      • If you test 1,000,000 people, you will find 100 with the disease and 10,000 false positives. Health statistics will report that there are 10,100 cases of the disease in the population. 10,000 will be quarantined, drugged, poked, and injected and highly inconvenienced. The population may be scared and behave irrationally and buy incredible amounts of toilet paper with no explanation why.

      • False negatives are also a problem. If tests result in a no answer, but they actually had the disease, there are many people who will spread the disease by behaving normally. Even in tests that do not relate to diseases, this has consequences. If there are 6.2 million pregnancies a year in the United States and inexpensive home pregnancy tests have a false negative up to 5% of the time, that makes 310,000 women a year who are pregnant, think they are not, and may unknowingly damage their future child through casual alcohol and drug use.


History Is Full of Racism, Sexism, and Terrible Things, Therefore Our Big Data Set is Racist, Sexist, and Terrible

images.google.com How many out of the first 50 results are White? Persons of Color?

  1. Beautiful Woman

  2. Successful Woman

  3. Intelligent Woman

  4. Expressive Woman

  5. Woman

  6. Convicted Woman

  7. Beautiful Man

  8. Successful Man

  9. Intelligent Man

  10. Expressive Man

  11. Man

  12. Convicted Man

  13. Farmworkers, Farmers

  14. Teacher, Professor

Charts Reveal and Can Also Be Obscuring Data

https://www.youtube.com/watch?v=O-3Mlj3MQ_Q

Arithmetic Growth adds a fixed amount (+10, +10, +10)

Logarithmic Growth

Exponential Growth multiplies a fixed amount (x1.2, x1.2, x1.2...)

Click Bait Titles Appeal to Emotion

Today (6/30/2020) I loaded the YouTube home page and got eight recommendations. A couple of them were political. 5 of the 8 had a word in all capitalized letters. There were a couple screamed verbs ("FOOL", "REACT", and "HAMMERS"), and a couple of screamed noun phrases ("SUPER SNOWFLAKE", and "IMPOSSIBLE card trick").

I reload the page and get "POWERFUL EPIC DEBATE" and "UNCLEARED GLITCH" and I reload again and see "WORST TROLL LEVEL" and "LIARS." Again and again "WRONG", "DETAINED", "VERY", "SUPER TRIGGERED", "REAL REASONS", "THREATENED", "INSANE", "DESTROYS." A few titles are fully capitalized in every word.

Looking at the other titles, you see plenty of editorializing. Titles that are telling you what to think and how to think about the topic before you watch the video. Most brutal fails, Tyrant, Most terrifying, Dishonest. The emotionally charged words and the capitalizing are doing something to the reader.

Titles with capitals and charged words are appealing to your base self. Your emotional reactive self. Your instant gratification self. Your unconscious non-thinking self. The charged up words interrupt a normal flow of reading from left to right. A reader instead will see those one or two words before processing the sentence. The mind will see "------- ------ ------- ----- -- --- ------- EXPOSED!" before you process the topic or person involved. You are being provoked over and over again.

Good Titles Respect The Audience

      • A title that respects the audience does not have fully capitalized words other than Acronyms (FBI, CIA, USAF).

      • A title that respects an audience provides information without telling you what to think about it.

      • A respected audience member is not screamed at to make them listen.

      • A respected audience is treated like they are capable of making their own decisions.

      • Articles and videos and sources that use normal language and approach you informationally might not get more hits, but they are far more likely to be quality sources.

Now let's go to some places that are generally respectful with their titles. Here's where I tested and what I found today:

Now I'm going to a few places that are probably pretty bad at this:

      • www.infowars.com: 20 different articles on the home page. Every word of every title is fully capitalized.

      • www.breitbart.com: Over 20 different articles on the home page. Every word of every title is fully capitalized.

      • https://tyt.com: All of the section titles are fully capitalized. All of the shows they produce are fully capitalized. A surprising lack of titles. The site labels everything with the show's name: "THE YOUNG TURKS", "THE DAMAGE REPORT WITH JOHN IADROLA", AND "TYT INVESTIGATES." Prioritizing the medium over the content?

      • www.tmz.com: "BALTIMORE DISCRIMINATION MOTHER AND SON ANNOUNCE LAWSUIT After Denial Over Dress Code", "POOR JUDGEMENT", RAYSHARD BROOKS MURDER: JUDGE SETS $500 BOND FOR GARRETT ROLF... Brooks' Widow Gives Emotional Impact Statement."

Please keep in mind this is a focused analysis of source titles. This analysis is not addressing other forms of journalistic credibility, political slant, or quality of content of any of these sites. This is just one evaluation method you can use to avoid the worst sources.



Spin, Propaganda, and Polluted Data

Because we live in a human system, everything becomes political. There are political agents who will take data and see it their way. They will only present the parts they agree with. They will hide and delete information that goes against their stance. All of this is bad science. The truth is often lost in human power struggles.

Cherry Picking

Cherry picking is the strategy of collecting data from an experiment and then only using the data that matches what you are trying to prove. A true experiment should have a hypothesis, but a good scientist can not force the data to fit their hypothesis. A good scientist reports all processes and data so that the experiment can be peer reviewed. Cherry picked conclusions will not be reproducible.

    • Video: BEST TRICK SHOT EVER (LIGHTHOUSE to SHIP)!! - Excerpt starts at 7m40s - How Ridiculous - Jun 2, 2017

    • Video: We Spent 6 Days Attempting a 200m Basketball Shot in Lesotho, Africa - How Ridiculous - Apr 6, 2018

    • Example: Five surveys are ordered to be given in different parts of the country. The survey asks questions about if a politician is a good leader. Four out of five of the surveys are very negative. One of the surveys from the politician's home state comes back decently positive. The campaign then issues a press release using the one survey as evidence that people like the politician.

    • Example: Texas Sharpshooter is a special form of cherry picking when a tester is searching for patterns and trends. Imagine a sharpshooter firing at an unpainted side of a barn. He takes a bunch of shots. He then walks up to the wall and finds a group of three bullet holes that are very close together. The sharpshooter then paints an archery style target on the part of the wall with the close bullet holes. The shooter then poses next to the target and posts on social media how precise they are with their shots. None of the other shots are in the picture. “To be sure of hitting the target, shoot first, and call whatever you hit the target”

Nutpicking

Nutpicking is the strategy of finding the stupidest, craziest, most out there members of a group, and then using those nuts to represent everyone in that group. A claim made by nutpicking has negatively-focused polluted data using the strategy of cherry picking while also mixing in an ad hominem attack, mischaracterizing the people in the group.

Discovery Abuse: The Document Dump

Imagine asking a teacher a question, and the teacher replies by pointing at the library and saying "good luck." Imagine this scenario with no search engines, no indexes, no glossaries, no tables of contents, and no reference librarians. How does a person acquire truth while they are drowning in data?

  • "If you don't find it in the index, look very carefully through the entire catalogue.” - Sears, Roebuck Co. Catalogue: A Window to Turn-of-the-Century America - 1897

During a trial the plaintiffs and defendants are able to demand information from the other side. Often a side is ordered to give documents that they don't want to give because it would hurt their side of the case. The process of requesting and obtaining evidence from the other side is called discovery.

Example: A prosecution thinks the company's financial documents will reveal foul play and files with the court to receive them. The company does not want that information revealed, but fails to get the court to deny the subpoena. The company is now ordered to provide the documents, but still wants to hide things. The company realizes that humans are limited by time, and that lawyers and paralegals cost money. The company provides the full financial records and all related material to the request which is thousands upon thousands of documents and files. The document dump becomes a hunt for the needle in the haystack that the prosecution wants that will hopefully waste their time and resources.

Media Consolidation

https://en.wikipedia.org/wiki/Mayflower_doctrine 1934-1949

https://en.wikipedia.org/wiki/FCC_fairness_doctrine (1949-1987)


https://www.youtube.com/watch?v=hWLjYJ4BzvI Sinclair's script for stations


https://www.youtube.com/watch?v=x6U2Un5kEdI 11 Local TV Stations Pushed the Same Amazon-Scripted Segment

The package—you can view the script Amazon provided to news stations here—was produced by Amazon spokesperson Todd Walker. Only one station, Toledo ABC affiliate WTVG, acknowledged that Walker was an Amazon employee, not a news reporter, and noted that Amazon had supplied the video. Other stations that ran the Amazon-provided content as a news package include:

  • WTVJ-NBC, Miami, FL

  • WKRN-ABC, Nashville, TN

  • WLEX-NBC, Lexington, KY (ran twice)

  • WVVA-NBC, Bluefield, WV

  • WTVM-ABC, Columbus, GA (ran twice)

  • KMIR-NBC, Palm Springs, CA (ran three times)

  • WBTW-CBS, Myrtle Beach, SC

  • WOAY-ABC, Bluefield, WV (ran twice)

Video: MSNBC interrupts Congresswoman for report on Justin Bieber - Host Andrea Mitchell interrupts former Congresswoman Jane Harman (D-CA) to report breaking news regarding the arrest of popstar Justin Bieber. Aired on Andrea Mitchell Reports on MSNBC, 22 January 2014.