"data is the new oil of our new digital economy"
Uses of data
Data mining:
definition- discovering patterns and trends from large, unrefined data sets
puposes- to extract (or mine) knowledge from data
example- shoppee
collect personal data of customers (purchases, views price, frequency, shopping cart)
use code to mine the data
Data matching:
definition- comparing data from two or more distinct data sets
purpose- identifying a key link between two data sets
example- cancer diagnosis
record data on skin cancer patients from all over the world
find similarities in relevant data (time spent in sun, # of cigarettes smoked, etc)
suggest potential causes of skin cancer
Data brokers:
definition- a company that specializes in personal data collection (income, age, location, etc..)
these companies mine and match this data, and sell it as ‘knowledge’ to other companies
how do they collect this data?
public records (court filings, property assets,..)
from apps that collect our data
predictions - using an algorithm to guess your income based on address and purchase history
third party cookies
DATA BROKER CASE STUDY
inforgraphic
stakeholders
how was data collected, mined, or matched
whom was the data sold l
legality: were any laws broken
dicussion about values and ethics
Warm Up:
identify two stages of the data life cycle?
data creation
storage
distinguish between data matching and data mining
data mining is discover patterns and trends from a large number of unrefined data to gain knowledge.
for example: collected search history or relevant information are used by companies to find out users’ preferences
data matching is comparing two sets of data to find out a key link between them
for example: a company finds out a relation between a specific age group and their taste in music. people in their 40s tend to prefer ballad genre
explain how data brokers collect data
from apps that collect our data.
for example: business might buy our ‘information’ or ‘data’ from other companies
predictions
from example: tiktok collects what kind of content that we watch the most in order to suggest similar videos
Ways To Collect And Organize Data
primary data: original data created/collected for the first time for a specific purpose
secondary data: data that has already been created by another source
Data storage: The database
database: a collection of data, organized into tables by field names and records about a specific entity
primary key: a field that contains unique values
relational database: a database that connects multiple tables
The tables are linked by key fields. we can use an entity-relationship diagram to visualize Relational Databases
To ensure database accuracy, the following strategies are often used:
validation: database design that ensures only valid data is entered to minimize error during data entry
verification: to check that data is accurate and up to date (during the entry and after the data entry) to ensure accuracy of data and remove errors
example: email verification: clicking a verification link to ensure an active email has been selected before it is added to a database
Encryption: the process of converting readable data (plaintext) into unreadable characters (cipher text) to prevent unauthorized access to the data
How can we ensure that the intended recipient of the data is able to decrypt the message?
Decryption is possible thanks to a key that can ‘unlock’ (or decrypt) cipher text back into plain text
Data Masking
Definition: replacing (or masking) real data with fictional or unreadable during storage to protect data from the database
Example: a data broker masks all of its data during storage. Once the data needs to be used, the data broker uses its own key to unmask (decrypt) the data
Data Erasure
Definition: to permanent destruction of data to make sure that data must be unrecoverable
Example:
Physical method - putting a phone through a shredder
Software method - computer programs that overwrite data with 0 & 1
Data Deletion
Definition: sending files to the recycle bin to get rid of data that is no longer needed (it’s possible to recover them again)
Example: after year 11, you delete all of your IGCSE files that you no longer need
Block Chain
Definition: digital, decentralized recording of transactions to protect data transfers (digital transformation) from hacking , fraud or interference
Example: change in ownership of something
GDPR