collect
data engineer
prepare
data engineer
data scientist
use
explore
data scientist
analyse
data scientist
data analyst
communicate
data analyst
archive
data engineer
data analyst
destruct
data engineer
policy
trustworthy
communication
open
clear
up to date
control
consistence
standards
change management
standards
definition
process
operation model
structure
centralised
top down
gatekeeper on high, specialised hierarchy level
difficult to scale
data quantity risk
decentralised
bottom up
democratic on low , general hierarchy level
easily scalable
data quality risk
responsibility
data roles
data producer
data consumer
proper use
transparency
clear user roles
legal compliance
confidentiality
confidentiality level classification
access
limitation
purpose
scope
minimization
identify access management (IAM)
access points
access path
access window
access code
password
firewall
integrity
modification
write rights
encryption
hashing
backup
data privacy
anonymisation
use
user training
transparency
clear user roles
meeting standards
protection
risk level
classes
public data
internal data
confidential data
personal identifiable information (PII)
phone numbers
bank account number
top secret data
research
intelligence
social security number
encryption
audit
retention
destruction
monitoring
testing
contingient plan
response policy
internal vs external actors
communication
information channels to the customers
contingient plan
response policy
information channels to the company
threats
current
general
training
reporting channels from the company
whislteblower
access
timely
foresight
data quality ⇒ data trust ⇒ data value ⇒ business value
accurate – Does the data reflect the real word truly?
representing the truth
master data
"golden record" across systems
identify
match
merge
reference data
standardisation across systems
coded data (e.g. country codes)
currency
etc.
meta data
definitions
business glossary
describe business terms
data dictionary
field description
update timestamp
data owner
technical information
location
relationship
formats
data catalogue
organised inventory
find meta data
data marketplace
links business glossary and data dictionary
data lineage
data flow
How was data collected?
When?
Where?
By whom?
Which means?
consistent – Is the data the same across data instances?
the same across instances
over datasets
over time
valid – Does the data meet formats and standards?
business context
complete – Are all required fields populated with data?
dataset
record
data element
value population
topic
time
unique – Are there unwanted duplications in the data?
distorting erroneous duplicates
timely – Is the data punctually available?
on time
data profile
statistics on the data
expected results
data quality rules
non-crucial errors with little impact
detective rules
find the source issue
does not block load of faulty data downstream
business process run with slight errors
crucial errors with big impact
preventive rules
blocks load if faukty data downstream
business process is blocked
data quality KPI
data quality alert thresholds
urgency
critical
impact
priority
consequnece
inform
alert
stop
machine learning
data anomaly detection
data drift
collect data and store data
data engineer
data pipelines
data lake
darta catalogue
owener, use, update
database
datawarehouse
big data
volume, variety, velocity, veracity, value
parallel computing
prepare data
data pipelines
data lake
data warehouse
database
data engineer
dta scientist
explore data
data scientist
machine learning scientist
AI scientist
data analyst
analyse data and experiment on data
data scientist
data analyst
communicate results
data analyst
data engineer
ETL extract, transform, load
gather data and store data
data pipelines
data lake
database
big data
volume, variety, velocity, veracity, value
prepare data
build data bases
data scientist
prepare data
use databases
explore data
machine learning scientist
AI scientist
analyse data and experiment on data
data analyst
explore data
analyse data and experiment on data
communicate results
date drive decision making
critical analysis
identify opportunities
measure success
data literacy
experimentation on data
test
learn
adapt
improve operatons
archieve goals
reading data
relevance
limits
bias
sources
analysing data
insights
critical thinking
communcating data
visualisation
story telling