Welcome to A-level ICT
Big Data is the processing of data stored in volumes too large for traditional analytical tools.
In relation to data processing 'Big Data' can be described in terms of the following characteristics:
Volume;
Validity;
Variety;
Variability;
Complexity.
You also need to be aware of the impact of Big Data on the following:-
Data warehousing and data mining.
Detecting and preventing fraud.
Marketing campaigns.
Combining big data with predictive analysis.
VOLUME
Big data implies enormous volumes of data how much data there is.
Organisations collect data from a variety of sources, including business transactions, social media and information from sensors or machine-to-machine data.
• Data is generated by machines, devices and networks
• Data is generated by human interaction on many systems like social media
VARIETY
There are many sources and types of data both structured and unstructured.
Refers to the number of types of data. e.g. data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions
Data comes in many forms e.g. word processed documents, PDFs, spreadsheets databases, emails, photos, videos, audio, monitoring devices, etc.
This variety of unstructured data creates problems for storage, mining and analysing data
With financial transactions - a stock ticker is a report of the price for certain securities, updated continuously throughout the trading session by the various stock exchanges. A "tick" is any change in price, whether that movement is up or down. ... Many of today's fully electronic stock tickers display market data in real-time or with a small delay.
VALIDITY
Correctness of the data for use.
Initial data is likely to be very dirty, more important to see if there are links/relationships
Data then needs to be validated as it will be applied to an operational condition
Big data sources need to be valid if they’re going to be used for future research.
The quality of the data captured being can vary greatly. Accuracy of analysis depends on the accuracy/veracity of the source data.
Variability
In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks.
Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.
Velocity
Refers to the speed of data processing
Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.
Complexity
Today's data comes from multiple sources, which makes it difficult and complex to link, match, cleanse and transform data across systems.
However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.
Listen to the following podcast as it brings Data to life in the real world. It is with Brian Cox and is quite funny and explains how big data is used and turned into information/knowledge.
Please listen to the following podcast and answer the question above.