Analysing (A)
Questions may cover: characteristics of big data (volume, variety, velocity, etc.), generation, analysis, representation (bias and display).
Big data comes from myriad sources -- some examples are transaction processing systems, customer databases, documents, emails, medical records, internet clickstream logs, mobile apps and social networks.
3 Types of Big Data:
Structured Data.
Unstructured Data.
Semi-Structured Data.
Key aspects of BIG DATA
characteristics of big data (volume, variety, velocity, etc.)
generation
Relevant algorithms or other mechanisms behind BIG DATA
Algorithms
association rule of learning → Apriori
network analysis → PageRank algorithm used by Google
Sentiment Analysis ("opinion mining") → eg. newspapers
Technique
Parallel Computing → multi-core computer processing, distributed computing, and clustered computing
Visualisation → used to help us find patterns, trends or correlations
How BIG DATA is used, is implemented, or occurs, giving examples
Key problems or issues related to BIG DATA and how these have been or may be addressed
Big Data can range from terabytes (TB) to petabytes (PB), exabytes (EB), and beyond. Just to give you an idea:
Terabyte (TB): Roughly 1 trillion bytes.
Petabyte (PB): Roughly 1,000 terabytes.
Exabyte (EB): Roughly 1,000 petabytes.
The size of Big Data is ever-increasing due to the proliferation of digital devices, the growth of the internet, and the increasing amount of data generated by businesses, scientific research, and more. The challenge with Big Data is not just its size but also its complexity and the need to process, analyze, and extract meaningful insights from it. This requires specialized tools, technologies, and approaches to handle the unique challenges posed by Big Data.
GENERATION
Process of creating or collecting large volumes of data over time.
Data can come from various sources like sensors, social media, IoT devices, etc.
Emphasizes continuous and ongoing nature of data creation.
Driven by advancements in technology, leading to increased data generation rates.
Requires advanced storage, processing, and analysis techniques.
Gives rise to the concept of big data due to the scale and complexity of datasets.
Involves extracting insights and value from generated data.
Utilizes distributed computing, cloud technology, and advanced analytics for processing.