Need of Big Data Analytics
- Nowadays, data storage became cheaper and cheaper. So, every organisation is showing interest to store all types of data about customers including Transactional history, user browsing behavior, user logs etc.
- To process such a huge data we need a cutting-edge frame work.
What is Big data?
- Volume - Size of the data Ex. one Terabyte, 10 Terabyte
- Variety - Different formats of data Ex. .txt, .xml, .jpg, .mp4
- Velocity - Generation of data per unit time Ex: one terabyte per day
Structured Data
- Data that reside in a relational database in the form of tables.
- Structured data represents 10% of entire data
- Example: SQL Tables
Semi Structured Data
- Data that doesn't reside in a relational database
- Semi Structured data represents 10% of entire data
- Example: CSV, XML, JSON files
Unstructured Data
- Data that doesn't have proper format
- Unstructured data represents 80% of entire data
- Example: Text messages,images, videos, social media data,website content,
Big Data Analytics
- The process of converting large amounts of unstructured raw data, retrieved from different sources to a data product useful for organizations forms the core of Big Data Analytics.
- To make sense out of large voluminous data.
Data Scientist
- The role of a data scientist is normally associated with tasks such as predictive modeling, developing segmentation algorithms, recommender systems and often working with raw unstructured data.
- The nature of their work demands a deep understanding of mathematics, applied statistics and programming.
Here is a set of skills a data scientist normally need to have −
- Programming in a statistical package such as: R, Python, SAS, SPSS
- Able to clean, extract, and explore data from different sources
- Research, design, and implementation of statistical models
- Deep statistical, mathematical, and computer science knowledge