Big data is defined as a collection of datasets so huge that the data becomes difficult to process using traditional techniques. The manipulation of big data differentiates statistical problems, which are based on small samples, from data science problems.
The focus in this section is on reducing data dimensions when the data has too many repetitions of the same information. You can view this reduction as a kind of information compression which is similar to compressing files on a hard disk in order to save space.