Bit - shorthand for binary digit and is either 0 or 1. Bits are grouped to represent abstractions, such as numbers, characters, colors, etc. The same sequence of bits may represent different types of data in different contexts.
Byte - 8 bits
Analog Data - Analog data have values that change smoothly - having continuous values. Uses continuous range of values to represent information.
Digital Data: Uses discrete values to represent information. At the lowest level all digital data are represented as sequences of bits.
Abstraction - the process of reducing complexity by focusing on the main idea. By hiding details irrelevant to the question at hand and bringing together related and useful details, abstraction reduces complexity and allows one to focus on the idea
Sampling technique - measuring values of the analog signal at regular intervals called samples.
Binary (base 2) - uses only combinations of the digits zero and one
Decimal (base 10) - uses only combinations of the digits 0 - 9
Data compression - can reduce the size (number of bits) of transmitted or stored data
Lossless compression - these algorithms can usually reduce the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data
Lossy compression - these algorithms can significantly reduce the number of bits stored or transmitted but only allow reconstruction of an approximation of the original data. Lossy data compression algorithms can usually reduce the number of bits stored or transmitted more than lossless compression algorithms
Information - the collection of facts and patterns extracted from data
Metadata - data about data. For example, the piece of data may be an image, while the metadata may include the date of creation or the file size of the image - used for finding, organizing, and managing information.
Cleaning data - a process that makes the data uniform without changing their meaning (e.g., replacing all equivalent abbreviations, spellings, and capitalizations with the same word)
Parallel systems - may be required to process large data sets