In this lesson, we will start handling data using some basic Python functions.
The easy method you can use to work with data is to load that into memory. We will use animal.txt file for input.
Not all data can fit easily in the memory at one time. Here, you can download individual pieces, making it possible to work with just part of the data without needing to wait for all the dataset to download. You can do this in following way:
When you take some data from a data source, sometimes you don't need all the records. You can save time by doing data sampling. This code shows how to get even-numbered record in the animal.txt file.
The value of n is important in determining which records appear as part of the dataset. Try changing n to 3. The output will change to sample just the header and rows 3 and 6.You can do random sampling as well by selecting randomly. The random() method outputs a value between 0 and 1. However, Python randomizes the output so that you don't know what value you receive.
The sample_size variable contains a number between 0 and 1 to determine the sample size. For example, 0.25 selects 25 percent of the items in the file.Do each number in the given file (Ex: for no.1, do it in ex_1_1.py)
Open the cities.csv and print the whole file.
Open the cities.csv again, but now, before printing the whole thing, print "Read Cities:" followed the contents of the file.
Using modulo operator (%), print every 10 cities.csv. In the beginning of line, print Cities Number {line_number} : then the contents of each line.
Do random sampling with 10% probability of the cities in cities.csv. Make the print output similar to no. 3. After that, find a way to save that sampling into a new csv files called sample_cities.csv