File Types and Finding Data
Welcome to Part Six: File Types and Finding Data!
In this sub-module you will learn about:
The difference between open and subscription data
Data types you may encounter when you work with data
How to work with ZIP files
To complete this section:
Read "Finding Data in the Wild" about subscription data and open data
Watch the video about .csv and .xls file types
Read about .zip files
Finding Data in the Wild
There are many data sources available on a variety of topics! Finding data can be overwhelming, so the Library is here to help. You can get started with our “Finding Data” web page.
Subscription data
Subscription data is curated data that you must pay for. They are often focused on a particular subject area.
Libraries often provide their patrons access to these types of databases for educational and research purposes. NC State Librarians can help with this, check with us before you think about paying for access to data!
Some popular examples:
Open data
Open data portals/repositories provide access to data to the public for free. Open data portals are often run by government agencies, nonprofit organizations, and academic institutions.
Some popular examples:
Many cities run their own open data portal, as do many local and federal agencies:
File Types
There are many different file types where data is stored. It often depends on how the data is captured (i.e. what kind of software did people use? What data collection process did they employ?)
Here is an example of a type of data format you may be familiar with -- tabular data. Tabular data consists of rows and columns. A clean, formatted tabular dataset has:
Rows with unique observations
A header in the first row that tells you what is in that column (i.e. column E in the image below contains State abbreviations.)
.csv (comma separated value) and .xls (Microsoft Excel files)
.xls files can use formatting (fonts, colors, etc.) and multiple worksheets.
.csv files cannot contain multiple worksheets or formatting.
.zip files
Zipped (compressed) files take up less storage space and can be transferred to other computers more quickly than uncompressed files. Because data files can be very large and take up a lot of space, you may find .zip files a lot when searching for data.
Unzip a file on a PC
Click on the zipped folder
Drag the file or folder from the zipped folder to a new location
OR
Right-click the folder
Select Extract All
Unzip a file on a Mac
Double click the zipped file.
If you are opening or uploading it onto a tool, make sure you select the unzipped file or folder.
.json, .xml, oh my...
Not all data is tabular, data can also look like this .json file that lists all the characters in Star Wars, like this .xml file about 76 influential American cookbooks, or these files of song lyrics.
There are a lot of other types of data formats, such as .json and .xml. This can be intimidating, but have no fear -- DVS is here!
The Data and Visualization Services department at the NCSU Libraries can help you find data, figure out how to access it if it's in an unfamiliar format, and get started analyzing it with a variety of tools. Contact us if you have any questions!
More Resources
Other fun data sources include websites like:
CORGIS: The Collection of Really Great, Interesting, Situated Datasets
Data is Plural Archive: A weekly newsletter of useful/curious datasets
LINC: Log into North Carolina, a data portal operated by the North Carolina Office of State Budget and Management (OSBM)
Kaggle: User-submitted datasets on a variety of topics