Computer files are resources that allow data to be recorded on any type of storage device in a computing system. Files are therefore critical for the operation of almost every software solution if the program needs to save information to be retrieved later.
There are two types of files: text files and binary files.
Text files store data as easily readable plain text, while binary files store data in binary form, such as with images and sound.
Binary files are not easily readable and are therefore more secure than text files. In Software Development, you will focus only on text files.
Text files can be opened using different modes, such as read, write and append.
A plain text (TXT) file is a file that contains characters of readable data. This data can only be read as character and string data types. Plain text files are commonly used for configuration settings or for storing small amounts of data in simple software programs. While plain text files that are stored in a computer system can be opened and read by a human, they are not typically designed for human readability. Instead, they are designed for fast processing and reading by computer programs. This means that a plain text file often lacks comments, headings and sub-headings that would make it more coherent for a human.
A particular type of text file is a delimiter-separated value text file, which is a text file where data values are separated by a programmer-selected character. This character is referred to as the delimiter. The most common delimiters used in delimited files are commas, tabs and colons. Delimited files allow for the storage of two-dimensional arrays in a structured, readable format. When a comma is used as a delimiter in a delimited file, the file is referred to as a comma-separated value file, or CSV file.
An Extensible Markup Language (XML) file is one that has been created using a set of rules for encoding the file into a format that can be read by both a human and a computer program. XML makes it easier to store and transport data within a system and between systems, as it is based on a set of standards and conforms to published conventions. XML was designed to be as self-descriptive as possible, which increases human readability. Following is a brief outline of the features of XML. More comprehensive documentation can be accessed via the weblink in the margin.
XML files contain a prolog, which is information that appears before the start of any data in the XML file. It includes information that applies to the XML file as a whole, such as the version of XML it uses and the character encoding of the data within it.
An XML file contains an XML tree, which is the set of elements contained within the file. The tree begins with a root element that is a parent element to child elements. These child elements are sub-elements of the root, but any element can contain sub-elements. This makes the structure of an XML file hierarchical, using the analogy of a family tree.
While elements and attributes are user-defined, some naming rules still apply. Elements are case-sensitive, must start with a letter or an underscore, cannot start with the letters ‘xml’ and cannot contain spaces. They can contain letters, numbers, hyphens, underscores and full stops.
The advantages of using an XML file over a plain text file are that XML is industry standard, widely used and cross-platform. It allows rules to be set and used on data in a way that text files cannot. XML also allows storage of data that does not rely on a user interface – the same data can be displayed in different formats and interfaces.