UF1: Persistència en fitxers

Resultats d'aprenentatge:

  1. Desenvolupa aplicacions que gestionen informació emmagatzemada en fitxers identificant el camp d’aplicació dels fitxers i utilitzant classes específiques.

I/O Overview

Input and Output - Source and Destination

The terms "input" and "output" can sometimes be a bit confusing. The input of one part of an application is often the output of another. Is an OutputStream a stream where output is written to, or output comes out from (for you to read)? After all, an InputStream outputs its data to the reading program, doesn't it? Personally, I found this a bit confusing back in the day when I first started out learning about Java IO.

In an attempt to clear out this possible confusion, I have tried to put some different names on input and output to try to link them conceptually to where the input comes from, and where the output goes.

Java's IO package mostly concerns itself with the reading of raw data from a source and writing of raw data to a destination. The most typical sources and destinations of data are these:

  • Files
  • Pipes
  • Network Connections
  • In-memory Buffers (e.g. arrays)
  • System.in, System.out, System.error

The diagram below illustrates the principle of a program reading data from a source and writing it to some destination:

Streams

IO Streams are a core concept in Java IO. A stream is a conceptually endless flow of data. You can either read from a stream or write to a stream. A stream is connected to a data source or a data destination. Streams in Java IO can be either byte based (reading and writing bytes) or character based (reading and writing characters).

The InputStream, OutputStream, Reader and Writer

A program that needs to read data from some source needs an InputStream or a Reader. A program that needs to write data to some destination needs an OutputStream or a Writer. This is also illustrated in the diagram below:

An InputStream or Reader is linked to a source of data. An OutputStream or Writer is linked to a destination of data.

Java IO Purposes and Features

Java IO contains many subclasses of the InputStream, OutputStream, Reader and Writer classes. The reason is, that all of these subclasses are addressing various different purposes. That is why there are so many different classes. The purposes addressed are summarized below:

  • File Access
  • Network Access
  • Internal Memory Buffer Access
  • Inter-Thread Communication (Pipes)
  • Buffering
  • Filtering
  • Parsing
  • Reading and Writing Text (Readers / Writers)
  • Reading and Writing Primitive Data (long, int etc.)
  • Reading and Writing Objects

These purposes are nice to know about when reading through the Java IO classes. They make it somewhat easier to understand what the classes are targeting.

Java IO Class Overview Table

Having discussed sources, destinations, input, output and the various IO purposes targeted by the Java IO classes, here is a table listing most (if not all) Java IO classes divided by input, output, being byte based or character based, and any more specific purpose they may be addressing, like buffering, parsing etc.

XML Parsing

Push vs Pull Parser


SAX Parser


Java SAX Parser

SAX is an abbreviation and means "Simple API for XML". A Java SAX XML parser is a stream oriented XML parser. It works by iterating over the XML and call certain methods on a "listener" object when it meets certain structural elements of the XML. For instance, it will call the listener object for the following "events":

- startDocument
- startElement
- characters
- comments
- processing instructions
- endElement
- endDocument

This list is probably not complete, but it is long enough to give you an idea of how it works. Let's move on to see how you create and use a Java SAX Parser.

SAXParserFactory factory = SAXParserFactory.newInstance();
try {

    InputStream    xmlInput  = new FileInputStream("theFile.xml");
    SAXParser      saxParser = factory.newSAXParser();

    DefaultHandler handler   = new SaxHandler();
    saxParser.parse(xmlInput, handler);

} catch (Throwable err) {
    err.printStackTrace ();
}

When you call the SAXParser.parse() method the SAX parser starts the XML processing. The xmlInput InputStream passed as parameter to the parse() method is where the XML is read from.

Notice the SaxHandler instance being created, and passed as parameter to the parse() method. The SaxHandler class is a subclass of the class org.xml.sax.helpers.DefaultHandler. The DefaultHandler class comes with the JDK.

While processing the XML the SAXParser calls methods in the DefaultHandler subclass (here, the SaxHandler) instance corresponding to what the parser finds in the XML file. To react to those method calls you override the corresponding methods in the DefaultHandler subclass. Here is an example:

public class SaxHandler extends DefaultHandler {

    public void startDocument() throws SAXException {
    }

    public void endDocument() throws SAXException {
    }

    public void startElement(String uri, String localName,
            String qName, Attributes attributes)
    throws SAXException {

    }

    public void endElement(String uri, String localName, String qName)
    throws SAXException {
    }

    public void characters(char ch[], int start, int length)
    throws SAXException {
    }

    public void ignorableWhitespace(char ch[], int start, int length)
    throws SAXException {
    }

}    

It is the responsibility of the DefaultHandler subclass to extract any necessary information from the XML via these methods. If you need to build an object graph based on an XML file, you will have to build that object graph inside the DefaultHandler subclass.