documentation

Web console

MIXED can be used by interested users through a web console. This interface admits conversion of individual files. It is meant for those who want to investigate the capabilities of MIXED without setting up a local MIXED and without talking to MIXED through a web service. We do not yet guarantee an uninterrupted service.

Architecture

MIXED consists of a framework plus plugins. Plugins take care of the conversions between application file formats and application independent xml formats.
We intend to publish the Framework code as Open Source, but have not yet done so.
We intend to publish all converters as Open Source, but have done so only for a few convertors.
The libraries that understand the dBase formats and DataPerfect formats are on SourceForge.

Standard Data Formats for Preservation

The XML format tabular data consists of two formats, one for spreadsheets and one for databases.
Both formats are described in SDFP04.zip and documented in Standard Data Formats for Preservation 0.3.doc . (see attachments).
However, this specification does not yet reflect the change we have made by adopting the SIARD format for databases.
All SDFP formats are now set up as zip files with folders for metadata and folders for content. All files that contain data and metadata are in appropriate XML formats. This structure fits with the SIARD approach as well with the ODF approach. Consequently, the SDFP format for spreadsheets is ODF and the one for databases is SIARD.

Whereas SDFP expresses the structure of fields and records, rows and columns by means of XML schemas, it is also important to specify the format of the data that goes into the cells / fields, especially when the data concerned consists of numbers, date/times, character strings. These are the elementary data types.  

We use ISO 8601 for date and time representations, and ISO 6093 for number representations. The MIXED conversions that read SDFP will accept representations for dates, times and numbers according to these standards. The MIXED conversions that write SDFP will write representations according to these standards, but in this case not all representations that the standards allow will be produced. MIXED is stricter, because it does not use representations that rely in some way on the context. The details are in Standard Data Formats for Preservation 0_3 - date-time.doc and Standard Data Formats for Preservation 0_3 - numbers.doc  (see attachments). These documents are to be integrated with the SDFP documentation.

Statistical data is a form of tabular data that is not in the scope of MIXED. But MIXED is already prepared to deal with it. In SDFP we use DDI2 as a sub-schema to deal with statistical data. In How much DDI does one need for preservation purposes.pdf we explain why we use DDI's version 2 and not (yet) 3.

Limitations

There are some limitations concerning the Mixed conversions.  They are explained in the document  ‘Limitations in Mixed conversions.doc’. This document also outlines the data types used in the various transformations.

Deployment

MIXED is currently being deployed at DANS. In the process we encounter issues with respect to the conversions and with respect to the general usability of MIXED as an archival tool. Only through this process of putting MIXED into practice we can hope to turn MIXED into a useful and dependable tool for other repositories.
Ċ
Dirk Roorda,
22 Oct 2010, 05:49
ĉ
Unknown user,
16 Nov 2011, 04:38
ċ
SDFP04.zip
(5k)
Dirk Roorda,
22 Oct 2010, 05:34
ĉ
Dirk Roorda,
22 Oct 2010, 05:36
ĉ
Dirk Roorda,
22 Oct 2010, 05:47
ĉ
Dirk Roorda,
22 Oct 2010, 05:47
Comments