Files

Basic file handling

For the majority of files of a particular format, pDynamo defines a specific FileReader class if it can read the format, and a specific FileWriter class if it can write the format. There are some exceptions, including formats that allow simultaneous reading and writing, and those which employ Python's pickle and PyYAML modules.

To avoid having to use directly separate readers and writers, pDynamo employs a series of Import and Export functions that can handle all the formats transparently as long as unique file extensions for each format are used. If they are not, then the format can be explicitly specified via the functions' format keyword arguments.

Each of the Import and Export functions handles a specific type of object. Thus, for example, ExportSystem exports a System instance, whereas ImportCoordinates3 imports a set of coordinates and returns them as a Coordinates3 instance.

Many file formats can be handled and more are being added all the time. Full lists of those that are currently implemented, together with their recognized file extensions and the types of objects that they import or export, can be obtained by invoking the functions ImportOptions and ExportOptions from the pBabel package. An illustration of their use may be found in one of the pBabel examples, ImportExportSystems, in the pDynamo distribution.

In general, most file-handling in pDynamo is straightforward, but there are extra subtleties when handling certain types of file, which are detailed below. These types include those that are responsible for pDynamo output and serialization, and the various trajectory types.

Log files

All output in pDynamo is performed using instances of subclasses of the LogFileWriter class. No explicit Python print statements are ever used (although this does not mean that users cannot use them in their own scripts). In general, any methods or functions that do some "printing" take a keyword argument, log, that defines the log file writer that is to be used for output. If log is not given, it normally defaults to a globally defined writer called logFile which prints to standard output (for example, the terminal or Python shell). If log is set to None then all printing is suppressed.

logFile is an instance of the TextLogFileWriter class. However, writer classes that produce output in other formats also exist. Currently the only other one in pDynamo3 is XHTMLLogFileWriter, although others, including LaTeXLogFileWriter (very useful when writing papers with tables!), were available in previous versions of pDynamo and may be reincorporated in the future.

Log file writers have a large number of methods that permit output in various styles, in addition to plain text. These include headings, paragraphs, separators, summaries and tables, all of which are widely used in pDynamo's example scripts and code.

Using custom log file writers is straightforward. They are created as follows:

# . A text writer to a specific file.

textWriterWithPath = TextLogFileWriter.WithOptions ( path = "textPath" )

# . An XHTML writer to standard output with a page title.

xhtmlWriter = XHTMLLogFileWriter.WithOptions ( title = "XHTML Page" )

# . An XHTML writer to a specific file without a page title.

xhtmlWriterWithPath = XHTMLLogFileWriter.WithOptions ( path = "xhtmlPath" )

Once they exist writer instances must be passed to all methods or functions that print using their log arguments. E.g.:

# . Print energy output to a text file.

system.Energy ( log = textWriterWithPath )

The ability to tune printing using different log file writers is very useful in a wide range of circumstances. For example, to separate "essential" and "non-essential" (verbose) output using two writers (or one writer and None!), or in multiprocessing where it is often necessary to assign a unique writer to each process.

As a final remark, it is a good habit to start and stop log file output by invoking a writer's Header and Footer methods, respectively, as these print out various timing information about execution of the script. Most of pDynamo's examples conform to this convention by using:

from pCore import logFile

logFile.Header ( title = "My pDynamo3 Example" )

... various statements ...

logFile.Footer ( )

Serialization

Serialization is the process of converting an object into a form that can be stored and reconstructed later as needed. It is a very handy technique especially for complicated objects, such as simulation systems, which can be difficult to recreate from scratch.

In pDynamo, the principal way of serializing and deserializing an object is to use Python's pickle and PyYAML modules. Import and export operations with these formats are available via pDynamo's Import and Export functions (with the file extensions pkl and yaml, respectively), but it is also possible to use pDynamo's serialization functions, Pickle, Unpickle, YAMLPickle and YAMLUnpickle, directly. In general, pkl format is to be preferred for serialization, although yaml is sometimes useful as it has a more human-readable form.

Although useful, serialization depends intimately on an object's internal structure. This means that in a rapidly-evolving program, such as pDynamo, objects serialized with an early version of the program may no longer be deserializable by later ones. This is unlikely to change, so users are always advised to preserve the scripts and protocols that they employed to generate particular objects and, where appropriate, backup data to files whose formats are unlikely to change.

A useful tip when serializing and then deserializing objects is to include extra verification data that can be used to check if the deserialization process has succeeded. As a simplified example, consider the serialization of a simulation system:

from pCore import Pickle

... set up system ...

energy = system.Energy ( )

# . Serialize the system together with its energy.

Pickle ( "system.pkl", ( system, energy ) )

And now its deserialization:

from pCore import Unpickle

( system, referenceEnergy ) = Unpickle ( "system.pkl" )

energy = system.Energy ( )

if math.fabs ( energy - referenceEnergy ) > 1.0e-5:

raise Exception ( "Invalid deserialization energy." )

To conclude this section, it is worth noting that most of pDynamo's parameter files are in yaml format. However, these are not strictly serialized files as they have special import and export functions, and human-readable formats that have been designed to be independent of pDynamo's object structure (or, indeed, any object structure apart from the basic objects defined by the YAML standard). They should, therefore, change minimally between pDynamo versions.

Trajectories

Trajectory files hold specific types of data in sequences that are distinguished by some sort of parameter, such as distance (reaction paths), step number (geometry optimization and Monte Carlo) or time (molecular dynamics). pDynamo has trajectories for a variety of data types, including coordinates and crystal parameters, internal coordinates, and restraints.

The specific functions for reading and writing all types of trajectory are ImportTrajectory and ExportTrajectory, respectively. However, in contrast to the other Import and Export functions, which return data objects, these functions return trajectory instances that are activated for reading and writing. These instances can then either be passed to the various pDynamo functions and methods that accept trajectory arguments or used directly. Illustrations of both these cases can be found in Example16 and Example17 in pDynamo's examples/book subdirectory.

To facilitate common manipulations and analyses, pDynamo also has a collection of trajectory utilities in its pSimulation package. There are general functions, such as Duplicate, which duplicates a trajectory, and FromCoordinateFiles and ToCoordinateFiles, which interconvert between a coordinate trajectory and a collection of separate coordinate files, and functions for more specific analyses, such as CovarianceMatrix, RadialDistributionFunction and SelfDiffusionFunction. For convenience, all these functions take the trajectory path names (and optional format arguments) rather than already activated trajectory instances.

Page updated

Google Sites

Report abuse