Config Loader

Navigation

Recent site activity

Proposed functionality - Things we need to handle / consider

Initially, I'll consider only files as data sources. The DBI/DBM/ENV etc need to be considered separately.

Single / Multiple data files

  • load one file
  • load a tree of files given one directory
  • load a list with any combination  of the above

Level at which to load

The contents of a file should be inserted into the %config hash at:
  • top level :
    @config{@keys} = @loaded{@values}
  • top level with filename as key :
    $config{$filename} = \%loaded
  • with the directory path as the insert point :
    $config{$dir1}{$dir2}{$dir3} = \%loaded
  • with the directory and file path as the insert point :
    $config{$dir1}{$dir2}{$dir3}{$filename} = \%loaded

In these situations, $filename would be the base filename, without the extension - the extension is used to indicate the file format, and should be discarded once the data has been loaded.

Whether to load

For each file that we encounter, should we load it? This allows us to load the files relevant to this environment and the role of this application. For instance:
  • dev  vs production
  • cron daemon vs web server

How to load

If we encounter existing keys, do we:

This is intended as a mechanism to locally override just single keys, eg the DB host, username and password

Merge style

While overriding single keys in a hash is easy to represent,  overriding the contents of an array is considerably more complicated.  There is no single good way to represent this. See Advanced Usage in the Config::Merge docs for my attempt to provide this functionality. It is poor, confusing, and fragile, but I wanted to provide a mechanism for doing this.

Why is this required? For example, I have a cron daemon with a list of jobs to run - in my dev environment, I only want to run two of the twenty jobs - how do I remove the others from the list? Also, I want to change the frequency with which these jobs run.  And I want to do all of this without editing the config files which are used for production.

Format handlers

Which config module is used for each file? I like Config::Any's approach of using the file extension to represent the different file formats, so:
  • .yaml -> YAML::Syck / YAML
  • .xml -> XML::Simple

Of course, we will provide the ability to override these mappings.

What about passing specific options to a loader, for instance $YAML::SYCK::LoadCode which, unfortunately, is a class variable. The user should be able to specify this as a default option for this module, and also on a per-file basis, and not have these options alter other code (eg in a mod_perl environment, where the user is running several applications). So the initial state should be stored and reset after the module is used.

Load order

What order do we use for loading files/directories (see Config Tree Layout in Config::Merge):
  • depth first
  • breadth first
  • a basic alphanum sort of files in a dir (was a request for C::P::ConfigLoader)
  • if a directory and a file of the same name exists, which gets loaded first
  • local override files should be loaded last (at their specified depth)

Representation of level in config tree

In various situations, we need to represent the current level in the config tree, eg
  • for making decisions about whether to load
  • for specifying keys to override
  • for displaying the level at which an error occurs

This level is not just a list of directories and files, but could descend into structures within files. These structures could be hashes or arrays, so using slashes could be counter-intuitive.

What I've used in Config::Merge is a Template Toolkit style 'dot', eg 'main.db.host'. The same approach works for arrays : 'main.db.host.1.password' (distinguishing between element number 1 of an array and a hash key named '1' should be possible based on the context). Key names which include dots could be escaped 'main.db\.default.host'. How do we represent the config root? An empty string, or a single '.' ?

I realise that this is controversial, so would welcome discussion.

Post processing

We need to allow post-processing of data, for instance:
  • converting 'paths.images'  => 'BASEDIR/images' to an absolute path
  • "compiling" human readable data to a more optimal form for use in the application
  • eval'ing text into anonymous subs

Should be able to specify:
  • a list of keys to post process
  • recursive post-processing with:
    • a starting point in the hierarchy
    • regex matching
    • descend into hashes only, or arrays as well
    • callbacks to match keys to post process
  • multiple post processors

Creating customised Config::Loader objects

We want to:
  • allow Config::Loader to function in multiple applications simultaneously
  • allow multiple Config::Loader objects within a single application
  • each Config::Loader object should allow different behaviour

Which means that plugins should be incorporated per object, not per class.

Matt Trout suggested that the best way to handle this would be with pipelines. So we have a number of stages for processing, and users can push handlers onto a stack in order. Each handler for a stage will call the next handler. This makes it easy to extend via plugins and custom handlers.

Load on demand

There could be large chunks of the config tree that are used rarely, in which case we only want to load them on demand. This could be implemented with a TIE.

DBM::Deep / DBI

DBM::Deep and DBI could be useful adjuncts to loading config data.

My initial concept was to have (at the relevant point in the directory tree) a file with (eg) the DB/username/password/SELECT statement. However, I think it would probably make more sense to specify these details in the code which creates the new Config::Loader instance. The DB/username/password could be inheritable, so after specifying it the first time, we would just need the relevant SELECT statement, the level of the hierarchy where we'd like to insert it, plus any conversion that we need to go from rows to (eg) a hash structure.

The username / password would need to be retrievable from the config data that we have already loaded.

ENV and user preference files

At some stage during the loading of config data, we should allow the merging in of user preference files and environment variables, with default values taken from the preloaded config.

Accessing the configuration hash

Once your config data is loaded, how do you get access to it.  Two styles I can think of:
  • we return a ref to the config hash and the user passes it around their application as they see fit - for apps which required multiple different config hashes, this is really the only option
  • we create a config class and store a singleton in it, so the user just needs to use My::Config::Class; and a method would be imported into their namespace which would give them access to their singleton. Or they can access it with $config=My::Config::Class->config();

Reloading config and invalidating existing data

Given the scenario where a user loads their config, then stores some of the data in their own variables (eg my $db_config = $config->{db}{host}[0] ), it'd be nice to invalidate these references when the config data is reloaded.  This is probably not possible in an automated way (ETOOMUCHMAGIC). We "could" provide an interface for invalidating such references, which would allow the user to write
if (! $db_config ) {
$db_config = $config->{db}{host}[0];
$config->register_for_invalidation(\$db_config); # or declare this method with a prototype?
}
...but perhaps this is overkill.