NAME
DBIx::TableHash - Tie a hash to a mysql table + SQL utils
SYNOPSIS
use DBIx::TableHash;
my $DBHash = DBIx::TableHash->create_or_die
(my $Params =
{
DBIDriver => 'mysql',
Database => 'mydatabase',
HostName => 'localhost',
Port => undef,
Login => '',
Password => '',
TableName => 'SalesPeople',
KeyField => 'FullName',
## For multi-key lookup:
FixedKeys => {AreaCode => 415,
StatusCode => 'Active',
RecordType => 'Primary'},
## To retrieve a single value:
ValueField => 'PhoneNumber',
## ... or for "multi-value" retrieval...
ValueField => undef,
## ... optionally specifying...
RetrieveFields => [qw(Title Territory Quota)],
## For caching:
CacheMode => 'CacheBeforeIterate'
## or...
CacheMode => 'CacheOneTime'
## or...
CacheMode => 'CacheNone'
}
);
my %DBHash; tie(%DBHash, 'DBIx::TableHash', $Params);
my $DBHash = DBIx::TableHash->create($Params) or die "Help!";
my $DBHash = DBIx::TableHash->create_or_die($Params);
my $DBHash = DBIx::TableHash->create_copy($Params) or die "Help!";
my $DBHash = DBIx::TableHash->create_copy_or_die($Params);
OVERVIEW
All parameters are passed via a single anonymous hash.
All parameters are optional, but you'll almost always need to specify
Database, TableName, and KeyField.
Omitting ValueField puts the hash in "multi-value" mode, where you
store/retrieve a hash of fields/values instead of a single value. In
"multi-value" mode all fields in each record are retrieved on every
fetch; RetrieveFields limits fields retrieved to a specified list.
Specifying FixedKeys puts the hash in "multi-key" mode, in which only a
subset of the database table, corresopnding to records that match the
spec in FixedKeys, is operated on.
Cache modes reduce querying, but lose synchronization and hog memory.
The object is designed to be easy subclass. Try making a subclass that
sets defaults for all or most of the parameters, so the caller doesn't
have to supply any at instantiation time.
"create_copy" methods efficiently create and return potentially huge
untied hash "snapshot" of the same data that would have been retrieved
by the corresponding tied hash.
DETAILS
The DBHash object is designed to tie a hash to a table or a subset of
records in a table in a DBI database (only tested with mysql in the
current version, but expected to work with any vendor).
If the table only has a single KeyField, which this modules assumes to
be a unique key field in the table, then the hash keys are stored and
retrieved in that field, and the values are saved in and returned from
the ValueField. Records are automatically created and deleted as the
hash is used like any other hash. (If the table is read-only, be sure
not to try to store into the tied hash!)
To access only a subset of the records in a table, you may specify hash
of "FixedKeys", which is a hash mapping OTHER field names to fixed
values that those fields must have for all lookups, updates, and inserts
done via the hash interface. This sets up a virtual hash corresponding
to a subset of the table where N key fields are fixed at given values
and a different key field may vary.
There are several ways to use this module.
Quick and dirty mode (single-key, single-value) using tie:
use DBIx::TableHash;
my %PhoneNumbers;
tie (%PhoneNumbers, 'DBIx::TableHash',
{Database => 'mydatabase',
TableName => 'SalesPeople',
KeyField => 'FullName',
ValueField => 'PhoneNumber'})
or die "Failed to connect to database";
Then you can use %PhoneNumbers like any hash mapping FullName to
PhoneNumber. Any retrieval of data results in corresponding SQL queries
being made to the database. Modifying the hash modifies the database.
Even quicker mode using create():
For convenience, you can use the create() class method to do the tying
for you. It creates an anonymous hash, ties it and, returns it. It takes
the same parameters as new() and tie().
use DBIx::TableHash;
my $PhoneNumbers = DBIx::TableHash->create(......)
or die "Failed to connect to database";
Quicker still using create_or_die():
use DBIx::TableHash;
my $PhoneNumbers = DBIx::TableHash->create_or_die(......);
create() carps and returns undef if it can't connect to the database.
create_or_die() croaks (dies) your program, with the same error message
as create() would have given.
Normally, create() will carp() (warn) you with an error message upon
failure, and return undef.
If you would have handled your error by saying "or die", and ar
comfortable with create's error message rather than your own, then
create_or_die() is for you.
Using one of the create() methods instead of new() and tie() works with
all of the different modes discussed below, and all parameters are the
same either way.
Cooler subclassing mode:
You can create a simple subclass that provides default parmas in an
initialize method so they don't have to be provided by the caller ...
### MyCompany/SalesPhoneHash.pm:
#!/usr/bin/perl
use strict;
package MyCompany::SalesPhoneHash;
use vars qw(@ISA);
use DBIx::TableHash;
@ISA = qw(DBIx::TableHash);
sub initialize
{
my $this = shift;
$this->{Database} ||= 'mydatabase'; ## Name of database to connect to.
$this->{TableName} ||= 'SalesPeople'; ## Table in which to store the data
$this->{KeyField} ||= 'FullName'; ## Name of the key field
$this->{ValueField} ||= 'PhoneNumber'; ## Name of the value field.
done:
return($this->SUPER::initialize());
}
1;
Then to use the object, your script merely does:
use MyCompany::SalesPhoneHash;
my %PhoneNumbers = MyCompany::SalesPhoneHash->create_or_die();
Of course, when instantiating a subclass, if you wish you can still
override any parameters you wish, as long as the initialize() method in
the subclass uses ||= rather than = to set defaults for any unspecified
parameters.
Multi-key mode:
You may also use the "FixedKeys" parameter to specify a hash of some
additional key fields and their fixed values that must match exactly for
any records that are retrieved, deleted, or created by the tied object,
effectively allowing the hash to operate on only a subset of the data in
the database. This is typically helpful in a multi-keyed table where,
for the purposes of your script, all key values should be fixed except
one (and that one is the hash key).
use DBIx::TableHash;
my $PhoneNumbers =
DBIx::TableHash->
create_or_die(
{Database => 'mydatabase',
TableName => 'SalesPeople',
KeyField => 'FullName',
ValueField => 'PhoneNumbers',
FixedKeys =>
{AreaCode => 415,
StatusCode => 'Active',
RecordType => 'Primary'}});
Multi-value mode:
If instead of getting and setting a single value, you'd like to get or
set a hash of all fields in the record, simply don't specify ValueField,
and the object will use "multi-value" mode, where an entire record, as a
hash, is gotten or set on each fetch or store. Feel free to combine this
mode with multi-key mode.
When storing a record in multi-value mode, if the record already exists,
only the specified fields are overwritten. If it did not already exist,
then only the specified fields will be written and the others will be
NULL or defaulted according to the table schema.
When storing a record in multi-value mode, you can't change the values
of the primary key field or any other key field specified in FixedKeys
(if any), since that would mess up the whole point of this module which
is to leave the main key and fixed keys fixed while mucking with the
other values in the record. Any changed values in key fields are simply
ignored.
use DBIx::TableHash;
my $SalesPeopleTable =
DBIx::TableHash->
create_or_die(
{Database => 'mydatabase',
TableName => 'SalesPeople',
KeyField => 'FullName'});
my $SalesPersonFullName = "Joe Jones";
my $EntireRecord = $SalesPeopleTable->{$SalesPersonFullName};
When fetching records in multi-value mode, you can limit the list of
returned fields to a subset of all available fields in case there might
be some very big ones that you don't want to waste bandwidth getting.
Just set the RetrieveFields parameter to an anonymous list of the fields
you care to retrieve. (This setting does not limit the fields you can
SET, just the ones that get retrieved.)
use DBIx::TableHash;
my $SalesPeopleTable =
DBIx::TableHash->
create_or_die(
{Database => 'mydatabase',
TableName => 'SalesPeople',
KeyField => 'FullName',
RetrieveFields=> [qw(Territory Quota)]});
Warning:
In multi-value mode, you might expect that this:
$Hash->{$MyKey}->{FieldX} = 'foo';
would set the value of the FieldX field in the appropriate record in the
database. IT DOES NOT.
This is because the anonymous hash returned by $Hash->{$MyKey} is not in
any way tied back to the database. You'd have to retrieve the record
hash, change any value in it, and then set $Hash->{$MyKey} back to it.
Making the above syntax work with a multi-valued tied hash to set a
value in the database is a possible future enhancement under
consideration by the author. Let me know if you would like to have that
work.
In the meanwhile, here's how you do could do it:
(my $Record = $Hash->{$MyKey})->{FieldX} = 'foo';
$Hash->{$MyKey}->{FieldX} = $Record;
WARNING: If you use the above approach to update a record in multi-value
mode, beware that there's potentially a race condition in the above code
if someone else updates the same record after you've copied it but
before you've modified and set it. So use this technique with caution
and understanding. If in doubt, don't use this module and instead use an
SQL query to update the record in a single transaction. Only you know
the usage patterns of your database, the concurrency issues, and the
criticality of errors.
Caching modes:
The object has several ways it can cache data to help minimize the
number of SQL queries to the database, at the expense of potentially
dramatically increased memory usage. The following cache parameters can
be specified to enable caching:
CacheMode => 'CacheBeforeIterate'
CacheMode => 'CacheOneTime'
To disable caching, specify:
CacheMode => 'CacheNone'
(You can also assign undef to CacheMode, but you'll get warnings.)
Normally, every time you fetch a value from the hash, it makes an SQL
query to the database. This, of course, is the intended and normal mode
of operation.
Unfortunately, in Perl, just calling values(%Hash) or each(%Hash) or
even copying the hash with {%Hash} results in a separate FETCH, and
consequently, a separate SQL query made by this module, for each item.
This could result in thousands of queries just to fetch all the values
from a thousand-item table in the database.
However, often you want to iterate over all the elements of a hash
without it having to go back to the database and issue another query for
each item that you retrieve.
Using the 'CacheBeforeIterate' mode, all keys and values are cached upon
each call to FIRSTKEYS (i.e. at the start of any iteration or
enumeration). Then, any subsequent calls to FETCH data from the hash
retrieve it from the cache instead of doing an SQL query. STORING or
DELETING any items from the hash results in them being stored and
deleted from both the database and the cache.
Using the CacheOneTime mode, the full cache is built at object
instantiation and time never fully rebuilt. In fact, its contents never
change unless you make alterations by using it to store into and/or
delete from the database.
CACHE WARNING: With both caching modes, of course, you must be
comfortable with the fact that the data being retrieved is a "snapshot"
of the database and consequently will not reflect updates done by other
parties during the lifetime of the object; it will only reflect updates
that you make by storing or deleting values from it. If other people are
using the database simultaneously, your cache and the actual data could
"drift" out of agreement. This is mainly dangerous to you, not others,
unless you then go make updates to the data based on potentially
outdated values.
All modes may be combined...
All modes and parameters are orthogonal, so any combination of
parameters may be specified, with the exception that the RetrieveFields
parameter is only meaningful when ValueField is not unspecified.
With subclassing, you may create objects that pre-specify any
parameters, even those that affect the major modes of operation. For
example, you may combine the subclassing technique and the multi-key
mode to make an object that accesses only the appropriate subset of a
multi-keyed table without requiring any parameters to be supplied by the
caller.
Getting a COPY of the data instead of a tied hash:
What if you just want a big copy -- a snapshot -- of the data in the
table, in a regular old hash that's no longer tied to the database at
all? (Memory constraints be damned!)
Just use the create_copy() or create_copy_or_die() methods. They work
just like create() and create_or_die(), but instead of returning a tied
object, they just return a potentially huge hash containing a copy of
all the data.
In other words:
create_copy() is equivalent to: {%{create() || {} }}
create_copy_or_die() is equivalent to: {%{create_or_die()}}
... but the _copy methods are more efficient because internally, a
caching mode is used to minimize the queries to the database and
generate the hash as efficiently as possible.
In all other respects, create_copy() and create_copy_or_die() perform
exactly like their non-copying namesakes, taking all the same
parameters, except CacheMode which is not relevant when making a static
copy.
Please remember that the object returned by the _copy methods is no
longer tied to the database.
PARAMETER SUMMARY
The full list of recognized parameters is:
DBI Parameters
Param Default Description
------------------------------------------------------------------------
DBIDriver 'mysql' Name of DBI driver to try to use (only
mysql has currently been tested by the
author).
HostName 'localhost' Host name containing the database and table;
Port undef Port number if different from the standard.
Login '' Login to use when connecting, if any.
Password '' Password to use when connecting, if any.
SQL Parameters
Param Default Description
------------------------------------------------------------------------
Database '' Name of database to connect to.
TableName '' Table to connect to.
KeyField '' Name of field in which lookup key is found.
ValueField '' Name of field to pull value from.
If empty or undef, then a
multi-value hash is used both for
saving and retrieving. This is
called "multi-value mode".
Module Parameters
Param Default Description
------------------------------------------------------------------------
FixedKeys {} If supplied, gives names and
fixed, hardcoded values that other
keys in the table must have; this
effectively limits the scope of
the tied hash from operating over
the entire table to operating over
just the subset of records that
match the values in FixedKeys.
This is called "multi-key mode".
RetrieveFields [] In multi-value mode, limits the
fields that are retrieved; default
is all fields in the record.
SUPPORT
I am unable to provide any technical support for this module. The whole
reason I had to make it was that I was way too busy (lazy?) to write all
that SQL code...
But you are encouraged to send patches, bug warnings, updates, thanks,
or suggestions for improvements to the author as listed below.
Just be aware that I may not have time to respond. Please be sure to put
the name of this module somewhere in the Subject line.
The code is a pretty simple tied hash implementation, so you're on your
own to debug it. If you're having trouble debugging via the "tie"
interface, try instantiating an object directly (or retrieving it when
you tie (see perltie)) and calling its methods individually. Use the
debugger or Data::Dumper to dump intermediate values at key points, or
whatever it takes. Use your database server logs if you want to see what
SQL code is getting generated. Or contribute a debugging mode to this
module which prints out or logs the SQL statements before executing
them.
BUGS/GOTCHAS
Problem: If you iterate or enumerate the hash, all keys get pulled in
from the database and stay stored in memory for the lifetime of the
object. FIRSTKEY, which is called every time you do a keys(), each() or
any full iteration or enumeration over the tied hash (such as copying
it) retrieves and hangs on to a full list of all keys in KeyField. If
the keys are long or there are lots of them, this could be a memory
problem. (Don't confuse this with CacheMode in which BOTH keys AND
values are stored in memory.)
Solutions:
1) Don't iterate or enumerate. Just fetch and store.
2) Only iterate or enumerate on short tables.
3) LValue or RValue hash slices should be safe to do.
INSTALLATION
Using CPAN module:
perl -MCPAN -e 'install DBIx::TableHash'
Or manually:
tar xzvf DBIx-TableHash*gz
cd DBIx-TableHash-?.??
perl Makefile.PL
make
make test
make install
SEE ALSO
The DBIx::TableHash home page:
http://christhorman.com/projects/perl/DBIx-TableHash/
The implementation in TableHash.pm.
The perlref and perltie manual pages.
The mysql home page:
http://mysql.com/
THANKS
Thanks to Mark Leighton Fisher <
[email protected]> for providing a patch
to fix -w support (change in standard "none" setting of CacheMode from
undef to CacheNone).
AUTHOR
Chris Thorman <
[email protected]>
Copyright (c) 1995-2002 Chris Thorman. All rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.