NAME

NAME
Set::Files - routines to work with files, each definining a single set

SYNOPSIS
use Set::Files;
$Version = $Set::Files::VERSION;

$obj = new Set::Files(OPT => VAL, OPT => VAL, ...);

@set = $obj->list_sets( [TYPE] );

@uid = $obj->owner;
$uid = $obj->owner(SET);

@set = $obj->owned_by(UID [,TYPE]);

@ele = $obj->members(SET);

$flag = $obj->is_member(SET, ELE);

@type = $obj->list_types( [SET] );

@dir = $obj->dir;
$dir = $obj->dir(SET);

%opts = $obj->opts(SET);
$val = $obj->opts(SET,VAR);

$obj->cache;

$num = $obj->add (SET, FORCE, COMMIT, ELE1,ELE2,...);
$num = $obj->remove(SET, ELE1,ELE2,...);

$obj->commit(SET1,SET2,...);

$obj->delete(SET);
$obj->delete(SET,1);

DESCRIPTION
This is a module for working with simple sets of elements where each set
is defined in a separate file (one file for each set to be defined).

The advantages of putting each set in a separate file are:

Set managment can be delegated
If all sets are defined in a single file, management of all sets
must be done by a single user, or by using a suid program. By
putting each set in a separate file, different files can be owned by
different users so management of different sets can be delegated.

Set files are a simple format
Because a file consists of a single set only, there is no need to
have a complex file format which has to be parsed to get information
about the set. As a result, set files can easily be autogenerated or
edited with any simple text editor, and errors are less likely to be
introduced into the file.

The disadvantages are:

Permissions problems
Some applications may need to read all of the data, but since the
different set files may be owned by different people, permissions
may get set such that not all set files are readable.

Applications which actually gather all of the data will need to be
run as root in order to be reliable. Alternately, some means of
enforcing the appropriate permissions needs to be in place.

No central data location
Usually, when you want to define sets, the data ultimately needs to
be stored in one central location (which might be a single file or
database).

To get around this, a wrapper must be written using this module to
copy the data to the central location.

Simple elements only
Many types of sets have elements which have attributes (for example,
a ranking within the set or some other attribute). When you start
adding attributes, you need a more complex file structure in order
to store this information, so that type of set is not addressed with
this module. The only attribute that an element has is membership in
the set.

Slow data access
Because the data is spread out over several files, each of which
must be parsed, and any error checking done, accessing the data can
be significantly slower than if the data were stored in a central
location.

Features of this module include:

Data caching
This module provides routines for caching the information from all
the set files. This can be used to avoid the permissions problems
(allowing user run applications access to all cached data) and
decrease access time (no parsing is left, and error checking can be
done prior to caching the information).

This still requires that a privileged user or suid script be used to
update the cache.

Multiple type of sets
Often, it is conveniant to define different types of sets using a
single set of files as there may be considerable overlap between the
sets of different types.

For example, it might be useful to create files containing sets of
users who belong to different committees in a department. Also,
there might be sets of users who belong to various departmental
mailing lists. One solution is to have two different directories,
one with set files with lists of users on the various committees;
one with set files with lists of users on each mailing list. Since
there might be overlap between these groups, it might be nice to
have the two sets of files overlap. For example, some committees may
want to have a mailing list associated with the group, others don't
want a mailing list, and there may be mailing lists not associated
with a committee.

This allows you to have a single file for each set of users, but
some sets will have mailing lists, some will be committees, and some
will be both.

Set ownership
Since the different files may be owned by different people,
operations based on set ownership can be done.

METHODS
The following methods are available:

VERSION
use Set::Files;
$Version=$Set::Files::VERSION;

Check the module version.

new
$obj = new Set::Files(OPT => VAL, OPT => VAL, ...);

This creates a new Set::Files object which reads the appropriate set
files (or a cache of the information in set files). The
initialization options available are described below.

list_sets
@set = $obj->list_sets( [TYPE] );

Returns a list of all defined sets or the sets of the specified
type.

owner
@uid = $obj->owner;
$uid = $obj->owner(SET);

Lists all UIDs who own a set, or the owner of the specified set.

owned_by
@set = $obj->owned_by(UID [,TYPE]);

Lists all sets owned by the specified UID (or those of a specific
type).

members
@ele = $obj->members(SET);

Lists all elements in the specified set.

is_member
$flag = $obj->is_member(SET, ELE);

Returns 1 if ELE is a member of SET.

list_types
@type = $obj->list_types( [SET] );

A list of all types defined, or the types that the specified set
belong to.

dir
@dir = $obj->dir;
$dir = $obj->dir(SET);

All directories containing set files, or the directory containing
the file of the specified set.

opts
%opts = $obj->opts(SET);
$val = $obj->opts(SET,VAR);

Returns a hash of all options set for a set, or the value of a
specific option. If the specific option is not set, 0 is returned.

delete
$obj->delete($set);
$obj->delete($set,1);

This removes the specified set file. By default, it renames the set
file to .set_files.$set (which are ignored when reading in set
data). If the optional second argument is passed in, no backup is
made (i.e. the set file is deleted completely).

This method is only available to those who have write access to the
directory containing the set file.

cache
$obj->cache;

This dumps the current set information to a cache file. This method
is only valid if the data was read in from files. If it was read in
from the cache, this method will fail.

add, remove
$num = $obj->add (SET, FORCE, COMMIT, ELE1,ELE2,...);
$num = $obj->remove(SET, FORCE, COMMIT, ELE1,ELE2,...);

These functions add/remove the specified elements to/from the set.

When adding elements to a set, it is first checked to see if the
element is already in the set, and if so, whether it is explicitely
excluded in the set file, or comes from some other set file via. an
INCLUDE tag.

If the element is not in the set, it is added. If the FORCE flag is
true, the element will be added to the set file explicitly if it is
already in the set, but only via. an INCLUDE tag. In either case,
any OMIT tag which removes this element will be removed from the
list.

When removing elements from a set, a similar set of tests are done.
If the element is in the set, it is removed from the file (if it
appears in the file) AND a OMIT tag is included. If the element does
NOT appear in the set, the file is unmodified unless the FORCE flag
is true, in which case an OMIT tag is added.

The COMMIT flag is used to determine whether the file should be
written out over the existing file. The file can only be written out
if data was read from the files. If it was read in from the cache,
this will fail.

The return value is the number of changes made to the set.

commit
$obj->commit(SET1,SET2,...);

Any changes that have been made with the add and remove methods can
be written out to the set file(s) with this method. This method is
only valid if the data was read in from files. If it was read in
from the cache, this method will fail.

INIT OPTIONS
The following options can be passed in to the new method:

path
path => DIR1:DIR2:...
path => [ DIR1, DIR2, ... ]

The set files may be stored in one or more different directories. By
default, set files are assumed to be in the current directory, but
using this option, the directory (or directories) can be explicitely
set.

One thing to note. If multiple directories are used, and a file of
the same name exists in more than one of the directories, the first
one found (in the order that the directories are included in the
list) is used. A warning will be issued for files of the same name
in other directories, but they will be ignored.

Warnings will be issued for unreadable directories, or unreadable
files within a directory.

valid_file
valid_file => REGEXP
valid_file => !REGEXP
valid_file => \&FUNCTION

By default, all files in the directories are used. With this option,
filenames are tested and only those that pass will be used. Others
will be silently ignored.

REGEXP is a regular expression. Only filenames which match the
REGEXP will pass (or if !REGEXP is used, only filenames which do NOT
match REGEXP will pass).

If a reference to a function is passed in, the function
&FUNCTION(dir,file) will be evaluated for each file. If it returns
0, the file will be silently ignored. Otherwise it will be used.

invalid_quiet
invalid_quiet = 1

By default, when a file is ignored due to failing a valid_file test,
or when an element is ignored due to failing a valid_ele test, a
warning is issued. With this option, no warning is issued.

cache
cache => DIR

Data from the set files may be cached in order to speed up data
access. If this option is used, you must specify the directory where
the data will be cached. The directory may be the same as one of the
directories containing the set files.

The cache directory defaults to the first directory given in the
path option (or the current directory if no path option is given).

read
read => "cache"
read => "files"
read => "file"

When an application wants to use data from the set files, they can
either read the data from set files or the cache.

If the cache option was used, the default is to read from the cache
if it exists, read from the files otherwise. If no cache option was
used, the default is to read from the files. When data is read in
from the cache, the commit and cache methods are disabled.

If the file option is used, it reads a single set from a single file
along with all dependancy sets (i.e. sets that are included or
excluded via. the appropriate tags). This allows someone to make
changes to a single set file that they own even if permissions are
set so that they cannot read other set files. The commit method is
available, but the cache method is disabled. The file option
requires that the set option also be present.

With the files option, all set files are read. Both the commit and
cache methods are enabled.

set
set => SET

This defines which set to read when the read = file> option is used.
This option is required when read = file> and ignored for any other
value for read.

types
types => TYPE
types => [ TYPE1, TYPE2, ... ]

Sets can be of one or more types (or they can belong to no type and
be used solely in building other sets using the INCLUDE or EXCLUDE
tags described in the FILE FORMAT section below).

This option can be used to specify the names of the different types
of sets defined by these files.

If this option is not given, then there is only one type and by
default, all sets belong to it.

default_types
default_types => [ TYPEa, TYPEb, ... ]
default_types => "all"
default_types => "none"
default_typew => TYPE

Some types of sets may be more common than others, and you may or
may not want to have to explicitely define which types a set belong
to.

If a list of types are passed in, every type must be defined in the
types option (warnings will be issued if they weren't). If a value
of "all" is passed in, sets belong to all types by default. If a
value of "none" is passed in, sets don't belong to any type by
default.

By default, sets belong to all types available.

comment
comment => REGEXP

This defines a regular expression used to recognize (and strip out)
comments from a set file. The default expression is "#.*" which
means that all characters from a pound sign to the end of the line
are removed.

If REGEXP is passed in as an empty string, there are no comments.
All lines are either empty or contain an element.

tagchars
tagchars => STRING

This defines a character (or a string) which marks a line of the set
file as containing a tag. The default value is "@".

valid_ele
valid_ele => REGEXP
valid_ele => !REGEXP
valid_ele => \&FUNCTION

By default, every non-blank line (after comments have been stripped
out) is treated as an element. If this option is used, elements are
tested, and only those that pass the test are treated as valid.
Others are invalid and produce a warning.

If a reference to a function is passed in, the function
&FUNCTION(set,ele) will be evaluated for each element. If it returns
0, the element will be silently ignored. Otherwise it will be
included in the set.

scratch
scratch => DIR

When automatically updating a set file, the directory where the
files live may or may not be writable by a user who owns a set file.

If the directory is writable by the user, there is no problem. In
this case, when a new set file is written, the old one is backed up
and the new one written in it's place.

If the directory is NOT writable by the user, the old copy is backed
up to the scratch directory. This directory must be writable by the
user. It defaults to /tmp.

FILE FORMAT
A set file has a very simple format. It consists of blank lines, tags,
and elements. Comments may be included as whole lines or part of one of
the above lines.

Each line is checked for comments and they are removed before any other
processing is done. A comment is anything that matches a regular
expression which can be set using the comment Init option. The default
regular expression is "#.*" which means that comments start with a pound
sign anywhere on the line and go to the end of the line.

Tags are lines which begin with begin with a special string (which can
be set with the tagchars Init option. The default string is "@". Tag
lines are of one of the formats:

@TAG
@TAG VAL1,VAL2,...

All other lines are elements. Elements are any string (one per line).

Leading/trailing spaces are ignored in all cases.

The set name is the name of the set file.

The following TAGs are known:

INCLUDE SET1,SET2,...
This includes all members of one or more other sets in the current
set.

EXCLUDE SET1,SET2,...
This excludes all members of one or more other sets from the current
set. This overrides any members included from other sets, but does
NOT exclude members explicitely included in the set file.

OMIT ELE
This exludes a specific element from the current set. This overrides
any elements included via. an INCLUDE tag, or any elements
explicitly included in the set file.

Each element must be specified separately since there is no
guarantee that elements may not contain commas.

TYPE TYPE1,TYPE2,...
The default types that this set belongs to are determined by the
types and default_types Init options.

This tag explicitely puts this set if the specified types, even if
it is not in those types of default.

NOTYPE TYPE1,TYPE2,....
Similar to the TYPE tag, but this tag explicitely removes the set
from the specified types, even if it is in them by default.

OPTION VARIABLE [= VALUE]
Although there is no support for element specific attributes, there
IS support for attributes which apply to the entire set (and which
can be made available to applications using these sets).

Each set may have a hash associated with with key/value pairs (if no
value is include, it defaults to 1). These attributes are available
using the info method.

All tag lines can be repeated any number of times, so:

@INCLUDE foo,bar

is equivalent to

@INCLUDE foo
@INCLUDE bar

All tags are case insensitive.

When determining the members of a set which includes and excludes other
sets, or omits specific elements from the set, all inclusions are
evaluted first, followed by all exclusions (i.e. all exclusions override
all inclusions). If there is a cyclic dependancy (i.e. A depends on B
depends on A where a dependancy can either be an INCLUDE or EXCLUDE), an
error is reported and the cyclic dependancy is ignored.

A few examples illustrate the use of INCLUDE, EXCLUDE, and OMIT tags. In
the examples, the set file A contains the elements: E1, E2, E3. The set
file B contains the elements: E3, E4, E5. The set file contains the
following lines:

@INCLUDE A
@EXCLUDE B
E5
E6

defines a set contains the elements: E1, E2, E5, E6. The first line
includes E1, E2, E3. The second line excludes E3. It does NOT exclude E5
since the EXCLUDE tag does not override elements explicitly included in
the set file. Finally, the E5 and E6 elements are added.

The set file containing the following lines:

@INCLUDE A
@EXCLUDE B
@OMIT E2
@OMIT E6
E5
E6

defines a set contains the elements: E1, E5. This is similar to the
above example, except that the OMIT tags override elements included via.
the INCLUDE tag AND elements explicitly included in the set file.

FILES
Several files are used by the Set::Files module. They all live in the
directory set by the cache Init Option except for set specific files
which live in the same directory as the set file. Files are:

.set_files.SET
A backup of the given set. When a set file is updated, the original
file is stored in this file. The file is stored either in the same
directory as the set file (if it is writable) or in the directory
specified by the scratch Init Option.

.set_files.SET.new
A temporary file where a new set file (or the update to an old one)
is written. Once completed, this file is moved into place as the new
set file. This file lives in the same directory as the set file or
in the scratch directory.

.set_files.cache
The file containing the cache. This is created using the cache
method.

.set_files.template
When creating a new set file (or updating an existing one), this
file is used (if it exists) as a starting point and then all the
data is appended to it. This is a good place to store comments
describin how to edit the set files, etc., that set file maintainers
can read for help.

KNOWN PROBLEMS
None at this point.

LICENSE
This script is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

AUTHOR
Sullivan Beck ([email protected])