NAME
InSilicoSpectro - Open source Perl library for proteomics
INSILICOSPECTRO PROJECT DESCRIPTION
This is the description of the entire InSilicoSpectro project; a
description of the InSilicoSpectro.pm module is provided hereunder.
InSilicoSpectro is a proteomics open-source project intended to cover
common operations in mass list file format conversions, protein sequence
digestion, theoretical mass spectra computations, theoretical and
experimental MS data matching, text/graphic display, peptide retention
time predictions, etc.
The problems of raw data processing, storage and database searching are
not addressed by the InSilicoSpectro project. InSilicoSpectro is
released under the LGPL license and it is available from a dedicated web
site at
http://insilicospectro.vital-it.ch.
The general design of the modules follows the object oriented
programming (OOP) model and most of the modules are class definitions
actually. The module that implements most of the theoretical mass
computation routines supports a dual OOP and procedural programming
model. InSilicoSpectro modules make use of some Perl modules that are
not part of the standard Perl distribution, such as
Statistics:Regression, XML:Twig, GD, and IA:NNFlex.
We have developed a simple and minimal hierarchy to represent protein
sequences and peptides (as digestion product) in a way that, on the one
hand, fits the needs of the computations we perform and, on the other
hand, stays relatively neutral in its design. Thus it should be possible
to combine the latter classes with existing projects at users sites,
e.g. via multiple inheritance, or to use them as the basis of more
sophisticated objects.
InSilicoSpectro Perl code is documented mainly via pod and a wide
collection of simple and focused examples. An introductory explanation
is provided here to guide new users and give them an understanding of
the library that should be sufficient such that pod and the examples are
the only necessary documentation.
Installation
Library organization
InSilicoSpectro modules (lib/InSilicoSpectro) are organized according to
their function. At the more general level there is a module named
InSilicoSpectro.pm (This one!!) that provides general functionalities
for initializing all other modules. More specialized modules are grouped
in three folders:
Spectra, for mass list-related;
InSilico, for computational modules;
Utils, for a few utility modules.
In addition, illustrative examples can be found in three folders:
scripts, which contains a set of tools implemented with InSilicoSpectro
modules;
cgi, which contains scripts implementing a simple web-based set of
tools;
t, which contains test programs that are examples as well.
Now, by considering the main topics we cover in InSilicoSpectro one
after another, we introduce the main modules and examples the user
should try and look at to gain autonomy with the whole library.
Mass list file format conversion
A general purpose conversion program, convertSpectra.pl in folder
scripts, allows you to convert one mass list format to another. A CGIzed
version exists in the cgi folder: cgiConvertSpectra.pl.
convertSpectra.pl is a good starting point to see a high-level usage of
the basic methods implemented in the underlying modules.
InSilicoSpectro::Spectra::ExpSpectrum is the basic class for
representing spectra, i.e. a list of peaks (namely a list of pointers to
peaks). Peaks are represented as list of attributes such as mass,
intensity, SN, etc. The order of the attributes in these lists is given
by an object of class InSilicoSpectro::Spectra::PeakDescriptor. See
t/Spectra/testExpSpectrum.pl and t/Spectra/testPeakDescriptor.pl.
By means of classes InSilicoSpectro::Spectra::MSSpectra,
InSilicoSpectro::Spectra::MSMSSpectra,
InSilicoSpectro::Spectra::MSMSCmpd, and InSilicoSpectro::Spectra::MSRun
we represent PMF (MS) and MS/MS spectra, and HPLC runs. See
t/Spectra/testSpectra.pl.
Utils
The module InSilicoSpectro::Utils::IO.pm contains miscellaneous
utilities for accessing compressed files, defining a common verbose
variable, etc.
pI estimations
scripts/computePI.pl is a tool that exemplify the usage of the class
InSilicoSpectro::InSilico::IsoelPoint. Examples of how to use it can be
found in t/InSilico/examples_rt_pi. See also the example in
t/InSilico/testIsoelPoint.pl. A CGI version of computePI.pl can be found
in cgi folder.
Retention time prediction
scripts/computeRT.pl is a tool that exemplify the usage of the class
InSilicoSpectro::InSilico::RetentionTimer. Examples of how to use it can
be found in t/InSilico/examples_rt_pi. See also the examples in
t/InSilico/testPetritis.pl and t/InSilico/testHodges.pl. A CGI version
of computeRT.pl can be found in cgi folder.
Enzymes
Enzymes are modeled by class InSilicoSpectro::InSilico::CleavEnzyme. See
t/InSilico/testCleavEnzyme.pl.
PTMs and other modifications
Modifications of residues are modeled by class
InSilicoSpectro::InSilico::ModRes. See t/InSilico/testModRes.pl.
Protein and peptide sequences
The basic class for biological sequences is
InSilicoSpectro::InSilico::Sequence. We then define
InSilicoSpectro::InSilico::AASequence to represent protein sequences
with their modifications. A class InSilicoSpectro::InSilico::Peptide is
used for enzymatic digestion products as we need special data in this
case that are not part of a standard protein model.
Examples can be found in t/InSilico: testSequence.pl, testAASequence.pl,
testPeptide.pl.
Protein digestion and mass computations
The main module for digestion and mass computations is
InSilicoSpectro::InSilico::MassCalculator. Examples of digestions and
protein/peptide mass computations, including in the presence of
fixed/variable modifications, are found in t/InSilico:
testCalcDigest.pl, testCalcDigestOOP.pl, and testCalcVarpept.pl. OOP
means an example with the OOP model as MassCalculator supports both an
OOP and procedural interface.
PMF
The match between theoretical peptide masses and PMF experimental data
is made by functions found in InSilicoSpectro::InSilico::MassCalculator.
In the OOP model it is possible to represent PMF matches in objects of
class InSilicoSpectro::InSilico::PMFMatch. See
t/InSilico/testCalcPMFMatch.pl and t/InSilico/testCalcPMFMatchOOP.pl.
Peptide fragmentation
Theoretical fragment masses are computed by functions found in
InSilicoSpectro::InSilico::MassCalculator. In the OOP model, theoretical
MS/MS spectra can be represented as an object of class
InSilicoSpectro::InSilico::MSMSTheoSpectrum, which represents in turn
the various ions as InSilicoSpectro::InSilico::InternIonSeries and
InSilicoSpectro::InSilico::TermIonSeries.
The match between experimental and theoretical masses is also computed
by InSilicoSpectro::InSilico::MassCalculator and in the OOP model the
class InSilicoSpectro::InSilico::MSMSTheoSpectrum can store the match in
addition to the theoretical spectrum.
See in t/InSilico: testCalcFrag.pl, testCalcFragOOP.pl,
testCalcMatch.pl, testCalcMatchOOP.pl, getIonIntensities.pl, ionStat.R.
Graphical display of MS/MS spectra/matches
The class InSilicoSpectro::InSilico::MSMSOutput instanciates objects
aimed at providing different formats in order to represent MS/MS spectra
and matches. See in t/InSilico: testMSMSOutText.pl, testMSMSOutLatex.pl,
testMSMSOutHtml.pl, testMSMSOutPlot.pl, testMSMSOutLegend.pl.
Mini web site
In folder miniweb we provide a perl script build-miniweb.pl that builds,
from CGI scripts in folder cgi, a simple web site for protein digestion,
mass computations, and pI and retention time estimations.
MODULE DESCRIPTION
The module InSilicoSpectro.pm comprises generic functions that are
useful for the whole project.
FUNCTIONS
saveInSilicoDef([$out])
Saves all registered definitions into the configuration file named $out,
e.g. insilicodef.xml
getInSilicoDefFiles()
Returns the list of configuration files given by the operating system
environment variable, whose name is stored in
$InSilicoSpectro::DEF_FILENAME_ENV (default "INSILICOSPECTRO_DEFFILE").
The environment variable can point more than one file (separated by
':'), or be a glob ('...*...' expression).
init([@files])
Loads a list of configuration files given as parameter or the default
configuration files as returned by getInSilicoDefFiles.
SEE ALSO
InSilicoSpectro::InSilico, InSilicoSpectro::Spectra,
InSilicoSpectro::Utils
COPYRIGHT
Copyright (C) 2004-2005 Geneva Bioinformatics (www.genebio.com) &
Jacques Colinge (Upper Austria University of Applied Science at
Hagenberg)
This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.
This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser
General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this library; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
AUTHORS
Jacques Colinge, www.fhs-hagenberg.ac.at
Alexandre Masselot, www.genebio.com