this pdf

advertisement
BORGES
ARCIMBOLDO
Setup and Installation
System requirements
Borges was designed and developed to run on a local machine accessing a local or remote
Condor/SGE grid environment. The core part of the program will run on a local machine independently
of the grid environment available. There are two things to configure:
Local machine (Database server)
Remote machines (Condor/SGE grid)
Local Machine
The local machine is where the database is located and custom libraries are generated. There is some
local (heavy) computing performed so it is advised that an up to date machine is dedicated to this task.
The specs of the machine used in our local setup are:
Processor: *Intel i7 950 (4 core x 3.07 GHz)
Memory: 6GB
Hard Drive: >150 GB
OS: GNU/Linux
Note As for any other database intensive software, performance will be highly enhanced if a solid state
drive is used to store and generate the database and libraries.
Remote Machines
The majority of the calculations are distributed over a grid. The grid can be a local one, as the one used in
our lab for testing and development consisting of any available machine from our crystallography cluster
(110 cores with aproximately 130 GFlops peak performance with a minimum of 2GB of memory per core
coming from Intel i7 or Xeon processors) or a remote Supercomputer such as Calendula where a
Condor or SGE installation is available.
The documentation to deploy a Condor­grid is available on the main Condor project (now HTCondor) site
The documentation to deploy a SGE­grid is available for example on the Son of Grid Engine website
which is the implementation we use in our setup.
Software requirements
BORGES is written in python 2.6 so take care to choose the library packages for this python version.
The required software for the dependencies section is freely available and can be obtained either from
each project website or using the corresponding distributions package manager.
Dependencies
BORGES uses some python libraries not included in the standard python distribution:
BioPython
NumPy
SciPy
Paramiko
MySQLdb
And the SQL server client from: ­ MySQL
For example, to install this packages under debian or debian­based distribution we use the apt package
manager.
apt­get install python­biopython python­numpy python­mysqldb mysql­server python­s
cipy python­paramiko
Third party software (scientific)
Scientific software has separate licenses and it is necessary to get each program directly from their
sites, and install it manually.
phaser: http://www.phaser.cimr.cam.ac.uk/index.php/Phaser_Crystallographic_Software
shelxe: http://shelx.uni­ac.gwdg.de/SHELX/
The required scientific software is most likely already installed in any macromolecular crystallography
laboratory.
Scientific software on remote machines
The required scientific software should be available on all machines where jobs will run.
Phaser is distributed as part of larger software suites, CCP4 and Phenix. These suites are often
updated and might make changes to the programs BORGES uses, causing unexpected results
(breaking the program). In our setup we keep a separate version of phaser so we can safely perform
updates to both suites without the risk of breaking BORGES. The required files are isolated, so there is
no need to keep an extra set of the full CCP4 and Phenix suites.
Instructions for our setup are found below:
Phaser
Required libraries (located inside ccp4_folder/src/phaser/phaser­2.5.0/build/intel­linux/lib):
libcctbx.so
libiotbx_pdb.so
libomptbx.so
Copy those libraries and the phaser binary to a folder of your choice and create a file called
condor_phaser with the following content (for bash):
#! /bin/bash
# add the required libraries
export LD_LIBRARY_PATH=/your_path_of_choice:$LD_LIBRARY_PATH
# then launch phaser
/your_path_of_choice/phaser
Give condor_phaser execution permission.
chmod +x condor_phaser
BORGES configuration
mySQL
There are some parameters that have to be adjusted in the mysql server configuration file
/etc/mysql/my.cnf in debian, if the lines do not exist, create them after the [mysqld] tag.
max_allowed_packet = 800MB
wait_timeout = 31536000
interactive_timeout = 31536000
net_write_timeout = 31536000
query_cache_size = 16MB
BORGES uses two databases, one which is shared across runs and must be generated before the first
run and another one where user generated libraries are stored. The first or main database, is called
borges and user generated ones will have the username prepended to it for example foo_borges will
store the libraries for user To generate the foo.
borges database first you need to get a snapshot of the pdb.. You can gett it from the
pdb_rsync_mirrors with rsync use any of the three equivalent commands:
rsync ­rlpt ­v ­z ­­delete ­­port=33444 rsync.wwpdb.org::ftp_data/structures/divid
ed/pdb/ ./pdb
rsync ­rlpt ­v ­z ­­delete ftp.pdbj.org::ftp_data/structures/divided/pdb/ ./pdb
rsync ­rlpt ­v ­z ­­delete rsync.ebi.ac.uk::pub/databases/rcsb/pdb­remediated/data
/structures/divided/pdb/ ./pdb
Start a mysql session as mysql root and create the borges user which is the owner of the borges
database and at least a regular user that will be the actual user of the program:
CREATE USER 'borges'@'localhost' IDENTIFIED BY 'mypass';
CREATE USER 'username'@'localhost' IDENTIFIED BY 'mypass';
Create the borges database and grant full permissions to the borges user:
CREATE DATABASE borges;
GRANT ALL ON borges.* TO 'borges'@'localhost';
Create a database for the user named username_borges and grant full permissions to the user:
CREATE DATABASE username_borges;
GRANT ALL ON username_borges.* TO 'username'@'localhost';
Grant access to be able to query the borges database to any other user you create besides the borges
user.
GRANT SELECT ON borges.* TO 'username'@'localhost';
BORGES Installation
BORGES consists of several python scripts and internal modules. To install the program just
uncompress the downloaded.tar.gz file, set up the setup.bor file and add the BORGES folder to the
PATH.
The parameters required for the setup.bor are:
[LOCAL]
# Third party software paths
# New phaser stands for phaser 2.5.2 or higher
path_local_phaser: /path/to/phaser
path_local_mtzdmp: /path/to/mtzdmp
path_local_shelxe: /path/to/shelxe
path_local_borgesclient: /path/to/borgesclient
# If the python interpreter is not the default one the following variable inticate
s the path to it
python_local_interpreter: [MYSQL]
# MySQL parameters
# borges database name, hostname where the database is # located and port to access the database from another machine
borges_coredb_name: borges
borges_coredb_user: borges
borges_coredb_host: localhost borges_coredb_port: 3306
# OPTIONAL PARAMETERS
[CONDOR]
# Parameters for each executable under Condor (memory constraints, # CPU speed ...) requirements_shelxe: requirements_phaser:
requirements_borges:
[SGE]
# Default queue for Borges
qname: # If there are no special rules to use a queue, # there is no need to edit this value
fraction: 1
[GRID]
# BORGES parameters for "borges" database generation
number_of_pdbs_per_tar: 5
number_of_parallel_grid_jobs: 100
# Supercomputing environment parameters
path_remote_phaser: path_remote_shelxe: path_remote_borgesclient: # If the python interpreter is not the default one the following # variable inticates the path to it
python_remote_interpreter: remote_frontend_username: remote_frontend_host:
path_remote_sgepy: home_frontend_directory: remote_frontend_port: # The schedduler system on the remote grid, either Condor or SGE
type: Condor | SGE
# Boolean variable, set to True for NFS filesystem, otherwise set to False
remote_fylesystem_isnfs: True | False
remote_frontend_prompt: $
remote_submitter_username: remote_submitter_host: remote_submitter_port: remote_submitter_prompt: $
Once the config file is set up and the external software requirements are met and configured, the
program is ready to be used.
BORGES database generation
It is advised to use a non redundant set of pdbs to reduce time on database generation step and at the
creation of any specific library. To obtain this non redundant set use the script deployed with BORGES:
./nrpdb_filter.py input_dir_pdbs nrpdb.txt output_dir_pdbs
The borges db needs around 50GB of disk space and around 18 hours to calculate on an Intel i7 950.
This needs to be done the first time and using the mysql user borges, and later on just update the
database with the newly released structures from the PDB. The following command must be issued:
./BORGES­LIBRARY.py ­DBC input_reduced_dir_pdbs ­h localhost ­u borges
input_dir_pdbs is the path to the folder where the pdb snapshot was stored
input_reduced_dir_pdbs is the path to the folder with the reduced set of pdbs
nrpdb.txt is a text file where each line contains a pdb identifier. This file is deployed with BORGES but
you can set up your own one.
Remote Grid Setup
The main limiting factor when using an external grid, for example a supercomputer, is disk space, and
software installation. For BORGES both are related, because as mentioned above some programs used
are bundled in complex software suites, therefore being able to simplify the installation of such software
is important. The guidelines used to isolate the binaries used by BORGES also apply to the setup of a
remote environment.
About Us Contact
Download