BORGES ARCIMBOLDO Setup and Installation System requirements Borges was designed and developed to run on a local machine accessing a local or remote Condor/SGE grid environment. The core part of the program will run on a local machine independently of the grid environment available. There are two things to configure: Local machine (Database server) Remote machines (Condor/SGE grid) Local Machine The local machine is where the database is located and custom libraries are generated. There is some local (heavy) computing performed so it is advised that an up to date machine is dedicated to this task. The specs of the machine used in our local setup are: Processor: *Intel i7 950 (4 core x 3.07 GHz) Memory: 6GB Hard Drive: >150 GB OS: GNU/Linux Note As for any other database intensive software, performance will be highly enhanced if a solid state drive is used to store and generate the database and libraries. Remote Machines The majority of the calculations are distributed over a grid. The grid can be a local one, as the one used in our lab for testing and development consisting of any available machine from our crystallography cluster (110 cores with aproximately 130 GFlops peak performance with a minimum of 2GB of memory per core coming from Intel i7 or Xeon processors) or a remote Supercomputer such as Calendula where a Condor or SGE installation is available. The documentation to deploy a Condor­grid is available on the main Condor project (now HTCondor) site The documentation to deploy a SGE­grid is available for example on the Son of Grid Engine website which is the implementation we use in our setup. Software requirements BORGES is written in python 2.6 so take care to choose the library packages for this python version. The required software for the dependencies section is freely available and can be obtained either from each project website or using the corresponding distributions package manager. Dependencies BORGES uses some python libraries not included in the standard python distribution: BioPython NumPy SciPy Paramiko MySQLdb And the SQL server client from: ­ MySQL For example, to install this packages under debian or debian­based distribution we use the apt package manager. apt­get install python­biopython python­numpy python­mysqldb mysql­server python­s cipy python­paramiko Third party software (scientific) Scientific software has separate licenses and it is necessary to get each program directly from their sites, and install it manually. phaser: http://www.phaser.cimr.cam.ac.uk/index.php/Phaser_Crystallographic_Software shelxe: http://shelx.uni­ac.gwdg.de/SHELX/ The required scientific software is most likely already installed in any macromolecular crystallography laboratory. Scientific software on remote machines The required scientific software should be available on all machines where jobs will run. Phaser is distributed as part of larger software suites, CCP4 and Phenix. These suites are often updated and might make changes to the programs BORGES uses, causing unexpected results (breaking the program). In our setup we keep a separate version of phaser so we can safely perform updates to both suites without the risk of breaking BORGES. The required files are isolated, so there is no need to keep an extra set of the full CCP4 and Phenix suites. Instructions for our setup are found below: Phaser Required libraries (located inside ccp4_folder/src/phaser/phaser­2.5.0/build/intel­linux/lib): libcctbx.so libiotbx_pdb.so libomptbx.so Copy those libraries and the phaser binary to a folder of your choice and create a file called condor_phaser with the following content (for bash): #! /bin/bash # add the required libraries export LD_LIBRARY_PATH=/your_path_of_choice:$LD_LIBRARY_PATH # then launch phaser /your_path_of_choice/phaser Give condor_phaser execution permission. chmod +x condor_phaser BORGES configuration mySQL There are some parameters that have to be adjusted in the mysql server configuration file /etc/mysql/my.cnf in debian, if the lines do not exist, create them after the [mysqld] tag. max_allowed_packet = 800MB wait_timeout = 31536000 interactive_timeout = 31536000 net_write_timeout = 31536000 query_cache_size = 16MB BORGES uses two databases, one which is shared across runs and must be generated before the first run and another one where user generated libraries are stored. The first or main database, is called borges and user generated ones will have the username prepended to it for example foo_borges will store the libraries for user To generate the foo. borges database first you need to get a snapshot of the pdb.. You can gett it from the pdb_rsync_mirrors with rsync use any of the three equivalent commands: rsync ­rlpt ­v ­z ­­delete ­­port=33444 rsync.wwpdb.org::ftp_data/structures/divid ed/pdb/ ./pdb rsync ­rlpt ­v ­z ­­delete ftp.pdbj.org::ftp_data/structures/divided/pdb/ ./pdb rsync ­rlpt ­v ­z ­­delete rsync.ebi.ac.uk::pub/databases/rcsb/pdb­remediated/data /structures/divided/pdb/ ./pdb Start a mysql session as mysql root and create the borges user which is the owner of the borges database and at least a regular user that will be the actual user of the program: CREATE USER 'borges'@'localhost' IDENTIFIED BY 'mypass'; CREATE USER 'username'@'localhost' IDENTIFIED BY 'mypass'; Create the borges database and grant full permissions to the borges user: CREATE DATABASE borges; GRANT ALL ON borges.* TO 'borges'@'localhost'; Create a database for the user named username_borges and grant full permissions to the user: CREATE DATABASE username_borges; GRANT ALL ON username_borges.* TO 'username'@'localhost'; Grant access to be able to query the borges database to any other user you create besides the borges user. GRANT SELECT ON borges.* TO 'username'@'localhost'; BORGES Installation BORGES consists of several python scripts and internal modules. To install the program just uncompress the downloaded.tar.gz file, set up the setup.bor file and add the BORGES folder to the PATH. The parameters required for the setup.bor are: [LOCAL] # Third party software paths # New phaser stands for phaser 2.5.2 or higher path_local_phaser: /path/to/phaser path_local_mtzdmp: /path/to/mtzdmp path_local_shelxe: /path/to/shelxe path_local_borgesclient: /path/to/borgesclient # If the python interpreter is not the default one the following variable inticate s the path to it python_local_interpreter: [MYSQL] # MySQL parameters # borges database name, hostname where the database is # located and port to access the database from another machine borges_coredb_name: borges borges_coredb_user: borges borges_coredb_host: localhost borges_coredb_port: 3306 # OPTIONAL PARAMETERS [CONDOR] # Parameters for each executable under Condor (memory constraints, # CPU speed ...) requirements_shelxe: requirements_phaser: requirements_borges: [SGE] # Default queue for Borges qname: # If there are no special rules to use a queue, # there is no need to edit this value fraction: 1 [GRID] # BORGES parameters for "borges" database generation number_of_pdbs_per_tar: 5 number_of_parallel_grid_jobs: 100 # Supercomputing environment parameters path_remote_phaser: path_remote_shelxe: path_remote_borgesclient: # If the python interpreter is not the default one the following # variable inticates the path to it python_remote_interpreter: remote_frontend_username: remote_frontend_host: path_remote_sgepy: home_frontend_directory: remote_frontend_port: # The schedduler system on the remote grid, either Condor or SGE type: Condor | SGE # Boolean variable, set to True for NFS filesystem, otherwise set to False remote_fylesystem_isnfs: True | False remote_frontend_prompt: $ remote_submitter_username: remote_submitter_host: remote_submitter_port: remote_submitter_prompt: $ Once the config file is set up and the external software requirements are met and configured, the program is ready to be used. BORGES database generation It is advised to use a non redundant set of pdbs to reduce time on database generation step and at the creation of any specific library. To obtain this non redundant set use the script deployed with BORGES: ./nrpdb_filter.py input_dir_pdbs nrpdb.txt output_dir_pdbs The borges db needs around 50GB of disk space and around 18 hours to calculate on an Intel i7 950. This needs to be done the first time and using the mysql user borges, and later on just update the database with the newly released structures from the PDB. The following command must be issued: ./BORGES­LIBRARY.py ­DBC input_reduced_dir_pdbs ­h localhost ­u borges input_dir_pdbs is the path to the folder where the pdb snapshot was stored input_reduced_dir_pdbs is the path to the folder with the reduced set of pdbs nrpdb.txt is a text file where each line contains a pdb identifier. This file is deployed with BORGES but you can set up your own one. Remote Grid Setup The main limiting factor when using an external grid, for example a supercomputer, is disk space, and software installation. For BORGES both are related, because as mentioned above some programs used are bundled in complex software suites, therefore being able to simplify the installation of such software is important. The guidelines used to isolate the binaries used by BORGES also apply to the setup of a remote environment. About Us Contact