# Copyright (c) 2013-2020, SIB - Swiss Institute of Bioinformatics and
#                          Biozentrum - University of Basel
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#   http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


We provide several scripts to obtain a StructureDB. 
More information regarding the different steps you'll find in the comments
in the corresponding scripts.

Steps to obtain a structural database:

1. Get the structure/profile data: preferably you want a non redundant set of
   structures to generate a StructureDB. We suggest to cull a list of pdb entries
   from PISCES: http://dunbrack.fccc.edu/PISCES.php
   The get_data_from_smtl.py script directly reads in such a file and produces
   the input for the second step in StructureDB generation. 
   BUT: THE CHAIN NAMING WILL BE THE ONE AS IN THE SMTL, IT DOESN'T RELATE
   ANYMORE TO THE PDB NAMING!

2. Build an initial database: use build_structure_db.py with input generated
   in step one.

IF YOU NEVER EVER REQUIRE A STRUCTURE PROFILE, YOU'RE ALREADY DONE!
BUT BE AWARE, THE ACCORDING FREQUENCIES IN THE DATABASE WILL BE SET TO 0.0!

3. Calculate structure profiles: use create_structure_profiles.py.
   The script calculates the structure profiles for a subset of the
   sequences in the desired StructureDB and is intended for parallelization
   
4. Parallelize step 3: use prepare_slurm_submission_structure_profiles.py
   This script generates a submission script and a batch command file 
   that you can submit with sbatch structure_profile_generation_array.sh.
   This is intended to be used with the SLURM submission system.
   IMPORTANT: there are many paths to adapt on top of
              prepare_slurm_submission_structure_profiles.py

5. Assign the structure profiles generated in step 3 and 4 to the 
   initial database generated in step 2: 
   use assign_structure_profiles.py to perform this task


To qualitatively reproduce the default StructuralDB in ProMod3, you first 
perform step 1 and 2 with a non redundant set of protein structures as 
defined by PISCES with around 20000 chains (e.g. seq id threshold: 60, 
resolution threshold: 2.5). 
Repeat step 1 and 2 with a smaller PISCES list (5000-6000 entries, 
e.g. seq id threshold: 20 , resolution threshold: 2.0). 
The first database serves as default StructureDB and the second db as 
the source db for the structural profiles generated in steps 3-5.
