Description of functions¶
Module contents¶
-
pairef.
run_pairef
(input_args=None)¶ THE LAUNCHING FUNCTION - to process input arguments using
launcher.process_arguments()
and launch thelauncher.main()
function.Parameters: input_args (list) – List of input arguments - parameters
pairef.launcher module¶
-
class
pairef.launcher.
MyArgumentParser
(prog=None, usage=None, description=None, epilog=None, version=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True)¶ Bases:
argparse.ArgumentParser
Helper class for argparse
It adds an attribute add_argument_with_check that check if file given in argument exist but not open it. Inspired by https://codereview.stackexchange.com/questions/28608/ checking-if-cli-arguments-are-valid-files-directories-in-python
-
add_argument_with_check
(*args, **kwargs)¶ New attribute for argparse that checks if file given in argument exist but does not open it.
-
error
(message: string)¶ Prints a usage message incorporating the message to stderr and exits.
If you override this in a subclass, it should not return – it should either exit or raise an exception.
-
-
pairef.launcher.
check_non_negative_int
(value)¶
-
pairef.launcher.
check_positive_float
(value)¶
-
pairef.launcher.
check_positive_int
(value)¶
-
pairef.launcher.
main
(args)¶ The main function of the pairef module.
Parameters: args – Input arguments processed by argparse
pairef.preparation module¶
-
pairef.preparation.
calculate_merging_stats
(hklin_unmerged, shells, project, bins_low, res_low_from_hklin_unmerged=inf, res_high_from_hklin_unmerged=0)¶ For given file hklin_unmerged, calculate the merging statistics using CCTBX.
Parameters: Returns: Name of a CSV file where the calculated statistics has been saved
Return type:
-
pairef.preparation.
check_refinement_software
(args, versions_dict, refinement='refmac')¶ Check if the input structure model args.xyzin was refined in REFMAC5 or phenix.refine and in which version (PDB and mmCIF format is accepted). If it was not refined or if it was refined in another version of REFMAC5 or phenix.refine than is now installed, write warning.
Parameters: Returns: version of REFMAC5 or phenix.refine that was used for the refinement of args.xyzin, if it was not found, “N/A” is returned
Return type:
-
pairef.preparation.
create_workdir
(project)¶ Create directory pairef_`project`, if exists, create directory pairef_`project`_new (recursively).
Parameters: project (str) – Name of the project Returns: Name of the created working directory Return type: str
-
pairef.preparation.
def_res_shells
(args, refinement, res_high_mtz, res_low=999)¶ Determine high resolution shells and number of low resolution bins.
The first value of list should be the resolution of data which were used for the refinement of the input structure model. If explicit definition (args.res_shells) is set, test its correctness. If it is valid, use, if not, define it automatically (shell step 0.05 A)
Parameters: Returns: - shells (list)
- n_bins_low (int)
- n_flag_sets (int)
- default_shells_definition (bool)
Return type: (tuple)
-
class
pairef.preparation.
output_log
(stdout, filename)¶ Set to write sys.stdout to screen and also in file PAIREF_out.log.
-
close
()¶
-
flush
()¶
-
write
(text)¶
-
-
pairef.preparation.
res_from_hklin_unmerged
(hklin_unmerged)¶ Finds a resolution range for given unmerged diffr. data file hklin_unmerged. In the case of an ASCII file from XDS, find it by searching the option INCLUDE_RESOLUTION_RANGE in the file. In the case of an MTZ file, find it using xia2. In the case of a SCA file, do not check.
Parameters: hklin_unmerged (str) – Name of the unmerged diffr. data file Returns: - res_low (float): Low resolution diffraction limit
- res_high (float): High resolution diffraction limit
Return type: (tuple)
-
pairef.preparation.
res_from_mtz
(hklin)¶ Finds the low resolution diffraction limit of data hklin using CCTBX.
Parameters: hklin (str) – Name of diffraction data MTZ file Returns: - res_low (float): Low resolution diffraction limit
- res_high (float): High resolution diffraction limit
Return type: (tuple)
-
pairef.preparation.
res_high_from_xyzin
(xyzin, format='.pdb')¶ Finds a line containing RESOLUTION RANGE HIGH in the file xyzin, picks the last word of the line (that should be the high resolution) and rounds it to two decimals. If it is not successful, returns -1.
Parameters: xyzin (str) – Input structure model (PDB or mmCIF format) Returns: initial high resolution limit rounded to two decimals or -1 if it was not found Return type: float
-
pairef.preparation.
res_opt
(shell, args, refinement='refmac')¶ Finds optical resolution running command
sfcheck -f project_Rflag_shellA.mtz -m project_Rflag_shellA.pdb
and writes the gained value into project_Optical_resolution.csv PDB format is required, does not work with mmCIF.Parameters: - shell (float) – Current resolution shell
- args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
- Returns
- float: optical resolution
-
pairef.preparation.
run_baverage
(project, xyzin, res_init)¶ Finds average value of B-factors of all the atoms in the structure model xyzin using baverage from the CCP4 package. This creates files with a prefix project_twodecname(res_init)A_baverage. Then returns mean B-factor for all the atoms of the structure model to the obtained value.
Parameters: Returns: Mean B-factor for all the atoms
Return type:
-
pairef.preparation.
run_bmean_iotbx
(project, xyzin)¶ Finds average value of B-factors of all the atoms in the structure model xyzin using pdb from iotbx. Then returns mean B-factor for all the atoms of the structure model to the obtained value.
Parameters: Returns: Mean B-factor for all the atoms
Return type:
-
pairef.preparation.
run_pdbtools
(args, baverage=0)¶ Modify the input structure model args.xyzin by mmtbx.pdbtools. The procces is controlled by args.reset_bfactor, args.add_to_bfactor, args.set_bfactor, and args.shake_sites. Reset of B-factors requires set of baverage. The modified structure model: project_twodecname(res_init)A_modified.
Parameters: - args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
- baverage (float) – Mean B-factor for all the atoms
Returns: Filename of the modified structure model (PDB or mmCIF format)
Return type:
-
pairef.preparation.
welcome
(args, pairef_version)¶ Print introduction information about the module and input parameters.
Parameters: - args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
- pairef_version (str) –
Returns: True
Return type:
pairef.refinement module¶
-
pairef.refinement.
calculate_correlation
(hkl_calc, hklin, flag=0, res_low=None, res_high=None)¶ Calculates CCwork and CCfree using sftools.
Parameters: Returns: tuple containing CCwork and CCfree (both are float or str: “N/A”)
Return type: (tuple)
-
pairef.refinement.
collect_stat_BINNED
(shells, project, hklin, n_bins_low, flag, res_low, refinement='refmac')¶ Collects statistics of a particular structure model depending on resolution (e. i. values are binned) and saves them in a CSV file. Statistics are picked from a REFMAC5 logfile relating to the model project_RXX_twodecname(shells[-1])A.pdb, where XX is a number of a flag.
This function is called by the function main() in file launcher.py.
Parameters: - shells (list) – from the initial to the last which were used for that model (float).
- project (str) – Name of the project
- hklin (str) – Name of diffraction data MTZ file
- n_bins_low (int) –
- flag (int) –
- res_low (float) – Low resolution of diffraction data hklin (required for calling the function collect_stat_refmac_log_low())
- refinement (str) – “refmac” or “phenix”
Returns: Name of the created CSV file
Return type:
-
pairef.refinement.
collect_stat_OVERALL
(shells, args, flag, refinement='refmac')¶ Collects overall statistics of a particular structure model from a REFMAC5 logfile or from PDB file (phenix.refine). Model which is dealing with: project_RXX_twodecname(shells[-1])A.pdb (REFMAC5) or project_RXX_twodecname(shells[-1])A_001.pdb (phenix.refine), where XX is a number of a flag.
This function is called by the function main() in file launcher.py.
Parameters: Returns: tuple containing names of created CSV files
Return type: (tuple)
-
pairef.refinement.
collect_stat_OVERALL_AVG
(shells, project, flag_sets)¶ Calculates and saves average overall values from CSV files prepared by the function collect_stat_OVERALL().
This function is called by the function main() in file launcher.py.
Parameters: Returns: tuple containing names of created CSV files
Return type: (tuple)
-
pairef.refinement.
collect_stat_binned_phenix_high
(pdbfilename, n_bins_low)¶ Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from 1 shells.
This function is called by the function collect_stat_BINNED().
Parameters: Returns: tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)
Return type: (tuple)
-
pairef.refinement.
collect_stat_binned_phenix_low
(pdbfilename, n_bins_low, res_low=999)¶ Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from n_bins_low shells.
This function is called by the function collect_stat_BINNED() from this file and main() from file launcher.py.
Parameters: Returns: tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)
Return type: (tuple)
-
pairef.refinement.
collect_stat_binned_refmac_high
(logfilename, mtzfilename, hklin, n_bins_low, res_low, res_high, flag=0)¶ Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from 1 shells.
This function is called by the function collect_stat_BINNED().
Parameters: Returns: tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)
Return type: (tuple)
-
pairef.refinement.
collect_stat_binned_refmac_low
(logfilename, mtzfilename, hklin, n_bins_low, res_low=999, flag=0)¶ Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from n_bins_low shells.
This function is called by the function collect_stat_BINNED() from this file and main() from file launcher.py.
Parameters: Returns: tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)
Return type: (tuple)
-
pairef.refinement.
collect_stat_overall_phenix
(pdbfilename)¶ Picks and returns overall Rwork, Rfree values from a given PDB file refined in phenix.refine.
This function is called by the functions collect_stat_OVERALL() and collect_stat_BINNED().
Parameters: pdbfilename (str) – Filename of a PDB structure model refined in phenix.refine Returns: tuple containing statistics values Rwork, Rfree (all are str) Return type: (tuple)
-
pairef.refinement.
collect_stat_overall_refmac
(logfilename, flag=0)¶ Picks and returns overall Rwork, Rfree from a given REFMAC5 logfile.
This function is called by the functions collect_stat_OVERALL() and collect_stat_BINNED().
Parameters: logfilename (str) – Filename of a REFMAC5 logfile Returns: tuple containing statistics values Rwork and Rfree (both are str) Return type: (tuple)
-
pairef.refinement.
collect_stat_write
(csvfilename, bin_res_low, bin_res_high, bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, bin_CCfree, shell_number=1)¶ Saves the given statistics values to a CSV file.
This function is called by the function collect_stat_BINNED().
Parameters: Returns: Filename of the created CSV file.
Return type:
-
pairef.refinement.
refinement_phenix
(res_cur, res_prev, res_high, args, n_bins=1, mode='refine', res_low=0, res_highest=0, flag=0, xyzin_start='', bfac_set=0, label=None)¶ Refine using phenix.refine.
Resolution limit is controlled by parameters res_high and res_low. Names of files are controlled by parameters res_cur and res_high.
There are 4 different modes:
mode=”refine”
refine structure args.project_twodecname(res_prev)A_001.pdb; results with prefix args.project_twodecname(res_cur)A_001; number_of_macro_cycles given by deffile (default: 3)
mode=”comp”
use structure args.project_twodecname(res_cur)A_001.pdb; results with prefix args.project_twodecname(res_cur) A_comparison_at_twodecname(res_high)A_001; number_of_macro_cycles=1, refine.strategy=None, ordered_solvent=False
mode=”prev_pair”
use structure args.project_twodecname(res_cur)A_001.pdb; results with prefix args.project_twodecname(res_cur) “A_comparison_at_twodecname(res_high)A_prev_pair_001; number_of_macro_cycles=1, refine.strategy=None, ordered_solvent=False
mode=”first”
use structure xyzin_start; results with prefix structure args.project_twodecname(res_cur)A_001.pdb plus extra copy of log structure args.project_ twodecname(res_cur)A_comparison_at_twodecname(res_high)A_001.log; number_of_macro_cycles controlled by an option –prerefinement-ncyc (1 cycles with no strategy and no ordered solvent by default, 6 cycles by default for the complete cross-validation); usually res_cur = res_init, res_high = res_init
If the input structure model was in mmCIF format, files with suffix .cif (not .pdb) are used.
Parameters: - res_cur (float) –
- res_prev (float or str) –
- res_high (float) –
- args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
- n_bins (int) –
- mode (str) – [“refine”, “comp”, “prev_pair”, “first”]
- res_low (float) –
- res_highest (float) –
- flag (int) –
- xyzin_start (str) – Filename of the model to be refined (valid only for mode=”first”)
- bfac_set (float) – Value of B-factor that will be set to all atoms before refinement (not used now)
Returns: Dictionary containing names of files that have been created by phenix.refine and a version of phenix.refine, e. i. HKLOUT, XYZOUT, LOGOUT, and version (all str)
Return type: (dict)
-
pairef.refinement.
refinement_phenix_get_label
(outout)¶ Get possible choices of refinement.input.xray_data.labels from standard output from phenix.refine (saved in file outout) if multiple equally suitable arrays of observed xray data were found.
Parameters: out (str) – Filename Returns: tuple containing the chosen label and all the found labels - both are (str) Return type: (tuple)
-
pairef.refinement.
refinement_refmac
(res_cur, res_prev, res_high, args, n_bins_low, mode='refine', res_low=0, res_highest=0, flag=0, xyzin_start='', bfac_set=0)¶ Refine using REFMAC5.
Resolution limit is controlled by parameters res_high and res_low. Names of files are controlled by parameters res_cur and res_high.
There are 4 different modes:
mode=”refine”
refine structure args.project_twodecname(res_prev)A.pdb; results with prefix args.project_twodecname(res_cur)A; ncyc given by comfile (default: ncyc 10)
mode=”comp”
use structure args.project_twodecname(res_cur)A.pdb; results with prefix args.project_twodecname(res_cur) A_comparison_at_twodecname(res_high)A; ncyc 0
mode=”prev_pair”
use structure args.project_twodecname(res_cur)A.pdb; results with prefix args.project_twodecname(res_cur) “A_comparison_at_twodecname(res_high)A_prev_pair; ncyc 0
mode=”first”
use structure xyzin_start); results with prefix structure args.project_twodecname(res_cur)A.pdb plus extra copy of log structure args.project_ twodecname(res_cur)A_comparison_at_twodecname(res_high)A.log; ncyc controlled by an option –prerefinement-ncyc (0 cycles by default, 20 cycles by default for the complete cross-validation); usually res_cur = res_init, res_high = res_init
If the input structure model was in mmCIF format, files with suffix .mmcif (not .pdb) are used.
Parameters: - res_cur (float) –
- res_prev (float or str) –
- res_high (float) –
- args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
- n_bins_low (int) –
- mode (str) – [“refine”, “comp”, “prev_pair”, “first”]
- res_low (float) –
- res_highest (float) –
- flag (int) –
- xyzin_start (str) – Filename of the model to be refined (valid only for mode=”first”)
- bfac_set (float) – Value of B-factor that will be set to all atoms before refinement (not used now)
Returns: Dictionary containing names of files that have been created by REFMAC5 and a version of REFMAC5, e. i. HKLOUT, XYZOUT, LOGOUT, and version (all str)
Return type: (dict)
pairef.graphs module¶
-
pairef.graphs.
matplotlib_bar
(args, values='R-values', flag_sets=[], ready_shells=[])¶ Plots and saves a bar chart using matplotlib.
If flag_sets is an empty list, it is assumed that the values are saved in the file args.project_values.csv.
In the other case, a chart showing results of complete cross-validation is ploted; the needed values are picked from files project_RXX_values.csv where XX is a number of a flag.
Parameters: Returns: Name of the PNG file containing the chart.
Return type:
-
pairef.graphs.
matplotlib_line
(shells, project, statistics, n_bins_low, title, flag=0, multiscale=False, filename_suffix='', refinement='refmac')¶ Plots statistics values (choice by statistics) project+”_”+twodecname(shells[*])+”A.csv”. Generate and save plot project+”_”+statistic+”.png using matplotlib.
Parameters: - shells (list) – containing float
- project (str) – Name of the project
- statistics (list) – List of names of statistics to be plotted (str)
- n_bins_low (int) –
- title (str) –
- flag (int) –
- multiscale (bool) – Use 2 different y-axis for data lines
- filename_suffix (str) –
- refinement (str) – “refmac” or “phenix”
Returns: Name of the generated file with plot (project+”_”+statistic+”.png)
Return type:
-
pairef.graphs.
write_log_html
(shells, ready_shells, args, versions_dict, flag_sets, res_cur=0, ready_merging_statistics=False, done=False)¶ Created html output log.
Parameters: Returns: Name of the created HTML file
Return type:
-
pairef.graphs.
xticklabels_compress
(list, n_max=13, depth=1)¶ If there are more than n_max bins, do not show all the labels in list.
Parameters: Returns: containing labels for a graph - compressed
Return type:
pairef.commons module¶
-
pairef.commons.
extract_from_file
(filename, searched, skip_lines, n_lines, nth_word=False, not_found='stop', get_first=False)¶ Returns line(s) or word relating to the search based on searched string in the file filename.
If the filename is not found, abort (always). Default behavior: The last case of searched string match is used and if the searched string is not found in the file filename, the program is stopped with an error message.
Parameters: - filename (str) – Name of the file
- searched (str) – Searched string
- skip_lines (int) – Number of lines to be skipped
- n_lines (int) – Number of lines that should be returned
- nth_word (bool or int) – False if lines should be returned or a order in a line of the word that should be picked
- not_found (str) – If the searched string is not found, write an error message and exit (if not_found=”stop”, default option) or (if not_found=”N/A”) return “N/A” or [“N/A”].
- get_first (bool) – Use the first case of searched string match.
Returns: List containing number of lines (controlled by n_lines) with the skip_line offset (if nth_word=False) or picked word (nth_word-th word) in the skip_lines-th following line (if nth_word=True)
Return type:
-
pairef.commons.
fourdec
(var)¶ Returns number with 4 decimals as a string. If a float is not given, it returns a string.
Parameters: var (float) – Returns: str
-
pairef.commons.
try_symlink
(src, dst)¶ Make new symlink to src if the dst file does not exist yet. If it is not possible to make symlinks (difficulties on Windows), just make a copy of the file even if exists already.
Parameters: Returns: True
Return type:
-
pairef.commons.
twodec
(var)¶ Returns number with 2 decimals as a string. If a float is not given, it returns a string.
Parameters: var (float) – Returns: str
-
pairef.commons.
twodecname
(var)¶ Returns number with 2 decimals but intead of decimal point is used “-“
Parameters: var (float) – Returns: str
-
pairef.commons.
warning_my
(key, message)¶