Description of functions

Module contents

pairef.run_pairef(input_args=None)

THE LAUNCHING FUNCTION - to process input arguments using launcher.process_arguments() and launch the launcher.main() function.

Parameters:input_args (list) – List of input arguments - parameters

pairef.launcher module

class pairef.launcher.MyArgumentParser(prog=None, usage=None, description=None, epilog=None, version=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True)

Bases: argparse.ArgumentParser

Helper class for argparse

It adds an attribute add_argument_with_check that check if file given in argument exist but not open it. Inspired by https://codereview.stackexchange.com/questions/28608/ checking-if-cli-arguments-are-valid-files-directories-in-python

add_argument_with_check(*args, **kwargs)

New attribute for argparse that checks if file given in argument exist but does not open it.

error(message: string)

Prints a usage message incorporating the message to stderr and exits.

If you override this in a subclass, it should not return – it should either exit or raise an exception.

pairef.launcher.check_non_negative_int(value)
pairef.launcher.check_positive_float(value)
pairef.launcher.check_positive_int(value)
pairef.launcher.main(args)

The main function of the pairef module.

Parameters:args – Input arguments processed by argparse
pairef.launcher.process_arguments(input_args)

Processes input arguments using argparse.

Parameters:input_args (list) – Input arguments
Returns:Processed arguments
Return type:list
pairef.launcher.run_pairef(input_args=None)

THE LAUNCHING FUNCTION - to process input arguments using launcher.process_arguments() and launch the launcher.main() function.

Parameters:input_args (list) – List of input arguments - parameters

pairef.preparation module

pairef.preparation.calculate_merging_stats(hklin_unmerged, shells, project, bins_low, res_low_from_hklin_unmerged=inf, res_high_from_hklin_unmerged=0)

For given file hklin_unmerged, calculate the merging statistics using CCTBX.

Parameters:
  • hklin_unmerged (str) – Name of the unmerged diffraction data file
  • shells (list) –
  • project (str) – Name of the project
  • bins_low (list) –
  • costant_reflections_in_bins (bool) –
Returns:

Name of a CSV file where the calculated statistics has been saved

Return type:

str

pairef.preparation.check_refinement_software(args, versions_dict, refinement='refmac')

Check if the input structure model args.xyzin was refined in REFMAC5 or phenix.refine and in which version (PDB and mmCIF format is accepted). If it was not refined or if it was refined in another version of REFMAC5 or phenix.refine than is now installed, write warning.

Parameters:
  • args.xyzin (str) – Input structure model (PDB or mmCIF format)
  • versions_dict (dict) – Dictionary containing a key refmac_version or phenix_version.
  • refmac_version_installed (str) – Version of REFMAC5 or phenix.refine that is installed
Returns:

version of REFMAC5 or phenix.refine that was used for the refinement of args.xyzin, if it was not found, “N/A” is returned

Return type:

str

pairef.preparation.create_workdir(project)

Create directory pairef_`project`, if exists, create directory pairef_`project`_new (recursively).

Parameters:project (str) – Name of the project
Returns:Name of the created working directory
Return type:str
pairef.preparation.def_res_shells(args, refinement, res_high_mtz, res_low=999)

Determine high resolution shells and number of low resolution bins.

The first value of list should be the resolution of data which were used for the refinement of the input structure model. If explicit definition (args.res_shells) is set, test its correctness. If it is valid, use, if not, define it automatically (shell step 0.05 A)

Parameters:
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • refinement (str) – “refmac” or “phenix”
  • res_high_mtz (float) – High resolution diffraction limit of args.hklin
  • res_low (float) – Low resolution diffraction limit of args.hklin
Returns:

  • shells (list)
  • n_bins_low (int)
  • n_flag_sets (int)
  • default_shells_definition (bool)

Return type:

(tuple)

class pairef.preparation.output_log(stdout, filename)

Set to write sys.stdout to screen and also in file PAIREF_out.log.

close()
flush()
write(text)
pairef.preparation.res_from_hklin_unmerged(hklin_unmerged)

Finds a resolution range for given unmerged diffr. data file hklin_unmerged. In the case of an ASCII file from XDS, find it by searching the option INCLUDE_RESOLUTION_RANGE in the file. In the case of an MTZ file, find it using xia2. In the case of a SCA file, do not check.

Parameters:hklin_unmerged (str) – Name of the unmerged diffr. data file
Returns:
  • res_low (float): Low resolution diffraction limit
  • res_high (float): High resolution diffraction limit
Return type:(tuple)
pairef.preparation.res_from_mtz(hklin)

Finds the low resolution diffraction limit of data hklin using CCTBX.

Parameters:hklin (str) – Name of diffraction data MTZ file
Returns:
  • res_low (float): Low resolution diffraction limit
  • res_high (float): High resolution diffraction limit
Return type:(tuple)
pairef.preparation.res_high_from_xyzin(xyzin, format='.pdb')

Finds a line containing RESOLUTION RANGE HIGH in the file xyzin, picks the last word of the line (that should be the high resolution) and rounds it to two decimals. If it is not successful, returns -1.

Parameters:xyzin (str) – Input structure model (PDB or mmCIF format)
Returns:initial high resolution limit rounded to two decimals or -1 if it was not found
Return type:float
pairef.preparation.res_opt(shell, args, refinement='refmac')

Finds optical resolution running command sfcheck -f project_Rflag_shellA.mtz -m project_Rflag_shellA.pdb and writes the gained value into project_Optical_resolution.csv PDB format is required, does not work with mmCIF.

Parameters:
  • shell (float) – Current resolution shell
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
Returns
float: optical resolution
pairef.preparation.run_baverage(project, xyzin, res_init)

Finds average value of B-factors of all the atoms in the structure model xyzin using baverage from the CCP4 package. This creates files with a prefix project_twodecname(res_init)A_baverage. Then returns mean B-factor for all the atoms of the structure model to the obtained value.

Parameters:
  • project (str) – Name of the project
  • xyzin (str) – Filename of structure model in PDB or mmCIF format
  • res_init (float) – Resolution of the input structure model
Returns:

Mean B-factor for all the atoms

Return type:

float

pairef.preparation.run_pdbtools(args, baverage=0)

Modify the input structure model args.xyzin by mmtbx.pdbtools. The procces is controlled by args.reset_bfactor, args.add_to_bfactor, args.set_bfactor, and args.shake_sites. Reset of B-factors requires set of baverage. The modified structure model: project_twodecname(res_init)A_modified.

Parameters:
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • baverage (float) – Mean B-factor for all the atoms
Returns:

Filename of the modified structure model (PDB or mmCIF format)

Return type:

str

pairef.preparation.welcome(args, pairef_version)

Print introduction information about the module and input parameters.

Parameters:
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • pairef_version (str) –
Returns:

True

Return type:

bool

pairef.preparation.which(program)

Checks if program exists and finds its location. Analogy of the which GNU/Linux command.

Parameters:program (str) – Name of an executable
Returns:Path of an executable location
Return type:str

pairef.refinement module

pairef.refinement.calculate_correlation(hkl_calc, hklin, flag=0, res_low=None, res_high=None)

Calculates CCwork and CCfree using sftools.

Parameters:
  • hkl_calc (str) – Name of the MTZ file from a REFMAC5 run
  • hklin (str) – Name of the MTZ file with diffraction data
  • flag (int) – free reflection flag set
  • res_low (float) – low-resolution cutoff
  • res_high (float) – high-resolution cutoff
Returns:

tuple containing CCwork and CCfree (both are float or str: “N/A”)

Return type:

(tuple)

pairef.refinement.collect_stat_BINNED(shells, project, hklin, n_bins_low, flag, res_low, refinement='refmac')

Collects statistics of a particular structure model depending on resolution (e. i. values are binned) and saves them in a CSV file. Statistics are picked from a REFMAC5 logfile relating to the model project_RXX_twodecname(shells[-1])A.pdb, where XX is a number of a flag.

This function is called by the function main() in file launcher.py.

Parameters:
  • shells (list) – from the initial to the last which were used for that model (float).
  • project (str) – Name of the project
  • hklin (str) – Name of diffraction data MTZ file
  • n_bins_low (int) –
  • flag (int) –
  • res_low (float) – Low resolution of diffraction data hklin (required for calling the function collect_stat_refmac_log_low())
  • refinement (str) – “refmac” or “phenix”
Returns:

Name of the created CSV file

Return type:

str

pairef.refinement.collect_stat_OVERALL(shells, args, flag, refinement='refmac')

Collects overall statistics of a particular structure model from a REFMAC5 logfile or from PDB file (phenix.refine). Model which is dealing with: project_RXX_twodecname(shells[-1])A.pdb (REFMAC5) or project_RXX_twodecname(shells[-1])A_001.pdb (phenix.refine), where XX is a number of a flag.

This function is called by the function main() in file launcher.py.

Parameters:
  • shells (list) – Contains high resolution diffraction limits from the initial to the last which were used for that model (float).
  • project (str) – Name of the project
  • flag (int) –
  • refinement (str) – “refmac” or “phenix”
Returns:

tuple containing names of created CSV files

Return type:

(tuple)

pairef.refinement.collect_stat_OVERALL_AVG(shells, project, flag_sets)

Calculates and saves average overall values from CSV files prepared by the function collect_stat_OVERALL().

This function is called by the function main() in file launcher.py.

Parameters:
  • shells (list) – Contains high resolution diffraction limits from the initial to the last which were used for that model (float).
  • project (str) – Name of the project
  • flag_sets (list) – List of free reflection flag sets (int)
Returns:

tuple containing names of created CSV files

Return type:

(tuple)

pairef.refinement.collect_stat_binned_phenix_high(pdbfilename, n_bins_low)

Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from 1 shells.

This function is called by the function collect_stat_BINNED().

Parameters:
  • logfilename (str) – Name of a REFMAC5 logfile
  • n_bins_low (int) – Number of low resolution bins
Returns:

tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)

Return type:

(tuple)

pairef.refinement.collect_stat_binned_phenix_low(pdbfilename, n_bins_low, res_low=999)

Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from n_bins_low shells.

This function is called by the function collect_stat_BINNED() from this file and main() from file launcher.py.

Parameters:
  • pdbfilename (str) – Name of a REFMAC5 logfile
  • n_bins_low (int) – Number of low resolution bins
Returns:

tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)

Return type:

(tuple)

pairef.refinement.collect_stat_binned_refmac_high(logfilename, mtzfilename, hklin, n_bins_low, res_low, res_high, flag=0)

Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from 1 shells.

This function is called by the function collect_stat_BINNED().

Parameters:
  • logfilename (str) – Name of a REFMAC5 logfile
  • n_bins_low (int) – Number of low resolution bins
Returns:

tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)

Return type:

(tuple)

pairef.refinement.collect_stat_binned_refmac_low(logfilename, mtzfilename, hklin, n_bins_low, res_low=999, flag=0)

Picks and returns statistics values in the given REFMAC5 logfile. Logfile supposed to contain information from n_bins_low shells.

This function is called by the function collect_stat_BINNED() from this file and main() from file launcher.py.

Parameters:
  • logfilename (str) – Name of a REFMAC5 logfile
  • n_bins_low (int) – Number of low resolution bins
Returns:

tuple containing statistics values bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, and bin_CCfree (all are str)

Return type:

(tuple)

pairef.refinement.collect_stat_overall_phenix(pdbfilename)

Picks and returns overall Rwork, Rfree values from a given PDB file refined in phenix.refine.

This function is called by the functions collect_stat_OVERALL() and collect_stat_BINNED().

Parameters:pdbfilename (str) – Filename of a PDB structure model refined in phenix.refine
Returns:tuple containing statistics values Rwork, Rfree (all are str)
Return type:(tuple)
pairef.refinement.collect_stat_overall_refmac(logfilename, flag=0)

Picks and returns overall Rwork, Rfree from a given REFMAC5 logfile.

This function is called by the functions collect_stat_OVERALL() and collect_stat_BINNED().

Parameters:logfilename (str) – Filename of a REFMAC5 logfile
Returns:tuple containing statistics values Rwork and Rfree (both are str)
Return type:(tuple)
pairef.refinement.collect_stat_write(csvfilename, bin_res_low, bin_res_high, bin_Nwork, bin_Nfree, bin_Rwork, bin_Rfree, bin_CCwork, bin_CCfree, shell_number=1)

Saves the given statistics values to a CSV file.

This function is called by the function collect_stat_BINNED().

Parameters:
  • csvfilename (str) – Filename of the CSV file to be created.
  • bin_* (list) – Lists with statistics values
  • shell_number (int) – Number of the shell corresponding to the statistics values
Returns:

Filename of the created CSV file.

Return type:

str

pairef.refinement.refinement_phenix(res_cur, res_prev, res_high, args, n_bins=1, mode='refine', res_low=0, res_highest=0, flag=0, xyzin_start='', bfac_set=0, label=None)

Refine using phenix.refine.

Resolution limit is controlled by parameters res_high and res_low. Names of files are controlled by parameters res_cur and res_high.

There are 4 different modes:

  • mode=”refine”

    refine structure args.project_twodecname(res_prev)A_001.pdb; results with prefix args.project_twodecname(res_cur)A_001; number_of_macro_cycles given by deffile (default: 3)

  • mode=”comp”

    use structure args.project_twodecname(res_cur)A_001.pdb; results with prefix args.project_twodecname(res_cur) A_comparison_at_twodecname(res_high)A_001; number_of_macro_cycles=1, refine.strategy=None, ordered_solvent=False

  • mode=”prev_pair”

    use structure args.project_twodecname(res_cur)A_001.pdb; results with prefix args.project_twodecname(res_cur) “A_comparison_at_twodecname(res_high)A_prev_pair_001; number_of_macro_cycles=1, refine.strategy=None, ordered_solvent=False

  • mode=”first”

    use structure xyzin_start; results with prefix structure args.project_twodecname(res_cur)A_001.pdb plus extra copy of log structure args.project_ twodecname(res_cur)A_comparison_at_twodecname(res_high)A_001.log; number_of_macro_cycles controlled by an option –prerefinement-ncyc (1 cycles with no strategy and no ordered solvent by default, 6 cycles by default for the complete cross-validation); usually res_cur = res_init, res_high = res_init

If the input structure model was in mmCIF format, files with suffix .cif (not .pdb) are used.

Parameters:
  • res_cur (float) –
  • res_prev (float or str) –
  • res_high (float) –
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • n_bins (int) –
  • mode (str) – [“refine”, “comp”, “prev_pair”, “first”]
  • res_low (float) –
  • res_highest (float) –
  • flag (int) –
  • xyzin_start (str) – Filename of the model to be refined (valid only for mode=”first”)
  • bfac_set (float) – Value of B-factor that will be set to all atoms before refinement (not used now)
Returns:

Dictionary containing names of files that have been created by phenix.refine and a version of phenix.refine, e. i. HKLOUT, XYZOUT, LOGOUT, and version (all str)

Return type:

(dict)

pairef.refinement.refinement_phenix_get_label(outout)

Get possible choices of refinement.input.xray_data.labels from standard output from phenix.refine (saved in file outout) if multiple equally suitable arrays of observed xray data were found.

Parameters:out (str) – Filename
Returns:tuple containing the chosen label and all the found labels - both are (str)
Return type:(tuple)
pairef.refinement.refinement_refmac(res_cur, res_prev, res_high, args, n_bins_low, mode='refine', res_low=0, res_highest=0, flag=0, xyzin_start='', bfac_set=0)

Refine using REFMAC5.

Resolution limit is controlled by parameters res_high and res_low. Names of files are controlled by parameters res_cur and res_high.

There are 4 different modes:

  • mode=”refine”

    refine structure args.project_twodecname(res_prev)A.pdb; results with prefix args.project_twodecname(res_cur)A; ncyc given by comfile (default: ncyc 10)

  • mode=”comp”

    use structure args.project_twodecname(res_cur)A.pdb; results with prefix args.project_twodecname(res_cur) A_comparison_at_twodecname(res_high)A; ncyc 0

  • mode=”prev_pair”

    use structure args.project_twodecname(res_cur)A.pdb; results with prefix args.project_twodecname(res_cur) “A_comparison_at_twodecname(res_high)A_prev_pair; ncyc 0

  • mode=”first”

    use structure xyzin_start); results with prefix structure args.project_twodecname(res_cur)A.pdb plus extra copy of log structure args.project_ twodecname(res_cur)A_comparison_at_twodecname(res_high)A.log; ncyc controlled by an option –prerefinement-ncyc (0 cycles by default, 20 cycles by default for the complete cross-validation); usually res_cur = res_init, res_high = res_init

If the input structure model was in mmCIF format, files with suffix .mmcif (not .pdb) are used.

Parameters:
  • res_cur (float) –
  • res_prev (float or str) –
  • res_high (float) –
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • n_bins_low (int) –
  • mode (str) – [“refine”, “comp”, “prev_pair”, “first”]
  • res_low (float) –
  • res_highest (float) –
  • flag (int) –
  • xyzin_start (str) – Filename of the model to be refined (valid only for mode=”first”)
  • bfac_set (float) – Value of B-factor that will be set to all atoms before refinement (not used now)
Returns:

Dictionary containing names of files that have been created by REFMAC5 and a version of REFMAC5, e. i. HKLOUT, XYZOUT, LOGOUT, and version (all str)

Return type:

(dict)

pairef.graphs module

pairef.graphs.matplotlib_bar(args, values='R-values', flag_sets=[], ready_shells=[])

Plots and saves a bar chart using matplotlib.

If flag_sets is an empty list, it is assumed that the values are saved in the file args.project_values.csv.

In the other case, a chart showing results of complete cross-validation is ploted; the needed values are picked from files project_RXX_values.csv where XX is a number of a flag.

Parameters:
  • args (parser) – Input arguments (including e. g. name of the project) parsed by argparse via function process_arguments()
  • values (str) – expected value: “R-values” (not ready yet: “CC-values”)
  • flag_sets (list) – List of free reflection flag sets (int)
  • ready_shells (list) –
Returns:

Name of the PNG file containing the chart.

Return type:

str

pairef.graphs.matplotlib_line(shells, project, statistics, n_bins_low, title, flag=0, multiscale=False, filename_suffix='', refinement='refmac')

Plots statistics values (choice by statistics) project+”_”+twodecname(shells[*])+”A.csv”. Generate and save plot project+”_”+statistic+”.png using matplotlib.

Parameters:
  • shells (list) – containing float
  • project (str) – Name of the project
  • statistics (list) – List of names of statistics to be plotted (str)
  • n_bins_low (int) –
  • title (str) –
  • flag (int) –
  • multiscale (bool) – Use 2 different y-axis for data lines
  • filename_suffix (str) –
  • refinement (str) – “refmac” or “phenix”
Returns:

Name of the generated file with plot (project+”_”+statistic+”.png)

Return type:

str

pairef.graphs.write_log_html(shells, ready_shells, args, versions_dict, flag_sets, res_cur=0, ready_merging_statistics=False, done=False)

Created html output log.

Parameters:
  • shells (list) –
  • ready_shells (list) –
  • args
  • versions_dict (dict) – Dictionary containing keys “refmac_version” and “pairef_version”
  • flag_sets (list) –
  • res_cur (float) –
  • ready_merging_statistics (bool) –
  • done (bool) –
Returns:

Name of the created HTML file

Return type:

str

pairef.graphs.xticklabels_compress(list, n_max=13, depth=1)

If there are more than n_max bins, do not show all the labels in list.

Parameters:
  • list (list) – containing labels for a graph
  • n_max (int) – maximal allowed number of values in list that are not “”
  • depth (int) –
Returns:

containing labels for a graph - compressed

Return type:

list

pairef.commons module

pairef.commons.extract_from_file(filename, searched, skip_lines, n_lines, nth_word=False, not_found='stop', get_first=False)

Returns line(s) or word relating to the search based on searched string in the file filename.

If the filename is not found, abort (always). Default behavior: The last case of searched string match is used and if the searched string is not found in the file filename, the program is stopped with an error message.

Parameters:
  • filename (str) – Name of the file
  • searched (str) – Searched string
  • skip_lines (int) – Number of lines to be skipped
  • n_lines (int) – Number of lines that should be returned
  • nth_word (bool or int) – False if lines should be returned or a order in a line of the word that should be picked
  • not_found (str) – If the searched string is not found, write an error message and exit (if not_found=”stop”, default option) or (if not_found=”N/A”) return “N/A” or [“N/A”].
  • get_first (bool) – Use the first case of searched string match.
Returns:

List containing number of lines (controlled by n_lines) with the skip_line offset (if nth_word=False) or picked word (nth_word-th word) in the skip_lines-th following line (if nth_word=True)

Return type:

list or str

pairef.commons.fourdec(var)

Returns number with 4 decimals as a string. If a float is not given, it returns a string.

Parameters:var (float) –
Returns:str

Make new symlink to src if the dst file does not exist yet. If it is not possible to make symlinks (difficulties on Windows), just make a copy of the file even if exists already.

Parameters:
  • src (str) – File name of the source
  • dst (str) – File name of the destination
Returns:

True

Return type:

bool

pairef.commons.twodec(var)

Returns number with 2 decimals as a string. If a float is not given, it returns a string.

Parameters:var (float) –
Returns:str
pairef.commons.twodecname(var)

Returns number with 2 decimals but intead of decimal point is used “-“

Parameters:var (float) –
Returns:str
pairef.commons.warning_my(key, message)

pairef.gui module

Indices and tables