| | |
- build_baseline(args)
- Build a baseline for protospacer-spacer comparison using the provided configuration
and input files.
Parameters:
args (argparse.Namespace): Command line arguments containing paths to necessary input files and output directories.
Returns:
None
The function reads in the configuration file, creates necessary output
directories, copies the protospacers file to the output directory, builds a BLAST
database, and saves metadata about the baseline build.
- build_command(args)
- Build command based on the specified method and arguments.
Parameters:
args (argparse.Namespace): Command line arguments containing information
about the method to be used and input/output paths.
Returns:
None
The function checks the specified method and calls either `build_baseline`
or `build_vdphage` accordingly.
- build_vdphage(args)
- Build VDPhage model using provided arguments.
Args:
args (dict): Arguments containing necessary information for building the model.
Returns:
None. The function saves the model and writes metadata to a file.
- count_kmers(config, inputfile, k)
- Count the occurrences of k-mers in a given sequence file using specified command and parameters.
Parameters:
config (dict): A configuration dictionary containing necessary information for counting k-mers.
inputfile (str): The path to the input sequence file.
k (int): The length of the k-mer to be counted.
Returns:
None
The function constructs a command string based on the configuration and then executes that command.
- get_max(x)
- Calculate and return the maximum alignment span, query, and hit from a DataFrame.
Parameters:
x (DataFrame): The input DataFrame containing 'aln_span', 'query', and 'hit' columns.
Returns:
tuple: A tuple containing three elements - the maximum alignment span value,
the corresponding query sequence, and the corresponding hit sequence.
- load_blast(fn)
- Load and parse BLAST XML results from a file and extract the hits for each query.
Parameters:
fn (str): The path to the BLAST XML result file.
Returns:
list of lists: A list containing sublists, where each sublist corresponds to a query
in the input BLAST XML file and contains a list of Hit objects
with their respective HSPs (high-scoring pairs).
- load_items(fn)
- Load a list of items from a file and remove any newline characters.
Parameters:
fn (str): The path to the file containing a list of items, with each item
on a separate line.
Returns:
list: A list containing the loaded items after removing any newline characters.
- load_protospacers(protospacers)
- Load protospacer information from a file and parse the data into a dictionary and list of phage names.
Parameters:
protospacers (str): The path to the file containing protospacer sequences in FASTA format.
Returns:
tuple: A tuple containing two elements - a dictionary mapping protospacer IDs to their sequences,
and a list of phage names extracted from the input file.
- load_spacers(spacers)
- Load spacer sequences from a file and parse the data into a dictionary.
Parameters:
spacers (str): The path to the file containing spacer sequences in FASTA format.
Returns:
dict: A dictionary mapping spacer names to their respective sequences.
- load_vectors(fn)
- Load vectorized data from a file and perform L1 normalization on each feature.
Parameters:
fn (str): The path to the file containing the vectorized data in .npy format.
Returns:
numpy.ndarray: A normalized 2D array where each row represents a vector and each
column represents a feature, after applying L1 normalization.
- make_vicinity(vectors, items)
- Create a Vicinity index using the given vectors and items with cosine similarity as the metric.
Parameters:
vectors (numpy.ndarray): A 2D array of vectorized data where each row is a vector.
items (list): A list of item identifiers corresponding to each vector.
Returns:
vicinity.Vicinity: A Vicinity index object containing the vectors and items with cosine similarity
as the metric for querying similarities between items.
- parse_baseline(flat_result, spacers, protospacers)
- Parse the results of a BLASTN search against a baseline database to generate a summary.
Parameters:
flat_result (list): A flattened list of result objects from the BLASTN search.
spacers (dict): A dictionary mapping spacer IDs to their sequences.
protospacers (dict): A dictionary mapping protospacer IDs to their sequences and phage names.
Returns:
summary_df (DataFrame): A DataFrame containing the summary of the BLASTN
results, sorted by alignment span.
- print_and_run(cmd)
- Print the given command and execute it using the operating system's shell.
Parameters:
cmd (str): The command to be executed.
Returns:
None
- query_baseline(args, meta)
- Query a previously built baseline database with new spacer sequences and
generate a summary of the results.
Parameters:
args (argparse.Namespace): Command line arguments containing paths to
necessary input files and output directory.
meta (dict): Metadata about the baseline build, including the path to the
protospacers file and the method used.
Returns:
None
The function reads the configuration from a JSON file, creates a temporary
directory if it doesn't exist, performs a BLASTN search with the new spacers
against the baseline database, parses the results, and saves a summary as a TSV file.
- query_command(args)
- Query a model based on the method specified in metadata.
Args:
args (dict): Arguments containing necessary information for querying the model
Returns:
None. The function calls other functions to handle query based on method type.
- query_vdphage(args, meta)
- Query the VDPhage model with a set of spacers and return top results.
Args:
args (dict): Arguments containing necessary information for querying the model.
meta (dict): Metadata dictionary including protospacers and k value used in model build
Returns:
None. The function saves query results to a CSV file.
|