Python Usage
Overview
For a basic example of importing AIRRSHIP as a package, see here.
Detailed Function and Class Documentation
create_repertoire.load_data
Loads and processes required data files from data folder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_folder |
path
|
Path to data folder with required data. If not specified then uses inbuilt package data. Defaults to None. |
None
|
mutate |
bool
|
Whether to read in data for mutated sequences or not. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
data_dict |
dict
|
Dictionary containing all required data for generating sequences. Includes family_use_dict, gene_use_dict, trim_dicts, NP_transitions, NP_first_bases, NP_lengths, mut_rate_per_seq and kmer_dicts. |
create_repertoire.get_genotype
Wrapper that generates a locus for use in sequence generations
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_folder |
path
|
Path to data folder with required data. When not specified will use package data. Defaults to None. |
None
|
het_list |
list
|
Proportion of genes [V, D, J] to be heterozygous. Defaults to [1, 1, 1]. |
[1, 1, 1]
|
haplotype |
bool
|
True when only two alleles per gene are to be used. Defaults to True. |
True
|
locus |
path
|
Path to file with predefined locus. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
locus |
list
|
List of two dictionaries. Each is a dictionary containing the gene segment as keys and the chosen alleles as values. Format is {Segment : [Allele, Allele ...], ...} |
create_repertoire.generate_sequence
Wrapper to bring together entire sequence generation process.
Recombines, trims and mutates. Optional produces functional sequences (sequences with an in-frame V and J gene, no stop codons and the expected junction anchor residues) or non-functional sequences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
locus |
list
|
List of two dictionaries. Each is a dictionary containing the gene segment as keys and the chosen alleles as values. Format is {Segment : [Allele, Allele ...], ...} |
required |
data_dict |
dict
|
Output of load_data(). Includes family_use_dict, gene_use_dict, trim_dicts, NP_transitions, NP_first_bases, NP_lengths, mut_rate_per_seq and kmer_dicts. |
required |
mutate |
bool
|
True if SHM to be introduced. Defaults to False. |
False
|
flat_usage |
optional
|
gene, family or False. Gene or family specify that sequences should use all genes or gene families evenly. If false, usage follows experimental distributions. Defaults to False. |
False
|
no_trim_list |
tuple
|
List of 5 Booleans, specifying whether to not trim [all_ends, v_3_end, d_5_end, d_3_end, j_5_end]. Defaults to (False, False, False, False, False). |
(False, False, False, False, False)
|
no_np_list |
tuple
|
List of 3 Booleans, specifying whether to not add [both_np, np1, np2]. Defaults to (False, False, False). |
(False, False, False)
|
shm_flat |
bool
|
True if SHM is to be even across all sequences. Defaults to False. |
False
|
shm_random |
bool
|
True if per base mutation is to be random. Defaults to False. |
False
|
mutation_rate |
float
|
Mutation rate to be used rather than choosing from distribution. Defaults to None. |
None
|
mutation_number |
int
|
Number of mutations to be added rather than choosing from distribution. Defaults to None. |
None
|
mut_multiplier |
float
|
Multiplier to be used on mutation rates pulled from distribution. |
1
|
non_functional |
bool
|
Return non-functional sequences. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
sequence |
Sequence
|
Final recombined sequence, with trimming, NP region addition and SHM if requested. |
create_repertoire.Sequence
Represents a recombined Ig sequence consisting of V, D and J segments.
Attributes:
| Name | Type | Description |
|---|---|---|
v_allele |
Allele
|
IMGT V gene allele. |
d_allele |
Allele
|
IMGT D gene allele. |
j_allele |
Allele
|
IMGT J gene allele. |
alleles |
list
|
List of IMGT alleles. |
NP1_region |
str
|
NP1 region - between V and D gene. |
NP1_length |
int
|
Length of NP1 region. |
NP2_region |
str
|
NP2 region - between V and D gene. |
NP2_length |
int
|
Length of NP2 region. |
ungapped_seq |
str
|
Ungapped nucleotide sequence. |
gapped_seq |
str
|
Gapped nucleotide sequence. |
mutated_seq |
str
|
Ungapped mutated nucleotide sequence. |
gapped_mutated_seq |
str
|
Ungapped mutated nucleotide sequence. |
mutated_seq |
str
|
Ungapped mutated nucleotide sequence. |
junction |
str
|
Nucleotide sequence of junction region. |
v_seq |
str
|
Nucleotide sequence of V region. |
d_seq |
str
|
Nucleotide sequence of D region. |
j_seq |
str
|
Nucleotide sequence of J region. |
v_seq_start |
int
|
Start position of V region. |
d_seq_start |
int
|
Start position of D region. |
j_seq_start |
int
|
Start position of J region. |
v_seq_end |
int
|
End position of V region. |
d_seq_end |
int
|
End position of D region. |
j_seq_end |
int
|
End position of J region. |
mutations |
str
|
Mutation events. |
mut_count |
int
|
Mutation count. |
mut_freq |
int
|
Mutation frequency. |
functional |
bool
|
Sequence is functional. |
stop |
bool
|
Presence/absence of stop codon. |
anchors |
bool
|
Presence/absence correct junction anchors. |
inframe |
bool
|
VJ is in-frame. |
__init__(v_allele, d_allele, j_allele)
Initialises a Sequence class instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v_allele |
Allele
|
IMGT V gene allele, required. |
required |
d_allele |
Allele
|
IMGT D gene allele, required. |
required |
j_allele |
Allele
|
IMGT J gene allele, required. |
required |
get_junction_length()
Calculates the junction length of the sequence (CDR3 region plus both anchor residues).
Returns:
| Name | Type | Description |
|---|---|---|
junction_length |
int
|
Number of nucleotides in junction (CDR3 + anchors) |
get_nuc_seq(no_trim_list, trim_dicts, no_np_list, NP_lengths, NP_transitions, NP_first_bases, gapped=False)
Creates the recombined nucleotide sequence with trimming and np addition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
no_trim_list |
list
|
List of 5 Booleans, specifying whether to not trim [all_ends, v_3_end, d_5_end, d_3_end, j_5_end]. |
required |
trim_dicts |
dict
|
A dictionary of dictionaries of trimming length proportions by gene family for each segment (V, D or J). |
required |
no_np_list |
list
|
List of 3 Booleans, specifying whether to not add [both_np, np1, np2]. |
required |
NP_lengths |
dict
|
Dictionary of possible NP region lengths and the proportion of sequences to use them. In the format {NP region length: proportion}. |
required |
NP_transitions |
dict
|
Nested dictionary containing transition matrix of probabilities of moving from one nucleotide (A, C, G, T) to any other for each position in the NP region. |
required |
NP_first_bases |
dict
|
Nested dictionary of the proportion of NP sequences starting with each base for NP1 and NP2. gapped (bool): Specify whether to return sequence with IMGT gaps or not. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
nuc_seq |
str
|
The recombined nucleotide sequence. |
create_repertoire.Allele
Class that represents a V, D or J allele.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The IMGT name of the allele |
gapped_seq |
str
|
The IMGT gapped germline nucleotide sequence |
length |
str
|
IMGT defined length of the allele |
ungapped_sq |
str
|
Ungapped germline nucleotide sequence |
trim_5 |
int
|
Number of nucleotides to be trimmed from 5' end |
trim_3 |
int
|
Number of nucleotides to be trimmed from 3' end |
__init__(name, gapped_seq, length)
Initialises an Allele class instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
The IMGT name of the allele |
required |
gapped_seq |
str
|
The IMGT gapped nucleotide sequence |
required |
length |
str
|
IMGT defined length of the allele |
required |
get_trim_length(no_trim_list, trim_dicts)
Chooses trimming lengths for allele.
Adds two class attributes - trim_3, 3' prime trimming value and trim_5, 5' prime trimming value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
no_trim_list |
list
|
List of 5 Booleans, specifying whether to not trim [all_ends, v_3_end, d_5_end, d_3_end, j_5_end]. |
required |
trim_dicts |
dict
|
A dictionary of dictionaries of trimming length proportions by gene family for each segment (V, D or J). |
required |