Commandline Usage
airrship [-h] [-v] -o OUT_NAME [--outdir OUTDIR] [--datadir DATADIR]
[-n NUMBER_SEQS] [--het PROP PROP PROP] [--shm] [--shm_multiplier SHM_MULTIPLIER]
[--shm_flat] [--mut_rate MUT_RATE | --mut_num MUT_NUM] [--shm_random]
[--all_alleles] [--locus LOCUS FILE]
[--flat_vdj {gene,family,False}] [--no_trim]
[--no_trim_v3] [--no_trim_d3] [--no_trim_d5] [--no_trim_j5]
[--no_np][--no_np1] [--no_np2] [--non_productive]
[--prop_non_productive PROP] [--seed SEED] [--species SPECIES]
Parameters
| Option | Details |
|---|---|
| -o, --outname <out_prefix> | Name for repertoire files. Only required parameter. |
| --outdir <out_dir> | Output directory. The current working directory is used if not specified. |
| --datadir <data_dir> | Alternative input data directory. Defaults to airrship/data. Data must be formatted as in the airrship data directory |
| -n, --number_seqs <n_seqs> | Number of sequences to simulate. Defaults to 1000. |
| --het < prop prop prop > | Proportion of genes to be heterozygous, specify as V D J. Values must be between 0 and 1. Not compatible with --all_alleles. Not all genes have more than one allele. The proportion achieved may therefore be lower than requested. Defaults to 0 0 0 (the maximum possible proportion of heterozygous genes using the included IMGT alleles). |
| --shm | Hypermutate sequences according to experimental parameters. Each base will be mutated according to its 5mer context and the mutation frequency for each sequence will match observed distributions. If not specified, sequences will not be mutated. Mutation rates can be controlled by replacing the mut_freq_per_seq_per_family.csv reference file or specifying a --shm_multiplier. |
| --shm_multiplier | Multiplication factor to use on per sequence mutation rate distribution. Defaults to 1, i.e. replicates frequencies from the mut_freq_per_seq_per_family.csv reference file. |
| --shm_flat | Mutate each sequence to the same degree (i.e. return a flat per sequence mutation distribution). Specify degree of mutation using --mut_num or --mut_rate. Will default to a mutation rate of 0.05. |
| --shm_random | Do not mutate individual bases according to kmer context. Each base will have an equal chance of being mutated. |
| --mut_rate <mut_rate> | Mutation frequency for flat SHM only. Value between 0 and 0.6. Not compatible with --mut_num. Defaults to 0.05. |
| --mut_num <number_muts> | Number of mutations for flat SHM only. Not compatible with --mut_rate. |
| --all_alleles | Use all available alleles from all available genes, i.e., do not generate a synthetic 'haplotype'. Not compatible with --het. |
| --locus <locus_file> | Do not generate a new locus, instead specify path to an existing csv file to use as locus for repertoire generation. |
| --vdj_flat {gene, family} | Do not use experimental data to bias VDJ usage, instead use all genes or families evenly. |
| --no_trim | Don't trim any end of any VDJ genes during recombination. |
| --no_trim_v3 | Don't trim 3' end of V genes during recombination. |
| --no_trim_d5 | Don't trim 5' end of D genes during recombination. |
| --no_trim_d3 | Don't trim 3' end of D genes during recombination. |
| --no_trim_j5 | Don't trim 5' end of J genes during recombination. |
| --no_np | Don't insert nucleotides at either gene junction, i.e., do not create NP regions. |
| --no_np1 | Don't insert nucleotides at the VD junction, i.e., do not create NP1 regions. |
| --no_np2 | Don't insert nucleotides at the DJ junction, i.e., do not create NP2 regions. |
| --non_productive | Include non-productive sequences in the output. This includes sequences with out of frame V and J segments, stop codons and/or missing junction anchor residues (C-104 and W/F-118). The majority of sequences produced will be non-productive (~75% using defaults without SHM, ~85% with SHM). Specify --prop_non_productive to control this proportion. |
| --prop_non_productive <prop> | Proportion of sequences to be non-productive. Value between 0 and 1. Use with --non_productive. |
| --seed <seed> | Set random seed. |
| --species <species> | Specify if simulating non-human sequences. Will be used to find the imgt_{species}_IGH[V/D/J].fasta files in the specified --datadir. Deafult is human. |