Functions¶
General Information¶
Optional Arguments for all Functions¶
--out ResultFile
: File that will contain the function’s results.vcf(.gz)--log LogFile
: File that will contain the function’s log.log--gz
: Force all outputs to be bgzipped
Produced ascii files are automatically bgzipped if the filename given by the user ends with .gz
If the user provides the --gz
arguments, all output files be bgzipped and .gz will be appended automatically to the filenames (if missing). Output streamed to STD_OUT will also be bgzipped.
VCF Filter Functions¶
CompoundHeterozygous¶
Keeps only variants that respect the Compound Heterozygous pattern of inheritance.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--nohomo TRUE|FALSE
: Reject if a case is homozygous to alternate allele or if a control has none of the allele ?--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--missing TRUE|FALSE
: Missing genotypes allowed ?
Description
All cases have V1 and V2
No control have V1 and V2
one case doesn’t have V1 and V2
one control has V1 and V2
--missing true
, missing genotypes are concidered compatible with the transmission pattern.--nohomo
options allows to reject alternate alleles if a case is homozygous to an alternate allele or if at least one control is not heterozygous to an alternate allele of V1/V2. (If all the controls are supposed to be parents of cases)COMPOUND=A1>P1(gA|gB|gC)&P2(gD|gE|gF),A2>P3(gG|gH|gI)&P4(gJ|gK|gL),...
Ax
is the number of the allele involvedPx
is the partener allele in form chr:pos:ref:altgX
is the symbol of the gene common to this allele and it partner
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
DeNovo¶
Keeps only variants that are compatible with a De Novo pattern of inheritance.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--missing TRUE|FALSE
: Missing genotypes allowed ?
Description
present in every Case
absent in every Controle
Warning
Father/Mother/Child(ren) Trios are expected
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
DeNovoRecessive¶
Keeps only variants that strictly respect this genotypes parent1 0/1 + parent2 0/0 + child 1/1
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
Warning
Will only run if input file has a trio with 1 case an 2 controls
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
Dominant¶
Keeps only variants that respect the Dominant pattern of inheritance
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--missing TRUE|FALSE
: Missing genotypes allowed ?--nohomo TRUE|FALSE
: Reject if a case is homozygous to alternate allele ?--mode Mode
: strict : true for all cases | loose : true for at least one case
Description
Cases should have the causal variant
Controls cannot have the causal variant
one case doesn’t possess the alternate allele (strict mode)
one control possesses the alternate allele
--missing true
, missing genotypes are concidered compatible with the transmission pattern.--nohomo
options allows to reject alternate alleles if at least one case is homozygous. (If you expect the resulting phenotype would not be consistent for example.)Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterConsequenceLevel¶
Filters the variants according to their consequences
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--csq vep.consequence
: Least severe consequence [empty | intergenic_variant | feature_truncation | regulatory_region_variant | feature_elongation | regulatory_region_amplification | regulatory_region_ablation | TF_binding_site_variant | TFBS_amplification | TFBS_ablation | downstream_gene_variant | upstream_gene_variant | non_coding_transcript_variant | NMD_transcript_variant | intron_variant | non_coding_transcript_exon_variant | 3_prime_UTR_variant | 5_prime_UTR_variant | mature_miRNA_variant | coding_sequence_variant | synonymous_variant | stop_retained_variant | start_retained_variant | incomplete_terminal_codon_variant | splice_region_variant | protein_altering_variant | missense_variant | inframe_deletion | inframe_insertion | transcript_amplification | start_lost | stop_lost | frameshift_variant | stop_gained | splice_donor_variant | splice_acceptor_variant | transcript_ablation]
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterF2¶
Filters variants to keep only those contributing to F2 data.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--prefix prefix
: Output filename prefix--outdir ResultsDirectory
: The directory that will contain results files
Description
All variants
All SNVs
variants without rs
SNVs without rs
variants with rs
SNVs with rs
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterFrequencies¶
Keeps only variants with frequencies below the threshold in all of the selected populations.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--threshold 0.0-1.0
: maximum frequency in any population--pop pop1,pop2,...,popN
: List of Populations to test (from AF, AFR_AF, AMR_AF, EAS_AF, EUR_AF, SAS_AF, AA_AF, EA_AF, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_ASJ_AF, gnomAD_EAS_AF, gnomAD_FIN_AF, gnomAD_NFE_AF, gnomAD_OTH_AF, gnomAD_SAS_AF, MAX_AF)
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterGeneCsqLevel¶
Filters the variants according to their consequences on a list of genes.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--genes genes.txt
: File listing genes to keep--csq vep.consequence
: Least severe consequence [empty | intergenic_variant | feature_truncation | regulatory_region_variant | feature_elongation | regulatory_region_amplification | regulatory_region_ablation | TF_binding_site_variant | TFBS_amplification | TFBS_ablation | downstream_gene_variant | upstream_gene_variant | non_coding_transcript_variant | NMD_transcript_variant | intron_variant | non_coding_transcript_exon_variant | 3_prime_UTR_variant | 5_prime_UTR_variant | mature_miRNA_variant | coding_sequence_variant | synonymous_variant | stop_retained_variant | start_retained_variant | incomplete_terminal_codon_variant | splice_region_variant | protein_altering_variant | missense_variant | inframe_deletion | inframe_insertion | transcript_amplification | start_lost | stop_lost | frameshift_variant | stop_gained | splice_donor_variant | splice_acceptor_variant | transcript_ablation]
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterGeneCsqList¶
Filters the variants to keep only those affect one of the given genes with one of the given consequences.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--genes genes.txt
: List of the genes to keep--csq csq1,csq2,...,csqN
: List (comma separated) of VEP consequences to keep
Description
--genes
on one of the genes in the file from --csq
, then the variants is kept.--csq null
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterGenotype¶
Filters the variants to match the given genotype filter.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--filter "SAMPLE1:geno1:keep1,SAMPLE2:geno2:keep2,...,SAMPLEN,genoN:keepN"
: List (comma separated) for samples, their associated genotypes and is they are to be kept
Description
SAMPLE1:geno1:keep1,SAMPLE2:geno2:keep2,...,SAMPLEN:genoN:keepN
Keep=true|false
tells if we want to keep(true) or exclude(false) matching genotype for this sampleSA:0/0:false,SB:0/1:true,SC:0/1:true,SD:1/1:false
will keep variants that are 0/1 for SB and SC, and that aren’t 0/0 for SA or 1/1 for SDFilterGnomADFrequency¶
Filters out variants with frequencies above threshold in GnomAD
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--threshold 0.0-1.0
: Maximum GnomAD Frequency
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FilterKnownID¶
Keeps only variant with and empty 3rd field
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
FilterNew¶
Keeps only the variants not found in either dbSNP, 1KG or GnomAD
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
FoundInAllCases¶
Keeps Variants found in every “Case” samples
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
--ped
file.Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
HQ¶
Extract HQ variants.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
VQSR PASS
At least 80% of the genotypes have DP above 10 and GQ above 20
at least one variant genotype has DP above 10 and GQ above 20
KeepHomoAlt¶
Returns a VCF containing only the position homozygous to alt for the given SAMPLES
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--sample s1,s2,...sN
: list (comma separated) of samples to test
Description
Note
In case of multiallelic variants : Various is kept if different samples are homozygous to different alternative alleles
MergeVQSR¶
Merges SNP and INDEL results files from VQSR
Mandatory Arguments
--snp snp.vcf
: File containing SNP output from VQSR--indel indel.vcf
: File containing INDEL output from VQSR
Description
MonoAllelicSNV¶
Keep only the lines containing monoallelic SNVs
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
NotFoundInAnyControl¶
Removes Variants that are found in controls.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
--ped
file.Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
QC¶
Run a Quality Control on VCF Variants
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--opt custom.parameters.tsv
: file containing the various thresholds for the QC (see Documentation)--report fiteredVariant.tsv
: output file listing all the variants that were filtered, and why
Description
G_AN: AlleleNumber for this group
G_AC: AlleleCounts for this group
G_AF: AlleleFrequencies for this group
Warning
The VCF File must contain the following INFO : QD,FS,SOR,MQ,ReadPosRankSum,InbreedingCoeff,MQRankSum
RandomVariants¶
kept only a portion of the variants from a VCF file.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ratio 0.0-1.0
: Probability of keeping each variant--file positions.txt
: File listing Positions to keep regardless of given probability in format chr:position
Description
--ratio
chance of being kept.--file
are always keptRecessive¶
Keeps only variants that respect the Recessive pattern of inheritance.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--missing TRUE|FALSE
: Missing genotypes allowed ?--nohomo TRUE|FALSE
: Reject if a control is homozygous to reference allele ?--mode Mode
: strict : true for all cases | loose : true for at least one case
Description
Cases should be homozygous to the causal allele
Controls should not be homozygous to the causal allele
one case isn’t homozygous to the alternate allele (strict mode)
one control is homozygous to the the alternate allele
--missing true
, missing genotypes are concidered compatible with the transmission pattern.--nohomo
options allows to reject alternate alleles if at least one control is not heterozygous to the alternate allele. (If all the controls are supposed to be parents of cases)Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
Recode¶
Reads all lines in a VCF Files
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
RemoveNonSNV¶
Remove variants lines where there have no SNVs
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
RemoveNonVariant¶
Remove variants where only 0/0 and ./. genotypes are present
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
SplitByChromosome¶
Splits a given vcf file by chromosome
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--outdir ResultsDirectory
: The directory that will contain results files
Description
SplitByGene¶
Creates an output VCF file for each gene.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--outdir ResultsDirectory
: The directory that will contain results files
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
SplitFromDB¶
Generates two new VCF files with variants present/absent in 1kG/GnomAD.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--outdir ResultsDirectory
: The directory that will contain results files
Description
inDB.MYVCF.vcf (with variants present in 1kG/GnomAD)
notInDB.MYVCF.vcf (with variant absent from 1kG/GnomAD)
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept in “inDB.MYVCF.vcf”.
StrictCompoundHeterozygous¶
Keeps only variants that strictly respect the Compound Heterozygous pattern of inheritance.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--nohomo TRUE|FALSE
: Reject if a case is homozygous to alternate allele or if a control has none of the allele ?--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
All cases have V1 and V2 from their parents
All controls (parents) have one of V1/V2 while the other parents have V2/V1
All case have V1 and V2
All cases have a parent (control) with V1 and not V2, and this other parent with V2 and not V1
--nohomo
options allows to reject alternate alleles if a sample is homozygous to it.COMPOUND=A1>P1(gA|gB|gC)&P2(gD|gE|gF),A2>P3(gG|gH|gI)&P4(gJ|gK|gL),...
Ax
is the number of the allele involvedPx
is the partener allele in form chr:pos:ref:altgX
is the symbol of the gene common to this allele and it partner
Warning
The input VCF File must have been previously annotated with vep.
Warning
This function expects a complete definition of the sample, where all cases are affected children and both their parents are identified controls.
Note
In case of multiallelic variants : If at least one alternate allele satisfy all the conditions, the whole variant line is kept.
VCF Transformation Functions¶
ClearInfoField¶
Replaces the Info column by “.”
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
FilterCsqExtractGene¶
Filters Variants according to consequences. Replaces ID by gene_chr_pos.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--csq vep.consequence
: Least severe consequence [empty | intergenic_variant | feature_truncation | regulatory_region_variant | feature_elongation | regulatory_region_amplification | regulatory_region_ablation | TF_binding_site_variant | TFBS_amplification | TFBS_ablation | downstream_gene_variant | upstream_gene_variant | non_coding_transcript_variant | NMD_transcript_variant | intron_variant | non_coding_transcript_exon_variant | 3_prime_UTR_variant | 5_prime_UTR_variant | mature_miRNA_variant | coding_sequence_variant | synonymous_variant | stop_retained_variant | start_retained_variant | incomplete_terminal_codon_variant | splice_region_variant | protein_altering_variant | missense_variant | inframe_deletion | inframe_insertion | transcript_amplification | start_lost | stop_lost | frameshift_variant | stop_gained | splice_donor_variant | splice_acceptor_variant | transcript_ablation]
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Only Ref and one alternate allele are Kept. The Kept alternate is (in this order) : 1. the most severe; 2. the most frequent in the file; 3. the first one.
GenerateHomoRefForPosition¶
Generates a VCF with Homozygous-to-Reference Genotypes for every given positions and each sample (Alternate is a transition A↔G, C↔T)
Mandatory Arguments
--ref Reference.fasta
: Fasta File containing the reference genome--pos positions.tsv
: List of positions in the results VCF file--sample samples.txt
: List of samples in the results VCF File
Description
GT:DP:GQ:AD:PL
0/0:30:99:30,0:0,50,500
MissingToMajor¶
Replaces every missing genotype by the most frequent allele present
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
A/A:0:0:0,0,0...
Note
In case of multiallelic variants : The major allele is the most frequent allele from ref and each alternate.
MissingToRef¶
Replaces every missing genotype by 0/0:0:0:0....
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
0/0:0:0:0....
Scramble¶
Outputs the same VCF same but randomly reassigns the genotypes among the samples
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
SetGenotypeFromProbability¶
Affect a genotype for each sample, for each position from the GenotypeProbability annotation. If a genotype is already present, it can be kept or replaced
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--overwrite TRUE|FALSE
: overwrite existing genotypes ?
Description
highest=p1 → 0/0
highest=p2 → 0/1
highest=p3 → 1/1
Warning
The input VCF file must contain Genotype Probability (GP=p1,p2,p3) for each genotype
Note
In case of multiallelic variants : An error will be thrown, as this function expects only monoallelic variants. The affected variant line will be dropped.
SplitMultiAllelic¶
Splits multiallelic variants into several lines of monoallelic variants
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
VCFToReference¶
Outputs the given VCF File and reverts genotypes when ref/alt alleles are inverted according to given reference (as a fasta file)
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ref Reference.fasta
: Fasta File containing the reference genome
Description
Warning
Annotation (INFO/AD/PL/…) are not updated
Note
In case of multiallelic variants : An error will be thrown, as this function expects only monoallelic variants. The affected variant line will be dropped.
VCF Annotation Functions¶
AddAlleleBalance¶
Adds the annotations : AB, ABhet, ABhem, OND to a VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
AB : Allele balance for each het genotype (alleleDepth(gt1) / alleleDepth(gt1) + alleleDepth(gt2))
ABhet : Allele Balance for heterozygous calls (ref/(ref+alt)), for each variant
ABhom : Allele Balance for homozygous calls (A/(A+O)) where A is the allele (ref or alt) and O is anything other, for each variant
OND : Overall non-diploid ratio (alleles/(alleles+non-alleles)), for each variant
AddDbSNP¶
Adds/updates dbSNP information to the VCF from a dbSNP release file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ref dbsnp.vcf
: dbSNP refenrece VCF File (can be gzipped)
Description
RS=
and dbSNPBuildID=
in INFO field from the input file --ref
.Note
In case of multiallelic variants : Annotation is added/updated for each alternate allele (comma-separated).
AddGroupACANAF¶
Add AN,AC,AF annotation for each group described in the ped file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
G_AN
AlleleNumber for this groupG_AC
AlleleCounts for this groupG_AF
AlleleFrequencies for this group
Note
In case of multiallelic variants : Annotation is added/updated for each alternate allele (comma-separated).
AddWorstAndCanonicalConsequence¶
For each variant, add the most severe consequence from vep and add the consequence from vep for the annotation marked as Canonical.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
WORSTCSQ
WORSTGENE
CANONICALCSQ
CANONICALGENE
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Annotation is added/updated for each alternate allele (comma-separated).
ReaffectdbSNP¶
Puts all observed RS numbers in the ID column
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Every RS IDs from every alternate alleles are listed in the ID column.
UpdateACANAF¶
Resets the AC
AN
and AF
values for the given VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
AC
AN
and AF
values for the VCF.Note
In case of multiallelic variants : Annotation is added/updated for each alternate allele (comma-separated).
Analysis Functions¶
CheckReference¶
For every position in the vcf file, compares the reference from the VCF to the one in the fasta
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ref Reference.fasta
: Fasta File containing the reference genome
Description
CHROM |
POS |
VCF_REF |
FASTA_REF |
Warning
Lines containing indels are ignored
Chi2¶
Performs a chi² Association Tests on an input VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
Note
In case of multiallelic variants : Each alternate allele is processed independently.
CommonVariants¶
Displays the list of variants that are common to two VCF files
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--file smallest.file.vcf
: the smallest of the two input VCF files (can be gzipped)
Description
Warning
For faster execution, use –vcf with the largest file and –file with the smallest one
Note
In case of multiallelic variants : Each alternate allele is processed independently.
CompareGenotype¶
Compares the genotypes of the samples in the first and second VCF file.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--vcf2 File2.vcf(.gz)
: the second input VCF file (can be bgzipped)
Description
Sample |
Group |
Total |
Concord |
Discord |
LeftMissing |
RightMissing |
%Concord |
Note
In case of multiallelic variants : Alternate alleles are expected to be the same and in the same order in both files
CompareToGnomAD¶
Compares the variants present in a VCF file to those present in a GnomAD VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--file GnomAD.site.vcf.gz
: GnomAD VCF File (can be gzipped)
Description
#CHR |
POS |
ID |
REF |
ALT |
QUAL |
CSQ |
GENE |
AC |
AF |
AN |
GnomAD_AC |
GnomAD_AF |
GnomAD_AN |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
CountFromPublicDB¶
Returns the number of Variants, SNVs, INDEL, in dbSNP, 1kG, GnomAD.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Total |
dbSNP |
1kG |
GnomAD |
Not dbSNP |
Not 1kG |
Not GnomAD |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
CountGenotypes¶
Counts the genotypes 0/1
and 1/1
for each variants
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
0/1
and 1/1
for each variantsCHROM |
POS |
REF |
ALT |
CONSEQUENCE |
TOTAL_HETEROZYGOUS |
TOTAL_HOMOZYGOUS_ALT |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
CountMissing¶
For each samples in the PED file, print a summary of missingness
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--threshold 0.0-1.0
: Maximum ratio of Missing Individuals per position
Description
#SAMPLE |
TOTAL |
GENOTYPED |
NB_MISSING |
%_MISSING |
REF |
ALT |
Total_Variants |
Kept_Variants |
SAMPLE
the sample nameTOTAL
total variants keptGENOTYPED
variants with non missing genotypes for this sampleNB_MISSING
variants with missing genotypes for this sample%_MISSING
percent of genotypes missing for this sampleREF
number of variants homozygous to the ref for this sampleALT
number of variants not homozygous to the ref for this sample
Warning
Kept variants are those with less than --threshold
genotypes missing
CountVariants¶
Counts the number of variants for each Samples and print a summary for each group
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
FamilyID |
ID |
MotherID |
FatherID |
Sex |
Phenotype |
Group |
NbVariants |
Note
In case of multiallelic variants : Each alternate allele is processed independently.
DbSNPMismatch¶
Check if there is a discrepancy between the ID Column and the VEP annotation for RS ID.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
CHR |
POS |
ID |
REF |
ALT |
VEP_Annotation |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : RS IDs of every alternate allele are put in the ID field.
ExtractAlleleCounts¶
For every variants, exports the variant allele count for each sample
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
#CHROM |
POS |
ID |
REF |
ALT |
Note
In case of multiallelic variants : Each alternate allele is processed independently.
ExtractNeighbours¶
Creates a bed file of the positions where at least one sample has 2 SNVs that could be in the same triplet (regardless of the reading frame)
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
chr |
V1_pos |
V2_pos |
Note
In case of multiallelic variants : chr V1_pos V2_pos is printed if, at least one alternate allele of V1_pos and V2_pos is a SNP, and if one sample has a variant of both side (not necessarily the SNP one).
ExtractPrivateToGroup¶
Extracts All Variants that are private to a Group.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
#CHR |
POS |
GROUP |
SAMPLES |
Note
In case of multiallelic variants : Each alternate allele is processed independently.
F2¶
Computes F2 data.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--prefix prefix
: prefix of the output files--outdir ResultsDirectory
: The directory that will contain results files
Description
All variants
All SNVs
variants without rs (new)
SNVs without rs (new)
variants with rs (known)
SNVs with rs (known)
Warning
The difference between known and new is done by looking a the vep annotation, not the ID column.
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
F2Individuals¶
Computes F2 data by samples and not by groups (Each sample is its own group).
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--prefix prefix
: prefix of the output files--outdir ResultsDirectory
: The directory that will contain results files
Description
All variants
All SNVs
variants without rs (new)
SNVs without rs (new)
variants with rs (known)
SNVs with rs (known)
Warning
The difference between known and new is done by looking a the vep annotation, not the ID column.
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
FrequencyCorrelation¶
Prints the frequency correlation of variants between local samples and GnomAD
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--outdir ResultsDirectory
: The directory that will contain results files
Description
CHR |
POS |
REF |
ALT |
Local |
GnomAD |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
FrequencyForPrivate¶
Prints the Allele frequency in the file and each group, for variants not found in dbSNP, 1kG or GnomAD.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
GeneList¶
Prints the list of all gene covered by the VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
GetWorstConsequence¶
Print the worst consequence/gene for each variant allele.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
#CHR |
POS |
ID |
REF |
ALT |
WORST_CSQ |
AFFECTED_GENE |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
InbreedingCoeffDistribution¶
Outputs a sorted list of all Inbreeding Coeff from a VCF File.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
Input file must contains Inbreeding Coeff. annotation
IQSBySample¶
Computes the IQS score for each sample between sequences data and data imputed from genotyping.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--cpu Integer
: number of cores--file imputed.vcf(.gz)
: VCF File Containing imputed data (can be gzipped)
Description
#SAMPLE |
GROUP |
IQS |
NB_VARIANTS |
TOTAL_VARIANTS |
Note
In case of multiallelic variants : Each alternate allele is processed independently.
IQSByVariant¶
Computes the IQS score for each variant between sequences data and data imputed from genotyping.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--cpu Integer
: number of cores--file imputed.vcf(.gz)
: VCF File Containing imputed data (can be gzipped)
Description
chr |
pos |
rs |
ref |
alt |
gene |
consequence |
Freq_VCF |
Freq_GnomAD_NFE |
Freq_MaxPop |
Max_Pop |
IQS |
Info |
Warning
Extra information are available if the input file was annotated with VEP
Note
In case of multiallelic variants : Each alternate allele is processed independently.
JFSSummary¶
Outputs the Joint Site Frequency Spectrum Summary statistics
Mandatory Arguments
--file GROUP1.GROUP2.XXX.YYY.ZZZ.tsv
: input tsv file
Description
N = Number of haplotypes in each population (2xn – the num of samples per pop.)
V = Total number of variants
threshold : pooled sample allele frequency (i + j)/2N <= 0.05
FST = overall measure of genetic diversity
AS = allele sharing statistic (probability that two individuals carrying an allele count of n come from different populations, normalized by the expected probability in panmictic population)
WS = weighted symmetry (measures how evenly rare aleeles are distributed between the two populations)
JointFrequencySpectrum¶
Creates a JointFrequencySpectrum result file for each group defined in the ped file.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)--outdir ResultsDirectory
: The directory that will contain results files
Description
Note
In case of multiallelic variants : Each alternate allele is processed independently.
Kappa¶
Kappa Comparision between to vcf files.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--vcf2 File2.vcf
: the second input VCF File (can be gzipped)--tsv output.tsv
: the result TSV File--outdir ResultsDirectory
: The directory that will contain results files
Description
CHROM |
POS |
ID |
MAF_FILE1 |
MAF_FILE2 |
KAPPA_With_Missing |
KAPPA_Ignore_Missing |
Note
In case of multiallelic variants : Results are given for the first alternate allele, why is expected to be the same in both files.
MaleFemale¶
Show Male/Female Allele Frequencies
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
CHROM |
POS |
ID |
REF |
ALT |
FILTER |
GENE/CSQ |
AF |
MALE_AF |
FEMALE_AF |
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
MeanQuality¶
Prints information and quality statistics for each variant.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
#CHROM |
POS |
IN_dbSBP |
IN_GnomAD |
meanDP_with_missing |
meanGQ_with_missing |
meanDP_without_missing |
meanGQ_without_missing |
Warning
The input VCF File must have been previously annotated with vep.
MultiAllelicProportion¶
Slides a 1kb window over the genome and outputs a list of regions orderer by the proportion of multi-allelic variations (desc.)
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Chr |
pos_n |
pos_n+Window_size |
nb_multialleleic variants |
NumberOfCsqPerGene¶
Given a VCF file and a list of genes, prints the number of variants per gene for each consequence
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--genes genes.txt
: File listing genes
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
NumberOfLinesFromTabix¶
Gets the number of lines indexed by a tabix file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
The bgzipped VCF file FILENAME.vcf.gz must have an associated tabix file FILENAME.vcf.gz.tbi
PrivateVSPanel¶
Check how many of the variants from the input file are filtered as Already_existing when adding samples from the reference file.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ref reference.vcf
: the panel VCF File (can be gzipped)
Description
Warning
The input VCF File must have been previously annotated with vep.
Note
In case of multiallelic variants : Each alternate allele is processed independently.
QCParametersDistribution¶
Reports the distributions of each parameter used by QC
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
Warning
The VCF File must contain the following INFO : QD,FS,SOR,MQ,ReadPosRankSum,InbreedingCoeff,MQRankSum
SampleStats¶
Print Stats about each samples (Mean Depths, TS/TV Het/HomAlt).
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--ped samples.ped
: File describing the VCF’s samples (See File Formats in the documentation)
Description
Sample |
Group |
Sites |
Genotyped |
Missing |
%Missing |
MeanDepths |
Variants |
Singletons |
TS |
TV |
TS/TV |
Het |
HetRatio |
HomAlt |
Note
In case of multiallelic variants : Each alternate allele is processed independently.
VQSLod¶
Print VQSLod statistics for each tranche.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Tranche |
Mean |
Min |
D1 |
D2 |
D3 |
D4 |
Median |
D6 |
D7 |
D8 |
D9 |
Max |
Warning
File must contain VQSLOD annotations
Formatting Functions¶
ShowFields¶
Shows selected fields of a VCF File
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--query "field1,field2,...,info:key1;key2;...,geno:key1;key2;..."
: Output columns
Description
Field_1,Field_2,...,Field_n
info:key1;key2;...;keyN
ex: info:AbHet;AC;AN;AF
geno:key1;key2;...;keyN
ex : geno:GT;AD;GQ
TSV2HTML¶
Converts a TSV to a HTML
Mandatory Arguments
--file table.tsv
: the input TSV File--link PositiveInteger
: put link in header, starting at column INDEX (counting from 0)--title MyTitle
: title of the result HTML page
Description
VCF2HTML¶
Generates an HTML legible file for the given VCF file
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
Warning
The input VCF File must have been previously annotated with vep.
VCF2TSV¶
Creates a TSV file, readable in Excel.
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped
Description
VCF2TSVGeneCsq¶
Creates a TSV file, readable in Excel, keeps only annotations for given genes and consequences
Mandatory Arguments
--vcf input.vcf(.gz)
: VCF file to use as an input. Can be bgzipped--genes genes.txt
: Filename of gene list--csq vep.consequence
: Least severe consequence [empty | intergenic_variant | feature_truncation | regulatory_region_variant | feature_elongation | regulatory_region_amplification | regulatory_region_ablation | TF_binding_site_variant | TFBS_amplification | TFBS_ablation | downstream_gene_variant | upstream_gene_variant | non_coding_transcript_variant | NMD_transcript_variant | intron_variant | non_coding_transcript_exon_variant | 3_prime_UTR_variant | 5_prime_UTR_variant | mature_miRNA_variant | coding_sequence_variant | synonymous_variant | stop_retained_variant | start_retained_variant | incomplete_terminal_codon_variant | splice_region_variant | protein_altering_variant | missense_variant | inframe_deletion | inframe_insertion | transcript_amplification | start_lost | stop_lost | frameshift_variant | stop_gained | splice_donor_variant | splice_acceptor_variant | transcript_ablation]
Description
Warning
The input VCF File must have been previously annotated with vep.
Other Functions¶
CoverageStats¶
Gets the coverage statistics for an input file
Mandatory Arguments
--tsv cov.tsv.gz
: File containing depth-of-coverage--chrom chr1
: Chromosome name
Description
chr |
pos |
mean |
median |
1 |
5 |
10 |
15 |
20 |
25 |
30 |
40 |
50 |
100 |
ExtendBed¶
Adds a padding to the left and right of each regions in the bed, and merges overlapping regions
Mandatory Arguments
--bed regions.bed
: the Bed file to pad--pad PositiveInteger
: number of bases to add left and right of each region
Description
GeneCards¶
Generates a script to retrieves GeneCards HTML pages for each gene in the given list.
Mandatory Arguments
--file genes.txt
: file listing genes
Description
GeneCardsParser¶
Exports summary data from a genecards HTML files as an unformatted table
Mandatory Arguments
--file input.html
: input genecargs HTML file
Description
#Gene |
GeneCards |
Entrez Gene |
UniProtKB/Swiss-Prot |
GzPaste¶
Unix paste command for gzipped files
Mandatory Arguments
--files file1.gz,file2.gz,...,fileN.gz
: list (comma separated) of gzipped files to paste
Description
--gz
to gzip the outputIsInBed¶
Check if a given chromosome:position is contained in a bedfile
Mandatory Arguments
--chrom chromosome
: chromosome name : (chr)[1-25]/X/Y/M/MT--pos PositiveInteger
: Position--bed region.bed
: the Bed File to process
Description
NormalizePed¶
Extract x subgroups of y samples for each group present in the Ped file
Mandatory Arguments
--ped samples.ped
: The input PED file to process--number PositiveInteger
: Number Of subgroups for each group--size PositiveInteger
: Group Size
Description
--number 3 --size 10
will create 9 group :RandomPed¶
Keeps N random samples from a Ped File
Mandatory Arguments
--ped samples.ped
: The input PED file to process--threshold PositiveInteger
: Number Of Samples
Description
SimplifyBED¶
Returns a simplified bed (with the smallest number of regions covering all the positions in the input bed file).
Mandatory Arguments
--bed region.bed
: the Bed File to process
Description
Graphics¶
GraphCompareFrequencies¶
Compares the frequencies of common variants in 2 populations (output of FrequencyCorrelation / CompareToGnomAD)
Mandatory Arguments
--width PositiveInteger
: Graph’s Width in Pixels--height PositiveInteger
: Graph’s Height in Pixels--tsv input.tsv
: input data--name dataset
: Graph Title--outdir ResultsDirectory
: The directory that will contain results files--x PositiveInteger
: index of the column containing X values 0-based--y PositiveInteger
: index of the column containing Y values 0-based
Description
Example

GraphCountGenotypes¶
Create a graph for the results of CountGenotypes
Mandatory Arguments
--width PositiveInteger
: Graph’s Width in Pixels--height PositiveInteger
: Graph’s Height in Pixels--tsv input.tsv
: input data--csq vep.consequence
: Least severe consequence [empty | intergenic_variant | feature_truncation | regulatory_region_variant | feature_elongation | regulatory_region_amplification | regulatory_region_ablation | TF_binding_site_variant | TFBS_amplification | TFBS_ablation | downstream_gene_variant | upstream_gene_variant | non_coding_transcript_variant | NMD_transcript_variant | intron_variant | non_coding_transcript_exon_variant | 3_prime_UTR_variant | 5_prime_UTR_variant | mature_miRNA_variant | coding_sequence_variant | synonymous_variant | stop_retained_variant | start_retained_variant | incomplete_terminal_codon_variant | splice_region_variant | protein_altering_variant | missense_variant | inframe_deletion | inframe_insertion | transcript_amplification | start_lost | stop_lost | frameshift_variant | stop_gained | splice_donor_variant | splice_acceptor_variant | transcript_ablation]--outdir ResultsDirectory
: The directory that will contain results files
Description
Example

GraphF2¶
Create a graph for the results of F2 or F2Individuals
Mandatory Arguments
--width PositiveInteger
: Graph’s Width in Pixels--height PositiveInteger
: Graph’s Height in Pixels--tsv input.tsv
: input data--name title
: Title (will be printed on the graph)--outdir ResultsDirectory
: The directory that will contain results files
Description
Example

GraphJFS¶
Create a graph for the results of JointFrequencySpectrum
Mandatory Arguments
--width PositiveInteger
: Graph’s Width in Pixels--height PositiveInteger
: Graph’s Height in Pixels--tsv input.tsv
: input data--name title
: Title (will be printed on the graph)--x Set1
: Name of the first Set--y Set2
: Name of the second Set--max Scale Max
: Top Number of variant on legend. Enter “null” to use the maximal value from data--outdir ResultsDirectory
: The directory that will contain results files
Description
Warning
Expects a NxN matrix, where matrix[a][b] is the number of variants seen a times in the first set and b times in the second set.
Example

GraphSampleStats¶
Create a graph for the results of SampleStats
Mandatory Arguments
--width PositiveInteger
: Graph’s Width in Pixels--height PositiveInteger
: Graph’s Height in Pixels--tsv input.tsv
: input data--name title
: Title (will be printed on the graph)--outdir ResultsDirectory
: The directory that will contain results files
Description
Number of Variants
Mean Depth
TS/TV
Het/HomAlt
Missing
Example
