Tools developped in INSERM U1078


Annotation and Ranking of Human Structural Variations


AnnotSV (Geoffroy et al., 2021, 2018) is a program designed for annotating and ranking Structural Variations (SV). This tool compiles functionally, regulatory and clinically relevant information and aims at providing annotations useful to i) interpret SV potential pathogenicity and ii) filter out SV potential false positives.

Different types of SV exist including deletions, duplications, insertions, inversions, translocations or more complex rearrangements. They can be either balanced or unbalanced. When unbalanced and resulting in a gain or loss of material, they are called Copy Number Variations (CNV). CNV can be described by coordinates on one chromosome, with the start and end positions of the SV (deletions, insertions, duplications). Complex rearrangements with several breakends can arbitrary be summarized as a set of novel adjacencies, as described in the Variant Call Format specification VCF v4.3 (Jun 2020).


An integrated workflow to prioritize pathogenic variants in sequence data from a single individual


Easy-PSAP is a Snakemake workflow [Köster et al, 2012] which allows the evaluation of genetic variants at the scale of a whole exome or genome.

It is composed of two pipelines based on the Population Sampling Method (PSAP) method [Wilfert et al, 2016].

PSAP uses allele frequencies from large population databases to construct gene-based null distributions of CADD pathogenicity scores [Kircher et al, 2014] and ultimately gives a p-value by gene for each individual, which summarizes how unlikely it is to observe a variant with such CADD score in the general population in this gene.


A simple likelihood based method to estimate the age of the most recent common ancestor of individuals, based on the information provided by their shared haplotypes


When estimating the age of a genetic variant, EstiAge assumes that all affected samples descended from a common ancestor who introduced the variant n generations ago.

An estimate of n is obtained from the size of the haplotype shared by individuals on both sides of the variant locus by finding the most likely positions of recombinations on the ancestral haplotype in the different samples.

Then, the value of n is converted to age in years by multiplying by the assumed 25 years per generation.


GEMPROT (Genetic Mutation to Protein Translation) is a tool to visualize the changes induced on the protein by the different variants found within a gene in an individual


Starting from a phased vcf, it translates the two haplotypes of each individual into the two corresponding protein sequences, allowing to vizualize if variants are in cis or trans.

GEMPROT proposes two different modes: the first mode outputs both sequences for each individuals and provides a frequency summary of all haplotypes.

The second mode is intended for larger populations and shows difference of genes haplotypes repartition.


LowKi is a R package for computing Kinship and Fraternity matrices from shallow sequencing data


LowKi is a R package for computing Kinship and Fraternity matrices from shallow sequencing data. It uses vcf files with 'PL' or 'GP' fields.

To install the package, first install gaston, milorGWAS, Rcpp, and RcppEigen from the CRAN, then use devtools :



Ravages is an R package to perform rare variant association tests (RVAT) and genetic simulations in the whole genome.


Ravages enables to perform rare variant association tests (burden and SKAT) with a multi-category, binary, and continuous phenotypes.

These association tests can be applied genome-wide using the RAVA-FIRST approach based on the CADD regions. Ravages also enables to perform genetic simulations based on real data to mimic allele frequency spectrum and linkage disequilibrium pattern observed on these data.


An integrative pipeline for region-based rare variant association studies including quality control, qualifying variant selection and association tests


RAVAQ is an R package implementing an integrative pipeline that (1) performs a specific quality control taking into account the case-control status to ensure data comparability, (2) selects qualifying variants as defined by the user, and (3) performs burden tests per gene.

The pipeline integrates the tool VCFProcessor which handles VCF File and is implemented in java. It calls the functions QC, SplitMultiAllelic, SampleStats, recode and showfields. See for more details about VCFProcessor:

The pipeline calls for the external tool PLINK (version 1.9) to perform the principal component and the relatedness analyses. The main arguments it uses from PLINK are pca, genome, hardy, indep-pairwise, bmerge, maf, extract and keep. For more details about PLINK v1.9, see

Optionally, the pipeline proposes to run a vcf file annotation calling for the external tool VEP. For more details about VEP, see

The RAVAQ package calls java, python and plink through the R function system(). For python, it needs a version 2.7. The python scripts were adapted from TRAPD, see

Currently, RAVAQ was tested on Linux and Mac OS. Future developments include adaptation to Windows.


SURFBAT (a SURrogate Family Based Association Test) a tool to perform an approximation of the transmission-disequilibrium test


Our method hinges on the application of genotype imputation algorithms to match similar haplotypes between the case and control groups.

This permits us to approximate local ancestry informed posterior probabilities of un-transmitted parental alleles of each case individual.

SURFBAT provides an association test that is inherently robust to fine-scale population stratification and opens up the possibility of efficiently using large imputation reference panels as control groups for association testing.

The method is suitable when the control panel spans the local ancestry spectrum of the case-group population and each control has similar paternal and maternal ancestries.

This is the case for our reference panels where individuals have their four grand-parents born in the same geographic area.

In contrast to other methods for association testing that incorporate local-ancestry inference, SURFBAT does not require a set of ancestry groups to be defined, nor for local ancestry to be explicitly estimated.


VCFProcessor is a tool to handle VCF File


VCFProcessor articulates itself around two main concepts: functions and filters.

VCFProcessor can apply one function at a time. A function is a traitement to execute on the input VCF file, it can be an analysis, an annotation, a transformation, a formatting operation, etc.

Filters are rules indicating which variants/genotypes/samples to filter or keep before executing a function.