Reference Panel Provider

Choose a data set

  • a VCF File (must be VEP annotated, with the following annotations : SYMBOL, SOURCE and ALLELE_NUM)

  • a Bed of well covered position (capture kit - minus positions with low coverage)

The Server configuration file

Each RPP server needs a TSV configuration file config.rpp (or any other name), that has the following format:

port_number          6666
data_file            Data_Name1:/path/to/file1.vcf.gz:65147:/path/to/coverage1.bed.gz,Data_Name2:/path/to/file2.vcf.gz:81791:/path/to/coverage2.bed.gz
gnomad               Version1:gnomad.v1.bin,Version2:gnomad.v2.bin,...,VersionN:gnomad.vN.bin
rpp_session_dir      /path/to/rpp_working_directory/sessions
rpp_expired_session  /path/to/rpp_working_directory/expired.session."+FileFormat.FILE_EXCLUSION_EXTENSION
tps_name             The name of the TPS Super computer
tps_address          supercomputer.domain.com
tps_user             privasuser
tps_launch_command   /path/to/PrivAS.TPS.sh
tps_get_key_command  /path/to/PrivAS.getPublicKey.sh
tps_session_dir      /path/to/tps_working_directory/sessions
whitelist            172.0.0.1,192.168.*.*,172.0.0.0,193.54.252.10-99
blacklist            142.250.178.131,152.199.19.61
connection_log       /path/to/connection.log
max_per_day          5
max_per_week         10
max_per_month        30

The content of this file is:

key

description

port_number

the port on which the RPP server will listen

data_file

Comma-separated dataset description

gnomad

Command-separated GnomAD version:path

rpp_session_dir

the directory where RPP will store the files for each session

rpp_expired_session

the file where RPP will list expired sessions

tps_name

Name and description of the Third Party Server

tps_address

fully qualified hostname.domain name or IP address for the TPS

tps_user

the SSH username that will be used on the TPS

tps_launch_command

the unix command executed by tps_user on TPS to launch an Association Test

tps_get_key_command

tps_get_key_command: the unix command executed by tps_user on TPS to generate of unique RSA keypair for each new session

tps_session_dir

the directory where TPS will store the files for each session

whitelist

a list of IP addresses/ranges that are always allowed to connect to the RPP server

blacklist

a list of IP addresses/ranges that are never allowed to connect to the RPP server

connection_log

The file logging all the connections to the RPP

max_per_day

Maximum number of connections per day from the same address

max_per_week

Maximum number of connections per week from the same address

max_per_month

Maximum number of connections per month from the same address

The following fields for each dataset description are separated by colons (:)

  • Dataset name (As informative as possible, especially in regard to the reference genome)

  • /path/to/file.vcf.gz the VCF file annotated with vep

  • number of variants sites in the VCF file

  • /path/to/coverage.bed.gz the Bed file defining the well covered positions (any variants found in regions outside this scope will be ignored)

Run the default QC

To speed-up session times for your Client, you can run the default QC prior to launching your server. This has top be done for each of your GnomAD versions. In order for the Quality Control to work optimally, it is recommended to split multiallelic variants.

java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin

input.vcf(.gz)

The Input VCF file (must have been annotated with vep)

GnomADFile.bin

The GnomAD file to use for the frequency annotation

This will produce the following files

  • QC122186898.my_precious_data.vcf.gz The VCF file of variants that PASS the QC

  • QC122186898.my_precious_data.excluded The list of variants that FAIL the QC

  • GnomAD.Version.QC122186898.my_precious_data.genotypes.gz The variants in the genotypes file format, annotated with the frequencies from GnomAD

Note

QC Parameters files are named according to the hash of their content. So the file containing the default parameters is named QC122186898

If a Client runs a session with a new combination of QC Parameters / GnomAD Versions, this operation will be performed at the beginning of the session and the resulting files will be stored for future use.

PrivAS RPP’s Command Lines

Main Command

Launch the Reference Panel Provider Server

java -jar PrivAS.RPP.jar ConfigFile.rpp

ConfigFile.rpp

RPP Configuration File

Tools

Perform DEFAULT Quality Control on a VEP Annotated VCF file and convert the result to genotype file

java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin

input.vcf(.gz)

The Input VCF file (must have been annotated with vep)

GnomADFile.bin

The GnomAD file to use for the frequency annotation

Perform a Quality Control on a VCF file

java -jar PrivAS.RPP.jar vcf2qc input.vcf(.gz) qc.param

input.vcf(.gz)

The Input VCF file (must have been annotated with vep)

qc.param

The file containing the QC parameters to apply

Convert a QCed VEP Annotated VCF file to a genotypes file

java -jar PrivAS.RPP.jar qc2genotypes vep_annotated_QCed_file.vcf(.gz) GnomADFile.bin

vep_annotated_QCed_file.vcf(.gz)

The input VCF File (must result from a PrivAS QC and thus is annotated with vep)

GnomADFile.bin

The GnomAD file to use for the frequency annotation

Creates an annotation binary file from lists of GnomAD (exome/genome) VCF files

java -jar PrivAS.RPP.jar gnomad gnomADVersion listExomeVCFFiles.list listGenomeVCFFiles.list output.bin

gnomADVersion

The name of the GnomAD Version

listExomeVCFFiles.list

File listing input GnomAD Exome files (one path per line)

listGenomeVCFFiles.list

File listing input GnomAD Genome files (one path per line)

output.bin

The name of the resulting binary file