Reference Panel Provider¶
Choose a data set¶
a VCF File (must be VEP annotated, with the following annotations :
SYMBOL
,SOURCE
andALLELE_NUM
)a Bed of well covered position (capture kit - minus positions with low coverage)
The Server configuration file¶
Each RPP server needs a TSV configuration file config.rpp
(or any other name), that has the following format:
port_number 6666
data_file Data_Name1:/path/to/file1.vcf.gz:65147:/path/to/coverage1.bed.gz,Data_Name2:/path/to/file2.vcf.gz:81791:/path/to/coverage2.bed.gz
gnomad Version1:gnomad.v1.bin,Version2:gnomad.v2.bin,...,VersionN:gnomad.vN.bin
rpp_session_dir /path/to/rpp_working_directory/sessions
rpp_expired_session /path/to/rpp_working_directory/expired.session."+FileFormat.FILE_EXCLUSION_EXTENSION
tps_name The name of the TPS Super computer
tps_address supercomputer.domain.com
tps_user privasuser
tps_launch_command /path/to/PrivAS.TPS.sh
tps_get_key_command /path/to/PrivAS.getPublicKey.sh
tps_session_dir /path/to/tps_working_directory/sessions
whitelist 172.0.0.1,192.168.*.*,172.0.0.0,193.54.252.10-99
blacklist 142.250.178.131,152.199.19.61
connection_log /path/to/connection.log
max_per_day 5
max_per_week 10
max_per_month 30
The content of this file is:
key |
description |
---|---|
|
the port on which the RPP server will listen |
|
Comma-separated dataset description |
|
Command-separated GnomAD version:path |
|
the directory where RPP will store the files for each session |
|
the file where RPP will list expired sessions |
|
Name and description of the Third Party Server |
|
fully qualified hostname.domain name or IP address for the TPS |
|
the SSH username that will be used on the TPS |
|
the unix command executed by |
|
tps_get_key_command: the unix command executed by |
|
the directory where TPS will store the files for each session |
|
a list of IP addresses/ranges that are always allowed to connect to the RPP server |
|
a list of IP addresses/ranges that are never allowed to connect to the RPP server |
|
The file logging all the connections to the RPP |
|
Maximum number of connections per day from the same address |
|
Maximum number of connections per week from the same address |
|
Maximum number of connections per month from the same address |
The following fields for each dataset description are separated by colons (:)
Dataset name (As informative as possible, especially in regard to the reference genome)
/path/to/file.vcf.gz
the VCF file annotated with vepnumber of variants sites in the VCF file
/path/to/coverage.bed.gz
the Bed file defining the well covered positions (any variants found in regions outside this scope will be ignored)
Run the default QC¶
To speed-up session times for your Client, you can run the default QC prior to launching your server. This has top be done for each of your GnomAD versions. In order for the Quality Control to work optimally, it is recommended to split multiallelic variants.
java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin
input.vcf(.gz) |
The Input VCF file (must have been annotated with vep) |
GnomADFile.bin |
The GnomAD file to use for the frequency annotation |
This will produce the following files
QC122186898.my_precious_data.vcf.gz
The VCF file of variants that PASS the QCQC122186898.my_precious_data.excluded
The list of variants that FAIL the QCGnomAD.Version.QC122186898.my_precious_data.genotypes.gz
The variants in the genotypes file format, annotated with the frequencies from GnomAD
Note
QC Parameters files are named according to the hash of their content. So the file containing the default parameters is named QC122186898
If a Client runs a session with a new combination of QC Parameters / GnomAD Versions, this operation will be performed at the beginning of the session and the resulting files will be stored for future use.
PrivAS RPP’s Command Lines¶
Main Command¶
Launch the Reference Panel Provider Server¶
java -jar PrivAS.RPP.jar ConfigFile.rpp
ConfigFile.rpp |
RPP Configuration File |
Tools¶
Perform DEFAULT Quality Control on a VEP Annotated VCF file and convert the result to genotype file¶
java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin
input.vcf(.gz) |
The Input VCF file (must have been annotated with vep) |
GnomADFile.bin |
The GnomAD file to use for the frequency annotation |
Perform a Quality Control on a VCF file¶
java -jar PrivAS.RPP.jar vcf2qc input.vcf(.gz) qc.param
input.vcf(.gz) |
The Input VCF file (must have been annotated with vep) |
qc.param |
The file containing the QC parameters to apply |
Convert a QCed VEP Annotated VCF file to a genotypes file¶
java -jar PrivAS.RPP.jar qc2genotypes vep_annotated_QCed_file.vcf(.gz) GnomADFile.bin
vep_annotated_QCed_file.vcf(.gz) |
The input VCF File (must result from a PrivAS QC and thus is annotated with vep) |
GnomADFile.bin |
The GnomAD file to use for the frequency annotation |
Creates an annotation binary file from lists of GnomAD (exome/genome) VCF files¶
java -jar PrivAS.RPP.jar gnomad gnomADVersion listExomeVCFFiles.list listGenomeVCFFiles.list output.bin
gnomADVersion |
The name of the GnomAD Version |
listExomeVCFFiles.list |
File listing input GnomAD Exome files (one path per line) |
listGenomeVCFFiles.list |
File listing input GnomAD Genome files (one path per line) |
output.bin |
The name of the resulting binary file |