Reference Panel Provider ************************ Choose a data set ================= - a VCF File (must be VEP annotated, with the following annotations : :code:`SYMBOL`, :code:`SOURCE` and :code:`ALLELE_NUM`) - a Bed of well covered position (capture kit - minus positions with low coverage) The Server configuration file ============================= Each RPP server needs a TSV configuration file :code:`config.rpp` (or any other name), that has the following format: .. code-block:: text port_number 6666 data_file Data_Name1:/path/to/file1.vcf.gz:65147:/path/to/coverage1.bed.gz,Data_Name2:/path/to/file2.vcf.gz:81791:/path/to/coverage2.bed.gz gnomad Version1:gnomad.v1.bin,Version2:gnomad.v2.bin,...,VersionN:gnomad.vN.bin rpp_session_dir /path/to/rpp_working_directory/sessions rpp_expired_session /path/to/rpp_working_directory/expired.session."+FileFormat.FILE_EXCLUSION_EXTENSION tps_name The name of the TPS Super computer tps_address supercomputer.domain.com tps_user privasuser tps_launch_command /path/to/PrivAS.TPS.sh tps_get_key_command /path/to/PrivAS.getPublicKey.sh tps_session_dir /path/to/tps_working_directory/sessions whitelist 172.0.0.1,192.168.*.*,172.0.0.0,193.54.252.10-99 blacklist 142.250.178.131,152.199.19.61 connection_log /path/to/connection.log max_per_day 5 max_per_week 10 max_per_month 30 The content of this file is: =========================== ================================================================================================================================ key description =========================== ================================================================================================================================ :code:`port_number` the port on which the RPP server will listen :code:`data_file` Comma-separated dataset description :code:`gnomad` Command-separated GnomAD version:path :code:`rpp_session_dir` the directory where RPP will store the files for each session :code:`rpp_expired_session` the file where RPP will list expired sessions :code:`tps_name` Name and description of the Third Party Server :code:`tps_address` fully qualified hostname.domain name or IP address for the TPS :code:`tps_user` the SSH username that will be used on the TPS :code:`tps_launch_command` the unix command executed by :code:`tps_user` on TPS to launch an Association Test :code:`tps_get_key_command` tps_get_key_command: the unix command executed by :code:`tps_user` on TPS to generate of unique RSA keypair for each new session :code:`tps_session_dir` the directory where TPS will store the files for each session :code:`whitelist` a list of IP addresses/ranges that are always allowed to connect to the RPP server :code:`blacklist` a list of IP addresses/ranges that are never allowed to connect to the RPP server :code:`connection_log` The file logging all the connections to the RPP :code:`max_per_day` Maximum number of connections per day from the same address :code:`max_per_week` Maximum number of connections per week from the same address :code:`max_per_month` Maximum number of connections per month from the same address =========================== ================================================================================================================================ The following fields for each dataset description are separated by colons (:) - Dataset name (As informative as possible, especially in regard to the reference genome) - :code:`/path/to/file.vcf.gz` the VCF file annotated with vep - number of variants sites in the VCF file - :code:`/path/to/coverage.bed.gz` the Bed file defining the well covered positions (any variants found in regions outside this scope will be ignored) Run the default QC ================== To speed-up session times for your Client, you can run the default QC prior to launching your server. This has top be done for each of your GnomAD versions. In order for the Quality Control to work optimally, it is recommended to split multiallelic variants. .. code-block:: bash java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin ============== ====================================================== input.vcf(.gz) The Input VCF file (must have been annotated with vep) GnomADFile.bin The GnomAD file to use for the frequency annotation ============== ====================================================== This will produce the following files - :code:`QC122186898.my_precious_data.vcf.gz` *The VCF file of variants that PASS the QC* - :code:`QC122186898.my_precious_data.excluded` *The list of variants that FAIL the QC* - :code:`GnomAD.Version.QC122186898.my_precious_data.genotypes.gz` *The variants in the genotypes file format, annotated with the frequencies from GnomAD* .. note:: QC Parameters files are named according to the hash of their content. So the file containing the default parameters is named QC122186898 If a Client runs a session with a new combination of QC Parameters / GnomAD Versions, this operation will be performed at the beginning of the session and the resulting files will be stored for future use. PrivAS RPP's Command Lines ========================== Main Command ------------ Launch the Reference Panel Provider Server .......................................... .. code-block:: bash java -jar PrivAS.RPP.jar ConfigFile.rpp +----------------+------------------------+ | ConfigFile.rpp | RPP Configuration File | +----------------+------------------------+ Tools ----- Perform DEFAULT Quality Control on a VEP Annotated VCF file and convert the result to genotype file ................................................................................................... .. code-block:: bash java -jar PrivAS.RPP.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin +----------------+--------------------------------------------------------+ | input.vcf(.gz) | The Input VCF file (must have been annotated with vep) | +----------------+--------------------------------------------------------+ | GnomADFile.bin | The GnomAD file to use for the frequency annotation | +----------------+--------------------------------------------------------+ Perform a Quality Control on a VCF file ....................................... .. code-block:: bash java -jar PrivAS.RPP.jar vcf2qc input.vcf(.gz) qc.param +----------------+--------------------------------------------------------+ | input.vcf(.gz) | The Input VCF file (must have been annotated with vep) | +----------------+--------------------------------------------------------+ | qc.param | The file containing the QC parameters to apply | +----------------+--------------------------------------------------------+ Convert a QCed VEP Annotated VCF file to a genotypes file ......................................................... .. code-block:: bash java -jar PrivAS.RPP.jar qc2genotypes vep_annotated_QCed_file.vcf(.gz) GnomADFile.bin +----------------------------------+----------------------------------------------------------------------------------+ | vep_annotated_QCed_file.vcf(.gz) | The input VCF File (must result from a PrivAS QC and thus is annotated with vep) | +----------------------------------+----------------------------------------------------------------------------------+ | GnomADFile.bin | The GnomAD file to use for the frequency annotation | +----------------------------------+----------------------------------------------------------------------------------+ Creates an annotation binary file from lists of GnomAD (exome/genome) VCF files ............................................................................... .. code-block:: bash java -jar PrivAS.RPP.jar gnomad gnomADVersion listExomeVCFFiles.list listGenomeVCFFiles.list output.bin +-------------------------+------------------------------------------------------------+ | gnomADVersion | The name of the GnomAD Version | +-------------------------+------------------------------------------------------------+ | listExomeVCFFiles.list | File listing input GnomAD Exome files (one path per line) | +-------------------------+------------------------------------------------------------+ | listGenomeVCFFiles.list | File listing input GnomAD Genome files (one path per line) | +-------------------------+------------------------------------------------------------+ | output.bin | The name of the resulting binary file | +-------------------------+------------------------------------------------------------+