Client

Preamble

In order for the quality control to work optimally, it is recommended to split multiallelic variants. To compare similar variants during the Association Tests, input VCF files need to be annotated with vep (to added the consequence on genes and GnomAD frequencies). The command line to do this (after having installed vep) is:

/path/to/vep --cache --merged --offline --dir [/path/to/cache] --fork 24 --buffer_size 25000 --species homo_sapiens --assembly GRCh37 --use_given_ref --check_existing --allele_number --symbol --af_gnomad --vcf -i input.vcf.gz -o annotated.vcf

Launch the Client GUI

java -jar PrivAS.Client.jar [/path/to/data/directory]

Performing Association Tests with PrivAS

PrivAS Main window

1. Load Variants

Load your annotated VCF file and perform a Quality Control on the Variants and Annotate with GnomAD frequencies [File/Load VCF, Apply QC and Annotate...]. Alternatively if you have already performed this operation for a previous session and you want the same QC parameters, you can simple load the results genotypes file, this will be much quicker [File/Load Annotated QCed Genotypes...]

Load menu
  • Choose your input VCF file

  • Choose a previous QC Parameters file…

  • Choose a GnomAD File

Choose VCF / QC parameters

…or create a new QC Parameters file

Set QC parameters

The variants are Loaded (as seen in the [Genotype Filename] field)

Variants Loaded

2. Connect to a RPP Server

Connect to a server [Server/Connect to RPP Server...]

Connect menu
  • Fill in the address

  • Fill in the port number (default values point to a Server providing the FrEx datasets)

Connection dialog

You are connected (as seen in the [RPP Server] field)

Connected

3. Perform Association Tests

Start a Session : [Server/Start new Session...]

Start menu

Choose:

  • one of the provided datasets (your data and RPP’s data must share the same Reference Genome)

  • the same GnomAD Version as the one you annotated your data with

  • the least severe VEP Consequence

  • the maximal frequency in GnomAD

  • a GnomAD subpopulation (or None to disable)

  • the maximal frequency in the GnomAD subpopulation

  • if you want to limit the tests to SNVs (as a INDEL calling is often less reliable)

  • the Bed file defining the well covered positions (any variants found outside those regions will be ignored)

  • the QC Parameters file used to extract the Client’s variants, so that the RPP’s variants will be filtered using the same criteria (should be automatically filled in)

  • the file listing the variants excluded by the QC (should be automatically filled in)

  • the Association Tests Algorithm (only WSS is present at this time) and its parameters

Set Variants selection criteria and Association Tests parameters

When prompted, save your session (this will allows you to keep track of the parameters you have selected, and to reconnected to the RPP if you have been disconnected).

Session is created

4. Follow your Association Tests progress

Follow the progress of the session in the [Last Known Status] Bar for the RPP server and in the [Third Party Server Log] window

Third Party Log Window

Save your results (when prompted)

5. Visualize your results

Results Visualization

Save the the visualization (Table / Manhattan Plot) through the [Export] menu

The Main Window

Description of the various information found in the Main Window

Field

Description

RPP Server

Reference Panel Provider (RPP) Server

The address and port of the RPP Server. The Color indicates the state of the server :

  • Grey : Unknown

  • Green : Up

  • Red : Down

Third Party Server

Third Party Server (TPS)

Name of the Server that will perform the actual calculations

Dataset

Name of the reference Dataset

GnomAD Version

Version of the GnomAD Version on the Reference Panel Provider

Max. MAF (GnomAD)

Maximum Minor Allele Frequency Threshold

When selecting variants, the Client and the Reference Panel Provider will only keep variants with MAF below or equal to this threshold.

Max. MAF (GnomAD Subpopulation)

Same as above, but for frequencies in Selected Subpopulation

GnomAD Subpopulation

Selected subpopulation on GnomAD

Session ID

Session ID

Uniquely identifies your work session for

  • the Client

  • the Reference Panel Provider

  • the Third-Party Server

Least Severe Consequence

Least Severe Consequence

When selecting variants, the Client and the Reference Panel Provider will only keep variants with Consequence above or equal to this threshold.

AES Key

AES Key

Shared between

  • the Client

  • the Third-Party Server

Data exchanged are encrypted/decrypted using this key.

Thus the Reference Panel Provider (that serves as a bridge) cannot read these data.

Limit variants to SNVs ?

Limit To SNVs ?

When selecting variants, the Client and the Reference Panel Provider will only keep variants that are SNVs.

Algorithm Parameters

Algorithm Parameters

The algorithm and parameters that will be used by the Third-Party Server.

Genotype Filename

The Genotype File that was/will be used to extract/hash the data matching selection criteria.

GnomAD filename

GnomAD binary file

Bed of well covered position

Bed file of well covered positions

Hash Key

Hash Key

Shared between

  • the Client

  • the Reference Panel Provider

This key will be used to hash gene names and variant information, so that the Third-Party Server can do comparison and computing while not being able the read data either from the Client or the Reference Panel Provider.

Public RSA Key

Public RSA Key

the Reference Panel Provider uses this key to encrypt the Hash Key, so that it is not legible on the network

Private RSA Key

Private RSA Key

the Client use this key to decrypt the Hash Key.

Only the Client know this key.

Third Party Public Key

Third Party Public Key

This key is used to encrypt your AES key and share it with the Third-Party Server.

This encryption prevents the Reference Panel Provider from reading the AES key.

Last Known Status

Last Known Status

The Last known message sent by the Reference Panel Provider.

Application Log

Session Log.

  • Messages are in black.

  • Successful operations are in green

  • Errors are in red

PrivAS Client’s Command Lines

Main Command

Launch the Client GUI

java -jar PrivAS.Client.jar [Directory]

[Directory]

initial working directory (Optional, default is current directory)

Tools

Perform DEFAULT Quality Control on a VEP Annotated VCF file and convert the result to genotype file

java -jar PrivAS.Client.jar vcf2genotypes input.vcf(.gz) GnomADFile.bin

input.vcf(.gz)

The Input VCF file (must have been annotated with vep)

GnomADFile.bin

The GnomAD file to use for the frequency annotation

Perform a Quality Control on a VCF file

java -jar PrivAS.Client.jar vcf2qc input.vcf(.gz) qc.param

input.vcf(.gz)

The Input VCF file (must have been annotated with vep)

qc.param

The file containing the QC parameters to apply

Convert a QCed VEP Annotated VCF file to a genotypes file

java -jar PrivAS.Client.jar qc2genotypes vep_annotated_QCed_file.vcf(.gz) GnomADFile.bin

vep_annotated_QCed_file.vcf(.gz)

The input VCF File (must result from a PrivAS QC and thus is annotated with vep)

GnomADFile.bin

The GnomAD file to use for the frequency annotation

Creates an annotation binary file from lists of GnomAD (exome/genome) VCF files

java -jar PrivAS.Client.jar gnomad gnomADVersion listExomeVCFFiles.list listGenomeVCFFiles.list output.bin

gnomADVersion

The name of the GnomAD Version

listExomeVCFFiles.list

File listing input GnomAD Exome files (one path per line)

listGenomeVCFFiles.list

File listing input GnomAD Genome files (one path per line)

output.bin

The name of the resulting binary file