tags:
- ngs
- fastq
type:
- course
class: m2ggb
subject:
- bioinformatics
Year: 2023
yaml
1 FastQ Files
contact: ludwig@univ-brest.fr
Copy the archive containing the data.
cd
cp /DATA/bioinfo/data.tar ~
tar xvf data.tar
This archive contains (among other) the FastQ files for 3 individuals
Paired-end sequencing implies that each individuals has 2 FastQ files
The .gz
extension indicates that the file has been compressed
The files are :
child.R1.fastq.gz
child.R2.fastq.gz
father.R1.fastq.gz
father.R2.fastq.gz
mother.R1.fastq.gz
mother.R2.fastq.gz
FastQ files are text file (human readable). It includes the reads coming from the sequencer. Each read is composed of 4 lines :
@
symbol, followed by a sequence identifier.+
symbolzcat DATA/child.R1.fastq.gz | wc -l
bash
zcat DATA/child.R2.fastq.gz | wc -l
bash
FastQ files are a succession of 4 lines groups :
id1
seq1
+
qual1
id2
seq2
+
qual2
...
idn
seqn
+
qualn
by using paste - - - -
we can group data into 4 columns :
id1 seq1 + qual1
id2 seq2 + qual2
...
idn seqn + qualn
The command cut -f
can then extract the selected column
zcat DATA/child.R1.fastq.gz | paste - - - - | cut -f 1 | head
zcat DATA/child.R2.fastq.gz | paste - - - - | cut -f 1 | head
bash
zcat DATA/child.R1.fastq.gz | paste - - - - | cut -f 2 | head
zcat DATA/child.R1.fastq.gz | paste - - - - | cut -f 2 | head -1 | tr -d '\n'| wc -c
bash
zcat DATA/child.R1.fastq.gz | paste - - - - | cut -f 2 | head -100000 | grep -o . | sort | uniq -c
bash
here grep -o .
means "show each character on a line"
head -100000
limits to 100,000 reads for faster processing
zcat DATA/child.R1.fastq.gz | paste - - - - | cut -f 4 | head -100000 | grep -o . | sort | uniq | paste - - - - - -
bash
!
to 126 ~
: 93 scores available Example: J
➡ code ASCII 74
Quality Control of the fastq
files produced by the sequencer
fastqc
bash
File, Open : DATA / child.R1.gz
Usually, we don't want to test each fastq file.
mkdir report
fastqc DATA/*fastq.gz --out report
multiqc report/*.zip
bash
Download the multiqc_report.html :
On Windows
multiqc_report.html
file to windowsOn Mac
open another terminal
type scp -P 4444 XXXXX@methionine:multiqc_report.html .
(where XXXXX is your login)
Open the file
Next Step: 2 Alignments