Skip to main content

Variant Calling (Part 5): Benchmarking Germline Variant Calling with nf-core/sarek

· 13 min read
Thanh-Giang Tan Nguyen
Founder at RIVER

We have a series for variant calling, however, one of the most important thing is not about how your workflow is advanced with modern framework, it is about the scoring system that show your workflow achieve high quality score with gold standard criteria. Therefore, in this series, I used the Genome In A Bottle (GIAB) with the truth set variants are curated. With nf-core/sarek-it shows the consistency with the results has been published in its paper with more than 99% in SNP/INDEL variants.

tip

Clone the repository

git clone git@github.com:nttg8100/nf-germline-short-read-variant-calling.git -b 0.3.0
cd nf-germline-short-read-variant-calling
cd benchmark/pipelines/sarek

1. Prepare Dataset

Prepare the input.csv as below, nextflow can automatically download the dataset

patient,sex,status,sample,lane,fastq_1,fastq_2
HG002,XY,0,HG002,1,s3://human-pangenomics/NHGRI_UCSC_panel/HG002/hpp_HG002_NA24385_son_v1/ILMN/downsampled/HG002_HiSeq30x_subsampled_R1.fastq.gz,s3://human-pangenomics/NHGRI_UCSC_panel/HG002/hpp_HG002_NA24385_son_v1/ILMN/downsampled/HG002_HiSeq30x_subsampled_R2.fastq.gz

The benchmark uses the HG002 sample from the Genome in a Bottle (GIAB) consortium. HG002 is widely used as a gold standard dataset for benchmarking germline variant calling pipelines because it provides:

  • High-confidence truth variants
  • High-confidence confident regions
  • Well-curated reference datasets for SNPs and indels

Dataset characteristics:

  • Sample: HG002 (NA24385)
  • Sequencing: Illumina short reads
  • Coverage: ~30X
  • Reference genome: GRCh38
  • Truth set: GIAB high-confidence variant calls

The truth VCF and confident region BED files are used to compare predicted variants against the gold standard.

2. Run The nf-core/sarek

I allocated more computing resources so it can be ran faster with the below config customized_labels.config

// HPC internet is quite slow, so we increase the pull timeout for singularity containers
singularity.pullTimeout="100.h"
process {
cpus = { 2 * task.attempt }
memory = { 4.GB * task.attempt }
time = { 4.h * task.attempt }

// memory errors which should be retried. otherwise error out
errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'

// Process-specific resource requirements
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:error_ignore {
errorStrategy = 'ignore'
}
withLabel:error_retry {
errorStrategy = 'retry'
maxRetries = 2
}
withLabel:process_single {
cpus = { 3 }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_low {
cpus = { 6 * task.attempt }
memory = { 12.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_medium {
cpus = { 64 * task.attempt }
memory = { 120.GB * task.attempt }
time = { 8.h * task.attempt }
}
withLabel:process_high {
cpus = { 64 * task.attempt }
memory = { 120.GB * task.attempt }
time = { 16.h * task.attempt }
}
withLabel:process_long {
time = { 20.h * task.attempt }
}
withLabel:process_high_memory {
memory = { 200.GB * task.attempt }
}
}
process.executor="slurm"
cd benchmark/pipelines/sarek
pixi run nextflow run nf-core/sarek -r 3.7.1 -profile docker,test_full_germline \
-c customized_labels.config \
--input ./input.csv \
--outdir bwamem \
--tools "deepvariant,freebayes,strelka,manta,haplotypecaller" \
--aligner "bwa-mem" \
-resume

The results then are copied to the corrected folder on benchmark

├── pipelines
│ └── sarek
│ ├── customized_labels.config
│ ├── input.csv
│ ├── pixi.lock
│ ├── pixi.toml
│ ├── run_pipeline.sh
│ └── variant_calling
│ ├── deepvariant
│ │ └── HG002
│ │ ├── HG002.deepvariant.g.vcf.gz
│ │ ├── HG002.deepvariant.g.vcf.gz.tbi
│ │ ├── HG002.deepvariant.vcf.gz
│ │ └── HG002.deepvariant.vcf.gz.tbi
│ ├── filtered
│ │ └── HG002
│ │ ├── HG002.deepvariant.bcftools_filtered.vcf.gz
│ │ ├── HG002.deepvariant.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.freebayes.filtered.bcftools_filtered.vcf.gz
│ │ ├── HG002.freebayes.filtered.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.haplotypecaller.filtered.bcftools_filtered.vcf.gz
│ │ ├── HG002.haplotypecaller.filtered.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.strelka.variants.bcftools_filtered.vcf.gz
│ │ └── HG002.strelka.variants.bcftools_filtered.vcf.gz.tbi
│ ├── freebayes
│ │ └── HG002
│ │ ├── HG002.freebayes.filtered.vcf.gz
│ │ ├── HG002.freebayes.filtered.vcf.gz.tbi
│ │ ├── HG002.freebayes.vcf.gz
│ │ └── HG002.freebayes.vcf.gz.tbi
│ ├── haplotypecaller
│ │ └── HG002
│ │ ├── HG002.haplotypecaller.filtered.vcf.gz
│ │ ├── HG002.haplotypecaller.filtered.vcf.gz.tbi
│ │ ├── HG002.haplotypecaller.vcf.gz
│ │ └── HG002.haplotypecaller.vcf.gz.tbi
│ ├── manta
│ │ └── HG002
│ │ ├── HG002.manta.diploid_sv.vcf.gz
│ │ └── HG002.manta.diploid_sv.vcf.gz.tbi
│ └── strelka
│ └── HG002
│ ├── HG002.strelka.genome.vcf.gz
│ ├── HG002.strelka.genome.vcf.gz.tbi
│ ├── HG002.strelka.variants.vcf.gz
│ └── HG002.strelka.variants.vcf.gz.tbi

3. Benchmark With HG002 Dataset

3.1 Download Truth Set And Reference Data

It is for proof of concept on how to benchmark the data, we ran with only with a simple, there are more many samples from GIAB or another public truth datasets can be used by the downloading script download.sh in reference folder

# download GATK reference genome
aws s3 cp --recursive --no-sign-request s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/ .

# download reference files for benchmarking
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
# download bed files for confident regions
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed

Download the data by running this command

cd benchmark/references
pixi run download.sh

The downloaded files are showed as below

benchmark/
├── Readme.md
├── benchmark.sh
├── benchmark_rewrite.sh
├── pipelines
│ └── sarek
│ ├── customized_labels.config
│ ├── input.csv
│ ├── pixi.lock
│ ├── pixi.toml
│ ├── run_pipeline.sh
│ └── variant_calling
│ ├── deepvariant
│ │ └── HG002
│ │ ├── HG002.deepvariant.g.vcf.gz
│ │ ├── HG002.deepvariant.g.vcf.gz.tbi
│ │ ├── HG002.deepvariant.vcf.gz
│ │ └── HG002.deepvariant.vcf.gz.tbi
│ ├── filtered
│ │ └── HG002
│ │ ├── HG002.deepvariant.bcftools_filtered.vcf.gz
│ │ ├── HG002.deepvariant.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.freebayes.filtered.bcftools_filtered.vcf.gz
│ │ ├── HG002.freebayes.filtered.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.haplotypecaller.filtered.bcftools_filtered.vcf.gz
│ │ ├── HG002.haplotypecaller.filtered.bcftools_filtered.vcf.gz.tbi
│ │ ├── HG002.strelka.variants.bcftools_filtered.vcf.gz
│ │ └── HG002.strelka.variants.bcftools_filtered.vcf.gz.tbi
│ ├── freebayes
│ │ └── HG002
│ │ ├── HG002.freebayes.filtered.vcf.gz
│ │ ├── HG002.freebayes.filtered.vcf.gz.tbi
│ │ ├── HG002.freebayes.vcf.gz
│ │ └── HG002.freebayes.vcf.gz.tbi
│ ├── haplotypecaller
│ │ └── HG002
│ │ ├── HG002.haplotypecaller.filtered.vcf.gz
│ │ ├── HG002.haplotypecaller.filtered.vcf.gz.tbi
│ │ ├── HG002.haplotypecaller.vcf.gz
│ │ └── HG002.haplotypecaller.vcf.gz.tbi
│ ├── manta
│ │ └── HG002
│ │ ├── HG002.manta.diploid_sv.vcf.gz
│ │ └── HG002.manta.diploid_sv.vcf.gz.tbi
│ └── strelka
│ └── HG002
│ ├── HG002.strelka.genome.vcf.gz
│ ├── HG002.strelka.genome.vcf.gz.tbi
│ ├── HG002.strelka.variants.vcf.gz
│ └── HG002.strelka.variants.vcf.gz.tbi
├── pixi.lock
├── pixi.toml
├── references
│ ├── HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
│ ├── HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
│ ├── HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed
│ ├── Homo_sapiens_assembly38.dict
│ ├── Homo_sapiens_assembly38.fasta
│ ├── Homo_sapiens_assembly38.fasta.fai
│ ├── download.sh
│ ├── pixi.lock
│ └── pixi.toml

3.2 Tool For Benchmarking

Here, we use hap.py is the tool that help to benchmark for germline variant calling with SNP and INDEL variants. For more detail, take a look at the original paper here https://www.nature.com/articles/s41587-019-0054-x.

What it does:

  • Reformat to find the match between the query and the truth set variant record on the vcf file
  • You can see the below example where these 2 variants are exactly matched with each others, however, it was represented by different position, alternative nucleotides
  • Reformat the variant records, then, hap.py calculates the metrics that can be represented to how good each tools for variant calling are with hap.py

Example output table:

TypeFilterTRUTH.TOTALTRUTH.TPTRUTH.FNQUERY.TOTALQUERY.FPQUERY.UNKFP.gtFP.alMETRIC.RecallMETRIC.PrecisionMETRIC.Frac_NAMETRIC.F1_ScoreTRUTH.TOTAL.TiTv_ratioQUERY.TOTAL.TiTv_ratioTRUTH.TOTAL.het_hom_ratioQUERY.TOTAL.het_hom_ratio
INDELALL5254695200485421908475294936379820104550.9896840.99458600000000010.4004490.9921291.5282757341385371.9313187724266925
INDELPASS5254695200485421908475294936379820104550.9896840.99458600000000010.4004490.9921291.5282757341385371.9313187724266925
SNPALL33651273344549205783841759529249017614015620.9938850.99842099999999990.1275920.99614799999999992.10012848675757451.98398438500830371.58119585325411821.4858796715961773
SNPPASS33651273344549205783841759529249017614015620.9938850.99842099999999990.1275920.99614799999999992.10012848675757451.98398438500830371.58119585325411821.4858796715961773
info
  • Happy outputs columns are explained by its documentation as below

Happy outputs a set of stratification columns, followed by metrics columns. Stratification columns may contain the placeholder "*" value, which indicates that counts in this row are aggregated over all possible values of the column. In case of a subtype, this means that the counts have been summed across all subtypes. In case of QQ, it means that the counts correspond to all variants rather than only ones below a QQ threshold.

Stratification ColumnDescription
TypeVariant type (SNP / INDEL)
SubtypeVariant Subtype (ti/tv/indel length, see above
SubsetSubset of the genome/stratification region
FilterVariant filters: PASS, SEL, ALL, or a particular filter from the query VCF
GenotypeGenotype of benchmarked variants (het / homalt / hetalt)
QQ.FieldWhich field from the original VCF was used to produce QQ values in truth and query
QQQQ threshold for ROC values
Metric ColumnDescription
METRIC.RecallRecall for truth variant representation = TRUTH.TP / (TRUTH.TP + TRUTH.FN)
METRIC.PrecisionPrecision of query variants = QUERY.TP / (QUERY.TP + QUERY.FP)
METRIC.Frac_NAFraction of non-assessed query calls = QUERY.UNK / QUERY.TOTAL
METRIC.F1_ScoreHarmonic mean of precision and recall = 2METRIC.RecallMetric.Precision/(METRIC.Recall + METRIC.Precision)
TRUTH.TOTALTotal number of truth variants
TRUTH.TPNumber of true-positive calls in truth representation (counted via the truth sample column)
TRUTH.FNNumber of false-negative calls = calls in truth without matching query call
QUERY.TOTALTotal number of query calls
QUERY.TPNumber of true positive calls in query representation (counted via the query sample column)
QUERY.FPNumber of false-positive calls in the query file (mismatched query calls within the confident regions)
QUERY.UNKNumber of query calls outside the confident regions
FP.gtNumber of genotype mismatches (alleles match, but different zygosity)
FP.alNumber of allele mismatches (variants matched by position and not by haplotype)
TRUTH.TOTAL.TiTv_ratioTransition / Transversion ratio for all truth variants
TRUTH.TOTAL.het_hom_ratioHet/Hom ratio for all truth variants
TRUTH.FN.TiTv_ratioTransition / Transversion ratio for false-negative variants
TRUTH.FN.het_hom_ratioHet/Hom ratio for false-negative variants
TRUTH.TP.TiTv_ratioTransition / Transversion ratio for true positive variants
TRUTH.TP.het_hom_ratioHet/Hom ratio for true positive variants
QUERY.FP.TiTv_ratioTransition / Transversion ratio for false positive variants
QUERY.FP.het_hom_ratioHet/Hom ratio for false-positive variants
QUERY.TOTAL.TiTv_ratioTransition / Transversion ratio for all query variants
QUERY.TOTAL.het_hom_ratioHet/Hom ratio for all query variants
QUERY.TP.TiTv_ratioTransition / Transversion ratio for true positive variants (query representation)
QUERY.TP.het_hom_ratioHet/Hom ratio for true positive variants (query representation)
QUERY.UNK.TiTv_ratioTransition / Transversion ratio for unknown variants
QUERY.UNK.het_hom_ratioHet/Hom ratio for unknown variants
Subset.SizeWhen using stratification regions, this gives the number of nucleotides contained in the current subset
Subset.IS_CONF.SizeThis gives the number of confident bases (-f regions) in the current subset

3.3 Script And Environment Setup For Benchmarking

3.3.1 Install

#!/bin/bash
# Benchmark script for HG002 with DeepVariant, FreeBayes, HaplotypeCaller, Manta, Strelka (relative paths)
set -euo pipefail

HGREF="references/Homo_sapiens_assembly38.fasta"
REF="HG002"
BED="references/${REF}_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed"
TRUTH="references/${REF}_GRCh38_1_22_v4.2.1_benchmark.vcf.gz"
SDF="references/grch38.sdf"
THREADS=16

RESULTSDIR="results"
mkdir -p "$RESULTSDIR"

TOOLS=(deepvariant freebayes haplotypecaller manta strelka)

VCF_DEEPVARIANT="pipelines/sarek/variant_calling/deepvariant/HG002/HG002.deepvariant.vcf.gz"
VCF_FREEBAYES="pipelines/sarek/variant_calling/freebayes/HG002/HG002.freebayes.vcf.gz"
VCF_HAPLOTYPECALLER="pipelines/sarek/variant_calling/haplotypecaller/HG002/HG002.haplotypecaller.vcf.gz"
VCF_MANTA="pipelines/sarek/variant_calling/manta/HG002/HG002.manta.diploid_sv.vcf.gz"
VCF_STRELKA="pipelines/sarek/variant_calling/strelka/HG002/HG002.strelka.variants.vcf.gz"

# Path map
declare -A VCF_PATHS=(
[deepvariant]="$VCF_DEEPVARIANT"
[freebayes]="$VCF_FREEBAYES"
[haplotypecaller]="$VCF_HAPLOTYPECALLER"
[manta]="$VCF_MANTA"
[strelka]="$VCF_STRELKA"
)

for TOOL in "${TOOLS[@]}"; do
TOOLDIR="$RESULTSDIR/$TOOL"
mkdir -p "$TOOLDIR"
VCF="${VCF_PATHS[$TOOL]}"
LOG="$TOOLDIR/${REF}_${TOOL}.log"
SUMMARY="$TOOLDIR/${REF}_${TOOL}.summary.csv"
FORMATTED="$TOOLDIR/${REF}_${TOOL}.formatted.summary.csv"
OUTPUT="$TOOLDIR/${REF}_${TOOL}"

# If the tool output folder already exists with summary file, continue from AWK formatting (line 55)
if [[ -f "${OUTPUT}.summary.csv" ]]; then
echo "Output for $TOOL already exists, formatting summary."
awk -v d="${TOOL}_${REF}" -F',' 'FNR==1{a="run"} FNR>1{a=d} {print $0",\t"a}' "${OUTPUT}.summary.csv" > "$FORMATTED"
echo "Completed $TOOL (using existing output)"
continue
fi

if [[ ! -f "$VCF" ]]; then
echo "VCF file not found for $TOOL: $VCF" >&2
continue
fi

echo "Benchmarking $TOOL for $REF ..."
hap.py "$TRUTH" "$VCF" \
-o "$OUTPUT" \
-V --engine=vcfeval --threads "$THREADS" --engine-vcfeval-template "$SDF" \
-f "$BED" \
--logfile "$LOG" \
--scratch-prefix .

if [[ -f "${OUTPUT}.summary.csv" ]]; then
awk -v d="${TOOL}_${REF}" -F',' 'FNR==1{a="run"} FNR>1{a=d} {print $0",\t"a}' "${OUTPUT}.summary.csv" > "$FORMATTED"
else
echo "Warning: ${OUTPUT}.summary.csv not found for $TOOL" >&2
fi

echo "Completed $TOOL"
done

MERGED="$RESULTSDIR/merged_${REF}_benchmark.csv"
echo "Merging summaries to $MERGED ..."
awk '(NR == 1) || (FNR > 1)' "$RESULTSDIR"/*/*.formatted.summary.csv > "$MERGED"
echo "All done. Results for $REF in $MERGED"

3.3.2 Running Benchmarking Script

Then you can obtain the result as below

├── results
│ └── sarek
│ ├── deepvariant
│ │ ├── HG002_deepvariant.extended.csv
│ │ ├── HG002_deepvariant.formatted.summary.csv
│ │ ├── HG002_deepvariant.log
│ │ ├── HG002_deepvariant.metrics.json.gz
│ │ ├── HG002_deepvariant.roc.Locations.INDEL.PASS.csv.gz
│ │ ├── HG002_deepvariant.roc.Locations.INDEL.csv.gz
│ │ ├── HG002_deepvariant.roc.Locations.SNP.PASS.csv.gz
│ │ ├── HG002_deepvariant.roc.Locations.SNP.csv.gz
│ │ ├── HG002_deepvariant.roc.all.csv.gz
│ │ ├── HG002_deepvariant.runinfo.json
│ │ ├── HG002_deepvariant.summary.csv
│ │ ├── HG002_deepvariant.vcf.gz
│ │ └── HG002_deepvariant.vcf.gz.tbi
│ ├── freebayes
│ │ ├── HG002_freebayes.extended.csv
│ │ ├── HG002_freebayes.formatted.summary.csv
│ │ ├── HG002_freebayes.log
│ │ ├── HG002_freebayes.metrics.json.gz
│ │ ├── HG002_freebayes.roc.Locations.INDEL.PASS.csv.gz
│ │ ├── HG002_freebayes.roc.Locations.INDEL.csv.gz
│ │ ├── HG002_freebayes.roc.Locations.SNP.PASS.csv.gz
│ │ ├── HG002_freebayes.roc.Locations.SNP.csv.gz
│ │ ├── HG002_freebayes.roc.all.csv.gz
│ │ ├── HG002_freebayes.runinfo.json
│ │ ├── HG002_freebayes.summary.csv
│ │ ├── HG002_freebayes.vcf.gz
│ │ └── HG002_freebayes.vcf.gz.tbi
│ ├── haplotypecaller
│ │ ├── HG002_haplotypecaller.extended.csv
│ │ ├── HG002_haplotypecaller.formatted.summary.csv
│ │ ├── HG002_haplotypecaller.log
│ │ ├── HG002_haplotypecaller.metrics.json.gz
│ │ ├── HG002_haplotypecaller.roc.Locations.INDEL.PASS.csv.gz
│ │ ├── HG002_haplotypecaller.roc.Locations.INDEL.csv.gz
│ │ ├── HG002_haplotypecaller.roc.Locations.SNP.PASS.csv.gz
│ │ ├── HG002_haplotypecaller.roc.Locations.SNP.csv.gz
│ │ ├── HG002_haplotypecaller.roc.all.csv.gz
│ │ ├── HG002_haplotypecaller.runinfo.json
│ │ ├── HG002_haplotypecaller.summary.csv
│ │ ├── HG002_haplotypecaller.vcf.gz
│ │ └── HG002_haplotypecaller.vcf.gz.tbi
│ ├── manta
│ │ ├── HG002_manta.extended.csv
│ │ ├── HG002_manta.formatted.summary.csv
│ │ ├── HG002_manta.log
│ │ ├── HG002_manta.metrics.json.gz
│ │ ├── HG002_manta.roc.Locations.INDEL.PASS.csv.gz
│ │ ├── HG002_manta.roc.Locations.INDEL.csv.gz
│ │ ├── HG002_manta.roc.Locations.SNP.PASS.csv.gz
│ │ ├── HG002_manta.roc.Locations.SNP.csv.gz
│ │ ├── HG002_manta.roc.all.csv.gz
│ │ ├── HG002_manta.runinfo.json
│ │ ├── HG002_manta.summary.csv
│ │ ├── HG002_manta.vcf.gz
│ │ └── HG002_manta.vcf.gz.tbi
│ ├── merged_HG002_benchmark.csv
│ └── strelka
│ ├── HG002_strelka.extended.csv
│ ├── HG002_strelka.formatted.summary.csv
│ ├── HG002_strelka.log
│ ├── HG002_strelka.metrics.json.gz
│ ├── HG002_strelka.roc.Locations.INDEL.PASS.csv.gz
│ ├── HG002_strelka.roc.Locations.INDEL.csv.gz
│ ├── HG002_strelka.roc.Locations.SNP.PASS.csv.gz
│ ├── HG002_strelka.roc.Locations.SNP.csv.gz
│ ├── HG002_strelka.roc.all.csv.gz
│ ├── HG002_strelka.runinfo.json
│ ├── HG002_strelka.summary.csv
│ ├── HG002_strelka.vcf.gz
│ └── HG002_strelka.vcf.gz.tbi

3.3.3 Overview From Output

I benchmarked several popular variant callers—DeepVariant, FreeBayes, HaplotypeCaller, Strelka, and Manta—on the HG002 dataset, evaluating performance separately for SNPs and INDELs using Recall, Precision, and F1-score.

Overall, DeepVariant achieved the best performance, showing the highest balance between recall and precision for both SNPs (F1 ≈ 0.996) and INDELs (F1 ≈ 0.992). HaplotypeCaller and Strelka also performed strongly, with slightly lower recall or precision depending on the variant type. FreeBayes showed comparable recall but noticeably lower precision, indicating more false positives.

SNP detection was consistently strong across most tools (recall ≈ 0.99), while INDEL calling showed more variability, highlighting the greater difficulty of detecting small insertions and deletions accurately.

As expected, Manta performed poorly for SNPs and INDELs, since it is designed primarily for structural variant detection rather than small variants.

Overall, these results confirm that modern machine-learning–based callers like DeepVariant provide the most balanced performance, while traditional callers remain competitive but may require stricter filtering to control false positives.

tip

See the merged_HG002_benchmark.csv is the aggregated results are showed as below

TypeFilterTRUTH.TOTALTRUTH.TPTRUTH.FNQUERY.TOTALQUERY.FPQUERY.UNKFP.gtFP.alMETRIC.RecallMETRIC.PrecisionMETRIC.Frac_NAMETRIC.F1_ScoreTRUTH.TOTAL.TiTv_ratioQUERY.TOTAL.TiTv_ratioTRUTH.TOTAL.het_hom_ratioQUERY.TOTAL.het_hom_ratiorun
INDELALL5254695200485421908475294936379820104550.9896840.9945860.4004490.9921291.5282757341385371.9313187724266925deepvariant_HG002
INDELPASS5254695200485421908475294936379820104550.9896840.9945860.4004490.9921291.5282757341385371.9313187724266925deepvariant_HG002
SNPALL33651273344549205783841759529249017614015620.9938850.9984210.1275920.9961482.10012848675757451.98398438500830371.58119585325411821.4858796715961773deepvariant_HG002
SNPPASS33651273344549205783841759529249017614015620.9938850.9984210.1275920.9961482.10012848675757451.98398438500830371.58119585325411821.4858796715961773deepvariant_HG002
INDELALL52546950443221037881738194413410051387315280.9599650.9640470.3867420.9620021.5282757341385371.8211051374734122freebayes_HG002
INDELPASS52546950443221037881738194413410051387315280.9599650.9640470.3867420.9620021.5282757341385371.8211051374734122freebayes_HG002
SNPALL336512733444732065444232421290289460303762164310.9938620.9628930.2138770.9781332.10012848675757451.81323039339427751.58119585325411821.8836484571228804freebayes_HG002
SNPPASS336512733444732065444232421290289460303762164310.9938620.9628930.2138770.9781332.10012848675757451.81323039339427751.58119585325411821.8836484571228804freebayes_HG002
INDELALL5254695169268543895219634335103737974360.9837420.9883440.3921240.9860381.5282757341385371.8797119780755498haplotypecaller_HG002
INDELPASS5254695169268543895219634335103737974360.9837420.9883440.3921240.9860381.5282757341385371.8797119780755498haplotypecaller_HG002
SNPALL3365127334011025017395719028452587714275822880.9925660.9915560.1485180.9920612.10012848675757451.95601583403091971.58119585325411821.7246457081290798haplotypecaller_HG002
SNPPASS3365127334011025017395719028452587714275822880.9925660.9915560.1485180.9920612.10012848675757451.95601583403091971.58119585325411821.7246457081290798haplotypecaller_HG002
INDELALL5254698525461782111476990421.5e-050.0655740.9844013e-051.5282757341385371.0282749675745784manta_HG002
INDELPASS5254698525461687710667630361.5e-050.0701750.9834233e-051.5282757341385371.060851318944844manta_HG002
SNPALL336512733365124141641408001e-060.50.994352e-062.10012848675757450.90055248618784531.58119585325411821.002828854314003manta_HG002
SNPPASS336512733365124123041222001e-060.50.9934962e-062.10012848675757450.83076923076923081.58119585325411820.9967532467532467manta_HG002
INDELALL5254695155609909875380656933497441495720.9811430.9878440.3826610.9844821.5282757341385371.8953993716115898strelka_HG002
INDELPASS52546951367511794834212340329872224013240.9775550.9936450.3580890.9855351.5282757341385371.8205751890885689strelka_HG002
SNPALL3365127334154623581423360567357821358430277490.9929930.980260.1940090.9865852.10012848675757451.85834356987621651.58119585325411821.8781977072413574strelka_HG002
SNPPASS3365127332344941678367690743893458649097580.9876150.9986820.0940640.9931182.10012848675757452.0387534195530271.58119585325411821.57305061157284strelka_HG002

Recap

info
  • Using modern framwork is good but the more important factor is about how much your pipeline show its accuracy, reliability
  • For developing the pipeline, it should be carefully reviewed and benchmarked with its own gold standard
  • However, using additional steps may not improve the results while consuming more computing resources (higher running cost), next blog will coming soon

References

  1. hap.py tool for benchmarking germline short-read variants
    https://github.com/qbic-projects/QSARK/tree/main

  2. Genome in a Bottle (GIAB) program for gold standard datasets

    oaicite:4

    https://www.nist.gov/programs-projects/genome-bottle

  3. nf-germline-short-read-variant-calling https://github.com/nttg8100/nf-germline-short-read-variant-calling