Skip to main content

3 posts tagged with "genomics"

View All Tags

Variant Calling (Part 11): Population-Scale Genotyping Using gVCF and Joint Variant Calling

· 21 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

Population-scale variant calling is a critical step in building genomic population projects. While single-sample variant calling is well established, scaling joint genotyping to thousands of WGS samples introduces challenges in performance, storage, and incremental updates. In this blog, I explore gVCF-based joint variant calling approaches and evaluate scalable solutions using modern open-source tools. I also discuss practical architecture considerations to efficiently construct population-scale genomics projects.

Variant Calling (Part 1): Building a Reproducible GATK Variant Calling Bash Workflow with Pixi

· 19 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

This blog is designed as a practical starting point for building bioinformatics workflows focused on germline variant calling. You'll begin with a straightforward, standard approach using bash and reproducible environments. In future posts, we'll explore how to transition to best-practice workflow management with Nextflow, allowing for further optimization, customization, and integration of additional tools to enhance workflow quality.

Working with Remote Files using bcftools and samtools (HTSlib)

· 18 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

HTSlib-based tools like bcftools and samtools provide powerful capabilities for working with genomic data stored on remote servers. Whether your data is in AWS S3, accessible via FTP, or hosted on HTTPS endpoints, these tools allow you to efficiently query and subset remote files without downloading entire datasets. This guide covers authentication, remote file access patterns, and practical workflows.