Skip to main content

Skills

A curated collection of technical skills and tutorials for bioinformatics, organized by topic. All posts are hands-on guides with practical examples.

Version Control & Git

Containers & Docker

Package Management & Environment Setup

Nextflow & Workflow Management

How to Migrate from In-House Pipelines to Enterprise-Level Workflows: A Proven 3-Step Validation Framework
Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

Variant Calling Pipeline

Variant Calling (Part 1): Building a Reproducible GATK Variant Calling Bash Workflow with Pixi - Academic proof-of-concept implementation
Variant Calling (Part 2): From Bash to Nextflow: GATK Best Practice With Nextflow - MD5 validation and scientific equivalence testing
Variant Calling (Part 3): Production Scale HPC Deployment and Performance Optimization - SLURM, resource optimization, and scaling to 100+ samples
Variant Calling (Part 4): Test/Lint Your Nextflow Workflow - nf-test, nf-lint, and workflow quality practices
Variant Calling (Part 5): Benchmarking Germline Variant Calling with nf-core/sarek - GIAB HG002 truth set, hap.py benchmarking, >99% SNP/INDEL accuracy
Variant Calling (Part 6): Do we really need complex pipelines to achieve high-quality variant calling? - DeepVariant and FreeBayes simplified workflows vs nf-core/sarek
Variant Calling (Part 7): Variant Annotation with VEP and SnpSift - Functional prediction and variant database integration
Variant Calling (Part 8): Structural Variant Calling Short Read Benchmark - Manta SV calling, Truvari benchmarking, HG002 GIAB SV truth set

Slurm & HPC Clusters

CI/CD & Testing

Machine Learning & Data Analysis

Data Management & Cloud Storage

Working with Remote Files using bcftools and samtools (HTSlib) - S3, HTTP, and cloud file access

Performance Optimization

Version Control & Git
Containers & Docker
Package Management & Environment Setup
Nextflow & Workflow Management
Variant Calling Pipeline
Slurm & HPC Clusters
CI/CD & Testing
Machine Learning & Data Analysis
Data Management & Cloud Storage
Performance Optimization