Skills
A curated collection of technical skills and tutorials for bioinformatics, organized by topic. All posts are hands-on guides with practical examples.
Version Control & Git
- The Evolution of Version Control - Git's Role in Reproducible Bioinformatics (Part 1)
- The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
- Running GitHub Actions Locally with act: 5x Faster Development
Containers & Docker
- Containers in Bioinformatics: Community Tooling and Efficient Docker Building
- Containers on HPC: From Docker to Singularity and Apptainer
- Docker Out of Docker: Running Interactive Web Applications for Data Analysis
Package Management & Environment Setup
- Pixi - New conda era
- Upgrade Your Shell: From Bash to Zsh for a Better Terminal Experience
- Setting Up a Local Nextflow Training Environment with Code-Server and HPC
Nextflow & Workflow Management
- How to Migrate from In-House Pipelines to Enterprise-Level Workflows: A Proven 3-Step Validation Framework
- Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
- Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)
Variant Calling Pipeline
- Variant Calling (Part 1): Building a Reproducible GATK Variant Calling Bash Workflow with Pixi - Academic proof-of-concept implementation
- Variant Calling (Part 2): From Bash to Nextflow: GATK Best Practice With Nextflow - MD5 validation and scientific equivalence testing
- Variant Calling (Part 3): Production Scale HPC Deployment and Performance Optimization - SLURM, resource optimization, and scaling to 100+ samples
- Variant Calling (Part 4): Test/Lint Your Nextflow Workflow - nf-test, nf-lint, and workflow quality practices
- Variant Calling (Part 5): Benchmarking Germline Variant Calling with nf-core/sarek - GIAB HG002 truth set, hap.py benchmarking, >99% SNP/INDEL accuracy
- Variant Calling (Part 6): Do we really need complex pipelines to achieve high-quality variant calling? - DeepVariant and FreeBayes simplified workflows vs nf-core/sarek
- Variant Calling (Part 7): Variant Annotation with VEP and SnpSift - Functional prediction and variant database integration
- Variant Calling (Part 8): Structural Variant Calling Short Read Benchmark - Manta SV calling, Truvari benchmarking, HG002 GIAB SV truth set
Slurm & HPC Clusters
- Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals
- Building a Slurm HPC Cluster (Part 2) - Scaling to Production with Ansible
- Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices
CI/CD & Testing
- The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
- Running GitHub Actions Locally with act: 5x Faster Development
- Bioinformatics Workflow Template: Standardizing Python Pipelines with Modular Design
Machine Learning & Data Analysis
- Introduction to AI/ML in Bioinformatics: Classification Models & Evaluation
- Machine Learning in Bioinformatics Part 1: Building KNN from Scratch
Data Management & Cloud Storage
- Working with Remote Files using bcftools and samtools (HTSlib) - S3, HTTP, and cloud file access