Skip to main content

Skills

A curated collection of technical skills and tutorials for bioinformatics, organized by topic. All posts are hands-on guides with practical examples.

Version Control & Git

  1. The Evolution of Version Control - Git's Role in Reproducible Bioinformatics (Part 1)
  2. The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
  3. Running GitHub Actions Locally with act: 5x Faster Development

Containers & Docker

  1. Containers in Bioinformatics: Community Tooling and Efficient Docker Building
  2. Containers on HPC: From Docker to Singularity and Apptainer
  3. Docker Out of Docker: Running Interactive Web Applications for Data Analysis

Package Management & Environment Setup

  1. Pixi - New conda era
  2. Upgrade Your Shell: From Bash to Zsh for a Better Terminal Experience
  3. Setting Up a Local Nextflow Training Environment with Code-Server and HPC

Nextflow & Workflow Management

  1. How to Migrate from In-House Pipelines to Enterprise-Level Workflows: A Proven 3-Step Validation Framework
  2. Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
  3. Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)

Variant Calling Pipeline

  1. Variant Calling (Part 1): Building a Reproducible GATK Variant Calling Bash Workflow with Pixi - Academic proof-of-concept implementation
  2. Variant Calling (Part 2): From Bash to Nextflow: GATK Best Practice With Nextflow - MD5 validation and scientific equivalence testing
  3. Variant Calling (Part 3): Production Scale HPC Deployment and Performance Optimization - SLURM, resource optimization, and scaling to 100+ samples
  4. Variant Calling (Part 4): Test/Lint Your Nextflow Workflow - nf-test, nf-lint, and workflow quality practices
  5. Variant Calling (Part 5): Benchmarking Germline Variant Calling with nf-core/sarek - GIAB HG002 truth set, hap.py benchmarking, >99% SNP/INDEL accuracy
  6. Variant Calling (Part 6): Do we really need complex pipelines to achieve high-quality variant calling? - DeepVariant and FreeBayes simplified workflows vs nf-core/sarek
  7. Variant Calling (Part 7): Variant Annotation with VEP and SnpSift - Functional prediction and variant database integration
  8. Variant Calling (Part 8): Structural Variant Calling Short Read Benchmark - Manta SV calling, Truvari benchmarking, HG002 GIAB SV truth set

Slurm & HPC Clusters

  1. Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals
  2. Building a Slurm HPC Cluster (Part 2) - Scaling to Production with Ansible
  3. Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

CI/CD & Testing

  1. The Evolution of Version Control - CI/CD in bioinformatics (Part 2)
  2. Running GitHub Actions Locally with act: 5x Faster Development
  3. Bioinformatics Workflow Template: Standardizing Python Pipelines with Modular Design

Machine Learning & Data Analysis

  1. Introduction to AI/ML in Bioinformatics: Classification Models & Evaluation
  2. Machine Learning in Bioinformatics Part 1: Building KNN from Scratch

Data Management & Cloud Storage

  1. Working with Remote Files using bcftools and samtools (HTSlib) - S3, HTTP, and cloud file access

Performance Optimization

  1. Unix Pipes in Bioinformatics: How Streaming Data Reduces Memory and Storage
  2. Bioinformatics Cost Optimization for Computing Resources Using Nextflow (Part 1)
  3. Bioinformatics Cost Optimization For Input Using Nextflow (Part 2)