Skip to main content

7 posts tagged with "slurm"

View All Tags

HPC: Run Spark Clusters on SLURM – Reproducible Setup with Pixi and sparkhpc

· 7 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

Running distributed Spark workloads on HPC clusters is a common task in bioinformatics and data science. However, integrating Spark with SLURM—the dominant HPC job scheduler—requires careful orchestration: you need to allocate compute resources via SLURM, start a Spark master, coordinate worker processes, and ensure all dependencies (Java, PySpark, Python) are available. This post shows how to set up reproducible Spark clusters on SLURM using Pixi for environment management and sparkhpc for cluster orchestration, based on the gkit Spark-on-SLURM implementation.

HPC: Test Ansible Playbook With Molecule – From Manual Vagrant to Automated CI/CD

· 10 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

Testing Ansible playbooks for HPC clusters is challenging. You can manually spin up VMs with Vagrant, but debugging issues across controller and worker nodes takes time. Instead, Molecule provides a repeatable, automated testing framework that validates your playbook configuration before deployment. This post shows how to transition from manual Vagrant testing to Molecule, and then integrate it into GitHub Actions CI/CD.

Variant Calling (Part 3): Production Scale HPC Deployment and Performance Optimization

· 20 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

In Part 1, we built a solid bash baseline. In Part 2, we migrated to Nextflow with MD5 validation. Now it's time to deploy on HPC clusters with SLURM and optimize for production scale: configure executors for small clusters, tune resources per tool, replace bottleneck steps with faster alternatives (fastp + Spark-GATK), and demonstrate scaling from 1 to 100 samples. This practical guide will help you run your variant calling pipeline efficiently on real HPC infrastructure.

Building a Slurm HPC Cluster (Part 3) - Administration and Best Practices

· 13 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

In Part 1 and Part 2, we built a complete Slurm HPC cluster from a single node to a production-ready multi-node system. Now let's learn how to manage, maintain, and secure it effectively.

This final post covers daily administration tasks, troubleshooting, security hardening, and integration with data processing frameworks.

Building a Slurm HPC Cluster (Part 1) - Single Node Setup and Fundamentals

· 8 min read
Thanh-Giang Tan Nguyen
Founder at G Labs

Building a High-Performance Computing (HPC) cluster can seem daunting, but with the right approach, you can create a robust system for managing computational workloads. This is Part 1 of a 3-part series where we'll build a complete Slurm cluster from scratch.

In this first post, we'll cover the fundamentals by setting up a single-node Slurm cluster and understanding the core concepts.