HPC: Run Spark Clusters on SLURM – Reproducible Setup with Pixi and sparkhpc

March 31, 2026 · 7 min read

Founder at G Labs

Running distributed Spark workloads on HPC clusters is a common task in bioinformatics and data science. However, integrating Spark with SLURM—the dominant HPC job scheduler—requires careful orchestration: you need to allocate compute resources via SLURM, start a Spark master, coordinate worker processes, and ensure all dependencies (Java, PySpark, Python) are available. This post shows how to set up reproducible Spark clusters on SLURM using Pixi for environment management and sparkhpc for cluster orchestration, based on the gkit Spark-on-SLURM implementation.

1. The Challenge: Spark + SLURM Integration

1.1. Why Spark on SLURM?

On most HPC clusters, SLURM controls resource allocation and job execution. If you want to run Spark:

You can't just launch Spark master and worker processes directly—SLURM must allocate the resources first
Dependencies (Java, PySpark, Python) must be available and consistent across all compute nodes
The Spark driver needs to discover and connect to the master and worker nodes
If jobs fail or time out, you need deterministic cleanup and logging

1.2. The Traditional Approach (Manual and Fragile)

Without a proper orchestration layer, developers often do this:

# Manually request resources
salloc -N 2 -c 4 --time=00:30:00

# SSH to first node, start Spark master manually
ssh node1
SPARK_HOME=/opt/spark ./bin/spark-class org.apache.spark.deploy.master.Master

# SSH to second node, start worker manually
ssh node2
SPARK_HOME=/opt/spark ./bin/spark-class org.apache.spark.deploy.worker.Worker \
    spark://node1:7077 --cores 4

# Run your driver script
python my_analysis.py

# Manual cleanup
# Kill processes if they're still running...

Problems:

Error-prone (easy to forget ports, hostnames, cleanup)
Not reproducible (different developers may set it up differently)
Dependencies scattered across system paths or manual installations
Difficult to integrate into automated pipelines

1.3. The Better Approach: Pixi + sparkhpc

# Single command sets everything up with reproducible dependencies
make setup

# Single command runs the entire workflow
make example

Behind the scenes:

Pixi creates an isolated environment with Python, Java, and PySpark pinned to specific versions
sparkhpc submits a SLURM batch job that orchestrates Spark startup
A Python driver connects, runs computations, and cleans up automatically

2. Understanding the Architecture

2.1. The Component Stack

┌─────────────────────────────────────────────┐
│ User Machine (submit dir)                   │
│ ┌───────────────────────────────────────┐   │
│ │ Python Driver (run_example.py)        │   │
│ └───────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
                      ↓ (sparkhpc.submit())
┌─────────────────────────────────────────────┐
│ SLURM Controller                            │
│ (sbatch sparkjob.slurm.template)            │
└─────────────────────────────────────────────┘
                      ↓ (srun)
┌─────────────────────────────────────────────┐
│ Spark Master (Node 1)                       │
│ Spark Workers (Nodes 1-N via srun)          │
│                                             │
│ Environment: Python, Java, PySpark          │
│ (from Pixi)                                 │
└─────────────────────────────────────────────┘

2.2. Key Files in the gkit Spark-on-SLURM Setup

spark-on-slurm/
├── pixi.toml                      # Environment definition
├── Makefile                       # User entry points
├── sparkhpc/
│   ├── run_example.py             # End-to-end workflow
│   ├── sparkhpc/
│   │   ├── sparkjob.py            # Base Spark job class
│   │   ├── slurmsparkjob.py       # SLURM-specific implementation
│   │   └── templates/
│   │       └── sparkjob.slurm.template  # SLURM batch script template
│   └── scripts/
└── sparkhpc.log                   # Cluster logs

3. Reproducibility with Pixi

3.1. What is Pixi?

Pixi is a cross-platform package manager that creates reproducible environments. Unlike venv or conda, Pixi locks all transitive dependencies, ensuring identical setups across machines.

3.2. The pixi.toml Configuration

[workspace]
authors = ["nttg8100 <nttg8100@gmail>"]
channels = ["conda-forge", "bioconda"]
name = "spark-on-slurm"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]
sparkhpc-example = "python sparkhpc/run_example.py"

[dependencies]
python = "3.11.*"
openjdk = "==17.0.18"
pyspark = ">=4.1.1,<5"

Key aspects:

Channels: conda-forge and bioconda provide pre-built packages (including Java and Spark)
Pinned versions: openjdk = "==17.0.18" ensures exact Java version across all runs
Task definition: sparkhpc-example command runs the Python driver inside the Pixi environment
Platform: linux-64 ensures reproducibility on HPC clusters (typically Linux)

3.3. Environment Setup

# Install Pixi (first time)
curl -sSL https://pixi.sh/install.sh | sh

# Install dependencies into pixi.lock
pixi install

# Run a command in the Pixi environment
pixi run sparkhpc-example
# or via Makefile
make example

When pixi run sparkhpc-example executes:

Pixi activates the locked environment (Python 3.11, OpenJDK 17, PySpark 4.1)
JAVA_HOME is automatically set
SPARK_HOME is resolved from the PySpark installation
The Python driver runs in this isolated context

4. Spark Cluster Orchestration with sparkhpc

4.1. The sparkjob Class

The sparkjob class wraps Spark on SLURM by:

Generating a SLURM batch script from a template
Submitting the batch job via sbatch
Polling for the Spark master URL (read from a metadata file)
Starting a PySpark context that connects to the master
Stopping the cluster and cleaning up when done

Example usage (from run_example.py):

from sparkhpc import sparkjob

# Create a Spark job: 2 cores total, 2 cores per executor, 10 min walltime
sj = sparkjob.sparkjob(ncores=2, cores_per_executor=2, walltime="00:10")

# Submit to SLURM
cluster_id = sj.submit()
print(f"submitted cluster_id={cluster_id} jobid={sj.jobid}")

# Poll until master starts (max 3 minutes)
started = False
deadline = time.time() + 180
while time.time() < deadline:
    master = sj.master_url()
    if master:
        print(f"master={master}")
        started = True
        break
    time.sleep(1)

if not started:
    sj.stop()
    raise RuntimeError("Spark master did not start in time")

# Start PySpark context
sc = sj.start_spark(graphframes_package=None)

# Run Spark actions
count_result = sc.parallelize(range(100)).count()
sum_result = sc.parallelize(range(1, 11)).sum()
print(f"count={count_result}")
print(f"sum={sum_result}")

# Always clean up
try:
    # ... run spark tasks ...
finally:
    sc.stop()
    sj.stop()
    print("cluster stopped")

4.2. The SLURM Batch Script Template

Behind the scenes, sparkhpc generates a SLURM script that looks like:

#!/bin/bash
#SBATCH --job-name=sparkjob
#SBATCH --nodes=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:10:00
#SBATCH --output=sparkcluster-%j.log

# Start Spark master on this node
export SPARK_HOME=/path/to/pyspark
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.master.Master \
    --host $(hostname) \
    --port 7077 \
    --webui-port 8080 \
    > master.log 2>&1 &
MASTER_PID=$!

# Extract master URL and write to metadata file
sleep 2
MASTER_URL="spark://$(hostname):7077"
echo $MASTER_URL > $HOME/.sparkhpc_${SLURM_JOBID}_master

# Start Spark workers via srun (parallel on allocated nodes)
srun $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker \
    --cores 2 \
    $MASTER_URL

# Cleanup
wait $MASTER_PID

Key points:

Master binds to the node's hostname and port 7077
Master URL is written to a metadata file for the driver to discover
srun parallelizes worker startup across allocated nodes
All processes run within the SLURM allocation and clean up when the job ends

4.3. How the Driver Discovers the Master

The master_url() method polls for the metadata file:

def master_url(self):
    """Check if master URL is available in metadata file."""
    metadata_path = f"{os.path.expanduser('~')}/.sparkhpc_{self.jobid}_master"
    if os.path.exists(metadata_path):
        with open(metadata_path, 'r') as f:
            return f.read().strip()
    return None

This avoids hardcoding hostnames (which vary across clusters) and allows the driver and master to run asynchronously.

5. Running the Example Locally (or on a SLURM Cluster)

5.1. Local Setup

If you have a local SLURM cluster (e.g., via Docker):

cd spark-on-slurm

# Install dependencies
make setup

# Run the example
make example

Expected output:

submitted cluster_id=abc123 jobid=12345
master=spark://node1:7077
count=100
sum=55
cluster stopped

5.2. On a Real HPC Cluster

The same commands work on any SLURM-managed HPC cluster:

# Log in to the cluster
ssh user@hpc.example.com
cd spark-on-slurm

# First time: install Pixi and dependencies
make setup

# Run Spark via SLURM
make example

# View Spark master logs if needed
tail -f sparkhpc/sparkcluster-*.log

5.3. Cleaning Up

# Remove generated artifacts
make clean

# Manually check for lingering jobs
squeue -u $USER

6 Key Takeaways

Pixi provides reproducible environments — all dependencies (Python, Java, PySpark) locked in pixi.lock
sparkhpc orchestrates Spark on SLURM — handles SLURM batch submission, master startup, worker coordination
Metadata files enable driver-master discovery — no hardcoding of hostnames or ports
Single command to run — make setup then make example or pixi run sparkhpc-example
Suitable for HPC pipelines — extends beyond local testing to real clusters with thousands of cores

By combining Pixi's reproducibility with sparkhpc's SLURM orchestration, you can build reliable, auditable Spark workflows that scale from laptops to production HPC clusters.

References

Pixi Documentation — Reproducible environment management
sparkhpc GitHub Repository — Original Spark-on-HPC framework
gkit Spark-on-SLURM — Updated implementation with modern Spark/SLURM support
Apache Spark on HPC Clusters — Official Spark documentation
SLURM Workload Manager — HPC job scheduling and resource management
Building a Slurm HPC Cluster (Part 1) — Foundation for SLURM deployment
HPC: Test Ansible Playbook With Molecule — Automated testing for HPC infrastructure

1. The Challenge: Spark + SLURM Integration​

1.1. Why Spark on SLURM?​

1.2. The Traditional Approach (Manual and Fragile)​

1.3. The Better Approach: Pixi + sparkhpc​

2. Understanding the Architecture​

2.1. The Component Stack​

2.2. Key Files in the gkit Spark-on-SLURM Setup​

3. Reproducibility with Pixi​

3.1. What is Pixi?​

3.2. The pixi.toml Configuration​

3.3. Environment Setup​

4. Spark Cluster Orchestration with sparkhpc​

4.1. The sparkjob Class​

4.2. The SLURM Batch Script Template​

4.3. How the Driver Discovers the Master​

5. Running the Example Locally (or on a SLURM Cluster)​

5.1. Local Setup​

5.2. On a Real HPC Cluster​

5.3. Cleaning Up​

6 Key Takeaways​

References​