Workload Management & Orchestration Series: Apolo

If you've worked in high-performance computing (HPC), machine learning operations (MLOps), data science, data management or any scenario that involves vending shared compute resources to a user community, you've probably used various workload management and/or orchestration tools. This series explores some of these at a high level to help you understand each tool's characteristics beyond the marketing hype.

Apolo platform: MLOps and AI orchestration

Apolo (from apolo.us) is a comprehensive MLOps and AI‑orchestration platform designed to optimize the utilization of both on-prem (bare metal) and cloud AI infrastructure. It unifies job scheduling, resource orchestration, workflow automation, and scalable model deployment under a cohesive architecture. Apolo's features are relevant to deploying GPU-as-a-service offerings.

Apolo uses the notion of jobs (execution units), environments (Docker images) and presets (pre-configured resource bundles like CPU, GPU, memory), plus attached block/object storage. It offers a CLI for running jobs, managing storage and presets, and also a web console to access apps like Jupyter, MLflow and Spark.

An end-to-end MLOps ecosystem: Apolo integrates multiple capabilities across the ML lifecycle:

Data preparation and management
Code and model versioning
Training orchestration and tracking via MLflow
Monitoring, explainability, testing and governance
Multi-tenancy, quotas and credit-based usage
Clusters, organizations, node‑pools, user quotas and credentials

Core concepts

Infrastructure requirements

Apolo installs its services onto an existing Kubernetes cluster, requiring standard tools like kubectl, docker and jq on the install VM.

Disk storage must support OpenEBS CStor (on-prem) or be cloud block/object storage, and disks must be attached but not pre-formatted or mounted on nodes. Persistent volumes are provisioned via OpenEBS CStor (on-prem) or cloud block/object storage. Remote storage endpoints are abstracted inside jobs and flows via a volume mechanism (storage:<name>) for network attached storage, enabling persisting code, datasets, model artifacts across jobs and between compute nodes; these endpoints are available via the CLI and API. There is also a block storage (disk:<name>) capability that can be attached to a single workload.

Kubernetes node pools must be configured and labeled to identify available GPU hardware (e.g., GPU model, count) for resource presets. Apolo's control services include an API server, scheduler extensions, authentication (Keycloak and Auth0), database backend (Postgres), Redis metrics/queues, and federated observability (Grafana+Prometheus).

A standalone Linux VM is required to host critical platform services, including a Docker registry, ChartMuseum (Helm chart repo) and DevPI (private Python package index). This VM becomes the local repository source for Apolo operator images, CLI wheels and Helm charts in air-gapped environments.

The auxiliary VM must reside on the same network or a routable segment to Kubernetes nodes. Apolo requires wildcard DNS entries to resolve internal and job endpoints with patterns like *.jobs.default-domain, *.default-domain, etc., that must resolve to the Kubernetes cluster node IPs.

Container runtime and images

Jobs are executed inside containers pulled from either public or private registries. A provided Apolo Base Docker Image (e.g., ghcr.io/neuro-inc/base:latest) comes bundled with popular ML runtimes, including pre-configured Conda environments (base, tf, torch).

Runtime: How user workloads execute

Jobs (via CLI, Console or Flow) are translated into Kubernetes Pods and resource presets define CPU, RAM and GPU requirements tied to labeled node pools. Apolo orchestrates workload placement (aka "scheduling" in Kubernetes parlance) across nodes according to those labels.

Flow pipelines (see next section) orchestrate multi-step jobs with volume mounts, environment variables, and network/dns referencing via live contexts. Apps (e.g., MLflow, vLLM, Jupyter) are deployed via add-on controllers as Helm charts within Kubernetes, using resource presets and PVCs for storage.

Apolo Flow

Apolo Flow turns repeatable ML tasks into structured, versioned, automatable workflows. It is a high-level, YAML-based abstraction built on top of the core Apolo platform that lets users define and parameterize multi-step pipelines (e.g., data prep → training → evaluation → deployment); declare dependencies, shared volumes and images; and execute jobs across CPU/GPU resources with preset-based job placement.

Here is an example Apolo Flow:

kind: batch # specifies kind of workflow to run
title: train-eval

params:
  model_name:
    default: "model.pt"
    descr: "How to name trained model file"
  eval_split:
    default: "test"
    descr: "Which data split to evaluate on"

volumes:
  code:
    remote: storage:train-eval/code
    mount: /code
  dataset:
    remote: storage:train-eval/shared_data # path in Apolo Storage
    mount: /data # where to mount the volume by default
  output:
    remote: storage:train-eval/output
    mount: /output

images:
  base:
    ref: ghcr.io/neuro-inc/base:latest  # Includes Python + ML libraries

tasks:
  - id: preprocess
    image: ${{ images.base.ref }}
    volumes:
      - ${{ volumes.code.ref_ro }}
      - ${{ volumes.data.ref_rw }}
    env:
      PYTHONPATH: /code
    bash: |
      python /code/preprocess.py --input /data/input --output /data/processed

  - id: train
    needs: 
      - preprocess
    image: ${{ images.base.ref }}
    pass_config: true
    volumes:
      - ${{ volumes.code.ref_ro }}
      - ${{ volumes.data.ref_rw }}
      - ${{ volumes.output.ref_rw }}
    env:
      PYTHONPATH: /code
    bash: |
      python /code/train.py --data /data/processed --output /output/${{ params.model_name }}

  - id: evaluate
    needs:
      - train
    image: ${{ images.base.ref }}
    volumes:
      - ${{ volumes.code.ref_ro }}
      - ${{ volumes.data.ref_ro }}
      - ${{ volumes.output.ref_ro }}
    env:
      PYTHONPATH: /code
    bash: |
      python /code/evaluate.py \
        --data /data/processed \
        --model /output/${{ params.model_name }} \
        --split ${{ params.eval_split }}

Run it via:

apolo-flow run evaluate \
  --param dataset_path=/mnt/raw \
  --param model_output=/mnt/models/model-v1.pt \
  --param eval_split=validation

Under jobs: you define each task with:

image: reference (Docker image, either public or image:‑prefixed internal images)

One of: cmd: or bash: or python:, mutually exclusive fields for the type of workload; and:

env: environment variable mappings
http_port: and http_auth: options to expose and secure web endpoints
browse: flag to automatically open endpoint after job launch
life_span: to auto-terminate jobs after a given time
multi: flag to allow multiple simultaneous runs
detach: option to run without attaching terminal input
pass_config: optionally injects Apolo config into a job container

Jobs can reference ${{ params.xxx }} in commands, env, image names, etc. expr contexts let you compute properties dynamically within a workflow execution. There is support for structured batch pipelines via apolo‑flow bake and batches: blocks. Output commands like ::set-output:: and ::save-state:: in job logs are used to pass context/state between tasks.

Apolo CLI vs. Apolo‑Flow CLI: Role comparison

The Apolo CLI is the core interface for interacting with Apolo platform. It is used to manage clusters, jobs, storage, images, services, secrets, disks, etc. Typical commands include:

apolo job run, job logs, job status, job top, ...
apolo storage cp, storage ls, ...
apolo image build, ...
apolo disk create, ...

On the other hand, Aplow-Flow CLI accesses the higher-level workflow engine built on top of the core Apolo CLI and SDK. It controls the YAML-defined workflows and reusable jobs, handling their dependency management, parameterization and batch execution. Typical commands:

apolo-flow init, bake, build, run, ps, status, logs, upload, download, mkvolumes, kill, ...

GPU support, scheduling and resource optimization in Apolo

Apolo works with both GPU-enabled and non-GPU Kubernetes node pools, which must be labeled with the appropriate GPU hardware profile (e.g., GPU model, number of devices per node). These labels are used to bind resource presets for jobs, ensuring GPU workloads land only on nodes with matching capabilities. The scheduling behaves like standard Kubernetes scheduling using those node labels, but controlled through Apolo's CLI and Flow presets.

NVIDIA's KAI scheduler is not currently supported.

Fractional GPU is not currently supported

Unlike platforms such as NVIDIA Run:ai or others offering GPU fractions or time-slicing optimization, Apolo only supports full GPU allocation. Each job must request one or more whole GPU devices matching the node pool labels. Thus, time-sharing or fractional GPU memory/compute slices (like 0.5 GPU) are not supported in Apolo.

Note that MIG (Multi-Instance GPU) slicing, if configured outside of Apolo, can be used to deliver fractional GPUs to workloads.

Non‑CUDA GPU support

Apolo's built-in Docker images and runtime presets are optimized around CUDA-based NVIDIA accelerated solution architectures, including support for PyTorch, TensorFlow and other CUDA-accelerated workloads.

However, Apolo is GPU-type agnostic. AMD- or Intel-based node pools and container images will work with the platform.

Gang scheduling and multi‑GPU jobs

Apolo supports multi‑GPU workloads via presets that request multiple GPUs per job; these are scheduled atomically (i.e., gang scheduling). Apolo can schedule multi-GPU jobs as long as sufficient GPUs are available in a node pool, and your Kubernetes cluster has adequate hardware alignment.

To get the most out of GPU workloads in Apolo: Use GPU nodes with homogeneous topology (e.g., NVIDIA HGX™ systems with NVIDIA NVLink™) so multi-GPU communication is high-speed; match GPU presets carefully — your resource definitions in Apolo must align to labeled nodes (e.g., nvidia.com/gpu: 4), ensuring proper scheduling and isolation; and confirm CUDA drivers and runtime compatibility inside your Docker images.

Summary

In short, Apolo's resource management for GPUs is straightforward and tailored to whole‑GPU scheduling in NVIDIA accelerated solution environments. It lacks fractionalization or MIG slicing techniques, and does not currently support non‑CUDA hardware. For workloads reliant on fine-grained GPU sharing or GPU types outside the NVIDIA ecosystem, Apolo may not meet those needs without custom extensions or engineering.

Model training, inference and deployment

Training

Users launch GPU-accelerated training jobs inside configured environments (e.g., using Jupyter notebooks or MLflow templates). Jobs can attach storage volumes for persistent datasets and model artifacts.

Inference and deployment

Apolo's "Service Deployment" app allows serving containerized workloads — including models — via Kubernetes. It supports autoscaling via Kubernetes' Horizontal Pod Autoscaler (HPA), public domain exposure with authentication, logging and monitoring integrations. Through CLI or Console, you install services (e.g., Stable Diffusion, MLflow, LLM inference services) with governance and metrics built in.

Use case highlights

Scott Data Center deployed Apolo alongside NVIDIA DGX™ H100 systems to offer GPU-as-a-Service with strong governance, multi-tenancy and MLOps tooling.

Cato Digital uses Apolo for sustainable bare-metal AI processing in regional data centers.

Other enterprise users deploy Apolo for RAG applications, multimodal PDF analysis and full ML-life‑cycle workflows inside secure on-premises environments.

GPUaaS considerations

In this section, we'll explore some of the considerations that are relatively more important to GPUaaS scenarios.

Security, identity and tenant administration

Apolo is ISO 27001 and SOC2 Type-2 compliant.

Apolo supports standard role-based access control (RBAC) for user-level authorization and tenant isolation, including support for organizational boundaries and project-level scoping, along with quota management and credit-based governance.

For authentication, Apolo uses Keycloak, which enables integration with identity providers through SAML or OpenID. However, direct integration with enterprise systems such as Active Directory or Azure AD is not yet automated — organizations would need to manually configure these connections within Keycloak.

Notably, Apolo does not currently distinguish between user administrators and security administrators; this limits the granularity of its access control model for highly regulated environments and may limit its applicability in low-trust service provisioning scenarios.

The documentation does not mention certified data destruction methods, such as those aligning with NIST 800-88, nor does it clarify whether encryption at rest and in flight are configurable or assumed through the underlying Kubernetes and storage layer defaults.

Apolo uses a single shared Kubernetes control plane, which would require separate Apolo instances to support full physical or virtual isolation per tenant if dedicated Kubernetes clusters are required. As a result, while Apolo is well-suited for logically isolated tenants, highly regulated use cases that demand tenant-specific clusters or hard multi-tenancy may need additional customization.

Cluster types and virtualization

Infrastructure as a Service (IaaS) is frequently a separate layer in service architectures. Apolo follows that model and is designed to operate on existing Kubernetes clusters. There are typically multiple clusters involved, one for the Apolo Control Plane that manages users, roles/quotas, RBAC, job metadata and lifecycle, compute clusters and the web/CLI UI.

Apolo supports multiple Compute Clusters under that control plane: Kubernetes clusters dedicated to running ML workloads. Persistent storage, object storage, registry services, etc., are integrated, and Apolo manages their configuration and usage.

Apolo expects that the underlying Kubernetes clusters exist or are provisioned externally; the documentation does not clearly state that Apolo itself will provision cloud/hybrid infrastructure or replace Kubernetes' control-plane components at the infrastructure level.

Apolo does not provide its own control plane, nor does it manage infrastructure provisioning across clouds or hybrid environments. The install process assumes that users have administrative access to the target cluster and that they can configure persistent storage external to Apolo.

Importantly, Apolo does not support integrations with other workload schedulers like Slurm. There is no evidence that Apolo is optimized for high-performance computing (HPC) environments, nor does it offer any native support for data processing units (DPUs) or SmartNIC-controlled infrastructure (e.g., by leveraging DPUs to ensure efficient encrypted flows, workload and workload component placement that leverages DPUs, etc.). Its design centers on Kubernetes-native orchestration for AI workloads rather than heterogeneous HPC environments or disaggregated compute fabrics.

PaaS capabilities and ML stack integration

At the platform-as-a-service (PaaS) layer, Apolo includes a "Service Deployment" interface that allows users to launch containerized inference endpoints with configurable presets and optional autoscaling. It exposes common tools such as JupyterLab, MLflow, and vLLM for developers to interact with models and artifacts.

Apolo does not currently offer deeper integrations with OEM infrastructure layers such as NVIDIA NIM™, or with other orchestration tools like NVIDIA Run:ai. It does not have serverless ML execution features such as ephemeral function-style jobs. As discussed above, Apolo Flow provides a declarative YAML-based abstraction for defining ML pipelines, but it does not yet expose a visual DAG editor or an integrated model catalog or registry beyond what users might configure using MLflow manually.

Observability and governance

Apolo provides standard job status, logging and artifact tracking through its CLI and UI. Observability is also supported, including metrics (e.g., via Prometheus) and dashboards (e.g., via Grafana), as well as centralized logs (e.g., via Loki or Elasticsearch). Loki, however, is not exposed by itself, but the web console and CLI use it to deliver logs to the user. Similarly, the platform does not yet provide support for audit trails or administrative logs, which may be important in regulated industries.

Apolo supports credit-based quotas in the billing section of each "organization" and viewable in the Grafana dashboards. There is little documentation on how Apolo enforces segmentation at the infrastructure level, such as through firewalls, leaf gateways or strict namespace isolation at the kernel or container runtime layer; and it does not support GPU fractionalization directly. As stated previously, Apolo relies heavily on the underlying k8 instance for these elements.

Workload manager comparison matrix

Table comparing features across Apolo, NVIDIA Run:ai, ClearML, Ray by Anyscale and Slurm

Conclusion

Apolo serves as a modern MLOps system that transcends orchestration and scheduling — it is a vertically integrated platform for data centers, enterprises and startups to run AI workloads consistently and securely. From job execution and workflow automation to inference deployment and business integration, it offers a compelling alternative to traditional tools by providing end-to-end platform control with socialized governance and resource management.