Skip to main content
Skip to main content
Version: 2.0 (Current)

Runtime Integration

Runtime integration is where production environments diverge from lab benchmarks. Capture operational reality.

Integration points

  • Job launchers and resource managers
  • Container or bare-metal provisioning
  • Performance counters and telemetry

Risk mitigation

  • Validate endpoint discovery across multi-rail networks
  • Record failure modes under large-scale job starts
  • Track fallback paths when preferred transports are unavailable

Resource managers and bootstrapping

Common schedulers (Slurm, PBS, LSF) provide environment variables you can map to GASNet bootstrap parameters. Capture:

  • Node list and allocation size
  • Per-node core counts and NUMA layout
  • PMI/PMIx versions if the launcher relies on them

If your launcher uses PMI, ensure the GASNet bootstrap path is compatible with the PMI API version exposed by the system.

MPI and PGAS coexistence

When GASNet is embedded inside an MPI application:

  • Decide whether MPI or GASNet owns process spawning.
  • Avoid double-initializing network transports (UCX/OFI).
  • Coordinate progress engines to prevent oversubscription.

For OpenMP or hybrid runtimes, pin GASNet progress threads away from compute threads to avoid contention.

Observability

Integrate performance counters early:

  • Export latency histograms and queue depth metrics.
  • Capture NIC counters (drops, retransmits, CQ overruns).
  • Emit structured logs for transport negotiation failures.

Tie telemetry to the same dataset metadata used in benchmarks so results stay comparable.

Loading comments...