Gratis verzending vanaf 59,99 euro.

Status van bestelling controleren

Word lid van een gemeenschap van boekenliefhebbers van over de hele wereld en krijg een heleboel voordelen. Gratis account aanmaken

Gratis bezorging met Zásilkovna boven 59.99 €

DPD koerier 5.49 € DHL koeriersdienst 5.49 € GLS koerier 4.99 € DPD-punt 3.99 €

Contact

Hoe winkelen bij ons werkt

Help

Mijn account

▸ Leeg :-(

Gratis verzending vanaf 59,99 euro.

HPC Observability

Name: HPC Observability
Brand: Independently published
SKU: 52747456
Price: 21.46 EUR
Availability: InStock
Author: M. Edwards
ISBN: 9798198765443

Production Monitoring, Profiling, and Site Reliability for Linux Clusters, GPUs, and Parallel Storage at Scale

M. Edwards

Taal

Engels

Boek Gebonden (paperback)

Libristo-code: 52747456

Uitgeverij Independently published, mei 2026

HPC Observability is a hands-on guide for the engineers and administrators who keep high-performance... Volledige beschrijving

Libristo-code: 52747456

52 b

Nieuw

21.46 €

Naar verwachting op voorraad Op voorraad op 02. 06. 2026

Retourneren binnen 30 dagen

HPC Observability is a hands-on guide for the engineers and administrators who keep high-performance computing systems running reliably at scale. It brings together the operational knowledge scattered across vendor documentation, conference papers, and forum threads into a practical framework for turning HPC telemetry into actionable insight.

Modern HPC environments - Slurm clusters, GPU-dense AI systems, Lustre and GPFS storage, InfiniBand and Slingshot fabrics - generate more data than any team can manually interpret. The result is wasted node-hours, failed simulations, hidden storage bottlenecks, fabric congestion, and GPU failures that surface only after days of runtime.

This book provides a complete operational approach to HPC observability through a five-layer model covering hardware, operating systems, schedulers, applications, storage, and networks. Readers learn how to build metrics pipelines for clusters from hundreds to tens of thousands of nodes; monitor GPUs with DCGM; profile MPI and OpenMP applications with PAPI and Score-P; diagnose storage and network slowdowns; create useful dashboards and alerts; and run effective incident response and post-mortems.

Drawing on peer-reviewed research and real production experience, the book includes original diagrams, practical workflows, reference material, Prometheus alert examples, and a step-by-step lab environment for learning on a laptop.

Written in the voice of a senior HPC engineer rather than an academic text, HPC Observability assumes readers already understand the fundamentals and focuses instead on the operational realities of running large-scale Linux, AI, and research-computing infrastructure.

Actrice & Polyglot

EWA KASP voor

Video afspelen

Libristo heeft de grootste selectie boeken in vreemde talen. Daarom koop ik mijn boeken hier.

Informatie over het boek

Volledige naam HPC Observability

Auteur M. Edwards

Taal

Engels

Bindwijze Boek - Gebonden (paperback)

Datum van uitgifte 2026

Aantal pagina's 164

EAN 9798198765443

Libristo-code 52747456

Uitgeverij Independently published

Gewicht 397

Afmetingen 216 x 280 x 9

Categorieën

Informatica en informatietechnologie > Informatica > Computerarchitectuur en logica-ontwerp > Parallelverwerking

Informatica en informatietechnologie > Informatica > Systeemanalyse en systeemontwerp

Geef dit boek vandaag nog cadeau

Dat gaat heel eenvoudig

1 Voeg het boek toe aan je winkelwagentje en selecteer Als cadeau bezorgen 2 Je krijgt van ons per omgaand een voucher 3 Het boek wordt bezorgd op het adres van de ontvanger

Vaak gezocht

Categories

Authors

Publishers

Vaak gezocht

Artikelen

Categories

Authors

Publishers

Bezorging

Winkelgids

HPC Observability

Production Monitoring, Profiling, and Site Reliability for Linux Clusters, GPUs, and Parallel Storage at Scale

Informatie over het boek

Categorieën

Geef dit boek vandaag nog cadeau

Dat gaat heel eenvoudig

Vaak gezocht

Categories

Authors

Publishers

HPC Observability

Production Monitoring, Profiling, and Site Reliability for Linux Clusters, GPUs, and Parallel Storage at Scale

Informatie over het boek

Categorieën

Geef dit boek vandaag nog cadeau

Dat gaat heel eenvoudig

Heb je geen account? Profiteer van de voordelen van een Libristo-account!