Towards I/O monitoring at scale

Designing a self-tuning I/O environment in HPC Download in PDF I/O Challenges in HPC In High-Performance Computing (HPC) data movements are one of the biggest challenges. Indeed, large computation is necessarily leading to large datasets. Current HPC workflows favor a feed-forward way of launching programs, loading their dataset, and then storing the result in persistent … Read more

Closing the loop: from Observation to Action

Performance monitoring and observation is a requirement in the complex IT systems we are building nowaday. Exascale systems are digital factories operating with millions of cores and discrete components. As any factory, these systems are instrumented and monitored. Performance observation is facing three main challenges: Operating at scale ADMIRE monitoring infrastructure is using Prometheus as … Read more