Chapter 1
Resilient HPC for 24Ă7Ă365 Weather Forecast Operations at the Australian Government Bureau of Meteorology
Dr Lesley Seebeck
Former Group Executive of Data & Digital, CITO, Australian Bureau of Meteorology
Tim F Pugh
Director, Supercomputer Programme, Australian Bureau of Meteorology
Damian Aigus
Support Services, Data & Digital, Australian Bureau of Meteorology
Dr Joerg Henrichs
Computational Science Manager, Data & Digital, Australian Bureau of Meteorology
Andrew Khaw
Scientific Computing Service Manager, Data & Digital, Australian Bureau of Meteorology
Tennessee Leeuwenburg
Model Build Team Manager, Data & Digital, Australian Bureau of Meteorology
James Mandilas
Operations and Change Manager, Data & Digital, Australian Bureau of Meteorology
Richard Oxbrow
HPD Systems Manager, Data & Digital, Australian Bureau of Meteorology
Naren Rajasingam
HPD Analyst, Data & Digital, Australian Bureau of Meteorology
Wojtek Uliasz
Enterprise Architect, Data & Digital, Australian Bureau of Meteorology
John Vincent
Delivery Manager, Data & Digital, Australian Bureau of Meteorology
Craig West
HPC Systems Manager, Data & Digital, Australian Bureau of Meteorology
Dr Rob Bell
IMT Scientific Computing Services, National Partnerships, CSIRO
1.1 Foreword
1.2 Overview
1.2.1 Program Background
1.2.2 Sponsor Background
1.2.3 Timeline
1.3 Applications and Workloads
1.3.1 Highlights of Main Applications
1.3.2 2017 Case Study: From Nodes to News, TC Debbie
1.3.3 Benchmark Usage
1.3.4 SSP - Monitoring System Performance
1.4 System Overview
1.4.1 System Design Decisions
1.5 Hardware Architecture
1.5.1 Australis Processors
1.5.2 Australis Node Design
1.5.2.1 Australis Service Node
1.5.2.2 Australis Compute Node
1.5.3 External Nodes
1.5.4 Australis Memory
1.5.5 Australis Interconnect
1.5.6 Australis Storage and Filesystem
1.6 System Software
1.6.1 Operating System
1.6.2 Operating System Upgrade Procedure
1.6.3 Schedulers
1.6.3.1 SMS
1.6.3.2 Cylc
1.6.3.3 PBS Professional
1.7 Programming System
1.7.1 Programming Models
1.7.2 Compiler Selection
1.7.3 Optimisations
1.8 Archiving
1.8.1 Oracle Hierarchical Storage Manager (SAM-QFS)
1.8.2 MARS/TSM
1.9 Data Center/Facility
1.10 System Statistics
1.10.1 Systems Usage Patterns
1.11 Reliability
1.11.1 Failover Scenarios
1.11.2 Compute Failover
1.11.3 Data Mover Failover
1.11.4 Storage Failover
1.11.4.1 Normal Mode
1.11.4.2 Failover Mode
1.11.4.3 Recovery Mode
1.11.4.4 Isolated Mode
1.11.5 SSH File Transfer Failover
1.12 Implementing a Product Generation Platform
Bibliography
Supercomputing lies at the heart of modern weather forecasting. It coevolves with the science, technology, means of the collection of observations, the needs of meteorologists, and the expectations of the users of our forecasts and warnings. It nestles in a web of other platforms and networks, applications and capabilities. It is driven by, consumes, and generates vast and increasing amounts of data. And it is part of the global effort by the worldâs meteorological agencies to collect data, understand the weather, and look ahead to generate forecasts and warnings on which human activity is based. Given the complexity of the overall task and the web of supporting capability, to talk about the supercomputing component alone seems reductionist. And yet it is a feat of human engineering and effort that we do well to recognise. These are capabilities that drive the data and information business that is the Bureau â the growing benefits available through more data, increasing granularity and frequency of forecasts, and better information to the Bureauâs customers â no more and no less than to the scientists or the meteorologists.
The Bureauâs current supercomputer, Australis, was delivered on time and within budget, with the supercomputer itself, a Cray XC40, bought at a capital cost of $A80 million[8]. The programme extends from 2014-15 through 2020-21. Within that period, the Bureau continues to keep pace with the relentless demands of the data, the models and user needs, and explore new, improving ways to extract value from both data and capability. It also has to contend with an increasingly challenging operating environment; their effective use placing growing demands on organisations in terms of skills, operating costs, and security.
On a personal note, arriving at the start of the programme to replace the existing supercomputer, I was fortunate to have a highly capable team led by Tim Pugh. To continue to be an effective contributor to the field, both the Bureau â and Australia â need to nurture and grow the technical skills, deep computational understanding, insights that build and shape the field of high performance computing and to exploit that capability. This chapter sets out the Australian Bureau of Meteorologyâs supercomputing capability, and in doing so helps contribute to that effort.
Dr Lesley Seebeck
Former Group Executive of Data & Digital, CITO,
Australian Bureau of Meteorology
The Australian Governmentâs Bureau of Meteorology has had the responsibility of providing trusted, reliable, and responsive meteorological services for Australia - all day, every day â since 1908. Bringing together the ever-expanding world-wide observation networks, and improving computational analysis and numerical modelling to deliver the Bureauâs exceptional predictive and analytical capability, we are able to undertake the grand challenge of weather and climate prediction.
Australia is a country with a landmass marginally less than the continental United States, but with a population 13 times smaller. Australia is not only vast, it is also harsh. With just 9% of the landmass suitable for farming, and the main population living along the cooler coastal regions, the climate of the continent plays a significant role in defining the life of the country.
Around the country there are climate pockets similar to those found on every other continent; Sydney shares a climate similar to South Africa, Canberra is most like Rome, Melbourne like the San Franciso Bay area, Perth like Los Angeles, Darwin like Mumbai, Hobart like Southern Chile and the UK. Across the centre are deserts, which, though sparsely populated, still contain major population centres like Alice Springs and the mining town of Kalgoorlie.
Against this backdrop the Bureau and its forecasting team strive to provide timely weather products to cover the entire continent and its climate variations, as well as managing its weather responsibilities for Australiaâs Antarctic Territory (a 5.9 million square kilometre area, 42% of the Antarctic continent), on a 24Ă7Ă365 basis. As if this wasnât a significant enough daily endeavour, the Bureau also manages a suite of on-demand emergency forecasts to cover the extreme weather events of the region; tsunami, cyclone, and bushfire (wildfire). They regularly run in the extreme weather season (December - April) and are also ready to go as and when they are required. Australia as an island continent also provides a full oceanographic suite of forecasting.
The Bureau of Meteorology has the unique numerical prediction capabilities required to routinely forecast the weather and climate conditions across the Australian continent, its territories, and the surrounding marine environment. When this capability is utilised with moder...