BSC About to Dispatch MareNostrum 5, for Critical Research

BSC About to Dispatch MareNostrum 5, for Critical Research

[ad_1]

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

In “Origin,” Dan Brown’s 2017 novel featuring professor Robert Langdon, the Barcelona Supercomputing Center (BSC), and its powerful MareNostrum supercomputer are part of the plot. In the book, Langdon, visiting the BSC, discovers a device called E-Wave, a MareNostrum supercomputer featuring a “Quantum cube.” The device is supposed to simulate the Miller-Urey experiment, using E-Wave’s ability to speed forward time digitally.

Simulations, fiction and drama aside, the BSC and the MareNostrum supercomputer are real, but the architecture and applications differ notably from Dan Brown’s imagination.

Since its establishment in 2005, BSC has actively promoted HPC (high-performance computing) in Europe—as an essential tool for international competitiveness in science and engineering. The center is a founding and hosting member of the former European HPC infrastructure PRACE (Partnership for Advanced Computing in Europe). It is now the hosting entity for EuroHPC JU, the Joint Undertaking that leads large-scale investments and HPC provision in Europe. 

Barcelona Supercomputing Center Operations Director Sergi Girona

EE Times recently visited the Barcelona Supercomputing Center, where Director Mateo Valero and Operations Director Sergi Girona gave us a comprehensive view of the new MareNostrum 5, the main areas of research and a glimpse into the European supercomputing strategy and future.

BSC is finishing assembling MareNostrum 5

The new MareNostrum 5 (MN5), set to soon replace the MareNostrum 4, comprises two distinct machines and massive high-performance storage. Girona gave us an overview of the new MN5 capabilities and the challenges to prepare it for operation.

Girona, who has spent nearly two decades at the BSC, has overseen the assembly and operation of the previous four supercomputers, and he is now finishing assembling MareNostrum 5.

There are two central processing units: a general-purpose machine and an accelerated machine. 

“The general-purpose machine will give sustained performance in Linpack to exceed 35 PetaFLOPS. That is the performance that we put in the specifications,” he said. “As we are working with a joint undertaking to install these machines at a European level, it aims to get a total peak performance of the aggregate machine. This is achieved with an accelerated machine with graphic accelerators. It is a machine with a peak performance of 260 PetaFLOPS, which must be guaranteed to be sustained at HPLinpack 163 PetaFLOPS.”

The general-purpose machine comprises 35 racks of the new Atos BullSequana, with 1120 memory nodes. Each of them has two 40-core Sapphire Rapids and 4 Nvidia Hoppers. The accelerated machine is based on the Nvidia Hopper and Nvidia Grace CPUs.

Fastest connectivity and high-performance storage

To achieve the maximum level of performance, the MN5 will be interconnected by an NTR200 network with different configurations.

“The network configuration for the accelerated machine is more powerful than the general-purpose machine’s,” Girona said. “In the general-purpose machine, each node is connected at 100 Gbps per second, and in the accelerated machine, each node is connected at 800 Gbps.” 

“All this comes together with high-performance storage, which is a critical part for us,” he added. “We can have a lot of computation and memory, but the system is useless if we don’t have short and long-term storage capacity. We also have installed a system of 248 petabytes of spinning disk with a capacity of 2.8 petabytes in flash to act as catch memory.”

The system will give the MN5 of 1.2 and 1.6 terabits per second of writing and reading speed, respectively.

When asked about spinning disks, Girona mentioned that cost and power consumption were the main factors. 

“The units occupy 25 racks; these 25 racks each consume about 22-23 kilowatts,” he said. “This significantly impacts the TCO, the floor plan, power distribution, and the maintenance of the UPSs to stabilize their operation.”

Behind that, the MN5 will also have 400 petabytes of tape for long-term storage. 

Focusing on sustainable performance 

The website Top500.org publishes a ranking of supercomputers by both performance and efficiency. Their June 2023 report listed LUMI as No. 3 in speed and No. 7 in efficiency worldwide. LUMI, ranked as the fastest European supercomputer, was built near the Arctic Circle in Kajaani, Finland, to take advantage of the cool sub-Arctic air. This way, it can achieve a Power Usage Effectiveness (PUE) of 1.03. That means 97% of the electricity that comes in becomes useful computing. PUE is not a one-time measurement. Tracking PUE over time reveals its performance against its initial baseline calculation.

One of the key design aspects for MN5 was achieving the highest performance with the most negligible environmental impact. For that, using the best cooling systems available, the MN5 has a target PUE of 1.08. By comparison, the MareNostrum 4, in production since July 1017, has a PUE of 1.4 to 1.5.

“To achieve [a low PUE], you have to do liquid cooling in the systems,” said Girona. “We have a refrigeration capacity with redundancy of 13 or 14 megawatts. We have circuits that provide cold water at different levels. The accelerated machine has in itself a superior circuit that uses glycol. We give it water at 32 degrees, and with that water, it cools the system. On the other hand, the general-purpose system directly uses our infrastructure water, water at 32 degrees, and with that water, it already cools the systems.” After that, the hot water from the system is reused for heating in the BSC main building. 

Ready to go soon

Because of supply-chain issues, mainly derived from the semiconductor shortage during the COVID-19 pandemic, the MareNostrum 5 is not yet operational. According to Girona, they already have installed the storage, all the main networks, and the general-purpose cores. 

The remaining are the accelerated machine systems and the logical part, the software, which they expect to be ready this month (September). Then, users will be able to migrate immediately.

“The previous system, MareNostrum 4, went into production on July 1, 2017, and that day, the schedules arrived, and they filled the machine,” Girona said. “When launching the machines, we already guarantee that we have the software stack of the users prepared. That is the goal. When accepted, the machine is already at full capacity.”

The MareNostrum 4 (MN4), set to be decommissioned with the release of the new MareNostrum 5.

After the MN5 is operational, the previous supercomputer, the MN4, will be disconnected and divided into different systems to be distributed around the country. The BSC is the coordinator of the Spanish supercomputing network, and they will distribute half of the 48 racks to different institutions. The BSC will keep the other half for their own cloud and internal services.

Focusing on research for public benefit

The main objective of the BSC, since its foundation, is research and innovation. Access to the machines is free. Almost all the funding comes from governments and public institutions. Only a tiny part of the money comes from enterprises, but they can’t use the machines to develop new products.

An independent committee evaluates the proposals for accessing the center’s resources. Obviously, there are more proposals than resources available. There is an oversubscription of 3 to 1.  

Mateo Valero, the director of the BSC, joined us to talk more about the current research being done with the MN4 and what to expect from the MN4.

An engineer with more than four decades of experience, Valero is one of the founders of the BSC, and he has been leading the institution ever since. 

The MareNostrum 5

“The BSC started with 60 workers in 2005, and last July, we passed the barrier of 900 employees,” Valero said. “A third of these workers come from 53 different countries, which shows the international projection of the BSC, which has become the largest supercomputing center in Europe, providing not only technical infrastructure for the scientific community but also conducting world-class research.”

According to Valero, 80% of the researchers working at the BSC receive funds from European projects. In 2022, the BSC participated in 12 of the 15 European centers of excellence for HPC applications, leading four of them.

“The MareNostrum 5 computing capacity and enormous memory will allow us to make progress in the massive analysis of data to tackle major challenges facing society, such as the fight against climate change (Destination Earth), the search for new sources of energy, new materials and new drugs against cancer (digital twin of the human body), or the recent explosion of artificial intelligence,” he said.

Working on RISC-V-architecture based chip design

According to Valero, the BSC started the RISC-V revolution in Europe with the premise that RISC-V can enable two significant shifts: high-fidelity research incorporating chips designed in Europe and digital autonomy for Europe. 

On the research side, BSC has led the way with the Vector processor designs in the European Processor Initiative (EPI), MEEP, EUPILOT, and eProcessor. BSC started creating its chips with the Lagarto family of CPUs, with the first tapeout in May of 2019, in 65 nm. “Today, we are on the fourth generation of the Lagarto core targeting 7 nm and beyond in the near future,” Valero said.

The Chapel to host 2 Quantum computers

After the MN4 is decommissioned and half of the racks leave the facilities, in 2024, two Quantum computers will be installed in the facility, one based on superconducting currents thanks to the QuantumSpain project and another quantum computer whose architecture is still to be defined, thanks to EuroHPC.

The quantum machines will also connect to MN5, allowing scientists to explore the new field of Quantum Computing with access to one of the most powerful supercomputers worldwide.

MareNostrum 6 aims to use European tech

Valero thinks Europe needs to develop its open-source processor architecture, he said, adding that today, the smallest missile that could cause the most significant damage to humankind would be one aimed at TSMC’s factories in Taiwan. It could delay innovation for longer than five years.

Valero and Girona are already looking at the next BSC supercomputer, the MareNostrum 6. They want it to be based primarily on European technology, especially on RISC-V architecture processors.

That’s why the BSC has created an entity called OpenChip, which aims to develop RISC-V cores and software and share it with the market. The final objective is to make MareNostrum 6 and reduce their dependence on the other architectures.

“The objective of the project was to show that the BSC, which is firmly committed to the development of a processor ‘made in Europe’ can create these technologies, and we have succeeded indeed,” Valero said.

[ad_2]

Source link

Share this post
Facebook
Twitter
LinkedIn
WhatsApp