DDN seeks AI leadership as it bags $300m investment | JP

0


DDN, which made its name in high-performance computing (HPC), has gained investment of $300m from US fund Blackstone, which it says will be used to translate leadership in supercomputing to storage solutions for artificial intelligence (AI).

While arrays aimed at the two workloads are similar in being able to keep up with extremely high-performance processing, there are differences. HPC workloads involve reading a relatively small number of mathematical formulations to produce enormous amounts of simulation data.

In AI, it’s the opposite. A massive amount of data is read to produce a relatively small model during training or to generate a response to an application or human prompt during inference.

DDN EXAscaler adapts to AI

DDN sells its EXAscaler arrays into the HPC market. They use the Lustre parallel file system, which is open source and was first launched around two decades ago. An EXAscaler array comprises a number of disk drives in which one acts as an index to the contents of the others. Compute nodes interrogate that node to find out which of the others to read and write blocks of data to, then communicate directly with that node.

To function, the compute nodes must run a Lustre client and have a direct network connection with all storage nodes. That usually means an Infiniband connection, with no packet loss and the ability for the controller to directly copy data in random access memory (RAM) or in non-volatile memory express (NVMe) storage on the host machine.

DDN has put this functionality in its AI400X2 arrays, which are aimed at AI workloads. They use the same 2U nodes as in EXAscaler, but use Nvidia Ethernet SpectrumX controller cards. These use a BlueField DPU from Nvidia and bring the same benefits to Ethernet networks as found in Infiniband. Their use of RDMA over Converged Ethernet (RoCE) also means no packet loss with writes of data in Nvidia graphics processing unit (GPU) memory directly (using GPUdirect).

DDN storage for training data

The AI400X2 is primarily intended to communicate as quickly as possible with the GPUs during training workloads. But they’re potentially a very expensive option for storing enormous quantities of data that an enterprise might want to store from models that have already been trained.

For this, DDN has had its Infinia arrays since 2023. These provide S3 object storage with the ability to add drives non-disruptively.

DNN has offloaded S3 storage functions to containers, such as the metadata server, the storage server, and so on. This means DDN can reproduce in Infinia functionality similar to Lustre when specific S3 containers are deployed on the compute nodes. Infinia arrays can also be equipped with SpectrumX cards to maximise transfer speeds.

DNN claims to know better than anyone how intensive storage works. When GPUs write data in parallel and then read data rapidly thereafter, problems of incoherence can arise. Checkpointing regulates this, but it’s a resource-hungry operation during processing and doesn’t produce useful data. DDN says it can avoid such delays by carefully managing data flows and use of caching.

Big announcement coming, says DDN

DDN already has skin in the AI game, and among its customers is Elon Musk’s xAI, which has deployed a supercomputer called Colossus with 100,000 H100 GPUs. So, the purpose of the new $300m is not altogether clear.

Blackstone is likely positioning itself in a number of AI-focused enterprises, and now has a member on the DDN board. Last year, the fund offered financial support to CoreWeave, a supplier of AI-focused infrastructure as-a-service.

DDN promises a significant announcement on 20 February, which it has prefaced with the slogan: “We’re making AI real.”



Source
Las Vegas News Magazine

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More