Data centers need extremely fast storage access, and no DPU is faster than Nvidia’s BlueField-2.
Recent testing by Nvidia shows that two BlueField-2 data processing units reached 41.5 million IO/s, more than 4x more IO/s than any other DPU.
The BlueField-2 DPU delivered record-breaking performance using standard networking protocols and open-source software. It reached more than 5 million 4KB IO/s and from 7 million to over 20 million 512B IO/s for NVMe-oF, a method of accessing storage media, with TCP networking, one of the primary Internet protocols.
To accelerate AI, big data and HPC applications, it provides even higher storage performance using the RoCE network transport option.
In testing, it supercharged performance as both an initiator and target, using different types of storage software libraries and different workloads to simulate real-world storage configurations. It also supports fast storage connectivity over IB, the preferred networking architecture for many HPC and AI applications.
Testing methodology
The 41.5 million IO/s reached by BlueField is more than 4x the previous world record of 10 million IO/s, set using proprietary storage offerings. This performance was achieved by connecting 2 fast HP Proliant DL380 Gen 10 Plus servers, one as the application server (storage initiator) and one as the storage system (storage target).
Each server had two 2 ‘Ice Lake’ Xeon Platinum 8380 CPUs clocked at 2.3GHz, giving 160 hyperthreaded cores per server, along with 512GB of DRAM, 120MB of L3 cache (60MB per socket) and a PCIe Gen4 bus.
To accelerate networking and NVMe-oF, each server was configured with 2 BlueField-2 P-series DPU cards, each with two 100GbE network ports, resulting in 4 network ports and 400Gb/s wire bandwidth between initiator and target, connected back-to-back using Nvidia LinkX 100GbE Direct-Attach Copper (DAC) passive cables. Both servers had Red Hat Enterprise Linux (RHEL) version 8.3.
For the storage system software, both SPDK and the standard upstream Linux kernel target were tested using both the default kernel 4.18 and one of the newest kernels, 5.15. Three different storage initiators were benchmarked: SPDK, the standard kernel storage initiator, and the FIO plugin for SPDK. Workload generation and measurements were run with FIO and SPDK. I/O sizes were tested using 4KB and 512B, which are common medium and small storage I/O sizes, respectively.
The NVMe-oF storage protocol was tested with both TCP and RoCE at the network transport layer. Each configuration was tested with 100%r ead, 100% write and 50/50 read/write workloads with full bidirectional network utilization.
Testing also revealed following performance characteristics of BlueField DPU:
- Testing with smaller 512B I/O sizes resulted in higher IO/S but lower-than-line-rate throughput, while 4KB I/O sizes resulted in higher throughput but lower IO/s numbers.
- 100% read and 100% write workloads provided similar IOPS and throughput, while 50/50 mixed read/write workloads produced higher performance by using both directions of the network connection simultaneously.
- Using SPDK resulted in higher performance than kernel-space software, but at the cost of higher server CPU utilization, which is expected behavior, since SPDK runs in user space with constant polling.
- The newer Linux 5.15 kernel performed better than the 4.18 kernel due to storage improvements added regularly by the Linux community.
Record-setting DPU storage performance enables storage performance with security
In today’s storage landscape, the vast majority of cloud and enterprise deployments require fast, distributed and networked flash storage, accessed over Ethernet or IB. Faster servers, GPUs, networks and storage media all tax server CPUs to keep up, and the best way to do so is to deploy storage-capable DPUs.
The storage performance demonstrated by the BlueField-2 DPU enables higher performance and better efficiency across the data center for both application servers and storage appliances.
On top of fast storage access, BlueField also supports hardware-accelerated encryption and decryption of both Ethernet storage traffic and the storage media itself, helping protect against data theft or exfiltration.
It offloads IPsec at up to 100Gb/s (data on the wire) and 256-bit AES-XTS at up to 200Gb/s (data at rest), reducing the risk of data theft if an adversary has tapped the storage network or if the physical storage drives are stolen or sold or disposed of improperly.
Customers and security software vendors are using BlueField’s recently updated Nvidia DOCA framework to run cybersecurity applications – such as a distributed firewall or security groups with micro-segmentation – on the DPU to further improve application and network security for compute servers, which reduces the risk of inappropriate access or data modifications on the storage attached to those servers.
Detailed results from BlueField-2 DPU tests:
BlueField-2 DPU tests using NMe-oF on TCP.
Each test result shows combined performance of 2 BlueField-2 DPUs.
Click to enlarge
BlueField-2 DPU tests using NVMe-oF RoCE.
Each test result shows combined performance of 2 BlueField-2 DPUs.
Click to enlarge