The NVMe 2.0 specification, published in June 2021, introduced changes to improve performance and to support new uses of NVMe.
While NVMe originally supported only block storage, which was the way that storage had always worked in the past, NVMe 2.0 provides two Command Set upgrades to add Zoned Namespaces (ZNS) and Key Value support. The specification also now supports three Transport specification upgrades for PCIe, Remote Direct Memory Access and TCP.
One important goal of the new release was to change the NVMe specification’s structure to simplify development and support faster innovation. NVM Express calls this “refactoring.” The specification has been restructured to isolate four different areas, two of which, Commands and Transport, are where new features have been added. The revised specification is intended to meet the demands of the future of storage, all while it maintains backward compatibility with prior NVMe specifications.
The ZNS Command Set specification enables an SSD — the most common storage device for NVMe use today — and host to collaborate on data placement within the SSD through a zoned storage device interface. The ZNS Command Set is a superset of the NVM Command Set. ZNS permits the host to align data to the characteristics of the SSD media, which eliminates the “indirection tax” within an SSD.
Removing a level of address translation can improve the speed of the SSD. The closer alignment between the host and the SSD media reduces the amount of overprovisioning the SSD needs, which makes more capacity available to the host at a lower cost. Fortunately, there is already a mature open source ecosystem, including native support in the Linux kernel and the Storage Performance Development Kit. Furthermore, multiple file systems, such as F2FS and Btrfs, and applications, such as MySQL, RocksDB and TerarkDB, have added support as well.
The “indirection tax” within an SSD refers to reducing the overhead of the internal shell game that SSDs usually perform to expose a general storage interface. Having the host place data into well-defined zones, the SSD no longer must perform excessive garbage collection caused by data placed inefficiently onto the media. Fortunately, the newly added host complexity is at a minimum because the SSD continues to manage media reliability.
One distinct advantage of ZNS is that admins can maintain high throughput without using any overprovisioning. Overprovisioning in SSDs enables them to second-guess how write traffic from the host will come into the device. Overprovisioning also reduces write amplification, thus increasing an SSD’s endurance, which opens the door to wider use of QLC to reduce costs. Because less management occurs within the SSD, latency and throughput improve.
The NVMe Key Value (KV) Command Set changes the way a piece of data is addressed on the SSD from fixed-length logical block addresses to variable-sized pieces of data addressed through a key. This eliminates the overhead required by existing systems in which the host must maintain a translation table that defines an object as a sequence of logical block addresses.
The key is the address of the data, while the value is the data set itself, which can be any number of bytes up to 4 GB. Block storage stores the data in a set of fixed-size logical blocks in the media.
The key-value approach reduces the number of remappings between the application program and the storage device. Like ZNS, this is also a way for the host to communicate its data structures more clearly to the SSD, which allows the SSD to do a better job of managing data placement, write amplification and garbage collection.
As with ZNS, software must be modified to take advantage of a key-value storage device.
Choosing ZNS or KVs
Admins can configure a single SSD to have multiple namespaces, each accessed by its own command set. Admins might manage some namespaces through a key-value system and others as ZNS.
These two command sets cannot be overlapped — a namespace must be accessed by either one or the other. This is because ZNS is block-based, while KV is not.
Endurance Group Management
NVMe Endurance Group Management enables the host to manage an SSD or other storage as Endurance Groups and NVM Sets. This gives the host a better understanding of the SSD’s access granularity and gives the host more control of the SSD to improve their combined performance. Endurance Groups were first introduced in NVMe 1.4.
These Endurance Groups are built up from one or more NVM Sets. The endurance of an Endurance Group is managed as a collection. Admins can add a Media Unit, typically an SSD, to either an Endurance Group or to an NVM Set. In prior versions of NVMe, Endurance Groups and NVM Sets were preconfigured in the system before it shipped. NVMe 2.0 enables Endurance Groups and NVM Sets to be managed; admins can dynamically allocate all three levels to the level above.
Domains and partitions
The NVMe 2.0 specification added support for warehouse-scale storage systems but still stays within the NVMe Command Set. It does this by adding domain support. A domain consists of storage capacity, controllers and ports. The new domain support has been configured to withstand the failure of a domain, as well as the addition, removal, reconfiguration or partitioning of a domain.
Rotational media support is a separate update to the NVMe 2.0 specification, which allows admins to use NVMe as an HDD or CD/DVD interface. The rotational media support tools include updates to features, management capabilities and other enhancements that are necessary to enable management of an HDD through the NVMe protocol. Seagate has recently been demonstrating an HDD that sports an NVMe interface.
Rotational media support is a nod to the fact that certain storage systems are now designed around NVMe, and don’t support standard HDD interfaces — SATA, SAS and Fibre Channel. In such systems there may arise a need to use rotational storage, so NVM Express worked to support that need by adding rotational media features to NVMe 2.0.
Aiming to the future
The NVMe standard has been developed with an eye to the future. The first specification was created to be simple, with the main goal of creating something that could work. Features were later added as the specification moved through successive upgrades.
The NVMe 2.0 specification added structure to the way it is put together and added features that were devised after earlier versions of the specification were developed. Still, developers have an eye toward the future and have been adding to the specification in a way that is expected to give NVMe a long life ahead.