Performance has not always been a priority for object storage. Instead, object storage found its role in secondary applications, such as archiving and backup, and, although object storage is widely used by cloud service providers, including Amazon Web Services (AWS) and Microsoft Azure, early cloud storage applications were not performance driven.
Improvements in object storage technology are changing that, however, along with a better understanding among IT architects and suppliers about how to tap into object storage’s advantages.
“Object storage has historically been associated with large volumes of unstructured data,” says Paul Speciale, chief product officer at Scality, an object storage supplier.
“In truth, the use cases are broader than that, with widespread adoption for online content delivery use cases, big data and analytics with ‘warm’ data access characteristics, and high-throughput demands in backup and restore use cases.”
Some changes are driven by technology. These include replacement of low-cost spinning disk with higher capacity and lower cost solid state drives (SSD)s including with NVMe flash. There is also erasure encoding offloaded to hardware and software system improvements.
But demand also plays a part. Businesses increasingly want to process data close to where it is stored. Analytics, artificial intelligence (AI), and the internet of things (IoT) are all use cases for object storage. Better technology allows more of the data to stay in the object store, rather than copy it to a local file-based or block storage.
The object storage difference
Object storage replaces hierarchical file systems with a “flat” structure.
Each object contains the data, metadata and a unique ID for that object. Object storage allows users to aggregate their storage into nodes and pools, which is why it is so popular among cloud service providers.
Object storage is massively scalable. You can add more nodes with no need to scale up metadata databases. Nodes can be located anywhere to create physical redundancy. F
urther protection comes from erasure encoded redundant/reliable array of inexpensive/independent nodes (RAIN), or redundant array of independent nodes. So a properly configured object storage system can survive the loss of drives and even entire nodes.
The need for object storage performance
Object storage users now also want improved performance. That’s because object storage has become more commonplace and because it solves other practical problems, such as the need to scale to cope with very large data sets. As more applications can work with object storage, through S3 and Azure APIs, CIOs want to bring performance closer to what you can get from file-based storage.
“There are several applications and workloads now that can take advantage of object stores, including IoT, HPC [high-performance computing], big data analytics and AI,” says GigaOM analyst Enrico Signoretti. “Associating scalability and performance in the same system is a requirement for large multi-petabyte projects.”
Businesses want to use the data in their archives more actively, such as for analytics and machine learning. With object storage now a de facto standard for enterprise archiving, it is being used with more demanding applications.
A further use case is in the telecoms and media sector, where companies now store movies and even live TV content online. These files are in the multi-gigabyte range, but systems also need to support multiple users.
“During the Rio Olympic Games, there was the need to store 6,000 4k video streams in object storage,” adds Jonathan Morgan, CEO at media-focused supplier Object Matrix.
Object storage performance metrics
To boost performance, IT teams first need to measure it. Conventional measures of storage performance, such as input/output per second (IOPS) and latency, do not give a full picture for object storage. And “cold” object storage is often bought on a cost per GB rather than performance basis.
Latency for object storage will depend as much on the network as on storage hardware specs – and there are applications, such as AI, where latency is important. In others, such as analytics and video streaming, throughput and the number of simultaneous connections might matter more.
“An on-premise object storage system does have millisecond response times,” says Sherman Schorzman, technical marketing engineer at supplier Quantum.
“Where object storage performance really shines is as an analytics application that can open up a couple of thousand connections.” Single disk or array-level IOPS is less critical, he adds.
Actual throughput speed in MBps or even GBps is more important for some applications, and so is consistency. “For a video-streaming application, it is less important if access to the first part of the video takes 1ms or 20ms. What matters is that the delivery in terms of throughput is consistent,” says Scality’s Paul Speciale.
Throughput in real-world scenarios can be higher for object storage than even block access storage.
Boosting performance for object storage involves hardware, software, networking and architectural changes.
A “small” object store – in the low petabytes – running locally, can already support high-performance applications, says GigaOM’s Signoretti.
Elsewhere, a move to solid state storage and NVMe brings performance, but at higher cost. Solid state storage is already widely used for caching, while lower cost, higher density media is pushing it down into storage arrays too.
The network can then become a bottleneck, however. HDD arrays are generally within the capacity of most networking technology, but NVMe can saturate a 100Gigabit Ethernet connection.
Improved caching and optimising API calls will help, says Object Matrix’s Jonathan Morgan, as can adding disks, as this spreads out the I/O workload, and moving some processing into hardware.
Intel now has Reed-Solomon erasure encoding built into some CPUs, and some suppliers are also moving to more lightweight hashes for ingest.
But IT architects also need to look at overall system performance. Low latency or high transfer rates are of little use if they swamp the target application – so investing in high-performance object storage makes most sense when storage is the current bottleneck.
Performance-focused object storage suppliers
Mainstream enterprise storage suppliers are working to improve performance in their object storage lines, and specialist suppliers include:
OpenIO – This company markets itself as a high-performance, hardware-agnostic, object storage supplier. It operates on-premise, in the cloud or in edge applications. It claims capacities up to exabyte scale and 1Tbps throughput.
Minio – This is an on-premise and private cloud object storage system that is S3 compatible and tuned for Kubernetes. It claims read/write speeds of 183 GBps and 171 GBps, and publishes a range of reference hardware stacks based on kit from suppliers including Dell and Supermicro.
Scality – Scality RING is an entirely software-based object storage system. The supplier claims peak IOPS of 1.6m in one deployment at a US-based cloud service provider and peak throughput of 60GBps at a travel service company.
Quantum – The company’s ActiveScale storage system provides exabyte-scale storage based on its own hardware. Scale-out configurations support up to 75GBps throughput, and up to 74PB on each ActiveScale X100 unit.
SwiftStack – The supplier’s 1space technology is a software-driven storage system. It works at the edge, in the cloud and in the datacentre. It claims write speeds over 20GBps and reads at 50GBps, with the potential to scale to over 100GBps. SwiftStack was recently acquired by NVIDIA.