Decoding Storage Solutions: Object vs. Block Storage for Digital Pathology

Decoding Storage Solutions: Object vs. Block Storage for Digital Pathology

Introduction

Digital pathology builds on a broad spectrum of technologies. An often underrated but crucial example is large-scale data storage. As pathology labs transition from glass slides to digital imaging, the volume of data they need to manage grows by multiple orders of magnitude.

A single whole slide image (WSI) typically ranges from 1-4 GB when using brightfield, while immunofluorescence (IF) images can easily reach 10-100 GB depending on magnification, specimen size, and number of markers.

Today, I want to try and unravel the two storage approaches of Object Storage and Block Storage – and why understanding the difference between these two approaches is critical for anyone involved in digital pathology infrastructure.

Block Storage: The Notebook

Picture block storage as a huge notebook with different sections. You can easily skim through the complete content by flipping pages. If you were really fast at reading you could also do a “full-text search” in a straightforward way. Different datasets will be marked by some separators. Maybe there are also multiple notebooks on your shelf, but the way you can use them largely stays the same.

This approach is characterized by:

  • Fast random access: Need a specific piece of data? You can jump directly to it
  • Low latency: Perfect for applications that require frequent reads and writes
  • Fine-grained modifications: You can edit small portions without rewriting entire files (think: edit one page in a sub-section)

Object Storage: The Library

Now, imagine Object Storage as a library. In order for you to read a data set (think: book) you first need to know which one you’d like and then you need to go through some sort of checkout process. Once checked out you have the complete data set in your hands and can interact with it in whichever way you like.

This approach is characterized by:

  • Massive scalability: Add billions of objects without architectural changes
  • Built-in redundancy: Data is typically replicated across multiple locations automatically (think: a library may keep multiple copies of the same book without a “user” noticing)
  • Immutability focus: Objects are typically written once and read many times

Library vs. Notebook

Intuitively, everyone will agree that using a library to manage your personal notes is not a great idea, but let’s explore why for a moment. If you need to checkout the notes of a given day before you can read them, finding something that happened roughly a month or two ago will be very painful. Rather you want to skim through the pages and use the information to orient yourself, because you don’t know the exact date when you wrote down the information you’re looking for.

So what’s happening here is random access. Jumping to some point in the past helps you to figure out if you need to go further in the past or not. With many image formats in digital pathology it can be the same. When the computer opens the file it will not immediately know at which location the data for a tile with a given coordinate is. This will be important later.

Now let’s look at the opposite. If we were to take the content of all books in a library and cram them into a huge notebook: one massive body of text. Again, everyone would intuitively argue that this is a bad idea – but why exactly?

If you are looking for information on pathology, you will now have to search the entire body of text for relevant snippets. Ideally there will be some bookmarks to indicate sections. However – and this is the crucial point – you will need the complete body of text available to you, because the logical separations (books) are no longer represented in the way the information is stored.

This describes why object storage scales significantly better. Further, because not every data set needs to be constantly “on display” – older books can be archived efficiently while remaining accessible when needed.

The Cost Perspective: Why It Matters for Pathology

Now that we’ve clarified what the two storage approaches are, let’s look at the biggest pain point of them all: cost. The cost difference between these storage solutions is dramatic, especially at scale.

Consider a mid-sized pathology laboratory that:

  • Digitizes 500 slides per day
  • Assuming an average slide of 1GB
  • Total: 500GB of new data daily, or ~180TB annually

We assume that the active data, that is, what is actually accessed, is only data from the last 30 days: 14.6 TB or 8%.

Here’s how the cost compares when looking at 1 year:

Block StorageObject Storage
14.6 TB active / hot data (up to 1 month old)$1,196$339
29.2 TB infrequent (1-3 months old)$2,392$374
136.2 TB archive (older than 3 months)$11,157$558
———————————————–——————————-
Total$14,745$1,271

This is based on the following current market pricing (2025):

  • Block storage (SSD): $0.08-0.20 per GB/month (AWS gp3 at $0.08, Azure Standard SSD, GCP Balanced)
  • Object storage (hot tier): $0.023 per GB/month (AWS S3 Intelligent Tiering Frequent Access)
  • Object storage (infrequent access): $0.0125 per GB/month (AWS S3 Intelligent Tiering Infrequent Access)
  • Object storage (archival): $0.004 per GB/month (AWS S3 Intelligent Tiering Archive Instant Access)

Note: Object storage implementations can be even cheaper when using non-instant access archival tiers.

These roughly 10x cost reductions are not theoretical. We have actually seen exactly these types of cost drops when migrating from block storage to object storage.

Beyond raw storage costs, there are further total-cost-of-ownership aspects where object storage excels:

  1. No storage provisioning: Pay only for what you use, no need to over-provision
  2. Built-in redundancy: Robust data replication and disaster recovery
  3. Automatic tiering: Many providers offer intelligent tiering (like AWS S3 Intelligent-Tiering) that automatically moves data between access tiers based on usage patterns
  4. Reduced infrastructure complexity: No need to manage RAID arrays, disk failures, or capacity planning
  5. Geographic replication: Disaster recovery built in

The Access Pattern Reality in Digital Pathology

The reason object storage fits so well to digital pathology lies in the access patterns.

Implementation studies at Memorial Sloan Kettering Cancer Center showed a 93% reduction in archival glass slide requests after digital pathology adoption. Off-site centers experienced even more dramatic 97% decreases in archive access.

The Digital Pathology Association documents three distinct access tiers:

  1. Clinical review period (0-2 months): Requires rapid, real-time access for active diagnosis
  2. Prior cases period (2-3 years): Tolerates 4-5 minute retrieval times for reference
  3. Regulatory archives (7-10+ years): Accessed rarely, if ever, but must be retained for compliance

Industry implementations reflect this pattern with aggressive tiering strategies:

  • Hot storage: 3 days to a few weeks for active cases
  • Warm storage: Up to 90 days for recent reference
  • Cold/archival storage: Post-90 days, where cloud egress fees become “typically minimal” due to rare access

This access pattern – write once, read intensively for weeks, then rarely thereafter – is perfectly matched to object storage’s strengths.

Performance

Different use-cases have different performance requirements and hence block storage isn’t obsolete in digital pathology – it has specific use cases:

  • Active scanning operations: Scanners writing data benefit from block storage’s low latency
  • High-performance computing: When running intensive image analysis pipelines

This list used to be longer though. Luckily, we have a few solutions available to us that have rendered them obsolete.

Client-side rendering

Today, we can read pathology images directly from the object storage and render them on the client in the browser.

How does object storage enable streaming access to multi-gigabyte whole slide images without downloading entire files? The answer lies in HTTP range requests, defined in RFC 7233.

Range requests allow clients to request specific byte ranges from files using standard HTTP headers:

1
Range: bytes=1000-2000

The server responds with just that portion of the file (HTTP 206 Partial Content), enabling:

  • Selective tile retrieval: Fetch only the visible tiles at current magnification
  • Parallel downloads: Multiple concurrent range requests for different tiles
  • Bandwidth optimization: Reduce data transfer by 10-100x compared to downloading entire files
  • Real-time viewing: Enable immediate pan/zoom without waiting for full downloads

All major cloud providers support this:

  • AWS S3: Recommends 8-16 MB byte ranges with multiple concurrent connections for optimal performance
  • Azure Blob Storage: Supports both standard Range headers and custom x-ms-range headers
  • Google Cloud Storage: Provides batch frame retrieval optimizing performance over serial requests

There’s a major catch, though. The viewer needs to know where in the file the requested data is.

OME-TIFF: Today’s Standard

The most common format in digital pathology today is OME-TIFF. It’s ubiquitous, well-supported, and works with object storage using HTTP range requests. However, there’s a performance caveat.

TIFF files require “seeking” – the viewer needs to jump around in the file to find the chunks it needs. Think of it like a book where the table of contents tells you “Chapter 3 starts on page 47, Chapter 4 on page 89.” Every time you want a specific tile at a specific magnification level, you need to seek to that location in the file.

When reading from object storage, this seeking translates to multiple HTTP requests to figure out where things are before you can actually fetch the data. For datasets with many channels, timepoints, or z-stacks, this can introduce noticeable latency.

Indexed OME-TIFF solves this problem. Similar to approaches used in genomics, an index file acts as a lookup table that tells the viewer exactly where every chunk lives. This eliminates the seeking overhead and makes random access much faster.

The good news? Even without indexing, OME-TIFF works from object storage. For many use cases, especially 2D brightfield images with standard pyramids, the performance is perfectly adequate. But if you’re working with highly multiplexed data or need optimal performance, indexing helps significantly.

OME-NGFF: Built for the Cloud

OME-NGFF (Next Generation File Format), also known as OME-Zarr, was designed from the ground up to eliminate the seeking problem entirely.

Published in Nature Methods in 2021, OME-NGFF achieved 10x faster access on cloud storage compared to traditional formats like HDF5 through its chunked architecture that eliminates monolithic file seeking overhead.

Key characteristics:

  • Cloud-native design: Built specifically for S3-compatible object storage from the ground up
  • Chunked architecture: Direct chunk access enables parallel reads and efficient streaming
  • Multi-resolution pyramids: Native support for pyramidal structures that digital pathology viewers require
  • Universal format: Converts from 150+ proprietary formats via bioformats2raw
  • Open standard: Maintained by the Open Microscopy Environment consortium with broad academic and industry support

OME-NGFF isn’t just theoretical – it’s being adopted in production:

  • European Bioinformatics Institute: Hosts public OME-Zarr datasets on S3-compatible storage
  • DANDI Archive: Stores 370+ TB of BRAIN Initiative data in OME-Zarr format on AWS S3 through the AWS Open Data Program
  • Glencoe Software: Maintains public archives on AWS S3 with documented optimization strategies for whole slide imaging
  • PathFlowAI: Demonstrated preprocessing 295 whole slide images (1.2 TB compressed to 641 GB) using Zarr’s parallel processing capabilities

A comprehensive 2023 review in Histochemistry and Cell Biology with 50+ international authors documents adoption across multiple imaging modalities including whole slide imaging, high content screening, electron microscopy, and light sheet imaging.

Visualization Tools: The ecosystem includes mature tools for working with OME-Zarr:

  • Viv (published in Nature Methods 2022): Web-based visualization streaming directly from object storage with no server infrastructure required
  • napari: Desktop viewer with native OME-Zarr support
  • webKnossos: Browser-based collaborative annotation
  • Neuroglancer: High-performance viewer for large datasets

The specification continues to evolve, currently at version 0.5.2, with active community development and growing industry support.

Native Support for Object Storage: A Game-Changer

For digital pathology software, native support for object storage is becoming crucial. That is, reading whole slide images directly from object storage without intermediate copies or storage adapters.

Without native support, organizations often resort to costly workarounds:

  • Copying files to block storage for viewing (doubling storage costs and complexity)
  • Building complex caching layers (adding infrastructure overhead)
  • Sacrificing scalability by keeping everything on expensive block storage
  • Creating data silos between archive and active storage
  • Missing out on cost savings from automated tiering

The cytario® Approach

At cytario, we’ve built native object storage support into our platform from day one. This means:

  • Direct streaming: Whole slide images are streamed directly from S3-compatible object storage
  • Seamless archiving: No distinction between “active” and “archived” cases from the user perspective
  • Cost optimization: Automatic tiering of rarely-accessed data to cheaper storage classes

Conclusion

The evidence is clear: object storage is not just viable for digital pathology – it’s becoming essential. The combination of massive cost savings (10x TCO reduction documented), unlimited scalability (to billions of objects), built-in redundancy, and emerging cloud-native formats like OME-NGFF (10x performance improvement) makes object storage the foundation for sustainable digital pathology infrastructure.

The choice between object and block storage isn’t binary – most digital pathology infrastructures will use both strategically:

  • Block storage: For active scanning, hot caching layers, and high-performance computing workloads
  • Object storage: For the vast majority of cases requiring long-term, cost-effective retention with on-demand access

As digital pathology continues to grow, with AI analysis requiring access to ever-larger training datasets and regulatory requirements demanding long-term retention (7-10+ years), object storage will become increasingly critical. Software solutions that offer native object storage support and embrace modern formats like OME-NGFF will be essential for organizations looking to manage costs while maintaining performance and accessibility.

The future is already here: Memorial Sloan Kettering, Samsung Medical Center, and numerous other institutions are demonstrating that petabyte-scale digital pathology workflows on object storage aren’t just possible – they’re more cost-effective and scalable than traditional approaches.


Have questions about implementing object storage for your digital pathology infrastructure? Get in touch with us – we’d love to discuss your specific needs and how cytario can help you reduce costs while scaling your operations.

Further Resources

File Formats & Standards

Academic Papers

Cloud Storage Documentation

Industry Resources

Learn More About cytario

Want to get a head start?

We are officially launching in December, but let us know if you'd like a sneak peek before!