city with high rise buildings under white sky during daytime

Evaluating Storage Read Cache for Different Database Designs

This blog post is focused on the different datasets I have designed to evaluate Pure Storage® DirectMemory Cache with various databases like SAP HANA, MySQL and MariaDB. More information on the results can be found in the blog posts Reducing costs for SAP HANA with Pure’s DirectMemory Cache and Accelerate MySQL Read Scale with DirectMemory Cache. 

Databases are a core part of how we store and retrieve complex data structures. These Data structures are designed to convey information in a human readable format and eliminate redundancy. Data structures inside relational databases attempt to illustrate the interaction between entities. Entities (tables) are data structures made up of one or more attributes (columns) of which records (rows) can have values in one or more attributes.

Some of the concepts in this blog post are covered in others such as An  Introduction To Storage Caching and A Brief Introduction to Database Structures and Design

There is no single way in which to design a database. As requirements vary between organizations so too will the data structures they use in business systems

Pure Storage DirectMemory Cache

Evaluating if a read cache like Pure Storage DirectMemory Cache will increase the performance of a business system is entirely dependent on how the data is accessed for read operations and the associated access frequency.

DirectMemory Cache makes use of Intel Optane Technology in the form of DirectMemory Modules added to FlashArray//X to accelerate performance for read intensive workloads. DirectMemory Cache makes use of a Cache-Aside caching policy with an emphasis on how recently and frequently a block of data has been accessed for read IO operations.

Read Cache Evaluation 

There are a number of logical layers to consider which will impact how data is accessed by the business system :

  • Number of concurrent users across the entire database instance (an instance could contain many databases)
  • Number of databases inside a single instance
  • Number of schemas inside a single database
  • Number of tables/entities inside a schema
  • The indexes created for a table

Using the different logical layers available in a database there are two possible testing scenarios which will yield different results, these are low density access patterns and high density access patterns.

The core metric to look at with these datasets is responsiveness. If storage responds more quickly to requests (latency) then it is possible that the database will be able to complete more operations in the same time frame as a less responsive solution. For the purpose of the examples given here data will either have a very low amount of or no change applied to it while evaluating read cache benefits.

To disclose how evaluations are performed : I have developed a tool which interacts with and is capable of doing performance testing scenarios with a range of databases. The tool creates database objects dynamically , interacts with them in a range of ways (OLTP/OLAP, etc) and is capable of recording and graphing the results of any evaluation. 

The tool which creates and tests the dataset is entirely based around the amount of data to be operated on. That is, if 100 gigabytes are requested to be operated on – operations will continue until that amount of data has been processed. 

In the below evaluation ¼ of a 20TB database was evaluated with and without a read cache using a range of read only SELECT queries.

Low Density Access Patterns

Evaluating a read cache like DirectMemory Cache with a low density access patterns requires a dateset to be designed that is highly diverse in the following ways :

  • Multiple databases are created
  • Each database has a user (or multiple users) that only work within that database.
  • Each database contains multiple schemas , where each user only accesses data from their schema.
  • Each schema contains many tables.
  • Records in tables are accessed randomly on a per row basis but with a high concurrency rate.

The results of testing with and without DirectMemory for a low density access pattern displayed an improvement of ~5%.

High Density Access Patterns

Evaluating a read cache like DirectMemory Cache with a high density access patterns requires a dataset to be designed that is highly diverse in the following ways :

  • One (1) or very few databases are created.
  • Each database contains as few schemas , or if possible only a single schema , which all users are attempting to perform read operations on. This also assumes that the tables populated with data have a significant number of records.
  • The schema can contain many tables , but the read operations are targeted at a specific subset of them.
  • Records in the table are accessed in a very focused manner. An example of this is only accessing records added or changed within a specific date range.

The results of testing with and without DirectMemory Cache for a high density access pattern displayed an improvement of ~17%.

The interesting comparison to look into is the difference in dateset density when compared with all of the configurations. FlashArray is designed with a native read cache (DRAM based) but it is much smaller than the capacities offered by the DirectMemory modules.

Even without DirectMemory cache I observed that the high density dateset was more responsive to the SELECT commands executed against it.

In conclusion I found that a high density access pattern will always yield better results with a read cache when compared with a low density access pattern. This is because when data is accessed more frequently , it is more likely to be a good candidate for a caching solution like DirectMemory cache.