red and gray train rail

Using hdbpersdiag To Check For Volume Corruption

The release of SAP HANA 2.0 SPS05 brings some great new capabilities to the platform , among them Native Storage Extension support for Scale Out Landscapes , significant improvements to the startup initialization of hybrid LOBs and support for having an NFS server for the hana-shared directory on the Master host of a scale out system.

One particular new capability that this release brings is the ability to do page consistency checks independently from SAP HANA data backups, using the SAP HANA Persistence Diagnosis Tool (hdbpersdiag).

There are a number of corruptions with can occur in an SAP HANA database :

  • Page corruption on the disk level – issues in the storage subsystem , filesystem or operating system. Typically identified as checksum errors or I/O errors.
  • Logical corruption within SAP HANA – issues with objects in SAP HANA itself displayed as inconsistencies such as main storage/delta storage inconsistency, duplicate keys, records located in the incorrect partitions and NULL values in NOT NULL columns among other possibilities.

There are a number of ways in which corruptions can be resolved but each is dependent on what type of corruption has occurred. Briefly , these are the recommended consistency checks and frequencies set out by SAP in the FAQ: SAP HANA Consistency Checks and Corruptions  :

Tool Minimum Frequency Additional Information
CHECK_TABLE_CONSISTENCY monthly This is the main check tool , should be run regularly or scheduled to run.
uniqueChecker monthly For SAP HANA revisions lower than SAP HANA 1.0 122.02 runs some additional checks in comparison to CHECK_TABLE_CONSISTENCY
CHECK_CATALOG On demand The catalog check should be run in scenarios where there are suspected metadata issues.

In addition to the corruptions which can occur at the logical layers , there are also corruptions which can occur at the lower layers underneath the database – for example at the data page level. Data pages are entities written to and read from disk for various purposes related to the persistence of logical data structures and objects. It is possible for a data page to fail checks during backup or column load operations which is essentially the data page not matching the expectation. Some examples of errors which could be thrown can be found in the section “What are typical errors and solutions for corruptions on lower layers ?” of the Consistency Checks and Corruptions FAQ”.

Now , why is the ability to do page consistency checks independently from SAP HANA backups so important ?

  • Data Page level consistency checks are only run during a data streaming style backup operation , i.e via an SAP HANA backint certified backup provider or direct backup to disk.
  • When creating application consistent or crash consistent storage snapshots no consistency checks are run on data pages.
  • It is not possible to know , without a full recovery , if a recovery from a storage snapshot will yield uncorrupted data.

Organizations who want to verify if the lower levels of the database persistence are in a good state or a storage snapshot is a good candidate to use before recovery now have the ability to do so using the SAP HANA Persistence Diagnosis Tool.

Using hdbpersdiag with Storage Snapshots on Pure Storage® FlashArray™

Important things to note before using the tool :

  • This is an expert tool where only the “check all” function is available for public use.
  • The tool is delivered as apart of the SAP HANA 2.0 SPS05 Installation in the /usr/sap/<sid>HDB<instance number>/exe directory
  • The tool is not standalone , it is dependent on binaries and environment variables in an SAP HANA deployment.
  • It should be run by the <sid>adm user from a command line terminal or ssh connection.

The SAP Note – How to Check the Consistency of the Persistence details out the process of using hdbpersdiag. My example of using it will detail out using a crash consistency storage snapshot of both the log and data volume on FlashArray. I will be using a system with multiple tenants , which does make a difference for illustrative purposes.

1. Take a storage snapshot of the log and data volume for the SAP HANA system.

2. Copy the snapshots to new volumes and connect those volumes to the host with SAP HANA 2.0 SPS05 installed

3. Mount the volumes to a location in the filesystem. In my example I have used /hana/testData as the location for SAP HANA data persistence. As the system I was mounting the new volumes to was the same one the volume snapshots were taken from I needed to use “nouuid” as an option to avoid conflicting with any existing volumes.

Filesystem                                           Size  Used Avail Use% Mounted on
devtmpfs                                             2.3T     0  2.3T   0% /dev
tmpfs                                                3.4T   32K  3.4T   1% /dev/shm
tmpfs                                                2.3T   21M  2.3T   1% /run
tmpfs                                                2.3T     0  2.3T   0% /sys/fs/cgroup
/dev/mapper/3624a9370c49a4cb0e2944f440002d735-part2   60G   18G   43G  30% /
/dev/mapper/3624a9370c49a4cb0e2944f440002dc76        512G   44G  469G   9% /hana/shared
fileserver.puredoes.local:/mnt/nfs/HANA_Backup       1.0T  136G  889G  14% /hana/backup
tmpfs                                                454G   20K  454G   1% /run/user/469
tmpfs                                                454G     0  454G   0% /run/user/468
tmpfs                                                454G     0  454G   0% /run/user/0
/dev/mapper/3624a9370884890ea83bd488200012c64        7.0T  1.1T  6.0T  15% /hana/data
/dev/mapper/3624a9370884890ea83bd488200012c65        3.0T  673G  2.4T  22% /hana/log
tmpfs                                                454G     0  454G   0% /run/user/1001
/dev/mapper/3624a9370884890ea83bd488200012c6d        3.0T  673G  2.4T  22% /hana/testLog
/dev/mapper/3624a9370884890ea83bd488200012c6e        7.0T  1.1T  6.0T  15% /hana/testData

4. Identify the HANA database volumes within the volume. , with multiple tenants there will be multiple database volumes. Each folder starting with “hdb000x” is a HANA database volume. The tool is used to check the integrity of the SAP HANA data volume.

sh1adm@Hannah:/hana/testData/SH1/mnt00001> ll
total 4
drwxr-x--- 2 sh1adm sapsys 117 Jul 13 06:36 hdb00001
drwxr-xr-- 2 sh1adm sapsys  93 Jul 14 06:43 hdb00002.00003
drwxr-xr-- 2 sh1adm sapsys  93 Jul 14 07:00 hdb00002.00004
-rw-r--r-- 1 sh1adm sapsys  17 Jul 14 07:06 nameserver.lck

5. Run the hdbpersdiag tool to check for page data corruption on each HANA database volume.

Test SystemDB Volumes :

/usr/sap/SH1/HDB00/exe/hdbpersdiag  -c 'check all'  /hana/testData/SH1/mnt00001/hdb00001

Output :

Loaded library 'libhdbunifiedtable'
Loaded library 'libhdblivecache'
Trace is written to: /usr/sap/SH1/HDB00/hannah/trace
Mounted DataVolume(s)
  #0 /hana/testData/SH1/mnt00001/hdb00001/ (2.7 GB, 2904342528 bytes)
Tips:
  Type 'help' for help on the available commands
  Use 'TAB' for command auto-completion
  Use '|' to redirect the output to a specific command. Available command(s) are:
    count        Count the number of lines
    dump         Save the output to a file
    grep         Print lines that contain a match for a pattern
    head         Print the first n lines
    more         Print text, one screen at a time
    tail         Print the last n lines
                     Default Anchor Page OK
                            Restart Page OK
                 Default Converter Pages OK
                RowStore Converter Pages OK
             Logical Pages (64750 pages) OK
                   Logical Pages Linkage OK
                      ContainerDirectory OK
                  ContainerNameDirectory OK
                  FileIDMappingContainer OK
                       UndoFileDirectory OK
                            LobDirectory OK
                     MidSizeLobDirectory OK
                            LobFileIDMap OK

Test Tenant 1 Volumes :

/usr/sap/SH1/HDB00/exe/hdbpersdiag  -c 'check all'  /hana/testData/SH1/mnt00001/hdb00002.00003

Test Tenant 2 Volumes :

/usr/sap/SH1/HDB00/exe/hdbpersdiag  -c 'check all'  /hana/testData/SH1/mnt00001/hdb00002.00004

At the end of verifying the health of each SAP HANA volume , if everything was listed as OK then I felt the storage snapshot could be used as a recovery point at that time or in future.

 

Hdbpersdiag is a great tool to help ensure that an SAP HANA system’s persistence is healthy , but it also makes storage snapshots (both application consistent data snapshots and crash consistent snapshots) more appropriate to use for recovery points as it overcomes the lack of consistency checks during the snapshot creation process.