LOBS is the DB2 way of dealing with big data. In db2 you can have a table that contains a column defined as LOB where one can dump large binary data (images, compressed text files, video or audio).
The problem is the way this LOBS are stored by DB2. By default they are stored “offline” in a separate LOB table space and the actual table has only a link to the LOB.
From point of view of transaction speed this is a problem if your system has to react fast, because the LOBs are read from spinning disks not memory. If your system only needs to access this LOBS seldom and with no time constraints the default model is the best.
In case you need speed the default model which does not use the memory caching and all the other indexing and speed related features of DB2 has to be changed. Here comes the INLINE LOB to the rescue. An inline LOB is a LOB stored in the table inline so that all the normal rows caching and speed improvements of DB2 apply.
In my case I have a very big table (Document) containing LOB data in one of the columns (bytecontent). The following steps have to be followed to optimize the speed of SELECT on this table.
The main source of inspiration is the following page from IBM knowledge base:
ADMIN_EST_INLINE_LENGTH function – Estimate length required to inline data
There is a nice description of DB2 LOBs and what is the difference between inline and classic LOBs in the IBM knowledge base: Storing LOBs inline in table rows
STEP 1: Get from the database the page size of your tablespaces.
db2 "list tablespaces show detail " | grep "Page size"
The result I got is:
Page size (bytes) = 32768
According to IBM the maximum size I can put as an inline LOB is 32673.
See bellow the IBM table for all the page sizes:
[supsystic-tables id=’2′]
STEP 2: Get from the database the statistics about the table with LOBs
In this step we will estimate three things. The maximum size of my LOBs, the number of records that can be inlined and the number of records not affected.
The maximum size of my LOB column entries and if they are inlined or not can be obtained with the following query.
Note the usage of DB2 functions ADMIN_IS_INLINED and ADMIN_EST_INLINE_LENGTH.
SELECT id, ADMIN_IS_INLINED(bytecontent) as IS_INLINED,
ADMIN_EST_INLINE_LENGTH(bytecontent) as EST_INLINE_LENGTH
from Document order by EST_INLINE_LENGTH desc
As a result my top rows are:
[supsystic-tables id=’1′]
As you can see all my data that can be inlined is bellow the 32673 byte limit my 32k page size imposes to the INLINE LOB. Taking into account the overhead and the size of the other fields I estimate I will get a lot of the LOBs inlined. From the result of the above query we can see that the biggest inlinable LOB has 9670.
You can use the following queries to estimate how much of your data can be converted:
Count the entries that cannot be inlined with a given threshold:
SELECT count(id) from Document where ADMIN_EST_INLINE_LENGTH(bytecontent)>'32000'
Count the entries that can be inlined with a given threshold :
SELECT count(id) from Document where ADMIN_EST_INLINE_LENGTH(bytecontent)<'32000'
Note that my results were 0 and 408805 so I will expect a big increase in performance.
STEP 3: Alter the table to inline the LOBs
Execute the following and make sure the specified parameter will not make your data to exceed the row size limit. It seems that there is some extra overhead than the 4 bytes specified by IBM so I went from 9670 to a safer 32000.
ALTER TABLE Document ALTER COLUMN bytecontent SET INLINE LENGTH 32000
If the size you specified is to big you will get the error:
[Error Code: -670, SQL State: 54010] The row length of the table exceeded a limit of "32677" bytes. (Table space "USERSPACE1".). SQLCODE=-670, SQLSTATE=54010, DRIVER=4.18.60
If the size you specified is too small you will get the error:
[Error Code: -20016, SQL State: 429B2] The value of the inline length associated with type or column "BYTECONTENT" is too small.. SQLCODE=-20016, SQLSTATE=429B2, DRIVER=4.18.60
STEP 4: Test the speed of a select before we apply the inlining so we have a base comparison
SELECT * FROM Document FETCH FIRST 1000 ROWS ONLY
The query execution result is returned in :
1 statement(s) executed, 1000 row(s) affected, exec/fetch time: 0.008/0.308 sec [1 successful, 0 warnings, 0 errors]
STEP 5: Reorg the table so the inlining will be performed.
First run again the query:
SELECT id, ADMIN_IS_INLINED(bytecontent) as IS_INLINED,
ADMIN_EST_INLINE_LENGTH(bytecontent) as EST_INLINE_LENGTH
from Document order by EST_INLINE_LENGTH desc
you will notice that there are no inlined entries.
Apply a reorg of the table Document specifying the LONGLOBDATA parameter. This will have to effects: inline the inlinable LOBs and free space from the LOB repository.
db2 "reorg table Document LONGLOBDATA"
Note that this may take a long time.
The last step is to run collect again statistics on the altered table.
db2 "runstats on table Document with distribution and detailed indexes all"
STEP 6: Check the result and do a performance test.
Check how many rows have now the LOBs inlined:
SELECT count(id) FROM Document WHERE ADMIN_IS_INLINED(bytecontent)='1'
I got the number of converted entries as: 318262 , so approximately 77.85% of the rows
Check how many rows were not affected:
SELECT count(id) FROM Document WHERE ADMIN_IS_INLINED(bytecontent)='0'
I got the number of unaffected entries as: 90543 , so approximately 22.15% of the rows
Test again the select:
SELECT * FROM Document FETCH FIRST 1000 ROWS ONLY
The query execution result is returned in :
... 1 statement(s) executed, 1000 row(s) affected, exec/fetch time: 0.009/0.162 sec [1 successful, 0 warnings, 0 errors]
Consistent with the number of entries that were converted and also taking into account that my select sample of 1000 is small compared with the total number of rows a 2 times increase in performance is a good estimate.
Note that even if my biggest inline-able LOB was 9.6k there are still 22% of the rows for which the LOB could not be inlined. This is because the total size of the row is limited to 32k so you have to account also for the values from the other columns of the table.
There is also an IBM knowledge base article about the increase in performance Inline LOBs improve performance.
The main benefits of inline LOBs according to IBM:
- no additional I/O is required to fetch, insert, or update this data.
- inline LOB data is eligible for row compression
There are also some drawbacks according to IBM:
- when a table has columns with inline LOBs, fewer rows fit on a page
- performance of queries that return only non-LOB data can be adversely affected
- LOB inlining is helpful for workloads in which most of the statements include one or more LOB columns
- inline LOBs are always logged and can therefore increase logging overhead
From my own performance tests on the Document table I noticed:
- when inserting a large number of documents (297,000) from which 70% are small enough that can have their LOB inlined, the insert operation is more than 3 times faster.
- the stages of the application that repackage and deliver to the output Documents is again much faster. It means that caching and compression is applied now on most of the rows.
- I did not notice a big overhead on the database logging.