Redo log sync time vs redo size

It’s been tough to find time to do actual performance research of late, but I have managed to get a test system prepared that will allow me to determine if Solid State disks offer some performance advantage over spinning disks when the redo entries are very large.   This is to test the theory that the results I’ve published in the past (here and here for instance) actually apply only when the redo entries are relatively small.  For small sequential writes to SSD, each successive write will invoke an erase of a complete NAND page, whereas in a larger sequential write  this will not occur since each write will hit different pages.

I’m still setting up the test environment to look at this, but first I thought it would be worth showing this pretty picture:

16-09-2013 9-08-26 PM redo sync2

This chart shows how redo log sync time (eg, time taken to COMMIT) varies with the amount of redo information written since the last COMMIT.  There is a slight overall upwards trend, but the really noticeable trend is the “sawtooth” effect, which I’ve highlighted below:

image

Can you guess what causes this?  

I think it’s pretty clear that we are seeing the effect of redo buffer flushing.  Remember, when you write redo entries, they are written to the redo buffer (or sometimes a strand).  Oracle flushes the buffer when you commit, but also flushes it when it is 1/3rd full, after 3 seconds (I think from memory) or after 1MB of redo entries.   Given that, we can see what happens when we commit:

  • If there has been no redo log flush since we started writing, we have to wait while LGWR writes all the entries to disk
  • If a redo log flush occurs after we have written our entries but before we COMMIT, then we have to write virtually nothing (but a COMMIT marker I suppose)
  • Between the two scenarios, we may have to write some of our redo log entries.  However, we should never have to write more than about 1MB

In the chart above we can clearly see the redo log flushes occurring at 1MB intervals.  If we write less than 1MB we generally have to write it all, above 1MB we only have to write a portion of the redo entry.     Note that on this system, I was pretty much the only session doing significant activity, so the pattern is very clear.  On a busy system the effect would be randomized by others causing flushes to occur.

Hopefully I’ll soon be able to compare HDD and SSD performance to see if there are any significant differneces to these trends – the above data was generated by redo on SSD.