17 April 2009

ext3 & 4 and data=guarded

Valerie Aurora over at Red Hat has just posted on her blog about ext3 and ext4 and fsync() issues we've all heard so much about. As she says there, rename in ext4 now implies fsync() so that issue should calm down.

However, 2.6.30 is defaulting to data=writeback, which means it only writes the metadata to the journal—not the actual data. This is how XFS, ReiserFS, and a few others work, and it's much faster than ext3's default data=ordered. It's also somewhat less awesome at ensuring your data doesn't get lost. She's asking that people test patches (linked from her blog) for a new journal mode called "guarded" (created by Chris Mason) which she says will be faster than "ordered" but still have its data consistency guarantees.


5 comments:

antistress said...

"data=writeback, which means it only writes the metadata to the journal"

are you sure ?

http://www.mjmwired.net/kernel/Documentation/filesystems/ext4.txt#313 says :

Data Mode
=========
There are 3 different data modes:

* writeback mode
In data=writeback mode, ext4 does not journal data at all. This mode provides a similar level of journaling as that of XFS, JFS, and ReiserFS in its default mode - metadata journaling. A crash+recovery can cause incorrect data to appear in files which were written shortly before the crash. This mode will typically provide the best ext4 performance.

* ordered mode
In data=ordered mode, ext4 only officially journals metadata, but it logically groups metadata information related to data changes with the data blocks into a single unit called a transaction. When it's time to write the new metadata out to disk, the associated data blocks are written first. In general, this mode performs slightly slower than writeback but significantly faster than journal mode.

* journal mode
data=journal mode provides full data and metadata journaling. All new data is written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it outperforms all others modes. Curently ext4 does not have delayed allocation support if this data journalling mode is selected.

Mackenzie said...

How does that conflict? The metadata is journalled. The data itself is not. That's what I said...

antistress said...

you're right, sorry !

tomas said...

Just wanted to note that ubuntu team has backported the changes which were introduced in the 2.6.30 kernel to make ext4 more reliable.

Ajzimm3rman said...

Thanks for the information.