Commit Graph

19 Commits (master)

Author SHA1 Message Date
Qu Wenruo 2a8cec4b12 btrfs-progs: Exhaust delayed refs and dirty block groups to prevent delayed refs lost
[BUG]
Btrfs-progs sometimes fails to find certain extent backref when
committing transaction.
The most reliable way to reproduce it is fsck-test/013 on 64K page sized
system:

  [...]
  adding new data backref on 315859712 root 287 owner 292 offset 0 found 1
  btrfs unable to find ref byte nr 31850496 parent 0 root 2  owner 0 offset 0
  Failed to find [30867456, 168, 65536]

Also there are some github bug reports related to this problem.

[CAUSE]
Commit 909357e867 ("btrfs-progs: Wire up delayed refs") introduced
delayed refs in btrfs-progs.

However in that commit, delayed refs are not run at correct timing.
That commit calls btrfs_run_delayed_refs() before
btrfs_write_dirty_block_groups(), which needs to update
BLOCK_GROUP_ITEMs in extent tree, thus could cause new delayed refs.

This means each time we commit a transaction, we may screw up the extent
tree by dropping some pending delayed refs, like:

Transaction 711:
btrfs_commit_transaction()
|- btrfs_run_delayed_refs()
|  Now all delayed refs are written to extent tree
|
|- btrfs_write_dirty_block_groups()
|  Needs to update extent tree root
|  ADD_DELAYED_REF to 315859712.
|  Delayed refs are attached to current trans handle.
|
|- __commit_transaction()
|- write_ctree_super()
|- btrfs_finish_extent_commit()
|- kfree(trans)
   Now delayed ref for 315859712 are lost

Transaction 712:
Tree block 315859712 get dropped
btrfs_commit_transaction()
|- btrfs_run_delayed_refs()
   |- run_one_delayed_ref()
      |- __free_extent()
         As previous ADD_DELAYED_REF to 315859712 is lost, extent tree
         doesn't have any backref for 315859712, causing the bug

In fact, commit c31edf610c ("btrfs-progs: Fix false ENOSPC alert by
tracking used space correctly") detects the tree block leakage, but in
the reproducer we have too much noise, thus nobody notices the leakage
warning.

[FIX]
We can't just move btrfs_run_delayed_refs() after
btrfs_write_dirty_block_groups(), as during btrfs_run_delayed_refs(), we
can re-dirty block groups.
Thus we need to exhaust both delayed refs and dirty blocks.

This patch will call btrfs_write_dirty_block_groups() and
btrfs_run_delayed_refs() in a loop until both delayed refs and dirty
blocks are exhausted. Much like what we do in commit_cowonly_roots() in
kernel.

Also, to prevent such problem from happening again (and not to debug
such problem again), add extra check on delayed refs before freeing the
transaction handle.

Reported-by: Klemens Schölhorn <klemens@schoelhorn.eu>
Issue: #187
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-26 17:46:41 +02:00
David Sterba d1efe50d0a btrfs-progs: move messages.[ch] to common/
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-03 20:49:03 +02:00
Qu Wenruo c31edf610c btrfs-progs: Fix false ENOSPC alert by tracking used space correctly
[BUG]
There is a bug report of unexpected ENOSPC from btrfs-convert, issue #123.

After some debugging, even when we have enough unallocated space, we
still hit ENOSPC at btrfs_reserve_extent().

[CAUSE]
Btrfs-progs relies on chunk preallocator to make enough space for
data/metadata.

However after the introduction of delayed-ref, it's no longer reliable
to rely on btrfs_space_info::bytes_used and
btrfs_space_info::bytes_pinned to calculate used metadata space.

For a running transaction with a lot of allocated tree blocks,
btrfs_space_info::bytes_used stays its original value, and will only be
updated when running delayed ref.

This makes btrfs-progs chunk preallocator completely useless. And for
btrfs-convert/mkfs.btrfs --rootdir, if we're going to have enough
metadata to fill a metadata block group in one transaction, we will hit
ENOSPC no matter whether we have enough unallocated space.

[FIX]
This patch will introduce btrfs_space_info::bytes_reserved to track how
many space we have reserved but not yet committed to extent tree.

To support this change, this commit also introduces the following
modification:

- More comment on btrfs_space_info::bytes_*
  To make code a little easier to read

- Export update_space_info() to preallocate empty data/metadata space
  info for mkfs.
  For mkfs, we only have a temporary fs image with SYSTEM chunk only.
  Export update_space_info() so that we can preallocate empty
  data/metadata space info before we start a transaction.

- Proper btrfs_space_info::bytes_reserved update
  The timing is the as kernel (except we don't need to update
  bytes_reserved for data extents)
  * Increase bytes_reserved when call alloc_reserved_tree_block()
  * Decrease bytes_reserved when running delayed refs
    With the help of head->must_insert_reserved to determine whether we
    need to decrease.

Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-03 13:31:14 +02:00
Qu Wenruo 922a631d50 btrfs-progs: Avoid nested chunk allocation call
There is a indirect recursion which can reach the extent reservation:

btrfs_reserve_extent()             <--|
|- do_chunk_alloc()                   |
   |- btrfs_alloc_chunk()             |
      |- btrfs_insert_item()          |
	 |- btrfs_reserve_extent() <--|

Currently, we're using root->ref_cows to determine whether we should do
chunk prealloc to avoid such loop.

But that's still a hidden trap. Instead of solving it using some hidden
tricks, this patch will make chunk/block group allocation exclusive.

Now if do_chunk_alloc() determines to alloc chunk, it will set a flag in
transaction handle so new call of do_chunk_alloc() will refuse to
allocate new chunk until current chunk allocation finishes.

The chunks get over-allocated by 2M so there's enough space in case the
recursive call asks for a different type of blockgroup.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-06-05 20:27:31 +02:00
Qu Wenruo 5672a69639 btrfs-progs: Handle error properly in btrfs_commit_transaction()
[BUG]
When running fuzz-tests/003 and fuzz-tests/009, btrfs-progs will crash
due to BUG_ON().

[CAUSE]
We abused BUG_ON() in btrfs_commit_transaction(), which is one of the
most error prone function for fuzzed images.

Currently to cleanup the aborted transaction, we only need to clean up
the only per-transaction data: delayed refs.

This patch will introduce a new function, btrfs_destroy_delayed_refs()
to cleanup delayed refs when we failed to commit transaction.

With that function, we will gently destroy per-trans delayed ref, and
remove the BUG_ON()s in btrfs_commit_transaction().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-13 15:54:47 +02:00
Qu Wenruo 45e58a1acf btrfs-progs: Refactor btrfs_finish_extent_commit()
This patch will refactor btrfs_finish_extent_commit():

- Make it return void
  There is no failure pattern for btrfs_finish_extent_commit(), thus it
  always return 0. And the caller doesn't care about the return value.
  So no need to return int.

- Remove @root and @unpin parameters

  @root is only used to extract fs_info, which can be extracted from
  transaction handler already.
  @unpin is always fs_info->pinned_extents.
  All these parameters can be extracted from @trans, no need to pass
  them.

The function signature now matches the kernel counterpart.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-13 15:52:46 +02:00
Qu Wenruo 6ab19825b0 btrfs-progs: Don't BUG_ON() when write_dev_supers() fails
[BUG]
Since commit "btrfs-progs: disk-io: Flush to ensure super block write is
FUA" mkfs-tests/017 will fail like:

  ====== RUN MUSTFAIL /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol
  ERROR: failed to write super block for devid 1: flush error: Input/output error
  disk-io.c:1810: write_all_supers: BUG_ON `ret` triggered, value -5
  /home/adam/btrfs-progs/mkfs.btrfs(+0x1e5c1)[0x557a2c83e5c1]
  /home/adam/btrfs-progs/mkfs.btrfs(+0x1e65f)[0x557a2c83e65f]
  /home/adam/btrfs-progs/mkfs.btrfs(write_all_supers+0x1ce)[0x557a2c843a8a]
  /home/adam/btrfs-progs/mkfs.btrfs(write_ctree_super+0x12d)[0x557a2c843be2]
  /home/adam/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x250)[0x557a2c887c56]
  /home/adam/btrfs-progs/mkfs.btrfs(+0xc0b1)[0x557a2c82c0b1]
  /home/adam/btrfs-progs/mkfs.btrfs(main+0x1049)[0x557a2c82e929]
  /usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f6689e99223]
  /home/adam/btrfs-progs/mkfs.btrfs(_start+0x2e)[0x557a2c82b86e]
  failed (expected): /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol

[CAUSE]
Just one BUG_ON() in write_all_supers().

[FIX]
Just remove the BUG_ON(). Callers of write_all_supers() are already
checking the return value.

Also since write_all_supers() can return error, make write_ctree_super()
callers, btrfs_commit_transaction() and close_ctree_fs_info() to
handle the error correctly.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Josh Soref 2cd4a76ea9 btrfs-progs: fix typos in user-visible strings
* error messages
* help strings

Generated by https://github.com/jsoref/spelling

Issue: #154
Author: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-26 18:24:49 +01:00
Josh Soref b1d39a42a4 btrfs-progs: fix typos in comments
Generated by https://github.com/jsoref/spelling

Issue: #154
Author: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-26 18:24:48 +01:00
Nikolay Borisov 909357e867 btrfs-progs: Wire up delayed refs
This commit enables the delayed refs infrastructures. This entails doing
the following:

1. Replacing existing calls of btrfs_extent_post_op (which is the
   equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
   As well as eliminating open-coded calls to finish_current_insert and
   del_pending_extents which execute the delayed ops.

2. Wiring up the addition of delayed refs when freeing extents
   (btrfs_free_extent) and when adding new extents (alloc_tree_block).

3. Adding calls to btrfs_run_delayed refs in the transaction commit
   path alongside comments why every call is needed, since it's not
   always obvious (those call sites were derived empirically by running
   and debugging existing tests)

4. Correctly flagging the transaction in which we are reinitialising
   the extent tree.

5. Moving btrfs_write_dirty_block_groups to
   btrfs_write_dirty_block_groups since blockgroups should be written to
   disk after the last delayed refs have been run.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-23 14:48:41 +02:00
Nikolay Borisov d8a5e756be btrfs-progs: Make btrfs_write_dirty_block_groups take only trans argument
The root argument is used only to get a reference to the fs_info, this
can be achieved with the transaction handle being passed so use that.
This is in preparation for moving this function in the main transaction
commit routine. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-23 14:48:41 +02:00
Qu Wenruo 9f45658fd2 btrfs-progs: transaction: do proper error handling in transaction commit
There are cases that btrfs_commit_transaction() itself can fail, mostly
due to ENOSPC when allocating space.

Don't panic out in this case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-09-13 20:11:12 +02:00
Nikolay Borisov 677acdf534 btrfs-progs: Add boolean to signal whether we are re-initing extent tree
Add a boolean to record whether the extent tree is being re-initialised
in the current transaction. This is going to be needed by the
delayed refs code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 15:02:06 +02:00
Nikolay Borisov 723cab8a72 btrfs-progs: Remove fs_info argument from write_ctree_super
This function already takes a transaction handle which has a reference
to the fs_info, so use that to obtain it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-07 16:37:37 +02:00
David Sterba 72fbe845d4 btrfs-progs: don't start or commit after transaction abort
Signed-off-by: David Sterba <dsterba@suse.com>
2017-09-08 16:15:05 +02:00
David Sterba 355a052647 btrfs-progs: start framework for transaction abort
Signed-off-by: David Sterba <dsterba@suse.com>
2017-09-08 16:15:05 +02:00
David Sterba 0ee0b57f0b btrfs-progs: store pointer to fs_info in transaction handle
Signed-off-by: David Sterba <dsterba@suse.com>
2017-09-08 16:15:05 +02:00
David Sterba f2b0cbe8e8 btrfs-progs: move transaction code out of disk-io
Temporarily export the low-level helpers.

Signed-off-by: David Sterba <dsterba@suse.com>
2017-09-08 16:15:05 +02:00
David Sterba 37c271b216 btrfs-progs: move transaction implementation out of header
Signed-off-by: David Sterba <dsterba@suse.com>
2017-09-08 16:15:05 +02:00