Commit Graph

4563 Commits (5672a696392150fe161bb49860f4db392efeb516)
 

Author SHA1 Message Date
Qu Wenruo 5672a69639 btrfs-progs: Handle error properly in btrfs_commit_transaction()
[BUG]
When running fuzz-tests/003 and fuzz-tests/009, btrfs-progs will crash
due to BUG_ON().

[CAUSE]
We abused BUG_ON() in btrfs_commit_transaction(), which is one of the
most error prone function for fuzzed images.

Currently to cleanup the aborted transaction, we only need to clean up
the only per-transaction data: delayed refs.

This patch will introduce a new function, btrfs_destroy_delayed_refs()
to cleanup delayed refs when we failed to commit transaction.

With that function, we will gently destroy per-trans delayed ref, and
remove the BUG_ON()s in btrfs_commit_transaction().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-13 15:54:47 +02:00
Qu Wenruo 45e58a1acf btrfs-progs: Refactor btrfs_finish_extent_commit()
This patch will refactor btrfs_finish_extent_commit():

- Make it return void
  There is no failure pattern for btrfs_finish_extent_commit(), thus it
  always return 0. And the caller doesn't care about the return value.
  So no need to return int.

- Remove @root and @unpin parameters

  @root is only used to extract fs_info, which can be extracted from
  transaction handler already.
  @unpin is always fs_info->pinned_extents.
  All these parameters can be extracted from @trans, no need to pass
  them.

The function signature now matches the kernel counterpart.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-13 15:52:46 +02:00
Qu Wenruo 27a5b9ddc3 btrfs-progs: Remove the dead branch in btrfs_run_delayed_refs()
cleanup_ref_head() will only return 0 or 1, no way to return a negative
value.  So remove the dead branch.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-13 15:52:24 +02:00
David Sterba 3fe16a10da btrfs-progs: tests: fix misc/026 to run on NFS
The temporary files are not accessible if the testsuite is hosted on
NFS, pre-create them and allow writes.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 19:14:13 +02:00
Omar Sandoval cba6bae15d libbtrfsutil: don't close fd on error in btrfs_util_subvolume_id_fd()
The caller owns the fd passed to btrfs_util_subvolume_id_fd(), so we
shouldn't close it on error. Fix it, add a regression test, and bump the
library patch version.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-26 18:23:27 +02:00
David Sterba 5ca28fc25f Merge branch 'pull/qu/v2' into devel from Qu Wenruo (https://github.com/adam900710/btrfs-progs/tree/for_devel)
The branch passes all selftests, except:
- misc/035
  Known bug as the fix is reverted.
- fuzz/003
- fuzz/009
  Not a regression, as stable tags also triggers them.
  BUG_ON() in commit_transaction get triggered due to ENOSPC.
  These two bugs will be addressed soon. but not in this pull.

This pull request include the following features:

Core change:
- check --repair
  * Flush/FUA support to avoid breaking metadata CoW
      Now btrfs-progs crashing or transaction aborted won't cause
      new transid error.

Fixes and Enhancement:
- generic
  * Try best copy when reading tree blocks.
  * Skip unnecessary retry when one tree block copy fails.
  * Avoid back tree block to populate tree block cache.
  * Don't BUG_ON() when failed to flush/write super blocks
- check
  * File extents repair no longer relies data in extent tree.
  * New ability to check and repair free space cache invalid inode mode.
  * Update backup roots when commit transaction.

- Misc
  * fs_info <-> root parameters cleanup for btrfs_check_leaf/node()
2019-04-26 18:10:32 +02:00
Qu Wenruo 4ab95eb8b0 Revert "btrfs-progs: Do metadata preallocation as long as we're not modifying extent tree"
Commit 7a12d8470e ("btrfs-progs: Do metadata preallocation as long as
we're not modifying extent tree") tries to fix #123, however due to the
fact that chunk tree also has root->ref_cows set, we will call
do_chunk_alloc() until call stack explodes.

So revert that offending patch until we have a much better comment on
root->ref_cows and find a better solution to this problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:43 +08:00
Qu Wenruo 6ab19825b0 btrfs-progs: Don't BUG_ON() when write_dev_supers() fails
[BUG]
Since commit "btrfs-progs: disk-io: Flush to ensure super block write is
FUA" mkfs-tests/017 will fail like:

  ====== RUN MUSTFAIL /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol
  ERROR: failed to write super block for devid 1: flush error: Input/output error
  disk-io.c:1810: write_all_supers: BUG_ON `ret` triggered, value -5
  /home/adam/btrfs-progs/mkfs.btrfs(+0x1e5c1)[0x557a2c83e5c1]
  /home/adam/btrfs-progs/mkfs.btrfs(+0x1e65f)[0x557a2c83e65f]
  /home/adam/btrfs-progs/mkfs.btrfs(write_all_supers+0x1ce)[0x557a2c843a8a]
  /home/adam/btrfs-progs/mkfs.btrfs(write_ctree_super+0x12d)[0x557a2c843be2]
  /home/adam/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x250)[0x557a2c887c56]
  /home/adam/btrfs-progs/mkfs.btrfs(+0xc0b1)[0x557a2c82c0b1]
  /home/adam/btrfs-progs/mkfs.btrfs(main+0x1049)[0x557a2c82e929]
  /usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f6689e99223]
  /home/adam/btrfs-progs/mkfs.btrfs(_start+0x2e)[0x557a2c82b86e]
  failed (expected): /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol

[CAUSE]
Just one BUG_ON() in write_all_supers().

[FIX]
Just remove the BUG_ON(). Callers of write_all_supers() are already
checking the return value.

Also since write_all_supers() can return error, make write_ctree_super()
callers, btrfs_commit_transaction() and close_ctree_fs_info() to
handle the error correctly.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo e227e81d99 btrfs-progs: disk-io: Flush to ensure super block write is FUA
[BUG]
There are tons of reports of btrfs-progs screwing up the fs, the most
recent one is "btrfs check --clear-space-cache v1" triggered BUG_ON()
and then leaving the fs with transid mismatch problem.

[CAUSE]
In kernel, we have block layer handing the flush work, even on devices
without FUA support (like most SATA device using default libata
settings), kernel handles FUA write by flushing the device, then normal
write, and finish it with another flush.

The pre-flush, write, post-flush works pretty well to implement FUA
write.

However in btrfs-progs we just use pwrite(), there is nothing keeping
the write order.

So even for basic v1 free space cache clearing, we have different vision
on the write sequence from kernel bio layer (by dm-log-writes) and user
space pwrite() calls.

In btrfs-progs, with extra debug output in write_tree_block() and
write_dev_supers(), we can see btrfs-progs follows the right write
sequence:

  Opening filesystem to check...
  Checking filesystem on /dev/mapper/log
  UUID: 3feb3c8b-4eb3-42f3-8e9c-0af22dd58ecf
  write tree block start=1708130304 gen=39
  write tree block start=1708146688 gen=39
  write tree block start=1708163072 gen=39
  write super devid=1 gen=39
  write tree block start=1708179456 gen=40
  write tree block start=1708195840 gen=40
  write super devid=1 gen=40
  write tree block start=1708130304 gen=41
  write tree block start=1708146688 gen=41
  write tree block start=1708228608 gen=41
  write super devid=1 gen=41
  write tree block start=1708163072 gen=42
  write tree block start=1708179456 gen=42
  write super devid=1 gen=42
  write tree block start=1708130304 gen=43
  write tree block start=1708146688 gen=43
  write super devid=1 gen=43
  Free space cache cleared

But from dm-log-writes, the bio sequence is a different story:

  replaying 1742: sector 131072, size 4096, flags 0(NONE)
  replaying 1743: sector 128, size 4096, flags 0(NONE) <<< Only one sb write
  replaying 1744: sector 2828480, size 4096, flags 0(NONE)
  replaying 1745: sector 2828488, size 4096, flags 0(NONE)
  replaying 1746: sector 2828496, size 4096, flags 0(NONE)
  replaying 1787: sector 2304120, size 4096, flags 0(NONE)
  ......
  replaying 1790: sector 2304144, size 4096, flags 0(NONE)
  replaying 1791: sector 2304152, size 4096, flags 0(NONE)
  replaying 1792: sector 0, size 0, flags 8(MARK)

During the free space cache clearing, we committed 3 transaction but
dm-log-write only caught one super block write.

This means all the 3 writes were merged into the last super block write.
And the super block write was the 2nd write, before all tree block
writes, completely screwing up the metadata CoW protection.

No wonder crashed btrfs-progs can make things worse.

[FIX]
Fix this super serious problem by implementing pre and post flush for
the primary super block in btrfs-progs.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 2644f80611 btrfs-progs: disk-io: Make super block write error easier to read
When we failed to write super blocks, we just output something like:
  WARNING: failed to write sb: I/O error
Or
  WARNING: failed to write all sb data

There is no info about which device failed and there are two different
error message for the same write error.

This patch will change it to something more detailed:
ERROR: failed to write super block for devid 1: write error: I/O error

This provides the basis for later super block flush error handling.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 2437a88079 btrfs-progs: tests/fsck: Add test image for free space cache mode repair
The image has one free space cache inode with invalid mode (0).
        item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
                generation 30 transid 30 size 65536 nbytes 1507328
                block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
                sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 08:00:00)
                ctime 1553491158.189771625 (2019-03-25 13:19:18)
                mtime 0.0 (1970-01-01 08:00:00)
                otime 0.0 (1970-01-01 08:00:00)

Both lowmem and original mode should be able to detect and fix it.

The extracted test image is pretty big (1G extracted), as kernel won't
cache small chunks.
Even with SSD, such test may still take some seconds just extracting the
image.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 427990ad74 btrfs-progs: check/original: Check and repair free space cache inode item
Just like lowmem mode, also check and repair free space cache inode
item.

And since we don't really have a good timing/function to check free
space chace inodes, we use the same common mode
check_repair_free_space_inode() when iterating root tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 77fe19ba16 btrfs-progs: check/lowmem: Check and repair free space cache inode mode
Unlike inodes in fs roots, we don't really check the inode items in root
tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF.

This makes invalid inode items sneak into root tree.
For example:
        item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
                generation 30 transid 30 size 65536 nbytes 1507328
                block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
				   ^ Should be 100600
                sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 08:00:00)
                ctime 1553491158.189771625 (2019-03-25 13:19:18)
                mtime 0.0 (1970-01-01 08:00:00)
                otime 0.0 (1970-01-01 08:00:00)

There is a report of such problem in the mail list.

This patch will check and repair inode items of free space cache inodes in
lowmem mode.

Since free space cache inodes doesn't have INODE_REF but still has 1
link, we can't use check_inode_item() directly.
Instead we only check the inode mode, as that's the important part.

The check and repair function: check_repair_free_space_inode() is also
exported for original mode.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 11fd6cff82 btrfs-progs: check/original: Repair invalid inode mode in root tree
This patch will reuse the mode independent repair_imode() function, to
repair invalid inode mode.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 64ca43a8ad btrfs-progs: check/lowmem: Repair invalid inode mode in root tree
In root tree, we only have 2 types of inodes:
- ROOT_TREE_DIR inode
  Its mode is fixed to 40755
- free space cache inodes
  Its mode is fixed to 100600

This patch will add the ability to repair such inodes to lowmem mode.
For fs/subvolume tree error, at least we haven't see such corruption
yet, so we don't need to rush to fix corruption in fs trees yet.

The repair function, reset_imode() and repair_imode_common() can be
reused by later original mode patch, so it's placed in check/mode-common.c.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 23f1e9a13f btrfs-progs: check/original: Add inode mode check
Just like lowmem mode, check inode mode, specially for S_IFMT bits and
beyond.

Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.

Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo c06c5eef88 btrfs-progs: check/lowmem: Add inode mode check
There is one report about invalid free space cache inode mode.
Normally free space cache inode should have mode 100600 (regular file,
no uid/gid/sticky bit, rw------ bit).

But in that report, we have free space cache inode mode as 0.

So at least btrfs check should report invalid inode mode.

This patch will at least make btrfs check lowmem mode to detect this
problem.

Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.

Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 66d610010b btrfs-progs: disk-io: Try to find a best copy when reading tree blocks
[BUG]
If the first copy of a tree block has a bad key order, but the second
copy is completely good, then "btrfs ins dump-tree -b <bytenr>" fails to
print anything past the bad key:
  leaf 29786112 items 47 free space 983 generation 20 owner EXTENT_TREE
  leaf 29786112 flags 0x1(WRITTEN) backref revision 1
  fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
  chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
  [snip]
  	item 9 key (20975616 METADATA_ITEM 0) itemoff 3543 itemsize 33
  		refs 1 gen 16 flags TREE_BLOCK
  		tree block skinny level 0
  		tree block backref root CHUNK_TREE
  	item 10 key (29360128 BLOCK_GROUP_ITEM 33554432) itemoff 3519 itemsize 24
  		block group used 94208 chunk_objectid 256 flags METADATA|DUP
  ERROR: leaf 29786112 slot 11 pointer invalid, offset 1245184 size 0 leaf data limit 3995
  ERROR: skip remaining slots

While kernel can locate the good copy and acts just like nothing
happened.

[CAUSE]
btrfs-progs uses read_tree_block() to try each copy. But it only uses
less strict check_tree_block(), which has less sanity check than
btrfs_check_node/leaf().

Some error like bad key order is ignored to allow btrfs check to fix it.

This leads to above problem.

[FIX]
Introduce a new member, @candidate_mirror in read_tree_block(), which
records the copy passes check_tree_block() but fails
btrfs_check_leaf/node() as last chance.

Only if no better copy found, then use @candidate_mirror.

So btrfs-progs can act just like kernel to use best copy.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202691
Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
[Inspired by that image, not to fix any bug of that bugzilla]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo f76136a8d0 btrfs-progs: Move btrfs_num_copies() call out of the loop in read_tree_block()
btrfs_num_copies really only needs to be called once, so move it out of
the verification loop in read_tree_block().

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo e8ae577030 btrfs-progs: Use mirror_num start from 1 to avoid unnecessary retry
[BUG]
If the first copy of a tree block is corrupted but the other copy is
good, btrfs-progs will report the error twice:
  checksum verify failed on 30556160 found 42A2DA71 wanted 00000000
  checksum verify failed on 30556160 found 42A2DA71 wanted 00000000

While kernel only report it once, just as expected:
  BTRFS warning (device dm-3): dm-3 checksum verify failed on 30556160 wanted 0 found 42A2DA71 level 0

[CAUSE]
We use mirror_num = 0 in read_tree_block() of btrfs-progs.

At first glance it's pretty OK, but mirror num 0 in btrfs means ANY
good copy. Real mirror num starts from 1.
In the context of read_tree_block(), since it's read_tree_block() to do
all the checks, mirror num 0 just means the first copy.

So if the first copy is corrupted, btrfs-progs will try mirror num 1
next, which is just the same as mirror num 0.
After reporting the same error on the same copy, btrfs-progs will
finally try mirror num 2, and get the good copy.

[FIX]
The fix is way simpler than all the above analyse, just starts from
mirror num 1.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo db51d8d8f6 btrfs-progs: Use @fs_info to replace @root for btrfs_check_leaf/node()
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo 83aeb251f7 btrfs-progs: Free bad extent buffer as soon as possible
[BUG]
For the new multiple -b parameter supporting, we could hit this bug on a
16K node sized btrfs:
  $ ./btrfs inspect dump-tree -b 1024 -b 2048 -b 4096 -b 8192 zimg
  btrfs-progs v4.20.2
  ERROR: tree block bytenr 1024 is not aligned to sectorsize 4096
  ERROR: tree block bytenr 2048 is not aligned to sectorsize 4096
  Couldn't map the block 4096
  Invalid mapping for 4096-20480, got 13631488-22020096
  Couldn't map the block 4096
  bad tree block 4096, bytenr mismatch, want=4096, have=0
  ERROR: failed to read tree block 4096
  extent_io.c:665: free_extent_buffer_internal: BUG_ON `eb->refs < 0`
  triggered, value 1
  ./btrfs[0x426e57]
  ./btrfs(free_extent_buffer+0xe)[0x427701]
  ./btrfs(alloc_extent_buffer+0x3f)[0x427872]
  ./btrfs(btrfs_find_create_tree_block+0xf)[0x415b3c]
  ./btrfs(read_tree_block+0x5c)[0x4171b5]
  ./btrfs(cmd_inspect_dump_tree+0x587)[0x46fb75]
  ./btrfs(handle_command_group+0x44)[0x40df89]
  ./btrfs(cmd_inspect+0x15)[0x44b569]
  ./btrfs(main+0x8b)[0x40e032]
  /lib64/libc.so.6(__libc_start_main+0xeb)[0x7f2001a54b7b]
  ./btrfs(_start+0x2a)[0x40dd1a]
  Aborted (core dumped)

This is not only limited to multiple ins dump-tree -b parameter support,
but also to possible overlapping bad tree blocks.

[CAUSE]
Btrfs delay extent freeing to improve performance.

However for the "-b 4096 -b 8192" case, the first -b 4096 will cause an
extent buffer start=4096 len=16384 refs=0 in the cached extent tree.

Then the incoming -b 8192 will hit the cache and reuse the cached extent
buffer.
And since the cached extent buffer doesn't match the bytenr, its refs
won't get increased, and we're going to free that eb again.

Since the bad cached eb already has a ref number 0, calling
free_extent_buffer() on it again will trigger the assert.

[FIX]
So for bad extent buffer we failed to read, just delete them
immediately.
This will free them from extent buffer cache, so later extent buffer
allocation will not hit the stale one, and prevent the bug from
happening.

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Qu Wenruo fc4d433437 btrfs-progs: Update backup roots when writing super blocks
The code is mostly ported from kernel with minimal change.

Since btrfs-progs doesn't support replaying log, there is some code
unnecessary for btrfs-progs, but to keep the code the same, that
unnecessary code is kept as it.

Now "btrfs check --repair" will update backup roots correctly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Su Yue 84d433d861 btrfs-progs: fsck-test: enable lowmem repair for case 001
Lowmem can repair after commit
       'btrfs-progs: lowmem: move nbytes check before isize check',
so add the beacon file.

Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Lu Fengqi 8a3aefc78d btrfs-progs: tests: add case for inode lose one file extent
The missing extent will lead to the existence of the gap between adjacent
extents. The fsck should can detect the gap correctly and repair by punch
a hole.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Su Yanjun cedbfc2561 btrfs-progs: check: Delete file extent item with unaligned disk bytenr
For test case fsck-tests/001-bad-file-extent-bytenr, we have an
obviously hand crafted image with unaligned file extent:

        item 7 key (257 EXTENT_DATA 0) itemoff 3453 itemsize 53
                generation 6 type 1 (regular)
                extent data disk byte 755944791 nr 1048576
                extent data offset 0 nr 1048576 ram 1048576
                extent compression 0 (none)

disk bytenr 755944791 is obviously unaligned (not even).

For such obviously corrupted file extent, we should just delete the file
extent.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message and comment]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Su Yanjun 3b35deeadd btrfs-progs: check: fix wrong @offset used in find_possible_backrefs()
Function find_possible_backrefs() is used to locate the file extents
referring to an data extent.

For data extent backref, its btrfs_extent_data_ref structure has
the following members:
- root
  Which root refers to this data extent

- objectid
  Which inode refers to this data extent

- offset
  Search *hint*.
  Its value is @file_offset - @extent_offset.

While for @file_offset, it's directly recorded in (INO EXTENT_DATA
FILE_OFFSET) key.

So when searching the file extents refers to this data extent, we can't
use btrfs_extent_data_ref::offset as search key::offset.

We must search from file offset 0, and iterate all file extents until we
hit a file extent matches the data backref.

Thankfully such time consuming behavior is not triggered frequently,
it only gets called for repair, so it shouldn't affect normal check
routine.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:04:25 +08:00
Su Yanjun b6a0d97cba Revert "btrfs-progs: Record orphan data extent ref to corresponding root."
Commit 0ddf63c09f ("btrfs-progs: Record orphan data extent ref to
corresponding root.") introduces the ability to record a file extent
even all other related info is lost (data backref, inode item).

However this patch only records such info without doing any proper
repair, further more, it could even record invalid file extents, and the
report part only happens after all check is done.

Since we will later introduce proper file extent repair functionality,
we could revert that patch.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:03:51 +08:00
Su Yanjun 872c116c75 Revert "btrfs-progs: Add repair and report function for orphan file extent."
Commit ad03f840f0 ("btrfs-progs: Add repair and report function for
orphan file extent.") will record and try to repair orphan file extents
by:
- Removing the orphan file extent item if no extent backref can be found
Or
- Re-insert a file extent using data backref

Especially the later case is far from ideal, as normally extent tree is
more fragile and corruption prone.
Use any data from extent tree to try to repair could easily lead to
further corruption.

So here we revert commit ad03f840f0 ("btrfs-progs: Add repair and report
function for orphan file extent.") to cleanup the space for later proper
repair in original mode.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:03:02 +08:00
Su Yue 0617bde3bc btrfs-progs: lowmem: delete unaligned bytes extent data under repair
If found a extent data item has unaligned part, lowmem repair
just deletes it.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:13 +08:00
Su Yue a170ef86ff btrfs-progs: lowmem: rename delete_extent_tree_item() to delete_item()
The function can delete items in trees besides extent tree.
Rename and move it for further use.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:09 +08:00
Su Yue 2a058a2204 btrfs-progs: lowmem: check unaligned disk_bytenr for extent_data
Add support to check unaligned disk_bytenr for extent_data.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:08 +08:00
Su Yue cbe66f5804 btrfs-progs: lowmem: fix false alert if extent item has been repaired
Previously, @err are assigned immediately after check but before
repair.
repair_extent_item()'s return value also confuses the caller. If
error has been repaired and returns 0, check_extent_item() will try
to continue check the nonexistent and cause flase alerts.

Here make repair_extent_item()'s return codes only represents status
of the extent item, error bits are handled in caller of the repair
function.
Change of @err after repair.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:06 +08:00
Su Yue 0ea38ae08d btrfs-progs: lowmem: move nbytes check before isize check
For files, lowmem repair will try to check nbytes and isize,
but isize check depends nbytes.

Once bytes has been repaired, then isize should be checked and
repaired.
So move nbytes check before isize check. Also set nbytes to
extent_size once repaired successfully.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:04 +08:00
Su Yue ee455db07e btrfs-progs: lowmem: add argument path to punch_extent_hole()
Since repair will do CoW, the outer path may be invalid.

This patch will add an argument, @path, to punch_extent_hole().
When punch_extent_hole() returns, path will still point to the same key.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment and commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:02:01 +08:00
Lu Fengqi fa13420cc7 btrfs-progs: lowmem: fix false alert about the existence of gaps in the check_file_extent
The 'end' parameter of check_file_extent tracks the ending offset of the
last checked extent. This is used to detect gaps between adjacent extents.

Currently such gaps are wrongly detected since for regular extents only
the size of the extent is added to the 'end' parameter. This results in
wrongly considering all extents of a file as having gaps between them
when only 2 of them really have a gap as seen in the example below.

Solution:
The extent_end variable should set to the sum of the offset and the
extent_num_bytes of the file extent.

Example:
Suppose that lowmem check the following file extent of inode 257.

        item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53
                generation 6 type 1 (regular)
                extent data disk byte 13631488 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53
                generation 6 type 1 (regular)
                extent data disk byte 13631488 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53
                generation 6 type 1 (regular)
                extent data disk byte 13631488 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)

For inode 257, check_inode_item set extent_end to 0, then call
check_file_extent to check item {6,7,8}.
item 6)
	offset(0) == extent_end(0)
	extent_end = extent_end(0) + extent_num_bytes(4096)
item 7)
	offset(8192) != extent_end(4096)
	extent_end = extent_end(4096) + extent_num_bytes(4096)
			^^^
	The old extent_end should replace by offset(8192).
item 8)
	offset(12288) != extent_end(8192)
		^^^
	But there is no gap between item {7,8}.

Fixes: d88da10ddd ("btrfs-progs: check: introduce function to check file extent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
[Move this patch as the 1st patch, since it's an independent fix]
Signed-off-by: Qu Wenruo <wqu@suse.com>
2019-04-16 09:01:53 +08:00
Nikolay Borisov cfa470d5a6 btrfs-progs: Remove get_argv0_buf
get_argv0_buf is used only in two functions in help.c which also has
direct access to the static argv0_buf. All other references to this
buffer are direct. So remove the function. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-11 15:30:33 +01:00
Qu Wenruo f312aff23f btrfs-progs: inspect-dump-tree: Allow --block to be specified multiple times
Reuse extent-cache facility to record multiple bytenr so --block can be
specified multiple times.

Despite that, add a sector size alignment check before we try to print a
tree block.  Please note that, nodesize alignment check is not suitable
here as meta chunk start bytenr could be unaligned to nodesize.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 15:58:06 +01:00
Qu Wenruo 7a12d8470e btrfs-progs: Do metadata preallocation as long as we're not modifying extent tree
In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert.  After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).

In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.

However we only do metadata prealloc if we're reserving extents for fs
trees.  (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).

There is no explaination on why we only do metadata prealloc for file
trees, but at least from my investigation, it could be related to avoid
nested extent tree modication.

At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.

So change the metadata block group preallocation check from
"root->ref_cow" to "root->root_key.objectid !=
BTRFS_EXTENT_TREE_OBJECTID", and add some comment for it.

Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 15:33:54 +01:00
Adam Borowski 5ebf288875 btrfs-progs: defrag: open files RO on new enough kernels
Defragging an executable conflicts both way with it being run, resulting in
ETXTBSY.  This either makes defrag fail or prevents the program from being
executed.

Kernels 4.19-rc1 and later allow defragging files you could have possibly
opened rw, even if the passed descriptor is ro (commit 616d374efa23
"btrfs: allow defrag on a file opened read-only that has rw
permissions").

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 15:33:54 +01:00
Adam Borowski 3da12f3d5f btrfs-progs: fix kernel version parsing on some versions past 3.0
The code fails if the third section is missing (like "4.18") or is followed
by anything but "." or "-".  This happens for example if we're not exactly
at a tag and CONFIG_LOCALVERSION_AUTO=n (which results in "4.18.5+").

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 15:33:54 +01:00
Steven Davies 33b4acc7df btrfs-progs: receive: add option for quiet mode
Provide an option in `btrfs receive` to suppress the informational
messages when writing files.

Signed-off-by: Steven Davies <btrfs@steev.me.uk>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 15:33:54 +01:00
Qu Wenruo dc592178f2 btrfs-progs: gitignore: Ignore hidden files
A lot of editor/IDE related config files are dotfiles, like .vimrc
or .clang_complete.

Instead of adding gitignore entry for each editor/IDE, just ignore all
dotfiles.

Files tracked by git are not ignored and can be updated as usual.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 13:27:27 +01:00
David Sterba c7d546ab6b btrfs-progs: qgoup: propagate errors from btrfs_qgroup_parse_sort_string
Get rid of the exit inside and propagate errors to the callers.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:57 +01:00
David Sterba 9a1550bb2d btrfs-progs: qgroup show: report unrecognized format of sort string
Let the parsing function distinguish between parsing and other errors
and use that in qgroup show.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:57 +01:00
David Sterba 88ab0824a2 btrfs-progs: qgroup show: report errors while parsing sort string
Print a better message when there are unknown problems while parsing the
sort string. This currently is -ENOMEM and -1 on uknown format. This
will be changed.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:57 +01:00
David Sterba 6927db8ad9 btrfs-progs: qgroup: remove unused option handling
The commands create/destroy do not take any arguments so the error
hanling does not make sense here.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:57 +01:00
David Sterba f34d4c5ac0 btrfs-progs: qgroup: report unknown option
Since commit 208ba29007 "btrfs-progs: qgroup assign: can't handle
options", the unknown options will not be reported for qgroup
assign/remove as this was mistakenly removed from the caller.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:56 +01:00
David Sterba 50750ead22 btrfs-progs: find-root: use the common help infrastructure
Set up the structures so the usage helpers can be used instead of local
functions.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:56 +01:00
David Sterba 32ec525fd2 btrfs-progs: find-root: don't dump usage on wrong argument count
The checker prints the error message, no need the full usage string to
be pritned.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-05 12:57:56 +01:00