Commit Graph

4855 Commits (f01d2b85581e9e695897dfc86b7c8d8e3b04a232)
 

Author SHA1 Message Date
Qu Wenruo f01d2b8558 libbtrfsutil: Convert to designated initialization for SubvolumeIterator_type
[BUG]
When compiling btrfs-progs with libbtrfsutil on a python3.8 system, we
got the following warning:

  subvolume.c:636:2: warning: initialization of ‘long int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
    636 |  NULL,     /* tp_print */
        |  ^~~~
  subvolume.c:636:2: note: (near initialization for ‘SubvolumeIterator_type.tp_vectorcall_offset’)

[CAUSE]
C definition of PyTypeObject changed in python 3.8.
Now at the old tp_print, we have tp_vectorcall_offset.

So we got above warning.

[FIX]
C has designated initialization, which can assign values to each named
member, without hard coding to match the offset.
And all the other uninitialized values will be set to 0, so we can save
a lot of unneeded "= 0" or "= NULL" lines.

Just use that awesome feature to avoid any future breakage.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
Qu Wenruo 425c950cc6 libbtrfsutil: Convert to designated initialization for QgroupInherit_type
[BUG]
When compiling btrfs-progs with libbtrfsutil on a python3.8 system, we
got the following warning:

  qgroup.c:110:2: warning: initialization of ‘long int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
    110 |  NULL,     /* tp_print */
        |  ^~~~
  qgroup.c:110:2: note: (near initialization for ‘QgroupInherit_type.tp_vectorcall_offset’)

[CAUSE]
C definition of PyTypeObject changed in python 3.8.
Now at the old tp_print, we have tp_vectorcall_offset.

So we got above warning.

[FIX]
C has designated initialization, which can assign values to each named
member, without hard coding to match the offset.
And all the other uninitialized values will be set to 0, so we can save
a lot of unneeded "= 0" or "= NULL" lines.

Just use that awesome feature to avoid any future breakage.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
Qu Wenruo a528cbeead libbtrfsutil: Convert to designated initialization for BtrfsUtilError_type
[BUG]
When compiling btrfs-progs with libbtrfsutil on a python3.8 system, we
got the following warning:

  error.c:169:2: warning: initialization of ‘long int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
    169 |  NULL,      /* tp_print */
        |  ^~~~
  error.c:169:2: note: (near initialization for ‘BtrfsUtilError_type.tp_vectorcall_offset’)

[CAUSE]
C definition of PyTypeObject changed in python 3.8.
Now at the old tp_print, we have tp_vectorcall_offset.

So we got above warning.

[FIX]
C has designated initialization, which can assign values to each named
member, without hard coding to match the offset.
Also, uninitialized values will be 0, so we can also save a lot of
unneeded "= 0" or "= NULL" lines.

Just use that awesome feature to avoid any future breakage.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
Qu Wenruo 36128dff44 btrfs-progs: check/lowmem: Fix a false alert on uninitialized value
[BUG]
When compiling the devel branch with commit fb8f05e40b458
("btrfs-progs: check: Make repair_imode_common() handle inodes in
subvolume trees"), the following warning will be reported:

  check/mode-common.c: In function ‘detect_imode’:
  check/mode-common.c|1071 col 23| warning: ‘imode’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  1071 |   *imode_ret = (imode | 0700);
       |                ~~~~~~~^~~~~~~

This only occurs for regular build. If compiled with D=1, the warning
just disappears.

[CAUSE]
Looks like a bug in gcc optimization.
The code will only set @imode_ret when @found is true.
And for every "found = true" assignment we have assigned @imode.
So this is just a false alert.

[FIX]
I hope I can fix the problem of GCC, but obviously I can't (at least for
now).

So let's assign an initial value 0 to @imode to suppress the false
alert.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
Su Yue 97fc76c0ac btrfs-progs: add comments of block group lookup functions
The progs side function btrfs_lookup_first_block_group() calls
find_first_extent_bit() to find block group which contains bytenr
or after the bytenr. This behavior differs from kernel code, so
add the comments.

Add the coments of btrfs_lookup_block_group() too, this one works
like kernel side.

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
David Sterba 899977cd18 btrfs-progs: docs: update mkfs blockgroup description
- add raid1c34
- add utilization to the overview table
- wording updates

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
David Sterba 95117cbd4a btrfs-progs: tests: add tests for checksums
Add separate tests for basic coverage of new checksum algorithms. It
comes in two parts to do a full mkfs test and also a condition mount
test.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:51 +01:00
David Sterba b475a46f4d btrfs-progs: tests: add raid1c34 to basic mkfs tests
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:50 +01:00
David Sterba 1f5094bb5c btrfs-progs: add support for raid1c3 and raid1c4
Add support for 3- and 4- copy variants of RAID1. This adds resiliency
against 2 or resp. 3 devices lost or damaged.

$ ./mkfs.btrfs -m raid1c4 -d raid1c3 /dev/sd[abcd]

Label:              (null)
UUID:               f1f988ab-6750-4bc2-957b-98a4ebe98631
Node size:          16384
Sector size:        4096
Filesystem size:    8.00GiB
Block group profiles:
  Data:             RAID1C3         273.06MiB
  Metadata:         RAID1C4         204.75MiB
  System:           RAID1C4           8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata, raid1c34
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1     2.00GiB  /dev/sda
    2     2.00GiB  /dev/sdb
    3     2.00GiB  /dev/sdc
    4     2.00GiB  /dev/sdd

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:09:50 +01:00
Nikolay Borisov beef042d50 btrfs-progs: Remove convert param from btrfs_alloc_data_chunk
Convert is always set to true so there's no point in having it as a
function parameter or using it as a predicate inside
btrfs_alloc_data_chunk.  Remove it and all relevant code which would
have never been executed.  No semantics changes.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
Nikolay Borisov f28a5a1673 btrfs-progs: Remove type argument from btrfs_alloc_data_chunk
It's always set to BTRFS_BLOCK_GROUP_DATA so sink it into the function.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
Nikolay Borisov 6718ab4d33 btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk
sub_stripe variables is by default initialized to 0 and it's overriden
only in case we have RAID10 mode. This leads to the following (minor)
artifacts on a freshly created filesystem:

item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15863 itemsize 112
		length 1073741824 owner 2 stripe_len 65536 type METADATA|RAID1
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 2 sub_stripes 0
			stripe 0 devid 2 offset 9437184
			dev_uuid a020fc2f-b526-4800-9278-156f2f431fe9
			stripe 1 devid 1 offset 30408704
			dev_uuid 0f78aa72-4626-4057-a8f2-285f46b2c664

After balance resulting chunk item is:

item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 3251634176) itemoff 15863 itemsize 112
		length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 2 sub_stripes 1
			stripe 0 devid 2 offset 3230662656
			dev_uuid a020fc2f-b526-4800-9278-156f2f431fe9
			stripe 1 devid 1 offset 3251634176
			dev_uuid 0f78aa72-4626-4057-a8f2-285f46b2c664

Kernel code usually initializes it to 1, since it takes the value from
the raid description table which has it set to 1 for all but RAID10 types.
In userspace it has to be statically initialized to 1 since we don't
have btrfs_bg_flags_to_raid_index. Eventually the kernel/userspace needs
to be merged but for now it wouldn't bring much value if this function
is copied.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
Qu Wenruo 9b01db7785 btrfs-progs: rescue/zero-log: Manually write all supers to handle extent tree error more gracefully
[BUG]
Even "btrfs rescue zero-log" only reset btrfs_super_block::log_root and
btrfs_super_block::log_root_level, we still use trasction to write all
super blocks for all devices.

This means we can't handle things like corrupted extent tree:

  checksum verify failed on 2172747776 found 000000B6 wanted 00000000
  checksum verify failed on 2172747776 found 000000B6 wanted 00000000
  bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
  WARNING: could not setup extent tree, skipping it
  Clearing log on /dev/nvme/btrfs, previous log_root 0, level 0
  ERROR: Corrupted fs, no valid METADATA block group found
  ERROR: attempt to start transaction over already running one

[CAUSE]
Because we have extra check in transaction code to ensure we have valid
METADATA block groups.

In fact we don't really need transaction at all.

[FIX]
Instead of commit transaction, we can just call write_all_supers()
manually, so we can still handle multi-device fs while avoid above
error.

Also, add OPEN_CTREE_NO_BLOCK_GROUPS open ctree flag to make it more
robust.

Link: https://lore.kernel.org/linux-btrfs/CAKbQEqG35D_=8raTFH75-yCYoqH2OvpPEmpj2dxgo+PTc=cfhA@mail.gmail.com/
Reported-by: Christian Pernegger <pernegger@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
Qu Wenruo e731746a18 btrfs-progs: Reduce error level from error to warning for OPEN_CTREE_PARTIAL
Even if we're using OPEN_CTREE_PARTIAL, like "rescue zero log", the
error message still looks too serious even we skipped that tree:

    bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
    Couldn't setup extent tree
    ^^^^^^^^^^^^^^^^^^^^^^^^^^

This patch will change the error message to:
- Use error() if we're not using OPEN_CTREE_PARTIAL
- Use warning() and explicitly show we're skipping that tree

So the result would be something like:

  For non-OPEN_CTREE_PARTIAL case:
    bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
    ERROR: could not setup extent tree

  For OPEN_CTREE_PARTIAL case
    bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
    WARNING: could not setup extent tree, skipping it

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
David Sterba cbe239bf71 btrfs-progs: fi usage: sort table by device id
The result of 'btrfs fi usate -T' looks strange when the devices are
sorted by path and not by id. It's harder to lookup and the device id
reflects "the order of appearance" in the filesystem.

Original output:
              Data      Metadata  System
Id Path       RAID0     RAID1     RAID1    Unallocated
-- ---------- --------- --------- -------- -----------
 6 /dev/loop0 204.75MiB         -        -     1.80GiB
 3 /dev/loop1 204.75MiB         -        -     1.80GiB
 8 /dev/loop2 204.75MiB 256.00MiB  8.00MiB     1.54GiB
 1 /dev/loop3 204.75MiB         -        -     1.80GiB
 5 /dev/loop4 204.75MiB         -        -     1.80GiB
 4 /dev/loop5 204.75MiB         -        -     1.80GiB
 2 /dev/loop6 204.75MiB         -        -     1.80GiB
 7 /dev/loop7 204.75MiB 256.00MiB  8.00MiB     1.54GiB
-- ---------- --------- --------- -------- -----------
   Total        1.60GiB 256.00MiB  8.00MiB    13.88GiB
   Used           0.00B 112.00KiB 16.00KiB

New output:

              Data      Metadata  System
Id Path       RAID0     RAID1     RAID1    Unallocated
-- ---------- --------- --------- -------- -----------
 1 /dev/loop3 204.75MiB         -        -     1.80GiB
 2 /dev/loop6 204.75MiB         -        -     1.80GiB
 3 /dev/loop1 204.75MiB         -        -     1.80GiB
 4 /dev/loop5 204.75MiB         -        -     1.80GiB
 5 /dev/loop4 204.75MiB         -        -     1.80GiB
 6 /dev/loop0 204.75MiB         -        -     1.80GiB
 7 /dev/loop7 204.75MiB 256.00MiB  8.00MiB     1.54GiB
 8 /dev/loop2 204.75MiB 256.00MiB  8.00MiB     1.54GiB
-- ---------- --------- --------- -------- -----------
   Total        1.60GiB 256.00MiB  8.00MiB    13.88GiB
   Used           0.00B 112.00KiB 16.00KiB

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
David Sterba a85c496c0d btrfs-progs: docs: update check modes
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:21 +01:00
David Sterba a999abb7e9 btrfs-progs: docs: document new --repair --force behaviour
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:20 +01:00
Johannes Thumshirn e388bf386b btrfs-progs: check: warn users about the possible dangers of --repair
The manual page of btrfsck clearly states 'btrfs check --repair' is a
dangerous operation.

Although this warning is in place users do not read the manual page
and/or are used to the behaviour of fsck utilities which repair the
filesystem, and thus potentially cause harm.

Similar to 'btrfs balance' without any filters, add a warning and a
countdown, so users can bail out before eventual corrupting the
filesystem more than it already is.

To override the timeout, let --force skip it and continue.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:20 +01:00
David Sterba a517225ece btrfs-progs: docs: checksum algorithms
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:20 +01:00
David Sterba c981414ea3 btrfs-progs: docs: switch btrfs(5) to auto-numbered list
Each new chapter needed renumbering the whole list, this can be avoided.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:20 +01:00
David Sterba c1a1aa9e33 btrfs-progs: docs: document checksum options for mkfs and convert
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:20 +01:00
David Sterba b9000ce339 btrfs-progs: tests: remove unused variables in common
All the run_* helpers have unused variable cmd, probably a leftover from
debugging the option injection magic.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-22 19:07:16 +01:00
David Sterba 8041a1c229 btrfs-progs: tests: enhance convert option injection
Add support for TEST_ARGS_CONVERT to allow injection of eg. checksum
command for the all tests. Use like

 $ make TEST_ARGS_CONVERT='--csum=xxhash' TEST_ENABLE_OVERRIDE=true test-convert

This affects all btrfs-convert commands that are run by run_check and
other helpers, IOW this affects all tests, not just convert specific ones.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:10 +01:00
David Sterba f24ba8126e btrfs-progs: tests: enhance mkfs option injection
Add support for TEST_ARGS_MKFS to allow injection of eg. checksum
command for the all tests. Use like

 $ make TEST_ARGS_MKFS='--csum=xxhash' TEST_ENABLE_OVERRIDE=true test-mkfs

This affects all mkfs.btrfs commands that are run by run_check and other
helpers, IOW this affects all tests, not just mkfs specific ones.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
David Sterba 698e3baad6 btrfs-progs: convert: add option for checksum type
For parity with mkfs add --csum/--checksum option also for convert. This
affects data and metadata.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
David Sterba 65efb419a2 btrfs-progs: move parse_csum_type to utils
This will be used by convert.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Nikolay Borisov 78a3831d46 btrfs-progs: tests: Test backup root retention logic
This tests ensures that the kernel correctly persists backup roots in
case the filesystem has been mounted from a backup root.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ cleanup to use common helpers ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Nikolay Borisov 8ebc7219ee btrfs-progs: corrupt-block: Refactor tree block corruption code
As progs' transaction/CoW logic evolved over the years the metadata block
corruption code failed to do so. It's currently impossible to corrupt
the generation because the CoW logic will not only set it to the value
of the currently running transaction (__btrfs_cow_block) but the
current code will ASSERT due to the following check in __btrfs_cow_block:

   WARN_ON(!(buf->flags & EXTENT_BAD_TRANSID) &&
                   btrfs_header_generation(buf) > trans->transid);

Fix this by making the generation corruption code directly write
the modified block, outside of the transaction mechanism. At the same
time move the old code into BTRFS_METADATA_BLOCK_SHIFT_ITEMS handling
case, essentially leaving it unchanged.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Qu Wenruo 989a99b5f8 btrfs-progs: Replace btrfs_block_group_cache::item with dedicated members
We access btrfs_block_group_cache::item mostly for @used and @flags.

@flags is already a dedicated member in btrfs_block_group_cache, only
@used doesn't have a dedicated member.

This patch will remove btrfs_block_group_cache::item and add
btrfs_block_group_cache::used.

It's the btrfs-progs equivalent of the following kernel patches:
btrfs: move block_group_item::used to block group
btrfs: move block_group_item::flags to block group
btrfs: remove embedded block_group_cache::item

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Anand Jain 5f9a4e6314 btrfs-progs: balance status: fix usage show long verbose
btrfs balance status supports both short and long option -v|--verbose
but usage failed to show it in its --help. This patch fixes the --help.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Anand Jain ac7ce38475 btrfs-progs: balance start: fix usage add long verbose
btrfs balance start supports both short and long option -v|--verbose
however usage failed to show the long option. This patch fixes the --help.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:09 +01:00
Anand Jain 161402cc5a btrfs-progs: receive: make option quiet work
Even when -q option specified, the receive sub-command is not quiet as
shown below.

 $ btrfs receive -q -f /tmp/t /btrfs1
 At snapshot ss3

It must be quiet at least when it's been asked to be quiet.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
David Sterba 8a8083fded btrfs-progs: README: add gitlab CI/CD status badge
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Su Yue 011b7e2766 btrfs-progs: mkfs-tests/005: check global prereq for dmsetup
This test uses tool dmsetup so add the global prereq.

Issue: #192
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
David Sterba 62ab5ce067 btrfs-progs: ci: use newer image base on travis
Seems that 18.04 has arrived to travis, switch to it. The gcc is 7.4 and
kernel is unfortuantelly still 4.15.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Marcos Paulo de Souza c052b38418 btrfs-progs: Makefile: Add -Wimplicit-fallthrough
Avoid introducing new cases of implicit fallthrough by having this flag
always set, though a conditional check is needed to avoid build breakage
on older compilers or on CI.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Marcos Paulo de Souza 575b6e0e51 btrfs-progs: utils: Replace __attribute__(fallthrough)
When compiling with clang, this warning is shown:

common/utils.c:404:3: warning: declaration does not declare anything [-Wmissing-declarations]
                __attribute__ ((fallthrough));

This attribute seems to silence the same warning in GCC. Changing this
attribute with /* fallthrough */ fixes the warning for both gcc and
clang.

Full support for the attribute will be in clang 10, gcc supports that
now. Let's use what works for both and switch to the attribute in the
future.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Qu Wenruo e33a73b754 btrfs-progs: Refactor btrfs_read_block_groups()
This patch does the following refactor:
- Refactor parameter from @root to @fs_info

- Refactor the large loop body into another function
  Now we have a helper function, read_one_block_group(), to handle
  block group cache and space info related routine.

- Refactor the return value
  Even we have the code handling ret > 0 from find_first_block_group(),
  it never works, as when there is no more block group,
  find_first_block_group() just return -ENOENT other than 1.

  This is super confusing, it's almost a mircle it even works.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Qu Wenruo e46281d6fb btrfs-progs: Refactor excluded extent functions to use fs_info
The following functions are just using @root to reach fs_info:
- exclude_super_stripes
- free_excluded_extents
- add_excluded_extent

Refactor them to use fs_info directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:08 +01:00
Qu Wenruo e56cdab5f1 btrfs-progs: test: tests: Add test image for invalid inode generation repair
The image contains one inode item with invalid generation.  The image
can be crafted by "btrfs-corrupt-block -i 257 -f generation".  It should
emulate the bad inode generation caused by older kernel around 2014.

The image is repairable for both original and lowmem mode.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo b9ea7c1b23 btrfs-progs: check/original: Add check and repair for invalid inode generation
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.

All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.

So add such check and repair ability to lowmem mode check first.

This involves:

- Calculate the inode generation upper limit
  Unlike the lowmem mode context, we don't have anyway to determine if
  this inode belongs to log tree.
  So we use super_generation + 1 as upper limit, just like what we did
  in kernel tree checker.

- Check if the inode generation is larger than the upper limit

- Repair by resetting inode generation to current transaction
  generation
  The difference is, in original mode, we have a common trans handle for
  all repair and reset path for each repair.

Reported-by: Charles Wright <charles.v.wright@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 0a6e40bdc1 btrfs-progs: check/lowmem: Add check and repair for invalid inode generation
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.

All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.

So add such check and repair ability to lowmem mode check first.

This involves:

- Calculate the inode generation upper limit
  If it's an inode from log tree, then the upper limit is
  super_generation + 1, otherwise it's super_generation.

- Check if the inode generation is larger than the upper limit

- Repair by resetting inode generation to current transaction
  generation

Reported-by: Charles Wright <charles.v.wright@gmail.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 680b5c171f btrfs-progs: tests: Add new images for inode mode repair functionality
Add new test image for imode repair in subvolume trees.

The new test cases including the following cases:

- Regular file with bad imode
  It still has the valid INODE_REF and parent dir has correct DIR_INDEX
  and DIR_ITEM.
  In this case, no matter if the file is empty or not, it should be
  repaired using the info from DIR_INDEX of parent dir.

- Non-empty regular file with bad imode, and without INODE_REF
  The file should be mostly an orphan, so no INODE_REF for imode lookup.
  But it has EXTENT_DATA which should be enough for imode repair.
  The repair also involves moving the orphan to lost+found dir.

- Non-empty dir with bad imode, and without INODE_REF
  Pretty much the same case, but now a directory.
  The repair also involves moving the orphan to lost+found dir.

Also rename the existing test case 039-bad-free-space-cache-inode-mode
to 039-bad-inode-mode, since now we can fix all bad imode.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo eae0c8e32f btrfs-progs: check/original: Fix inode mode in subvolume trees
To make original mode to repair imode error in subvolume trees, this
patch will do:

- Remove the show-stopper checks for root->objectid.
  Now repair_imode_original() will accept inodes in subvolume trees.

- Export detect_imode() for original mode
  Due to the call requirement, original mode must use an existing trans
  handler to do the repair, thus we need to re-implement most of the
  work done in repair_imode_common().

- Make repair_imode_original() to use detect_imode().

- Free the path after reset_imode()
  reset_imode() keeps the path, as lowmem mode uses path to locate its
  current check position.
  But for original mode, the unreleased path can cause later repair to
  report warning, so we need to manually release the path.

- Update rec->imode after imode reset
  So later repair depending on rec->imode can get correct value.

- Move the repair before repair_inode_nlinks()
  repair_inode_nlinks() needs correct imode to add DIR_INDEX/DIR_ITEM.
  So moving the repair before repair_inode_nlinks() makes the latter
  repair happier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 69eac9de0c btrfs-progs: check/lowmem: Repair bad imode early
For lowmem mode, if we hit a bad inode mode, normally it is reported
when we checking the DIR_INDEX/DIR_ITEM of the parent inode.

If we didn't repair at that time, the error will be recorded even if we
fixed it later.

So this patch will check for INODE_ITEM_MISMATCH error type, and if it's
really caused by invalid imode, repair it and clear the error.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 7f8383b7a6 btrfs-progs: check: Make repair_imode_common() handle inodes in subvolume trees
[[PROBLEM]]
Before this patch, repair_imode_common() can only handle two types of
inodes:

- Free space cache inodes
- ROOT DIR inodes

For inodes in subvolume trees, the core complexity is how to determine
the correct imode, thus it was not implemented.

However there are more reports of incorrect imode in subvolume trees, we
need to support such fix.

[[ENHANCEMENT]]
So this patch adds a new function, detect_imode(), to detect imode for
inodes in subvolume trees.  The policy here is, try our best to find a
valid imode to recovery.  If no convicing info can be found, fail out.

That function will determine imode by:

1) Search for INODE_REF of the inode
   If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
   As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
   the BTRFS_FT_* to imode, then call it a day.
   This should be the most accurate way.

2) Search for DIR_INDEX/DIR_ITEM belongs to this inode
   If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
   just after the INODE_ITEM.
   Thus this only works for non-empty directory.
   If any can be found, it's definitely a directory.

3) Search for EXTENT_DATA belongs to this inode
   If EXTENT_DATA can be found, it's either REG or LNK.
   Thus this only works for non-empty file or soft link.
   For this case, we default to REG, as user can inspect the file to
   determine if it's a file or just a path.

4) Use rdev to detect BLK/CHR
   If all above fails, but INODE_ITEM has non-zero rdev, then it's either
   a BLK or CHR file. Then we default to BLK.

5) Fail out if none of above methods succeeded
   No educated guess to make things worse.

[[SHORTCOMING]]
The above search is not perfect, there are cases where we can't repair:
E.g. orphan empty regular inode.  Since it's already orphan, it has no
INODE_REF. And it's regular empty file, it has no DIR_INDEX nor
EXTENT_DATA nor rdev. Thus we can't recover.  Although for this case, it
really doesn't matter as it's already orphan and will be deleted anyway.

Furthermore, due to the DIR_ITEM/DIR_INDEX/INODE_REF repair code which
can happen before imode repair, it's possible that DIR_ITEM search code
may not be executed.  If there is only DIR_ITEM remaining, repair code
will remove the DIR_ITEM completely and move the inode to lost+found,
leaving us no info to rebuild imode.  If there is DIR_INDEX missing,
repair code will re-insert the DIR_INDEX, then imode repair code will go
DIR_INDEX directly.

But overall, the repair code should handle the invalid imode caused by
older kernels without problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo ac9e07a780 btrfs-progs: check: find imode using info from INODE_REF item
Introduce a function, find_file_type(), to find filetype using info from
INODE_REF, including dir_id from key index/name from inode_ref_item.

This function will:

- Search DIR_INDEX first
  DIR_INDEX is easier since there is only one item in it.

- Validate the DIR_INDEX item
  If the DIR_INDEX is valid, use the filetype and call it a day.

- Search DIR_ITEM then
  It needs extra iteration since it's possible to have hash collision.

- Validate the DIR_ITEM
  If valid, call it a day. Or return -ENOENT;

This would be used as the primary method to determine the imode in later
imode repair code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 87207654f1 btrfs-progs: check: Export btrfs_type_to_imode
This function will be later used by common mode code, so export it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:07 +01:00
Qu Wenruo 1796d5099f btrfs-progs: image: Rework how we search chunk tree blocks
Before this patch, we were using a very inefficient way to search
chunks:

We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree block.

Each time we need to iterate all clusters just to find a chunk tree
block.  This is obviously inefficient, especially when chunk tree gets
larger.  So the original author leaves a comment on it:

  /* If you have to ask you aren't worthy */
  static int search_for_chunk_blocks()

This patch will change the behavior so that we will only iterate all
clusters once.

The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.

Then, when we start to iterate through all items, we can easily skip
unrelated items at different level:

- At cluster level
  If a cluster starts beyond last system chunk map, it must not contain
  any chunk tree blocks (as chunk tree blocks only lives inside system
  chunks)

- At item level
  If one item has no intersection with any system chunk map, then it
  must not contain any tree blocks.

By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:06 +01:00
Qu Wenruo 7abe4c3385 btrfs-progs: image: determine if a tree block is in the range of system chunks
Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.

Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 19:21:06 +01:00