For code reuse, btrfs_insert_dir_item() now calls
inserts_with_overflow() even if the dir_item existed.
Add a parameter @ignore_existed to btrfs_add_link().
If @ignore_existed is not zero, btrfs_add_link() continues to do link.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
check_dir_item() now checks relative dir_item/dir_index.
Introduce print_dir_item_err() to print error msg while
checking dir_item/dir_index.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce print_inode_ref() to print error msg while checking inode ref.
Add args @name_ret and @namelen_ret to check_inode_ref().
Name is essential if the inode item is to be put into lost+found
while doing nlinks repair.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The changes in the patch is for further repair:
1.Introduce find_dir_index() to get the index by traversing items.
2.We should distinguish dir_index error and dir_item error.
However, there are only DIR_ITEM_MISSING and DIR_ITEM_MISMATCH.
Introduce marcos DIR_INDEX_MISSING and DIR_INDEX_MISMATCH
to represent index missing/mismatch.
3.Because find_dir_item() prints message right now if it detects any
error.
Remove message output now and next patches will introduce functions
to print error message.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Modify check_fs_first_inode to check the inode ref in first inode.
Which root dir inode differs from other inode is inode_ref points
"..".
So we just handle this special case and treat it as normal
inode in continued check.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For further lowmem repair, change @index type u64 to u64* of
function find_inode_ref().
So caller can get the index of ref.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce repair_inode_orphan_item_lowmem() to add an orphan
item if the inode refs and nlink are both zero.
repair_inode_orphan_item_lowmem() is just a wrapper function
that calls btrfs_add_orphan_item().
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After traversal of whole directory, we should get the actual isize.
Like original mode, function repair_dir_isize_lowmem() sets isize of the
directory inode item to actual size.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After checking one entire inode item, we should get the actual
nbytes of the inode item.
Like original mode, repair_inode_nbytes_lowmem() sets nbytes in
struct btrfs_inode_item to the actual nbytes.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Turn on the option --repair with --mode==lowmem in btrfs check.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[ use warning() and adjust wording ]
Signed-off-by: David Sterba <dsterba@suse.com>
The check opens the given device in exclusive by default. In the forced
mode we want to access a device in use, so we have to drop the
exclusivity bit.
This works for block devices but not for files, that could be mounted
via a loop device. In that respect test check/007 is broken and will be
fixed.
Signed-off-by: David Sterba <dsterba@suse.com>
We now have two data structures that can be used to iterate the same data
set, and there may be quite a few of them in memory. Eliminating the
list_head member will reduce memory consumption while iterating over
the extent backrefs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the pathlogical case, like xfstests generic/297 that creates a
large file consisting of one, repeating reflinked extent, fsck can
take hours. The root cause is that calling find_data_backref while
iterating the extent records is an O(n^2) algorithm. For my
example test run, n was 2*2^20 and fsck was at 8 hours and counting.
This patch supplements the list with an rbtree and drops the runtime
of that testcase to about 20 seconds.
A previous version of this patch introduced a regression that would
have corrupted file systems during repair. It was traced to the
compare algorithm honoring ->bytes regardless of whether the
reference had been found and a failure to reinsert nodes after
the target reference was found.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sometimes it's needed to do a check on a mounted filesystem. This should
work fine on a quiescent filesystem or a read-only mount. Changes on the
block device done by kernel might confuse the userspace checker and it
might crash when it reads some stale data.
Repair without mount checks is not supported right now.
Signed-off-by: David Sterba <dsterba@suse.cz>
The pointers to critical roots must be valid before we start using them,
eg. as the space clearing code.
Signed-off-by: David Sterba <dsterba@suse.com>
A code added in 2009 (95d3f20b51) for a very short-lived change in
the format is no concern to us nowadays.
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs_update_block_group fails when the block group is not found in
cache, we can exit btrfs_free_block_group, not much to rollback. The
caller will also exit in turn.
Signed-off-by: David Sterba <dsterba@suse.com>
Tree blocks are always nodesize. As readahead is only an optimization,
exact size is not required and is only advisory.
Signed-off-by: David Sterba <dsterba@suse.com>
I found some btrfs commands options are not working because of
inappropriate getopt_long() setting.
This fixes "btrfs check -Q/-E"
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although lowmem mode can detect name and hash mismatch in dir_item,
it's done by checking inode_ref to expose such problem.
This patch will enhance dir_item check, by also comparing name and
hash when checking dir_items.
Reported-by: Filippe LeMarchand <gasinvein@gmail.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In original mode, we don't check if the name in dir_item matches the
hash in key.offset.
In the following case, original mode will report nothing wrong while
lowmem mode will detect the name and hash mismatch.
------
item 72 key (79177 DIR_ITEM 54846528) itemoff 12380 itemsize 88
location key (4222342 INODE_ITEM 0) type FILE
transid 170929 data_len 0 name_len 14
name: deprecated.sxt
location key (13590433 INODE_ITEM 0) type FILE
transid 796448 data_len 0 name_len 14
name: deprecated.txt
------
In above case, hash of "deprecated.txt" matches with 54846528,
while hash of "deprecated.sxt" should be 2008317993.
Reported-by: Filippe LeMarchand <gasinvein@gmail.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When checking chunk or dev extent, lowmem mode uses chunk length as dev
extent length, and if they mismatch, report missing chunk or dev extent
like:
------
ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent
------
However, only for Single/DUP/RAID1 profiles chunk length is the same as
dev extent length.
For other profiles, this will cause tons of false alert.
Fix it by using correct stripe length when checking chunk and dev extent
items.
This fixes the mkfs test failure when using lowmem mode check.
Reported-by: Kai Krakow <hurikhan77@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, btrfs check lowmem mode manually checks found chunk
item, even we already have the generic chunk validation checker,
btrfs_check_chunk_valid().
This patch will use btrfs_check_chunk_valid() to replace open-coded
chunk validation checker in check_chunk_item().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only reasom read_tree_block() needs a btrfs_root parameter is to get
its node/sector size.
And long ago, I have already introduced a compactible interface,
read_tree_block_fs_info() to pass btrfs_fs_info instead of btrfs_root.
Since we have cleaned up all root->sector/node/stripesize users, we
should be OK to refactor read_tree_block() function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
As Qu mentioned in this thread
(https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression
can cause regular extent to co-exist with an inlined extent. This
coexistence makes things confusing. Since it is currently allowed and
can appear in a filesystem, fix btrfsck to prevent a bunch of error
reports to appear that will make user feel uneasy.
When checking a file extent, record the extent_end of the regular extent
to check if there is a gap between the regular extents. Normally there
is only one inlined extent, so the extent_end of inlined extent is
useless. However, if a regular extent can co-exist with an inlined
extent, the extent_end of the inlined extent also needs to be recorded.
Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since the incompat feature NO_HOLES still allows us to have an explicit
hole file extent, current check is too strict and will cause false
alerts like:
root 5 EXTENT_DATA[257, 0] shouldn't be hole
Fix it by removing the strict file hole extent check.
Link: https://www.spinics.net/lists/linux-btrfs/msg66374.html
Reported-by: Henk Slager <eye1tm@gmail.com>
Tested-by: Henk Slager <eye1tm@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When reading out name from inode_ref, it's possible that corrupted
name_len can lead to read beyond boundary of item or even extent buffer.
This happens when checking fuzzed image /tmp/bko-161811.raw, for both
lowmem mode and original mode.
Below is the example from lowmem mode.
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_INDEX[256 216172782113783808] namelen 255 filename bar filetype 0
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_ITEM[256 1306590535] namelen 255 filename bar filetype 0
WARNING: root 5 INODE[256] mode 0 shouldn't have DIR_INDEX[256 1167283096]
WARNING: root 5 DIR_ITEM[256 1167283096] name too long
==13013== Invalid read of size 1
==13013== at 0x4C31A38: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13013== by 0x431518: read_extent_buffer (extent_io.c:863)
==13013== by 0x4752AB: check_dir_item (cmds-check.c:4627)
==13013== by 0x475E5C: check_inode_item (cmds-check.c:4911)
==13013== by 0x476200: check_fs_first_inode (cmds-check.c:5011)
==13013== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13013== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13013== by 0x488B5B: cmd_check (cmds-check.c:13033)
==13013== by 0x40A8C5: main (btrfs.c:246)
==13013== Address 0x5c95b80 is 0 bytes after a block of size 4,224 alloc'd
==13013== at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13013== by 0x4307E0: __alloc_extent_buffer (extent_io.c:538)
==13013== by 0x430C37: alloc_extent_buffer (extent_io.c:642)
==13013== by 0x413DFE: btrfs_find_create_tree_block (disk-io.c:193)
==13013== by 0x414370: read_tree_block_fs_info (disk-io.c:340)
==13013== by 0x40B5D5: read_tree_block (disk-io.h:125)
==13013== by 0x40CFD2: read_node_slot (ctree.c:652)
==13013== by 0x40E5EB: btrfs_search_slot (ctree.c:1172)
==13013== by 0x4761A8: check_fs_first_inode (cmds-check.c:5001)
==13013== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13013== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13013== by 0x488B5B: cmd_check (cmds-check.c:13033)
Fix it by double checking dir_item, name_len against item boundary
before trying to read out name from extent buffer, for both original
mode and lowmem mode.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When reading out name from inode_ref, it's possible that corrupted
name_len can lead to read beyond boundary of item or even extent buffer.
This happens when checking fuzzed image /tmp/bko-161811.raw, for both
lowmem mode and original mode.
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_INDEX[256 504403158265495680] namelen 0 filename filetype 0
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_ITEM[256 4294967294] namelen 0 filename filetype 0
WARNING: root 5 INODE_REF[256 256] name too long
==13022== Invalid read of size 8
==13022== at 0x4C319BE: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13022== by 0x431518: read_extent_buffer (extent_io.c:863)
==13022== by 0x474730: check_inode_ref (cmds-check.c:4307)
==13022== by 0x475D65: check_inode_item (cmds-check.c:4890)
==13022== by 0x476200: check_fs_first_inode (cmds-check.c:5011)
==13022== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13022== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13022== by 0x488B5B: cmd_check (cmds-check.c:13033)
==13022== by 0x40A8C5: main (btrfs.c:246)
==13022== Address 0x5c96780 is 0 bytes after a block of size 4,224 alloc'd
==13022== at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13022== by 0x4307E0: __alloc_extent_buffer (extent_io.c:538)
==13022== by 0x430C37: alloc_extent_buffer (extent_io.c:642)
==13022== by 0x413DFE: btrfs_find_create_tree_block (disk-io.c:193)
==13022== by 0x414370: read_tree_block_fs_info (disk-io.c:340)
==13022== by 0x40B5D5: read_tree_block (disk-io.h:125)
==13022== by 0x40CFD2: read_node_slot (ctree.c:652)
==13022== by 0x40E5EB: btrfs_search_slot (ctree.c:1172)
==13022== by 0x4761A8: check_fs_first_inode (cmds-check.c:5001)
==13022== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13022== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13022== by 0x488B5B: cmd_check (cmds-check.c:13033)
=
Fix it by double checking inode_ref, name_len against item boundary
before trying to read out name from extent buffer, for both original
mode and lowmem mode.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fsck/004-no-dir-index makes valgrinds complaining about Invalid read.
==31890== Invalid read of size 1
==31890== at 0x453D09: repair_inode_backrefs (cmds-check.c:2690)
==31890== by 0x453D09: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D09: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890== at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890== by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Block was alloc'd at
==31890== at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890== by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890== by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890== by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890== by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890== by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890==
==31890== Invalid read of size 8
==31890== at 0x452D66: repair_inode_backrefs (cmds-check.c:2731)
==31890== by 0x452D66: check_inode_recs (cmds-check.c:3330)
==31890== by 0x452D66: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890== at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890== by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Block was alloc'd at
==31890== at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890== by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890== by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890== by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890== by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890== by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890==
While iterating over backrefs in repair_inode_backrefs, there are
several situations to repair one backref according
backref->found_dir_item and backref->found_dir_index. Two of these
branches may free the backref, but next checks will still access the
freed memory.
Because these branches are independent, let repair_inode_backrefs skip
to handle next backref after free can fix it.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we memset tmpl, max_size==0. This does not seem consistent with nr = 1.
In check_extent_refs, we will call:
set_extent_dirty(root->fs_info->excluded_extents,
rec->start,
rec->start + rec->max_size - 1);
This ends up with BUG_ON(end < start) in insert_state.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When this happens, we will trip a BUG_ON(end < start) in insert_state
because in check_extent_refs, we use this max_size expecting it's not zero:
set_extent_dirty(root->fs_info->excluded_extents,
rec->start,
rec->start + rec->max_size - 1);
See https://bugzilla.redhat.com/show_bug.cgi?id=1435567
for an example where this scenario occurs.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
See https://bugzilla.redhat.com/show_bug.cgi?id=1435567 for an example
where the message occurs.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
[ un-indent strings overfowing 80 cols ]
Signed-off-by: David Sterba <dsterba@suse.com>