[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] important change to syncop infra
From: |
Xavier Hernandez |
Subject: |
Re: [Gluster-devel] important change to syncop infra |
Date: |
Wed, 11 Dec 2013 13:26:53 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 |
I missed the const attribute. Thanks for the explanation. This should
eliminate the THIS problem, as you proved, however other cases are still
possible that won't work correctly.
Basically, any TLS pointer passed to user code instead of being
dereferenced will have problems. In this case, uuid and lkowner have
this problem. Consider the following code:
@@ -1129,6 +1129,7 @@ gf_defrag_migrate_data (xlator_t *this,
gf_defrag_info_t *defrag, loc_t *loc,
struct timeval start = {0,};
int32_t err = 0;
int loglevel = GF_LOG_TRACE;
+ char * str_uuid = NULL;
gf_log (this->name, GF_LOG_INFO, "migrate data called on %s",
loc->path);
@@ -1275,8 +1276,10 @@ gf_defrag_migrate_data (xlator_t *this,
gf_defrag_info_t *defrag, loc_t *loc,
* and also that guarantees that file has to
be mostly
* migrated */
+ str_uuid = uuid_utoa(entry_loc.gfid);
ret = syncop_getxattr (this, &entry_loc, &dict,
GF_XATTR_LINKINFO_KEY);
+ gf_log(this->name, GF_LOG_DEBUG, "uuid in tls:
%s", str_uuid);
if (ret < 0) {
if (errno != ENODATA) {
loglevel = GF_LOG_ERROR;
It seems obvious, but this is the generated assembly code anyway:
1279
2e8f: 48 8b 7c 24 60 mov 0x60(%rsp),%rdi
2e94: e8 00 00 00 00 callq 2e99
<gf_defrag_migrate_data+0x719> <<<=== get pointer to TLS object
2e95: R_X86_64_PLT32 uuid_utoa-0x4
1280
2e99: 4c 8b 74 24 20 mov 0x20(%rsp),%r14
2e9e: 48 8b 54 24 70 mov 0x70(%rsp),%rdx
2ea3: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx #
2eaa <gf_defrag_migrate_data+0x72a>
2ea6: R_X86_64_PC32 .LC82-0x4
2eaa: 48 8b 74 24 30 mov 0x30(%rsp),%rsi
1279
2eaf: 49 89 c7 mov
%rax,%r15 <<<=== store the TLS pointer
1280
2eb2: 4c 89 f7 mov %r14,%rdi
2eb5: e8 00 00 00 00 callq 2eba
<gf_defrag_migrate_data+0x73a> <<<=== possible thread switch
2eb6: R_X86_64_PLT32 syncop_getxattr-0x4
1282
2eba: 4c 89 3c 24 mov
%r15,(%rsp) <<<=== use the TLS pointer
2ebe: 49 8b 3e mov (%r14),%rdi
2ec1: 4c 8d 0d 00 00 00 00 lea 0x0(%rip),%r9 # 2ec8
<gf_defrag_migrate_data+0x748>
2ec4: R_X86_64_PC32 .LC83-0x4
2ec8: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx #
2ecf <gf_defrag_migrate_data+0x74f>
2ecb: R_X86_64_PC32 .rodata+0xbc
2ecf: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi #
2ed6 <gf_defrag_migrate_data+0x756>
2ed2: R_X86_64_PC32 .LC1-0x4
1280
2ed6: 89 c5 mov %eax,%ebp
1282
2ed8: 41 b8 08 00 00 00 mov $0x8,%r8d
2ede: 31 c0 xor %eax,%eax
2ee0: b9 02 05 00 00 mov $0x502,%ecx
2ee5: e8 00 00 00 00 callq 2eea
<gf_defrag_migrate_data+0x76a>
2ee6: R_X86_64_PLT32 _gf_log-0x4
The str_uuid pointer used can point to the uuid of another thread that
could have been modified. I don't know very well the internals of
syncops and synctask, but I think it's even possible that another task
running on the same thread while the current one was sleeping could
modify the contents of str_uuid.
This case seems very obvious and useless but maybe other combinations
are harder to detect.
I think that currently they are only used by uuid_utoa() and
lkowner_utoa() to write log messages, which seems safe, but we will have
to be careful when using syncops to avoid possible future problems like
this.
Xavi
El 11/12/13 12:02, Pranith Kumar Karampuri ha escrit:
it's really a very interesting and hard to find bug, but I'm wondering
how we can prevent this to happen in the general case. There might be
other operations based on pointers to thread local storage that will
suffer this problem. Probably 'errno' is one of the most dangerous, but
TLS is also used to resolve 'THIS' in a very similar way to errno, and
there are temporary uuid and lkowner values also stored into TLS. More
things could appear in the future. These are also potential cases to
have problems with syncops.
An access to THIS before and after a syncop might trigger this bug, right ?
for errno it happens because it has the attribute __attribute__((const)).
# grep errno_location /usr/include/bits/errno.h
extern int *__errno_location (void) __THROW __attribute__ ((__const__));
# define errno (*__errno_location ())
(The following article explains it best IMO: http://lwn.net/Articles/285332)
But it is not there for lkowner, uuid-buf, THIS. Just to confirm, I added the
following code
------------------------------------------------------
+ xlator_t *tmp_this = NULL;
gf_log (this->name, GF_LOG_INFO, "migrate data called on %s",
loc->path);
@@ -1258,8 +1259,10 @@ gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t
*defrag, loc_t *loc,
* and also that guarantees that file has to be mostly
* migrated */
+ tmp_this = THIS;
ret = syncop_getxattr (this, &entry_loc, &dict,
GF_XATTR_LINKINFO_KEY);
+ tmp_this = THIS;
if (ret < 0) {
if (errno != ENODATA) {
loglevel = GF_LOG_ERROR;
@@ -1267,7 +1270,7 @@ gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t
*defrag, loc_t *loc,
} else {
loglevel = GF_LOG_TRACE;
}
- gf_log (this->name, loglevel, "%s: failed to "
+ gf_log (tmp_this->name, loglevel, "%s: failed to
"
-----------------------------------------------------
assembly code it generated:
=====================================================
1262 tmp_this = THIS;
0x00007faf17120dbc <+1788>: xor %eax,%eax
0x00007faf17120dbe <+1790>: callq 0x7faf171191d0 <address@hidden>
<<----------------first call to THIS
1263 ret = syncop_getxattr (this, &entry_loc, &dict,
0x00007faf17120dc3 <+1795>: mov 0x60(%rsp),%rdx
0x00007faf17120dc8 <+1800>: mov 0x28(%rsp),%rsi
0x00007faf17120dcd <+1805>: lea 0x2b550(%rip),%rcx #
0x7faf1714c324
0x00007faf17120dd4 <+1812>: mov 0x18(%rsp),%rdi
0x00007faf17120dd9 <+1817>: callq 0x7faf17118a30 <address@hidden>
0x00007faf17120dde <+1822>: mov %eax,%ebp
1264 GF_XATTR_LINKINFO_KEY);
---Type <return> to continue, or q <return> to quit---
1265 tmp_this = THIS;
0x00007faf17120de0 <+1824>: xor %eax,%eax
0x00007faf17120de2 <+1826>: callq 0x7faf171191d0 <address@hidden>
<<----------------second call to THIS
0x00007faf17120de9 <+1833>: mov (%rax),%r15
1266 if (ret < 0) {
0x00007faf17120de7 <+1831>: test %ebp,%ebp
0x00007faf17120dec <+1836>: js 0x7faf1712100e
<gf_defrag_migrate_data+2382>
1267 if (errno != ENODATA) {
0x00007faf1712100e <+2382>: mov 0x50(%rsp),%rax
<<<---------------------------------------------check that errno does not callq
__errno_location
0x00007faf17121013 <+2387>: mov $0x9,%ebp
0x00007faf17121018 <+2392>: mov (%rax),%edi
0x00007faf1712101a <+2394>: cmp $0x3d,%edi
================================================
I think it's very difficult to track all these cases and handle them
correctly. Another solution could be to tell the compiler to forget
previous pointer values when a syncop is called, but I don't see any way
to do so. Since it's storing pointers to values, any barrier or volatile
seems useless.
We can do this by putting __attribute__((noinline)) for such functions. But
since errno is not in our control we can't do that.
Xavi
El 11/12/13 09:51, Pranith Kumar Karampuri ha escrit:
hi,
We found a day-1 bug when syncop_xxx() infra is used inside a synctask
with compilation optimization (CFLAGS -O2). This bug has been dormant
for at least 2 years.
There are around ~400(rebalance, replace-brick, bd, self-heal-daemon,
quota, fuse lock/fd migration) places where syncop is used in the code
base all of which are potential candidates which can take the hit.
I sent first round of patch at http://review.gluster.com/6475 to catch
regressions upstream.
These are the files that are affected by the changes I introduced to fix
this:
api/src/glfs-fops.c | 36
++++++++++++++++++++++++++++++++++
api/src/glfs-handleops.c | 15 ++++++++++++++
api/src/glfs-internal.h | 7 +++++++
api/src/glfs-resolve.c | 10 ++++++++++
libglusterfs/src/syncop.c | 117
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
xlators/cluster/afr/src/afr-self-heald.c | 45
+++++++++++++++++++++++++++++-------------
xlators/cluster/afr/src/pump.c | 12 ++++++++++--
xlators/cluster/dht/src/dht-helper.c | 24
+++++++++++++++--------
xlators/cluster/dht/src/dht-rebalance.c | 168
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
xlators/cluster/dht/src/dht-selfheal.c | 6 ++++--
xlators/features/locks/src/posix.c | 3 ++-
xlators/features/qemu-block/src/bdrv-xlator.c | 15 ++++----------
xlators/features/qemu-block/src/qb-coroutines.c | 14 ++++++++++----
xlators/mount/fuse/src/fuse-bridge.c | 16 ++++++++++-----
Please review your respective component for these changes in gerrit.
Thanks
Pranith.
Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
Lets look at interesting parts of the function:
int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t
*loc,
dict_t *migrate_data)
{
.....
code section - [ Loop ]
while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
&entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno
by a thread)
/* Need to keep track of ENOENT errno, that means, there
is no
need to send more readdirp() */
readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
ret = syncop_getxattr (this, &entry_loc, &dict,
GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2] (checking for failures of syncop_getxattr().
This may not always be executed in same thread which executed [SYNCOP-1])
if (ret < 0) {
if (errno != ENODATA) {
loglevel = GF_LOG_ERROR;
defrag->total_failures += 1;
.....
}
the function above could be executed by thread(t1) till [SYNCOP-1] and code
from [ERRNO-2] can be executed by a different thread(t2) because of the
way syncop-infra schedules the tasks.
when the code is compiled with -O2 optimization this is the assembly code
that is generated:
[ERRNO-1]
1165 readdir_operrno = errno; <<---- errno gets
expanded as *(__errno_location())
0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0
<address@hidden>
0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------
Address returned by __errno_location() is stored in a special location
in stack for later use.
0x00007fd149d48b77 <+519>: mov (%rax),%eax
0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp)
....
[ERRNO-2]
1281 if (errno != ENODATA) {
0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<-----
Because it already stored the address returned by __errno_location(),
it just dereferences the address to get the errno value. BUT THIS CODE
NEED NOT BE EXECUTED BY SAME THREAD!!!
0x00007fd149d492b3 <+2371>: mov $0x9,%ebp
0x00007fd149d492b8 <+2376>: mov (%rax),%edi
0x00007fd149d492ba <+2378>: cmp $0x3d,%edi
The problem is that __errno_location() value of t1 and t2 are different. So
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though
t2 is executing [ERRNO-2] code section.
When code is compiled without any optimization for [ERRNO-2]:
1281 if (errno != ENODATA) {
0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300
<address@hidden><<--- As it is calling __errno_location() again
it gets the location from t2 so it works as intended.
0x00007fd58e7a3274 <+2242>: mov (%rax),%eax
0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax
0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1
<gf_defrag_migrate_data+2287>
Fix:
We decided to make syncop_xxx() return (-errno) value as the return value
in case of errors and all the functions which make syncop_xxx() will need
to use (-ret) to figure out the reason for failure in case of syncop_xxx()
failures.
Pranith
_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] important change to syncop infra, Anand Subramanian, 2013/12/12