help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using GDB to debug Guix-installed software (e.g. virsh)


From: Gábor Boskovits
Subject: Re: Using GDB to debug Guix-installed software (e.g. virsh)
Date: Wed, 18 Jul 2018 08:59:43 +0200

Gábor Boskovits <address@hidden> ezt írta (időpont: 2018. júl. 17., K, 17:18):
Chris Marusich <address@hidden> ezt írta (időpont: 2018. júl. 17., K, 8:36):
Hi,

I sometimes want to debug Guix-installed software using GDB.
Unfortunately, I've only been successful with trivial programs like GNU
Hello.  All of my attempts to debug actual problems have failed because
I can't seem to get GDB to behave.  It's a bit frustrating to bang my
head on stuff like this by myself, so I'm hoping somebody with more
experience can offer some advice.

Let's start with what might be a real bug.  I've noticed that Guix's
virsh command (from the libvirt package) emits a suspicious error when
you try to list devices:

--8<---------------cut here---------------start------------->8---
$ virsh nodedev-list
error: Failed to count node devices
error: this function is not supported by the connection driver: virNodeNumOfDevices
--8<---------------cut here---------------end--------------->8---

Apparently, because this function is "not supported", it is also not
possible to use virt-manager to assign PCI devices to a libvirt domain.
That's what I was trying to do when I stumbled across this issue.

Anyway, this virsh problem occurs even when I invoke the command as
root, so it probably isn't a permissions issue.  I searched the Internet
for errors like this, but I didn't find anything helpful.  Every guide
I've read so far seems to suggest that this invocation should just work.
But it doesn't.  Why?

If you want to try reproducing this issue on GuixSD, make sure you have
a libvirt-service-type service and a virtlog-service-type service in
your operating system configuration declaration:

--8<---------------cut here---------------start------------->8---
(service libvirt-service-type
         (libvirt-configuration
          (unix-sock-group "libvirt")))
(service virtlog-service-type)
--8<---------------cut here---------------end--------------->8---

For good measure, make sure your user is in the "libvirt" group, too:

--8<---------------cut here---------------start------------->8---
(user-account
  (name "marusich")
  (comment "Chris Marusich")
  (group "users")
  (supplementary-groups '("wheel"
                          "netdev"
                          "video"
                          "libvirt"))
  (home-directory "/home/marusich"))
--8<---------------cut here---------------end--------------->8---

Reconfigure and restart if necessary.  Then run virsh:

--8<---------------cut here---------------start------------->8---
$ virsh nodedev-list
error: Failed to count node devices
error: this function is not supported by the connection driver: virNodeNumOfDevices
--8<---------------cut here---------------end--------------->8---

At this point, there are two possibilities: either everything is fine,
and this error is expected, or something is wrong.  If somebody knows
that this is expected, I'd love to hear about it.  However, let's
operate on the assumption that something is wrong.  How might we debug
it?

One way to debug it is to use GDB to investigate precisely why this
failure occurred.  There are probably other ways to debug the issue, but
I want to focus on using GDB because this email is more about the
problems I've had with GDB than the virsh issue.

To begin, I create a directory where I'll do my debugging:

--8<---------------cut here---------------start------------->8---
$ mkdir ~/debug
$ cd ~/debug
--8<---------------cut here---------------end--------------->8---

Let's get the virsh source so we can get GDB to tell us where we are in
the code as we debug it:

--8<---------------cut here---------------start------------->8---
$ tar -xf $(guix build -S libvirt)
--8<---------------cut here---------------end--------------->8---

For me, this unpacks the source to:

    /home/marusich/debug/libvirt-4.3.0

Note that the function virNodeNumOfDevices is defined in

    /home/marusich/debug/libvirt-4.3.0/libvirt-4.3.0/src/libvirt-nodedev.c

and called on line 254 of

    /home/marusich/debug/libvirt-4.3.0/tools/virsh-nodedev.c

in the virshNodeDeviceListCollect function.

I'd like to debug the code for virNodeNumOfDevices using GDB to see
what's going on.  To do this, I'm going to need the debug symbols, but
the libvirt package doesn't have a debug output.  Let's define a version
of it that does.  I put the following package definition into the file
/home/marusich/debug/my-libvirt.scm:

--8<---------------cut here---------------start------------->8---
(define-module (my-libvirt)
  #:use-module (guix packages)
  #:use-module (gnu packages virtualization))

(define-public my-libvirt
  (package
   (inherit libvirt)
   (name "my-libvirt")
   (outputs '("out" "debug"))))
--8<---------------cut here---------------end--------------->8---

Let's build it and install both outputs into a new profile:

--8<---------------cut here---------------start------------->8---
$ GUIX_PACKAGE_PATH=/home/marusich/debug guix package -p /home/marusich/debug/profile -i my-libvirt my-libvirt:debug
--8<---------------cut here---------------end--------------->8---

Let's make sure the new virsh still reports the same error:

--8<---------------cut here---------------start------------->8---
$ /home/marusich/debug/profile/bin/virsh nodedev-list
error: Failed to count node devices
error: this function is not supported by the connection driver: virNodeNumOfDevices
--8<---------------cut here---------------end--------------->8---

Great!  Let's debug it with GDB.  First, make sure your ~/.gdbinit
doesn't exist, otherwise your results might be different from mine.
Then let's start GDB:

--8<---------------cut here---------------start------------->8---
$ gdb
--8<---------------cut here---------------end--------------->8---

Tell it where the debug files live:

--8<---------------cut here---------------start------------->8---
(gdb) set debug-file-directory /home/marusich/debug/profile/lib/debug
--8<---------------cut here---------------end--------------->8---

Tell it where the source lives:

--8<---------------cut here---------------start------------->8---
(gdb) directory /home/marusich/debug/libvirt-4.3.0/src
Source directories searched: /home/marusich/debug/libvirt-4.3.0/src:$cdir:$cwd
(gdb) directory /home/marusich/debug/libvirt-4.3.0/tools
Source directories searched: /home/marusich/debug/libvirt-4.3.0/tools:/home/marusich/debug/libvirt-4.3.0/src:$cdir:$cwd
--8<---------------cut here---------------end--------------->8---

Tell it to use the file and read the symbols:
--8<---------------cut here---------------start------------->8---
(gdb) file /home/marusich/debug/profile/bin/virsh
Reading symbols from /home/marusich/debug/profile/bin/virsh...Reading symbols from /home/marusich/debug/profile/lib/debug//gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/bin/virsh.debug...done.
done.
--8<---------------cut here---------------end--------------->8---

Set the program's arguments:

--8<---------------cut here---------------start------------->8---
(gdb) set args nodedev-list
--8<---------------cut here---------------end--------------->8---

Set a breakpoint on the function virNodeNumOfDevices:

--8<---------------cut here---------------start------------->8---
(gdb) break virNodeNumOfDevices
Breakpoint 1 at 0x28610
--8<---------------cut here---------------end--------------->8---

Uh oh.  This is our first sign of a problem: The breakpoint is
associated with some sort of memory address, rather than a location in a
file.  Anyway, let's run the program:

--8<---------------cut here---------------start------------->8---
(gdb) run
Starting program: /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/bin/virsh nodedev-list
warning: the debug information found in "/home/marusich/debug/profile/lib/debug//gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0.4003.0.debug" does not match "/gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0" (CRC mismatch).

warning: the debug information found in "/home/marusich/debug/profile/lib/debug//gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0.4003.0.debug" does not match "/gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0" (CRC mismatch).

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/libthread_db.so.1".
[New Thread 0x7ffff2219700 (LWP 16097)]

Thread 1 "virsh" hit Breakpoint 1, 0x00007ffff768cdc0 in virNodeNumOfDevices ()
   from /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0
--8<---------------cut here---------------end--------------->8---

We hit the breakpoint - great!  However, it seems GDB did not load the
debug information for libvirt because of a CRC mismatch.  Indeed, the
backtrace seems to suggest that GDB knows about some of the source
files, but not all of them:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007ffff768cdc0 in virNodeNumOfDevicesw ()
   from /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0
#1  0x00005555555a816e in virshNodeDeviceListCollect (flags=0,
    ncapnames=<optimized out>, capnames=0x0, ctl=0x7fffffffb460)
    at virsh-nodedev.c:254
#2  cmdNodeListDevices (ctl=0x7fffffffb460, cmd=<optimized out>)
    at virsh-nodedev.c:472
#3  0x00005555555b8911 in vshCommandRun (ctl=0x7fffffffb460,
    cmd=0x55555583d850) at vsh.c:1318
#4  0x000055555557ea65 in main (argc=2, argv=0x7fffffffb7f8) at virsh.c:932
--8<---------------cut here---------------end--------------->8---

I wanted to see what was happening in the virNodeNumOfDevices function,
which came from libvirt.so.0.  Unfortunately, that's the library with
the CRC mismatch.  This means I'm totally blocked from investigating any
further using GDB.  I could set step-mode to "on" to step through the
machine code without debug symbols, but as they say: "That is an
exercise left to the reader."

I have seen this CRC mismatch problem twice now when trying to debug
issues with Guix-installed software.  The other time was while
attempting to debug a segfault in vinagre:

    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30591

What is wrong?  Am I using GDB wrong?  Is there a bug in the part of the
gnu-build-system that creates the debug files which might be causing the
CRC mismatch?  I'm aware of the fact that the gnu-build-system takes
advantage of the .gnu-debuglink stuff ((gdb) Separate Debug Files), but
to be honest I haven't done a lot of GDB debugging, so part of me
wonders if this is just a case of "user error".  If so, please help me
understand what I'm doing wrong.


Actually this is about the same thing I've found out. I'm also suffering CRC mismatches. The workaround I used was defining a package where I didn't strip the debugging symbols in the first place. I don't know what this is about either, but it is annoying.

Hello Chris,

I was thinking about this yesterday, is it possible that this is related to grafting? I
 
 
Thank you,

--
Chris

reply via email to

[Prev in Thread] Current Thread [Next in Thread]