[Gluster-devel] Stateless Nodes - HowTo - was Re: glusterfs-3.3.0qa34 re

gluster-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gluster-devel] Stateless Nodes - HowTo - was Re: glusterfs-3.3.0qa34 re

From:	Ian Latter
Subject:	[Gluster-devel] Stateless Nodes - HowTo - was Re: glusterfs-3.3.0qa34 released
Date:	Fri, 17 May 2013 00:37:25 +1000
Hello,


  Well I can't believe that it's been more than a year since I started looking 
into a stateless cluster implementation for GlusterFS .. time flies eh.

  First - what do I mean by "stateless"?  I mean that;
    - the user configuration of the operating environment is maintained outside 
of the OS
    - the operating system is destroyed on reboot or power off, and all OS and 
application configuration is irrecoverably lost
    - on each boot we want to get back to the user preferred/configured 
operating environment through the most normal methods possible (preferably by 
running the same commands and JIT building of config files that were used to 
configure the system the first time; should be used every time).

  In this way, you could well argue that the OE state is maintained in a type 
of provisioning or orchestration tool, outside of the OS and application 
instances (or in my case in the Saturn configuration file that is the only 
persistent data maintained between running OE instances).

  Per the thread below, to get a stateless node (no clustering involved) we 
would remove the xattr values from each shared brick, on boot;
    removexattr(mount_point, "trusted.glusterfs.volume-id")
    removexattr(mount_point, "trusted.gfid")

  And then we would populate glusterd/glusterd.info with an externally stored 
UUID (to make it consistent across boots).  These three actions would allow the 
CLI "gluster volume create" commands to run unimpeded - thanks to Amar for that 
detail.

  Note1: that we've only been experimenting with DHT/Distribute, so I don't 
know if other Gluster xlator modules have pedantic needs in addition to the 
above.
  Note2: that my glusterd directory is in /etc (/etc/glusterd/glusterd.info), 
where-as the current location in the popular distro's is, I believe, /var/lib 
(/var/lib/glusterd/glusterd.info), so I will refer to the relative path in this 
message.


  But we have finally scaled out beyond the limits of our largest chassis (down 
to 1TB free) and need to cluster to add on more capacity via the next chassis. 
Over the past three nights I've had a chance to experiment with GlusterFS 3.3.0 
(I will be looking at 3.4.0 shortly) and create a "distribute" volume between 
two clustered nodes.  To get a stateless outcome we then need to be able to 
boot one node from scratch and have it re-join the cluster and volume from only 
the "gluster" CLI command/s.

  For what its worth, I couldn't find a way to do this.  The peer probing model 
doesn't seem to allow an old node to rejoin the cluster.

  So many thanks to Mike of FunWithLinux for this post and steering me in the 
right direction;
    http://funwithlinux.net/2013/02/glusterfs-tips-and-tricks-centos/

  The trick seems to be (in addition to the non-cluster configs, above) to 
manage the cluster membership outside of GlusterFS.  On boot, we automatically 
populate the relevant peer file
(glusterd/peers/{uuid}) with the UUID, state=3, and hostname/IP address; one 
file for each other node in the cluster (excluding the local node).  I.e.

    # cat /etc/glusterd/peers/ab2d5444-5a01-427a-a322-c16592676d29
      uuid=ab2d5444-5a01-427a-a322-c16592676d29
      state=3
      hostname1=192.168.179.102

  Note that if you're using IP addresses as your node handle (as opposed to 
host names) then you must retain the same IP address across boots for this to 
work, lest you make modifications to the existing/running cluster nodes that 
will require glusterd to be restarted.

  When you do this in the startup process you can skip the "gluster peer probe" 
and simply call the "gluster volume create" as we did in a non clustered 
environment, on every node as it boots (on every boot, including the first).  
The nodes that are late to the party will be told that the configuration 
already exists, and the clustered volume *should* come up.

  I am still experimenting, but I say "should" because you can sometimes see a 
delay in the re-establishment of the clustered volume, and you can sometimes 
see the clustered volume fail to re-establish. When it fails to re-establish 
the solution seems to be a "gluster volume start" for that volume, on any node. 
 FWIW I believe I'm seeing this locally because Saturn tries to nicely stop all 
Gluster volumes on reboot, which is affecting the cluster (of course) - lol - a 
little more integration work to do.
  

  The external state needed then looks like this on the first node (101);

    set gluster server        uuid 6b481ebb-859a-4c2b-8b5f-8f0bba7c3b9a
    set gluster peer0         uuid ab2d5444-5a01-427a-a322-c16592676d29
    set gluster peer0         ipv4_address 192.168.179.102
    set gluster volume0       name myvolume
    set gluster volume0       is_enabled 1
    set gluster volume0       uuid 00000000-0000-0000-0000-000000000000
    set gluster volume0       interface eth0
    set gluster volume0       type distribute
    set gluster volume0       brick0 /dev/hda
    set gluster volume0       brick1 192.168.179.102:/glusterfs/exports/hda

  And the external state needed looks like this on the second node (102);

    set gluster server        uuid ab2d5444-5a01-427a-a322-c16592676d29
    set gluster peer0         uuid 6b481ebb-859a-4c2b-8b5f-8f0bba7c3b9a
    set gluster peer0         ipv4_address 192.168.179.101
    set gluster volume0       name myvolume
    set gluster volume0       is_enabled 1
    set gluster volume0       uuid 00000000-0000-0000-0000-000000000000
    set gluster volume0       interface eth0
    set gluster volume0       type distribute
    set gluster volume0       brick0 192.168.179.101:/glusterfs/exports/hda
    set gluster volume0       brick1 /dev/hda

  Note that I assumed that there was a per volume UUID (currently all zeros) 
that I would need to re-instate but haven't seen yet (presumably it's one value 
that's currently being removed from the mount point xattr's on each boot).


  I hope that this information helps others who are trying to dynamically 
provision and re-provision virtual/infrastructure environments.  I note that 
this information covers a topic that has not been written up on the Gluster 
site;

     HowTo - GlusterDocumentation
     http://www.gluster.org/community/documentation/index.php/HowTo
     [...]
     Articles that need to be written
     Troubleshooting
       - UUID's and cloning Gluster instances
       - Verifying cluster integrity
     [...]


  Please feel free to use this content to help contribute to that FAQ/HowTo 
document.


Cheers,


----- Original Message -----
>From: "Ian Latter" <address@hidden>
>To: "Amar Tumballi" <address@hidden>
>Subject:  Re: [Gluster-devel] glusterfs-3.3.0qa34 released
>Date: Wed, 18 Apr 2012 18:55:46 +1000
>
> 
> ----- Original Message -----
> >From: "Amar Tumballi" <address@hidden>
> >To: "Ian Latter" <address@hidden>
> >Subject:  Re: [Gluster-devel] glusterfs-3.3.0qa34 released
> >Date: Wed, 18 Apr 2012 13:42:45 +0530
> >
> > On 04/18/2012 12:26 PM, Ian Latter wrote:
> > > Hello,
> > >
> > >
> > >    I've written a work around for this issue (in 3.3.0qa35)
> > > by adding a new configuration option to glusterd
> > > (ignore-strict-checks) but there are additional checks
> > > within the posix brick/xlator.  I can see that volume starts
> > > but the bricks inside it fail shortly there-after, and
> that of
> > > the 5 disks in my volume three of them have one
> > > volume_id and two them have another - so this isn't going
> > > to be resolved without some human intervention.
> > >
> > >    However, while going through the posix brick/xlator I
> > > found the "volume-id" parameter.  I've tracked it back
> > > to the volinfo structure in the glusterd xlator.
> > >
> > >    So before I try to code up a posix inheritance for my
> > > glusterd work around (ignoring additional checks so
> > > that a new volume_id is created on-the-fly / as-needed),
> > > does anyone know of a CLI method for passing the
> > > volume-id into glusterd (either via "volume create" or
> > > "volume set")?  I don't see one from the code ...
> > > glusterd_handle_create_volume does a uuid_generate
> > > and its not a feature of glusterd_volopt_map ...
> > >
> > >    Is a user defined UUID init method planned for the CLI
> > > before 3.3.0 is released?  Is there a reason that this
> > > shouldn't be permitted from the CLI "volume create" ?
> > >
> > >
> > We don't want to bring in this option to CLI. That is
> because we don't 
> > think it is right to confuse USER with more
> options/values. 'volume-id' 
> > is a internal thing for the user, and we don't want him to
> know about in 
> > normal use cases.
> > 
> > In case of 'power-users' like you, If you know what you
> are doing, the 
> > better solution is to do 'setxattr -x trusted.volume-id
> $brick' before 
> > starting the brick, so posix translator anyway doesn't get
> bothered.
> > 
> > Regards,
> > Amar
> > 
> 
> 
> Hello Amar,
> 
>   I wouldn't go so far as to say that I know what I'm
> doing, but I'll take the compliment ;-)
> 
>   Thanks for the advice.  I'm going to assume that I'll 
> be revisiting this issue when we can get back into 
> clustering (replicating distributed volumes).  I.e. I'm
> assuming that on this path we'll end up driving out 
> issues like split brain;
>  
> https://github.com/jdarcy/glusterfs/commit/8a45a0e480f7e8c6ea1195f77ce3810d4817dc37
> 
> 
> Cheers,
> 
> 
> 
> --
> Ian Latter
> Late night coder ..
> http://midnightcode.org/
> 
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 


--
Ian Latter
Late night coder ..
http://midnightcode.org/
[Prev in Thread]
Current Thread
[Next in Thread]
[Gluster-devel] Stateless Nodes - HowTo - was Re: glusterfs-3.3.0qa34 released, Ian Latter <=
Prev by Date: Re: [Gluster-devel] Failing 'prove' tests for GlusterFS
Next by Date: [Gluster-devel] glupy not found during install from source
Previous by thread: Re: [Gluster-devel] [Gluster-users] Testing VM Storage with Fedora 19
Next by thread: [Gluster-devel] glupy not found during install from source
Index(es):
- Date
- Thread