Thu, 29 Apr 2010

GELI + ZFS. Easy.

Someone recently asked me how I would run ZFS on top of GELI devices. It's not that hard. Here's how I did it using two vnode backed md(4) devices since I didn't have any spare drives or slices laying around.

wxs@ack wxs % truncate -s 10G a b  
wxs@ack wxs % sudo mdconfig -a -t vnode -f a
Password:
md0
wxs@ack wxs % sudo mdconfig -a -t vnode -f b
md1
wxs@ack wxs % dd if=/dev/random of=key bs=64 count=1
1+0 records in
1+0 records out
64 bytes transferred in 0.000145 secs (441506 bytes/sec)
wxs@ack wxs % sudo geli init -s 4096 -K key -P /dev/md0 

Metadata backup can be found in /var/backups/md0.eli and
can be restored with the following command:

	# geli restore /var/backups/md0.eli /dev/md0

wxs@ack wxs % sudo geli init -s 4096 -K key -P /dev/md1

Metadata backup can be found in /var/backups/md1.eli and
can be restored with the following command:

	# geli restore /var/backups/md1.eli /dev/md1

wxs@ack wxs % sudo geli attach -k key -p /dev/md0 
wxs@ack wxs % sudo geli attach -k key -p /dev/md1
wxs@ack wxs % sudo zpool create foo mirror /dev/md0.eli /dev/md1.eli
wxs@ack wxs % zpool list foo
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
foo   9.94G   364K  9.94G     0%  ONLINE  -
wxs@ack wxs % zpool status foo
  pool: foo
 state: ONLINE
 scrub: none requested
config:

	NAME         STATE     READ WRITE CKSUM
	foo          ONLINE       0     0     0
	  mirror     ONLINE       0     0     0
	    md0.eli  ONLINE       0     0     0
	    md1.eli  ONLINE       0     0     0

errors: No known data errors
wxs@ack wxs %

posted at: 13:56 | tags: , , | path: /entries/geek | permanent link to this entry

Mon, 08 Jun 2009

ZFS + NFS = Crash :(

I started to experience a crash recently that was triggered when building something in my tinderbox setup. This particular tinderbox is running on ZFS and uses NFS mounts on localhost. The panic and backtrace look like this:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x6dc
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80572e7f
stack pointer           = 0x28:0xffffff803e722530
frame pointer           = 0x28:0xffffff803e722550
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1030 (nfsd: service)
[thread pid 1030 tid 100140 ]
Stopped at      prison_priv_check+0xff: movl    0x6dc(%rsi),%eax
db> bt
Tracing pid 1030 tid 100140 td 0xffffff00029ea000
prison_priv_check() at prison_priv_check+0xff
priv_check_cred() at priv_check_cred+0x4c
secpolicy_vnode_access() at secpolicy_vnode_access+0x28
zfs_zaccess() at zfs_zaccess+0x1d5
zfs_freebsd_access() at zfs_freebsd_access+0xd0
VOP_ACCESS_APV() at VOP_ACCESS_APV+0x44
nfsrv_access() at nfsrv_access+0xf3
nfsrv3_access() at nfsrv3_access+0x386
nfssvc_program() at nfssvc_program+0x1fb
svc_run_internal() at svc_run_internal+0x6d2
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a0e1c, rsp = 0x7fffffffe6d8, rbp = 0x5 ---
db> 

Here's the relevant pieces of prison_priv_check():

/*
 * Check with permission for a specific privilege is granted within jail.  We
 * have a specific list of accepted privileges; the rest are denied.
 */
int
prison_priv_check(struct ucred *cred, int priv)
{

        if (!jailed(cred))
                return (0);

        switch (priv) {

...

               /*
                 * Depending on the global setting, allow privilege of
                 * mounting/unmounting file systems.
                 */
        case PRIV_VFS_MOUNT:
        case PRIV_VFS_UNMOUNT:
        case PRIV_VFS_MOUNT_NONUSER:
        case PRIV_VFS_MOUNT_OWNER:
                if (cred->cr_prison->pr_allow & PR_ALLOW_MOUNT)
                        return (0);
                else
                        return (EPERM);

...

Loading up the core in kgdb for analysis it becomes very clear what is going on.

(kgdb) frame 12
#12 0xffffffff80572e7f in prison_priv_check (cred=0xffffff00168f5900, priv=334) at /usr/home/wxs/freebsd/src/head/sys/kern/kern_jail.c:3315
3315            switch (priv) {
(kgdb) p/x *cred
$7 = {cr_ref = 0x1, cr_uid = 0x0, cr_ruid = 0x0, cr_svuid = 0x0, 
  cr_ngroups = 0x1, cr_groups = {0x0 }, cr_rgid = 0x0, 
  cr_svgid = 0x0, cr_uidinfo = 0x0, cr_ruidinfo = 0x0, cr_prison = 0x0, 
  cr_vimage = 0x0, cr_flags = 0x0, cr_pspare = {0x0, 0x0}, cr_label = 0x0, 
  cr_audit = {ai_auid = 0x0, ai_mask = {am_success = 0x0, am_failure = 0x0}, 
    ai_termid = {at_port = 0x0, at_type = 0x0, at_addr = {0x0, 0x0, 0x0, 
        0x0}}, ai_asid = 0x0, ai_flags = 0x0}}
(kgdb) 

It's clear that cred->cr_prison is bad. But this isn't the real meat of the problem. The first check in prison_priv_check() is to see if we are jailed, and that looks something like this:

/*
 * Return 1 if the passed credential is in a jail, otherwise 0.
 */
int
jailed(struct ucred *cred)
{

        return (cred->cr_prison != &prison0);
}

Up until fairly recently this function used to contain:

        return (cred->cr_prison != NULL);

So, because cred->cr_prison is NULL in our case the check in prison_check_cred() is evaluating to false, when it should be evaluating to true. So now we know why we are crashing (NULL ptr dereference) we still don't know what is causing cr_cred to be NULL.

Credentials like this are derived from a very small number of places. The reason these are wrong is that the RPC code in the kernel doesn't know which credentials to assign when it handles the request. Luckily a workaround has been put in place while a more proper solution is being worked on.

posted at: 12:48 | tags: , , | path: /entries/geek | permanent link to this entry

Mon, 24 Nov 2008

ZFS Update on ack

wxs@ack wxs % mount
data on / (zfs, local, noatime)
devfs on /dev (devfs, local)
/dev/ad4s1a on /mnt/ad4s1a (ufs, local)
data/dump on /dump (zfs, local, noatime)
data/dump/cvs on /dump/cvs (zfs, local, noatime)
data/dump/incoming on /dump/incoming (zfs, local, noatime)
data/dump/mp3 on /dump/mp3 (zfs, NFS exported, local, noatime)
data/jails on /jails (zfs, local, noatime)
data/ncvs on /ncvs (zfs, local, noatime)
data/tinderbox on /tinderbox (zfs, NFS exported, local, noatime)
data/tmp on /tmp (zfs, local, noatime)
data/usr on /usr (zfs, local, noatime)
data/usr/home on /usr/home (zfs, local, noatime)
data/var on /var (zfs, local, noatime)
wxs@ack wxs % sudo zfs get version    
NAME                PROPERTY  VALUE               SOURCE
data                version   3                   -
data/dump           version   3                   -
data/dump/cvs       version   3                   -
data/dump/incoming  version   3                   -
data/dump/mp3       version   3                   -
data/jails          version   3                   -
data/ncvs           version   3                   -
data/tinderbox      version   3                   -
data/tmp            version   3                   -
data/usr            version   3                   -
data/usr/home       version   3                   -
data/var            version   3                   -
wxs@ack wxs % 

I upgraded ack (my personal development machine and fileserver) recently. Along with the upgrade I figured I would try out the new ZFS version and upgraded all the filesystems in this pool. There was only one problem which was resolved by using the old slice I had laying around which I still boot off of and fixing my mistake.

So with this update and ongoing work I should have a much more stable system (it was very stable before the update so this isn't saying much). I'll also be able to test out some of the delegated administration aspects of this update which sounds very nice. Lastly I hope to eventually test out the boot from zfs support which is just recently hit the tree and is still highly experimental.

posted at: 23:33 | tags: , | path: /entries/freebsd | permanent link to this entry

Mon, 17 Nov 2008

New ZFS Features in FreeBSD

With this commit we have an updated ZFS in -current now. I intend to play with some of these issues as time permits. I'm especially interested in the delegated administration piece of it.

No, I'm not dead again. I've just not wanted to write much about what I've been doing since it's mostly been fairly easy work lately. I've already committed more things to FreeBSD than any other month so far, so it's not like I've been slacking. I've got HoH work starting up this year, and I'll probably be putting a couple posts up over there. If you're one of my few readers who comes by this time of year looking for HoH information you can look at the new site run by Kym. Anyways, if I'm in your RSS reader don't delete me just yet. Maybe I'll pick up a decent project soon and write about it.

posted at: 17:15 | tags: , | path: /entries/freebsd | permanent link to this entry

Sun, 28 Sep 2008

/ on ZFS.

Prior to moving to NC (oh yeah, I may not have mentioned it here but I moved to the RTP area in North Carolina a few weeks ago) I took a quick stab at doing a migration from a regular setup to everything (except /boot) on ZFS. The first attempt failed miserably but I tried again tonight and got it working.

All the documentation I've found online details how to set it up without a currently populated system. This at least gave me a major hint as to what the final product should look like, even if it did not provide me with a simple step-by-step guide to get there. So after some hacking I had everything set up how I thought it should be and kicked off an rsync to move everything to the zpool. Upon reboot I went back into single user mode and double checked everything before booting into multiuser. The machine is now up and running, though I'm sure I will have some bugs to iron out. I have noticed a repeatable crash during reboot though, so I plugged in a serial cable and will attempt to debug later in the week. The machine is sitting in a closet on the other side of my office so I had to break out the obnoxiously long serial cable and run it behind my desk.

posted at: 20:41 | tags: , | path: /entries/freebsd | permanent link to this entry

Tue, 17 Jul 2007

Changing a ZFS Mount Point

Further evidence that ZFS rocks can be found in the fact that it takes a lot of the smaller details out of things that just become mundane after a while. As an example...

I have the following zpool setup on this box...

wxs@ack wxs > sudo zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
data                    696G   99.6G    596G    14%  ONLINE     -
wxs@ack wxs > sudo zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0

errors: No known data errors
wxs@ack wxs > 

With the following ZFS setup on it...

wxs@ack wxs > sudo zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data  66.3G   390G  66.3G  /data
wxs@ack wxs > 

The ZFS setup is mounted properly automagically for me by ZFS...

wxs@ack wxs > mount                             
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad4s1e on /tmp (ufs, local, soft-updates)
/dev/ad4s1f on /usr (ufs, NFS exported, local, soft-updates)
/dev/ad4s1d on /var (ufs, local, soft-updates)
data on /data (zfs, NFS exported, local, noatime)
wxs@ack wxs > 

I want to move the mount point from /data to /d (it's shorter and I would rather not setup a nasty symlink)

wxs@ack wxs > sudo zfs set mountpoint=/d data   
wxs@ack wxs > 

And just like that, it's automagically changed the mountpoint, made the new mountpoint and mounted it for me. It even removed the old mountpoint too.

wxs@ack wxs > mount
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad4s1e on /tmp (ufs, local, soft-updates)
/dev/ad4s1f on /usr (ufs, NFS exported, local, soft-updates)
/dev/ad4s1d on /var (ufs, local, soft-updates)
data on /d (zfs, NFS exported, local, noatime)
wxs@ack wxs > 

It's a nice touch.

posted at: 22:27 | tags: | path: /entries/freebsd | permanent link to this entry

Sat, 26 May 2007

ZFS, finally.

wxs@ack wxs > mount
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad4s1e on /tmp (ufs, local, soft-updates)
/dev/ad4s1f on /usr (ufs, local, soft-updates)
/dev/ad4s1d on /var (ufs, local, soft-updates)
data on /data (zfs, local, noatime)
devfs on /usr/jails/t0n/dev (devfs, local)
devfs on /usr/jails/test/dev (devfs, local)
devfs on /usr/jails/bro/dev (devfs, local)
devfs on /usr/jails/bro2/dev (devfs, local)
devfs on /usr/jails/ntop/dev (devfs, local)
wxs@ack wxs > 

Did you catch that? Maybe I should repeat it, this time I'll bold the awesome.

wxs@ack wxs > mount
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad4s1e on /tmp (ufs, local, soft-updates)
/dev/ad4s1f on /usr (ufs, local, soft-updates)
/dev/ad4s1d on /var (ufs, local, soft-updates)
data on /data (zfs, local, noatime)
devfs on /usr/jails/t0n/dev (devfs, local)
devfs on /usr/jails/test/dev (devfs, local)
devfs on /usr/jails/bro/dev (devfs, local)
devfs on /usr/jails/bro2/dev (devfs, local)
devfs on /usr/jails/ntop/dev (devfs, local)
wxs@ack wxs > 

I ordered some more drives to put in ack (a machine at my apartment) to use with ZFS. While I was at it I ordered a new Linksys WRT54GL since I was sick of running an insecure 802.11b network. I'm now running 802.11g with WPA2 (PSK). I've added the drives to ack, setup a zfs pool and am in the process of migrating data to it now. I'll be using this as a backup system for syn. Now all I need is some mounting brackets and I can put the two 80GB IDE drives I ripped out (lack of 3.5" drive bays for them) back in. I'll probably set them up as a zpool with a JBOD configuration.

I'm in the middle of copying data over so the USED numbers are low...

wxs@ack wxs > sudo zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
data                    696G   13.6G    682G     1%  ONLINE     -
wxs@ack wxs >

wxs@ack wxs > sudo zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0

errors: No known data errors
wxs@ack wxs > 

posted at: 14:15 | tags: , | path: /entries/geek | permanent link to this entry

Fri, 06 Apr 2007

ZFS hits -CURRENT

ZFS hits -CURRENT and I'm getting excited. Hopefully it can be fixed up for AMD64 since that is where most of my disks currently live.

And while I'm at it... This commit looks very interesting. I'll try and spend some time this weekend and figure out exactly how it works and where it's going. Things registering as services for jails sounds fun.

posted at: 10:56 | tags: , | path: /entries/freebsd | permanent link to this entry