########################################
Setting up the Sunfires, a.k.a. thumpers
########################################

Using Serial Console on a Sunfire
=================================

We have set up the ILOMs to respond on both serial and over ssh on the
management network.  They're named sunfireN-bmc (for similarity with the
others, rather inaccurately).  Once logged in, run "start /SP/console" to
connect to the serial console; to get out, feed a newline and then ESCape
and open paren.  (It's the strangest escape sequence I've ever seen, but
that's what Sun chose.)

Drive Enumeration Order
=======================

The labels on a sunfire are not in agreement with Linux's enumeration order,
though the pointers in the bottom left corner ("ATTENTION!") for boot disks are
accurate.  Linux enumerates the drives thusly::

             BACK OF MACHINE
    -----------------------------------
    ab af  t  x ar au aj an  l  d  d  h
    aa ae  s  u ag av ai am  k  o  c  g
     z ad  r  v ap at ah al  j  n  b  f
     y ac  q  u ao as ag ak  i  m  a  e
    -----------------------------------
             FRONT OF MACHINE

Please note the curious reversal of au and av.  I am not sure why.

Booting from USB Mass Storage Devices
=====================================

The sunfire BIOSes are bad at what they do.  They can in fact boot from USB
media, but they do not have a separate option in the boot selection screen
available by F8.  Instead, you must enter Setup (via F2) and use the Boot
menu's Hard Disks option to promote the USB device over the actual hard disks,
and ensure that the boot order is set to use hard disks first.  It's ugly,
but there it is.

Installing a Sunfire, the Debian Way
====================================

These notes pertain to Jessie; they are probably relatively time-invariant.

1. Do a Debian install

   1. using ``eth0`` as the primary interface

   2. Use ``sdy`` and ``sdac`` as the root devices
      Currently, we use a md mirror and LVM

   3. Just before rebooting, grab a shell and::

		chroot /target
		grub-install /dev/sdy
		grub-install /dev/sdac

2. Adjust networking

   .. note: You may need to tweak ``/etc/udev/rules.d/70-persist-net``. The
      controller that does belong to sunfire2 has a 10GbE card that enumerates
      too early, for example.

   In /etc/network/interfaces, ::

     allow-hotplug eth0
     iface eth0 inet dhcp
       pre-up ifconfig eth0 mtu 9000
     
   Then run::

     ifdown eth0 && ifup eth0

3. Now add some packages and make some changes::

     apt-get install sudo deborphan vim strace tcpdump
     adduser localadmin sudo
     usermod -L root

4. Install Ganglia reporting tool::

     apt-get install ganglia-monitor

   Modify /etc/ganglia/gmond.conf::

     cluster { 
       name = "Trinidad" 
       owner = "JHU ACM"
       latlong = "unspecified" 
       url = "unspecified" 
     } 
    
     udp_send_channel { 
       port = 8649 
       host = bigbrother
     } 

5.  Install OpenAFS and Kerberos tools::

      apt-get install openafs-client openafs-krb5 krb5-user

    * Our kerberos realm is "ACM.JHU.EDU", note the lack of "trinidad".

    * Our AFS cell is "acm.jhu.edu", note the lack of "trinidad".

6. While that's going, you may as well make the machine serial-friendly:

   * Replace the contents of /etc/default/grub with::

       # Don't forget to run update-grub
       GRUB_DEFAULT=0
       GRUB_HIDDEN_TIMEOUT_QUIET=true
       GRUB_TIMEOUT=2
       GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
       GRUB_CMDLINE_LINUX_DEFAULT=""
       GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,9600n8 rootwait"
       GRUB_TERMINAL="console serial"
       GRUB_SERIAL_COMMAND="serial --speed=9600 --unit=0 --word=8 --parity=no --stop=1"

     And then run update-grub as the comment says. :)

7. Fetch the kerberos keytab for this machine into /etc/krb5.keytab::

      chmod 400 /etc/krb5.keytab

8. Install Ceph.

   The Ceph debian maintainers seem to have given up on having their packages in
   the official repository; experimental is (as of Nov 2016) two major versions
   behind. Therefore, you're going to need to add ``deb
   https://download.ceph.com/debian-jewel/ jessie main`` to
   ``/etc/apt/sources.list``. You may also need to install
   ``apt-transport-https`` because, well, debian. In any case, once you've done
   that, you should just be able to ``apt-get update && apt-get install ceph``.
   Then copy the keyring and ceph.conf from an authoritative source.

9. Install ZFS (note: as of this writing, sunfires 1 and 2 are installed by
   route a, and sunfires 0 and 3 have been reinstalled via route b)
   
   a. To get the packages from ZoL's archive, run the following commands. Note
      that ZoL no longer supports this archive. ::

         wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_2%7Ewheezy_all.deb
         dpkg -i ./zfsonlinux_2~wheezy_all.deb
         apt-get update
         apt-get install spl-dkms # (It's OK not to do this first except that it wastes time below)
         apt-get install zfsutils zfs-dkms zfs-initramfs
         sed -ie "s/ZFS_MOUNT='no'/ZFS_MOUNT='yes'/" /etc/default/zfs
         sed -ie '/\$remote_fs/ s/$/ +zfs-mount/' /etc/insserv.conf

   b. The necessary packages are also available from jessie-backports. In order
      to effect a changeover, you'll need to remove the current zfs packages,
      then enable jessie-backports[#]_ in ``/etc/apt/sources.list`` (you will
      need ``main`` and ``contrib``), and finally::

         apt-get update
         apt-get install zfs-dkms zfs-initramfs # (Have patience...)

      After, you'll need to adjust the config as stated.
 
      .. [#] Do note that there is presently (Nov 13 2016) a bug in these
         packages that prevents dkms from building them properly. While the bug
         persists, you will need to create some symlinks::

            ln -s /var/lib/dkms/spl/0.6.5.8/build/spl_config.h /var/lib/dkms/spl/0.6.5.8/3.16.0-4-amd64/x86_64
            ln -s /var/lib/dkms/spl/0.6.5.8/build/module/Module.symvers /var/lib/dkms/spl/0.6.5.8/3.16.0-4-amd64/x86_64/module

         and then ``dkms install -m zfs -v 0.6.5.8`` (and have more patience).

10. Create some zpools::

       zpool create -o ashift=12 \
                    stor raidz /dev/sd{a,b,c,d,e,f,g,h,i,j,k} \
                         raidz /dev/sd{l,m,n,o,p,q,r,s,t,u,v} \
                         raidz /dev/sd{w,x,z,aa,ab,ad,ae,af,ag,ah,ai} \
                         raidz /dev/sda{j,k,l,m,n,o,p,q,r,s,t} \
                         spare /dev/sdav cache /dev/sdau
       zpool export stor && zpool import -d /dev/disk/by-id stor
       zfs set checksum=fletcher4 stor
       zfs set compression=lz4 stor
       zfs set atime=off stor
       zfs set xattr=sa stor
       zfs create stor/osd
       zfs set recordsize=1M stor/osd

    Land a crontab to keep scrubs going on a regular basis. Try to balance them,
    temporally, across the different sunfires so that we don't experience the
    whole cluster scrubbing at once::

      0 0 15 * * /sbin/zpool scrub stor

11. Prevent updatedb (the worker for ``locate``) from traversing the ceph
    backing stores: In /etc/updatedb.conf, add ``/var/lib/ceph`` to the excluded
    path list (``PRUNEPATHS``).

12. Create some OSDs::

       OSDIX=`ceph osd create`; echo ${OSDIX}
       zfs set mountpoint=/var/lib/ceph/osd/ceph-$OSDIX stor/osd
       ceph-osd -i ${OSDIX} --mkfs --mkkey
       ceph auth add osd.${OSDIX} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-${OSDIX}/keyring
       ceph osd crush add osd.${OSDIX} 10 host=`hostname`

13. Update /afs/acm.jhu.edu/group/admins.pub/ceph.conf and release the volume,
    then run ``/etc/init.d/ceph start``