Survey of Xen and ARM Servers

| No Comments
Like many other people, at prgmr.com we have been watching the development of ARM servers with interest.  Unfortunately, everything generally available has lacked ECC support or been overly geared towards NAS and has also not been price competitive compared to running multi-core Xeon servers.

In 2014 this is likely to change. The two main competitors are Applied Micro's x-gene and AMD's A1100 ARM opteron, codenamed Seattle. The first gen x-gene will have 8@2.5GHz cores followed by 16@3GHz cores later with a maximum of 256GiB ECC ram. The a1100 will come with 4 or 8 cores @2GHz and will have a maximum of 128GiB ECC ram.  Both claim good compute/watt ratios, though this needs to be measured.  We also care about price competitiveness - while the MSRP for the processors is reasonable the price may be driven up by availability.


However, without going into too many more hardware details, both appear suitable for running xen.  I decided to research how far along xen support is for each of these processors.


One major difference between the two platforms is the software used to boot them.  Applied Micro has opted for u-boot.  U-boot is the standard bootloader for ARM platforms right now, but is not exactly common in the server world.  AMD has opted to use UEFI, which is the standard replacement for BIOS these days.  It has generally not been used for ARM platforms.  


Another difference is run-time capabilities after boot.  U-boot disappears after Linux has been loaded, while UEFI provides runtime services.  Historically, ARM platforms have used a lot of non-discoverable hardware, meaning that the Linux kernels had to be hand-tailored to each platform it was going to run on.  More recently, ARM Linux has moved to using device tree definitions of the hardware, which are supposed to be defined independent of what OS is going to use them.  Like the initial ram disk, the device tree is typically supplied as an additional parameter to the bootloader.  


But even with device tree, drivers are still highly customized between different SoCs.  UEFI should greatly reduce the number of drivers if it is implemented properly, but UEFI support is not upstreamed yet and to me it's not clear if the proposed patches are going to make it in or not.


While official support for arm64 (aarch64) is present in version 4.4.0, xen support for these servers is still under heavy development.  For example, while live migration has been demo'ed for ARM, it is not slated for the 4.4 release according to the Xen roadmap.


Perhaps because of bootloader choices, x-gene appears to be closer to a shipping product.  There are already instructions on how to boot xen on an x-gene based server and I did not pursue this further.


I decided to see how hard it might be to boot xen on the a1100 by looking at booting xen under UEFI.  What I did was very roundabout and this should be used as a starting place rather than step-by-step instructions. I only got as far as booting a dom0 kernel and did not take the time to compile the xen toolchain or boot a domU - both of these have have been done though according to this xen.org blog post .


The a1100 development kit will be shipping with Fedora.  Fedora has documented their efforts porting to aarch64 and written a quick-start guide here that describes how to boot an emulated arm64 board with UEFI as the bootloader:


https://fedoraproject.org/wiki/Architectures/ARM/AArch64/QuickStart


I used this as a base for trying to boot xen under UEFI, though for reasons I'll go into later it might be better to try openSUSE (use aarch64-rootfs) or a debian variant. I did not try either.


UEFI can be used to either boot xen directly or to load grub2 which then loads xen.  The fedora image is using grub2, so I figured that this would probably be easiest to try.  


I mostly followed these instructions:


https://wiki.linaro.org/LEG/Engineering/Grub2/Xen_booting_on_FVP_Base_AEMv8A


Thank you Wei Fu!


The toolchain I actually used is 4.8-2013.0701, but the latest toolchain can be found at  http://releases.linaro.org/latest/components/toolchain/binaries/ - use "aarch64-linux-gnu".


For xen, I used the stable-4.4 branch and compiled using


make dist-xen XEN_TARGET_ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- CONFIG_EARLY_PRINT=fastmodel


The binary which goes onto the target can be found under dist/install/boot/ in the xen build directory.


To put xen on the target, it's easiest to attach the image to a loopback target (losetup /dev/loop0 img) and then use kpartx to break out the partitions (kpartx -av /dev/loop0).  For the fedora image, /dev/mapper/loop0p1 is the EFI partition, /dev/mapper/loop0p2 is the boot partition, and /dev/mapper/loop0p4 is the root file system.  xen will go in the root directory of loop0p2 along with the kernels currently on there.  


We also need to add a device tree.  I found that the device tree definition which was already there wouldn't boot xen because the timer definition was incomplete, so I downloaded the most up-to-date one at


https://raw.githubusercontent.com/torvalds/linux/master/arch/arm64/boot/dts/foundation-v8.dts


With xen/grub as they are right now I was not able to boot cpu's 1-4, so in this file, remove cpu@1-3.  To compile it, install device-tree-compiler and run "dtc -O dtb -o foundation-v8.dtb foundation-v8.dts".  Copy the resulting file into the boot partition.


Then Linux needs to be compiled. I again used the instructions from the Linaro wiki article.  Copy the resulting Image to the boot partition.


Incidentally this version is not the same as the Fedora kernel. In addition to the kernel version, the Fedora kernel has EFI support.  The EFI patches in the Fedora kernel appear to still be outside of mainline but the one branch I tried, uefi-for-3.16, didn't boot when I tried it.  I did not try any other branches or to apply the patches to a booting kernel.  


The version of grub2 which is installed on the fedora image does not include multiboot support, so it needs to be replaced.  Multiboot support is not upstreamed, so (mostly) follow the instructions at the same wiki page.  The one exception is I found the default.cfg there did not work; this is what I used:


set root=(hd0,gpt1)

set prefix=($root)/EFI/fedora/


Mount the EFI partition, loop0p1, instead of the boot partition. I copied the resulting grub_v8.efi over EFI/fedora/grubaa64.efi.  Then I changed grub.cfg to be the following:


set pager=1

set timeout=5

menuentry 'ARM64 xen' {

   search --no-floppy --fs-uuid --set=root  4aa7fe0f-1bdc-4f41-8193-9562d2e5363e

   multiboot /xen no-bootscrub console=dtuart conswitch=x dtuart=serial0 dom0_mem=512M

dom0_max_vcpus=1 debug=y

       module /Image root=/dev/vda4 ro  console=hvc0

   devicetree /foundation-v8.dtb

}


After this, undo the loopback by device by unmounting all the partitions, running "kpartx -dv /dev/loop0" and then "losetup -d /dev/loop0".  Use efi-aarch64.sh to boot the Foundation model and you should get as far as systemd crashing and burning.  I didn't try the same kernel without xen so it could just be related to the kernel, but trying something other than Fedora would also be a good idea.


Assuming AMD honors the GPL and the a1100 ships with a device tree that defines as much as the foundation-v8 model does, it appears that a minimal boot of a dom0 is likely doable with a couple of days of effort.

ext3, pvgrub and "block error -1 for op 0"

| No Comments
For testing purposes I was creating a large number of instances with very small root file systems. The partitions were formatted using ext3. About one out of twenty would fail to decompress bzImage with the message "block error -1 for op 0".  The files were all read correctly when the partition was mounted in the dom0.

When I increased the disk size from 32MiB to 64MiB and the partition size to 63MiB, the problem went away.  I also made sure the ratio between inodes and blocks was an integer number though this didn't seem to help.

Direct connect to HVM serial ports

| No Comments
Me and Luke recently got this question:

"So with both pv and hvm domains you can pull out the virtual tty via xenstore-read /local/domain/[domain]/console/tty . However, with pv domains you can use the tty directly with something like screen /dev/tty/blah but with a hvm domain that ends up giving you a blank screen. What's even weirder is that when you look at the source of xm console it doesn't seem to differentiate between the two, and yet it works fine on both.

Any ideas?"

We are using xl on xen 4.3, not xm, on our test machine but in theory these should be doing mostly the same thing.

Using the command "strace -f xl console ubuntu-1 &> con" I found this in the output of strace:

[pid 30642] access("/dev/pts/2", R_OK|W_OK) = 0
[pid 30642] open("/dev/pts/2", O_RDWR|O_NOCTTY) = 8

Searching back for /dev/pts/2 I found the following:

[pid 30642] write(5, "/local/domain/356/serial/0/tty\0", 31) = 31

Therefore it looks like the key is

/local/domain/[domain]/serial/[serial-num]/tty

migrate yourself, part 2.

| No Comments
Finally had a successful migration, end-to-end, without administrator intervention.

Several things developed over the course of this testing.

  1. we need config management more than ever.
  2. the migration script doesn't bother with /etc/sudoers.  this is a problem.
  3. full paths, please.
  4. corollary: sudo doesn't source .profile et. al, so your path will stay the same.
  5. we still haven't caught all the error conditions that should cause us to bail out and run the cleanup function.
But still.  Success.  Now, if the rest of you on mares would just get moving, that would be just tops.  (And if you're on mares but didn't get the move email, drop a message on support@prgmr.com.)

(Also, a hearty "thank you" to user aahmedi, who tested the migration script over several frustrating back-and-forth intervals via email.  (Didn't work.  Try again now.  Still doesn't work.  Okay, try again.  Nope.  How about now?  And so forth.)

migrate yourself.

| No Comments
We're trying to move everyone off of mares because we are evil and oppressive hosts.  But I hate scheduling moves, especially since I've got a day job and don't like working nights.

So.  We have decided to throw technology at the problem.  I wrote up a move script based on the previously-existing (but still not rolled out) backup and restore scripts, and enabled it for some lucky users.

(Unfortunately, while this solves the scheduling problem, it introduces a new problem: users either wait for someone else to try it, or they hit amazing, horrible bugs.  Meanwhile I tear my hair out.)

But hey.  It's something.  We really do need to clear people off mares.  So if you got that email, you might want to give the move a shot.  (And your data is totally safe. . . in fact, extra-safe, because the move generates a backup.  Two backups if successful.)
Wow, been a while.  No explanation, no apologies.

Anyway.  Spent about a day working on trying to get the Sentry Power Tower XL working with powerman.  I'm not declaring victory here, because I still haven't got powerman working.  But I'm much farther along, and I got sufficiently annoyed at the lack of documentation to rant at Luke, whose remark was "you can fix this."

So.  Documentation on controlling the Sentry/Servertech Power Tower XL via SNMP, as written by someone who's never used SNMP before.

First, set up the box so that you can connect to it.  I used a Cisco-pinout RJ45 serial cable, 9600 8n1.  Power it up, sign in with the default username and password (admn/admn).  If necessary, reset the firmware by holding in the reset button next to the LCD on the front.

(Some notes: "front" is the side with all the plugs, regardless of how you actually choose to mount it.  Also, the reset button is an unlabeled pinhole.)

The online documentation on resetting was a little inaccurate, at least for the firmware version I originally had.  Nothing will happen when you initially push the button.  Hold down the button until the display changes to 3 horizonal lines, then release.  The display will change to one pair of horizontal lines in the middle of the display.

Now that we can log in, we need to upgrade the firmware.  The mechanism is pretty clever.  The box includes an ftp client that can download the firmware from a remote host and install it.  (It also includes an ftp server, apparently, but that's unrelated.)  We ran into trouble because we're on a masq'd network that doesn't allow active FTP.  If you don't have this problem, you can tell the box to download the firmware from ftp://ftp.servertech.com/pub/firmware/Sentry3/Version_5/v5.3/PTXL-PT2x-PT4x-48xx/ .

We downloaded the firmware locally and set up a quick FTP server.  Whatever.

Either way, set the ftp settings as appropriate.  Our settings are as shown:

                                                                               
Sentry: show ftp                                                               
                                                                               
   FTP Client Configuration                                                    
                                                                               
      Host:       172.16.10.206                                                
      Username:   prgmr                                                       
      Password:   ******                                                       
      Directory:  Downloads/                                                   
      Filename:   mrrpm-v53s.bin                                               
                                                                               
   FTP Automatic Update Configuration                                          
                                                                               
      Automatic Updates:  Disabled                                             
      Scheduled Day:      Everyday                                             
      Scheduled Hour:     12 AM                                                
                                                                               
   Command successful              

Then issue a 'restart ftpload'.  The box will download and flash its firmware.

Cool. Now you have a fully armed and operational Sentry Power Tower XL.  (Incidentally, much of this probably also applies to the Switched CDU.)

SNMP is going to be a little tougher, especially if you've never used it.  I had to install the appropriate software (on Ubuntu, apt-get install snmp worked fine.).  Then I had to download MIBs, which translate the numeric keys returned by snmp into human-readable values.  To do this:

 * download mib file: wget ftp://ftp.servertech.com/pub/SNMP/sentry3/Sentry3.mib
 * copy mib file to snmp's search path: cp Sentry3.mib /usr/share/snmp/mibs
 * specify that you want to use this file: I did this by adding a "-m +Sentry3-MIB" to my snmp commands.  There are better ways.  Note that you use the name listed in the file, not the filename itself.

Now you should be able to get useful information.  Try:

$ snmpwalk -v2c -c public -m +Sentry3-MIB

This should give you a long list of keys, including stuff like power usage.  Congratulations.

You should also be able to control the outlets.  This was where I got extremely annoyed and started ranting, because I had to resort to guessing magic numbers to figure this out.  There's probably some documentation somewhere, but I couldn't find it.  To spare you the trouble:

The OID you want is: Sentry3-MIB::outletControlAction.1.1.<outlet number>

The magic numbers are: 1 (on), 2 (off), 3 (reboot).

In our case, the approriate invocation to reboot outlet 4 was:

$ snmpset -v2c -c private -m +Sentry3-MIB 172.16.10.242 Sentry3-MIB::outletControlAction.1.1.4 i 3

That's the translated OID, "i" for integer, and the value we want to set, "3".

Now to actually figure out how to put powerman in front of this.

New update on prgmr.com servers

| No Comments
blog.prgmr.com and wiki.prgmr.com may now be accessed over IPv6.  We also now have SSHFP records in dns for all of the dom0's.  If you notice any missing, please let support know.


birds down hard

| No Comments
replacing a drive;  older system without hot swap. 

packet loss at svtix.

| No Comments
we've had packet loss at svtix all day.  my provider tells me:

" Looks like there is packet loss reaching the IP through HE.net. We are investig\
ating the issue."

edit:  looks like it's a problem is our fault, or at least is in our equipment.   we're working on it now.

fair and balanced.

| 2 Comments
Reminded Luke that someone who didn't know us might suspect from his blog that we run an extremely unstable service!  That would be a travesty.  I suggested that he make a daily post: "nothing to report" or "doin' fine" or "10 days without a fatal accident" or something like that.

Recent Comments

  • luke: also note, it will whine forever.... lsc@beholder:~$ snmpwalk -v2c -c read more
  • luke: note, if you have a 'b' slave unit, it's a read more
  • chris t: It wouldn't help with the basic problem of poor publicity. read more
  • matt40k: Why don't you use one of the site uptime services read more
  • Matt Howard: It looks like you were just doing this for a read more
  • pileofrogs: Hi, I'm playing around with doing exactly this. Do you read more
  • luke: The external-device-migrate brokenness, I believe, is rhel specific. Redhat supports read more
  • luke: The biggest problem with the xen 'Chinese Wall' is that read more
  • luke: Tip from the Computer Janitor: When doing development within a read more
  • chris t: Movable Type also created a comment for me as well read more

Recent Assets

  • 01-incoming_traffic.gif
  • edit.jpg
  • 02-migrate.gif
  • day-shaped.png

Find recent content on the main index or look in the archives to find all content.