Recently in new features Category

Distros updated

| | Comments (0)
We now have images for:

Fedora 17
CentOS 6.3

The Debian image has been updated from 6.0.0 to 6.0.3

I've also put a distrolist file in /distros in all of the dom0's that I've updated the /distros for, so you can tell what version the images on your dom0 are (they are always named distroXX.tar.gz, where XX is 32 or 64, representing 32 or 64 bit).

We will eventually put the new images in /distros on every dom0, however, if you want them quicker, please email support@prgmr.com

debian mirror

| | Comments (0)
Our current debian images are configured to use mirrors.kernel.org in /etc/apt/sources.list for package updates. Its normally a reliable server, but because its down right now we have setup mirrors.prgmr.com on a vps for now running apt-cacher-ng for debian packages. Eventually we plan to setup a dedicated server for a full debian, centos and ubuntu mirror, but this will help while kernel.org is down and until we get the hardware for the full mirror. To use mirrors.prgmr.com, set this in /etc/apt/sources.list:
deb http://mirrors.prgmr.com/debian/ squeeze main
deb-src http://mirrors.prgmr.com/debian/ squeeze main

deb http://security.debian.org/ squeeze/updates main
deb-src http://security.debian.org/ squeeze/updates main
or just search and replace mirrors.kernel.org with mirrors.prgmr.com. If you are running debian lenny, make sure that is still set in the sources.list file also until you are ready to upgrade. Email support@prgmr.com if you have any questions. Thanks!

on logging serial consoles.

| | Comments (7)

So every now and again a customer will complain of a crashing domain. Occasionally, it is an early sign of a hardware problem that I need to deal with, so I don't want to just ignore it.

Now, the problem is that like a physical server, once the domain has rebooted, most of the information about why it crashed is gone. (and what little is left is in /var/log on the guest, and as a general rule we don't like mucking around in the guest. that's your business, not ours.)

Now, on a physical server, we solve this by using a logging serial console. (I reccomend opengear if you have the money, and a used cyclades if you don't have money. the 'buddy system' (making one server the console server for the next, then the next server the console server for the first) usually requires adding usb serial dongles, but is even cheaper still, for installations with only a few servers. I personally like the IOgear brand usb -> serial dongles Fry's has.

I can turn on debug logging in xenconsoled and that will log the console for all domains to a file (one file for each domain) then I can use those logs to troubleshoot the problem. The thing is, apparently some people have privacy concerns with this, so I haven't done it yet.

Now, personally, I don't think serial consoles are that sensitive. I mean, it's common to leave terminals in data centers where passers by can see the output. They will allow me to see what program is crashing, which may be sensitive, and depending on how you have the thing configured, I can see when people log in and log out.

So, I have several options.

  1. I could leave it as is, continue to go back and fourth and guess if someone asks me why something crashed after a reboot
  2. I can log all consoles and delete the data once a week or once a month
  3. I can apply a patch to log some people's consoles and not others, and let the user decide

Obviously, option 2 makes my life a /whole lot/ easier. Option 3 is better than option 1, but it still means maintaining an out of tree xenconsoled (or pushing it upstream)

downtime should be under 10 minutes.  you are being moved on to the new server, which means that our new,  low-priced plans are available.  (I'm not done tweaking the prices/ram, I will probably even out the curve a little, but the price per megabyte of ram won't be going up.) 

I will be removing two of our legacy servers to make room for our new server.   (one of them doesn't have customers on it, only customers on hydra should be impacted by this.) 

It has begun,  hind is going down now.  After that, hydra
No, I've not setup non-PAE servers, but NetBSD-CURRENT has supported x86_64 and i386-PAE (as a DomU only)  for some time now. I just go around to testing and setting it up. 

full instructions here: http://book.xen.prgmr.com/mediawiki/index.php/NetBSD_as_a_DomU

or if you are a new customer, I've included the NetBSD install kernel in the menu.lst in all my debian x86_64 images. 

It looks to work (though it is still -current)  -  It hasn't crashed on me yet. 


IPv6 RDNS setup

| | Comments (0)

so IPv6 rdns is a little different from  IPv4 rdns.  With IPv4 rdns, you split each IP address on byte boundries, reverse the octets, and append in-addr.arpa.   all data is represented in decimal, and you don't pad zeros.  for example, to get the ipv4 rdns of 216.218.223.67  you look for a ptr record named 67.223.218.216.in-addr.arpa. 

IPv6 rdns is similar on the surface;   instead of .in-addr.arpa. you append ip6.arpa, but you split the address in unexpected ways, too:

IPv6 addresses are written out in hex, two byte chunks seperated by colon charaters.   IPv6 rdns writes out the full address in hex including padding out all zeros,  and then splits it into 4 bit chunks (single hex characters)  and reverses those. 

so to get the rdns of, say, ns2.prgmr.com,  IPv6 address: 2001:470:1:41:a800:ff:fe50:3143
you would look for a PTR record that looked like this:;  3.4.1.3.0.5.e.f.f.f.0.0.0.0.8.a.1.4.0.0.1.0.0.0.0.7.4.0.1.0.0.2.ip6.arpa.

note that you must pad out the zeros so that each two-byte chunk seperated by the ':' character is represented by four characters.    

Of course, dig -x does this for us....


 dig -x 2001:470:1:41:a800:ff:fe50:3143

; <<>> DiG 9.3.4-P1 <<>> -x 2001:470:1:41:a800:ff:fe50:3143
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 20189
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;3.4.1.3.0.5.e.f.f.f.0.0.0.0.8.a.1.4.0.0.1.0.0.0.0.7.4.0.1.0.0.2.ip6.arpa

traffic shaping: yet again.

| | Comments (1)
day-shaped.png

Traffic shaping seems to have been vaguely successful.  We're trying EXTREMELY HARD to keep our 95th percentile below 10mbit, since it appears that going beyond that will cost us incredible added monies.

Note that I don't really want to limit to the ~2mbit outgoing that's being used now -- I'm not sure why the outgoing traffic is limited so hard, and I haven't got a test domain on that box to find out.  The goal is to put the dangerous customers into an "overage class" that gets a shared 10mbit, and let them fight over it -- but they should be getting 10mbit.  I'm theorizing that maybe their traffic relies on high incoming bandwidth -- but I don't really know.  I should be able to test once Luke wakes up.

The next thing I want to do in this area is generalize the script somewhat and make the domain creation scripts call it at boot.  I'm not sure what the done thing is here -- I'm guessing making it a vif parameter would be best, then having that pass through vif-bridge and call a vif-qos script (or similar.)

Okay, I guess that's what I'll do then.

traffic shaping: round 2

| | Comments (0)
Started whacking domains with the b&width hammer, as per the directions noted a couple entries ago.

Further refinements will probably involve putting quotas directly in config files, with scripts to parse and automatically set limits at domain creation.  Ideally I'd also write a tool to re-assign domains to existing classes.
This title is, of course, a complete lie.  It looks like our disk layout scheme broke LVM snapshots.  To quote from our testing:

# lvcreate -s -L 100M -d hydra_domU/test -n test_snap
  Snapshots and mirrors may not yet be mixed.

That's some real well-supported technology there.  Google gives me two results for that error message, both of which are source diffs.

I'm not sure what to do about this.  Every so often I feel like abandoning LVM mirroring entirely and moving to LVM on MD, but that didn't exactly fill us with joy either.

I'm also considering bypassing the LVM-specific snapshot implementation and using the device mapper directly, but that worries me.  I would want to know why snapshots and mirrors can't be mixed before implementing snapshots anyway.
Today I put my money in my mouth and worked on traffic shaping.  I'm not 100% sure that this setup is correct -- we'll have to test it more before we put it in production.  Tentatively, though, here's how it works:

We're doing everything in the dom0.  Traffic shaping is, after all, a coercive technical solution.  Doing it in customer domUs would be silly.

First, we have to make sure that the packets on xenbr0 traverse iptables:

# echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

This is so that we can mark packets according to which domU emitted them.  (There are other reasons, but that's the important one in terms of our traffic-shaping setup.)

Next, we limit incoming traffic.  This is the easy part.  To limit vif "baldr" to 1mbit /s, with bursts up to 2mbit and max allowable latency 50ms:

# tc qdisc add dev baldr root tbf rate 1mbit latency 50ms peakrate 2mbit maxburst 40MB

This adds a queuing discipline, or qdisc, to the device "baldr".  Then we specify where to add it ("root",) and what sort of qdisc it is ("tbf").  Finally we specify the rate, latency, burst rate, and amount that can go at burst rate.

Next we work on limiting outgoing traffic.  The policing filters might work, but they handle the problem by dropping packets, which is. . . bad.  Instead we're going to apply traffic shaping to the outgoing physical Ethernet device, peth0.

First, for each domU, we add a rule to mark packets from that network interface:

# iptables -t mangle -A FORWARD -m physdev --physdev-in baldr -j MARK --set-mark 5

Here the number 5 is an arbitrary integer.  Eventually we'll probably want to use the domain id, or something fancy.  We could also simply use tc filters directly that match on source IP address, but it feels more elegant to have everything keyed to the domains "physical" network device.  Note that we're using physdev-in -- traffic that goes out from the domU comes in to the dom0.

Next we create an HTB qdisc.  We're using HTB because it does what we want and has good documentation (available at http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm .)  We won't go over the HTB options in detail, since we're just lifting examples from the tutorial at this point:

# tc qdisc add dev peth0 root handle 1:  htb default 12

Then we make some classes to put traffic into.  Each class will get traffic from one domU.  (As the HTB docs explain, we're also making a parent class so that they can share surplus bandwidth.)

# tc class add dev peth0 parent 1: classid 1:1 htb rate 100mbit
# tc class add dev peth0 parent 1: classid 1:2 htb rate 1mbit

Now that we have a class for our domU's traffic, we need a filter that'll assign packets to it.

# tc filter add dev peth0 protocol ip parent 1:0 prio 1 handle 5 fw flowid 1:2

At this point traffic to and from the target domU is essentially shaped.  To prove it, we copied a 100MB file out, followed by another in.   Outgoing transfer speed was 203.5KB/s, while incoming was about 115KB/s.

This incoming speed is as expected, but the outgoing rate is a bit high.  Still, though, it's about what we're looking for.  Tomorrow we'll test this with more machines and heavier loads.

About this Archive

This page is a archive of recent entries in the new features category.

ipv6 is the previous category.

outage is the next category.

Find recent content on the main index or look in the archives to find all content.