<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Xen hosting: Lessons from the Trenches</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/" />
    <link rel="self" type="application/atom+xml" href="http://blog.prgmr.com/xenophilia/atom.xml" />
    <id>tag:blog.prgmr.com,2008-03-02:/xenophilia/2</id>
    <updated>2013-05-23T22:10:58Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.37</generator>

<entry>
    <title>Hey, make sure you disable tso and gso in your guest</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/hey-make-sure-you-disable-tso.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.418</id>

    <published>2013-05-23T22:03:30Z</published>
    <updated>2013-05-23T22:10:58Z</updated>

    <summary><![CDATA[so yeah, srn found a bug;&nbsp;&nbsp;&nbsp; the NIC offloading stuff has always not worked properly for virtual guests... but with the latest RHEL/CentOS kernel it's gone from 'you drop a few packets every now and then'&nbsp; to "takes down your...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[so yeah, srn found a bug;&nbsp;&nbsp;&nbsp; the NIC offloading stuff has always not worked properly for virtual guests... but with the latest RHEL/CentOS kernel it's gone from 'you drop a few packets every now and then'&nbsp; to "takes down your guest entirely if you send just one packet"&nbsp;&nbsp; <br /><br />So yeah, uh,&nbsp; we'll change the starting image to add <br /><br />ethtool -K eth0 tso off gso off&nbsp;
<br /><br />to /etc/rc.local.&nbsp;&nbsp; Please do the same on your guest.<br /><br />details from srn:<br /><br />4 separate domu's have been seeing an instance of this bug - probably more will do so as they upgrade:
<br />
<br /><a class="moz-txt-link-freetext" href="http://xen.crc.id.au/bugs/view.php?id=3">http://xen.crc.id.au/bugs/view.php?id=3</a>
<br />
<br />This behavior on the dom0 side (disconnecting when it sees a packet that is too large) was 
introduced in 2.6.18-348.4.1.el5.&nbsp; It is not present in .6.18-348.3.1.el5.&nbsp; It is still present in 
2.6.18-348.6.1.el5 (latest.)
<br />
<br />40 of our servers have 2.6.18-348.4.1.el5.
<br />
<br />There is a bug fix:
<br />
<br /><a class="moz-txt-link-freetext" href="http://lists.xen.org/archives/html/xen-devel/2013-04/msg01328.html">http://lists.xen.org/archives/html/xen-devel/2013-04/msg01328.html</a>
<br />
<br />But I don't know what the status of that is WRT centos.&nbsp; I guess this redhat bug is related:
<br /><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=957231">https://bugzilla.redhat.com/show_bug.cgi?id=957231</a>
<br />
<br />But without a redhat account we can't look.
<br />
<br />domu's can work around this (apparently with some performance impact) by running
<br />
<br />ethtool -K eth0 tso off gso off
<br />
<br />Considering we have 40 servers running 4.1 and only 4 people have been affected, is the best thing 
to do just to send a list out to announce / the blog and throw swatch on the console logs?
<br />
<br />I may poke at the centos virt mailing list and ask if they know if there's a timeline for applying 
the patch to netback I linked to above.
<br />
<br />]]>
        
    </content>
</entry>

<entry>
    <title>dhcp problems (now fixed)</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/dhcp-problems-now-fixed.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.417</id>

    <published>2013-05-21T21:38:23Z</published>
    <updated>2013-05-21T21:44:18Z</updated>

    <summary>The following subnets may have had issues acquiring a dhcp lease from approximately 03:30 to 14:30 PST 2013-05-21:71.19.14571.17.14671.19.14771.19.14971.19.15071.19.15471.19.15671.19.157The problem is we are running dhcrelay on those subnets, which forwards dhcp requests to another server, and they were using a cached...</summary>
    <author>
        <name>srn</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[The following subnets may have had issues acquiring a dhcp lease from approximately 03:30 to 14:30 PST 2013-05-21:<br /><br />71.19.145<br />71.17.146<br />71.19.147<br />71.19.149<br />71.19.150<br />71.19.154<br />71.19.156<br />71.19.157<br /><br />The problem is we are running dhcrelay on those subnets, which forwards dhcp requests to another server, and they were using a cached dns lookup which pointed to an old ip address rather than the current for the dhcp server.<br /> ]]>
        
    </content>
</entry>

<entry>
    <title>Outage in rack 05-11 - Luke&apos;s report</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/outage-in-rack-05-11.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.416</id>

    <published>2013-05-18T09:56:44Z</published>
    <updated>2013-05-18T10:35:33Z</updated>

    <summary><![CDATA[NOTE:&nbsp; please read the report written by srn.&nbsp; It's better:http://blog.prgmr.com/xenophilia/2013/05/unplanned-downtime-in-santa-cl.htmlSo, a co-lo customer brings in a desktop so cheap that it has a manual 120-240switch.&nbsp; My co-lo, of course, is 208.Everything made in the last 10 years auto-switches 100-240v. But...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[<br />NOTE:&nbsp; please read the report written by srn.&nbsp; It's better:<br /><br />http://blog.prgmr.com/xenophilia/2013/05/unplanned-downtime-in-santa-cl.html<br /><br />So, a co-lo customer brings in a desktop so cheap that it has a manual 120-240<br />switch.&nbsp; My co-lo, of course, is 208.<br /><br />Everything made in the last 10 years auto-switches 100-240v. But not this<br />garbage.<br /><br />Anyhow, he plugged it in, and this destroyed my PDU.&nbsp; We went to the<br />office and grabbed a spare, mounted it, and you should be back up (the<br />power cord situation in rack 05-11 is now much, much worse, due to plugging<br />it back in during a panic.)<br /><br />This is my fault, of course;&nbsp; I thought this person was pretty competent,<br />and I had a little room left in 05-11 and nowhere else, so I put him in with<br />the production stuff.&nbsp;&nbsp; Clearly, this was a mistake. <br /><br />Interestingly, this was on a sub-pdu (that was plugged in via a c19-&gt;c20;&nbsp; my main pdu has several c19 outlets as well as a bunch of c13)&nbsp; - and the sub-pdu works just fine still;&nbsp; it was the main PDU that completely fried. &nbsp; I need to learn more about electricity. &nbsp;&nbsp; &nbsp; <br /><br />anyhow, you should be back up now.&nbsp; I'm sorry.<br /><br /> ]]>
        
    </content>
</entry>

<entry>
    <title>unplanned downtime in santa clara</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/unplanned-downtime-in-santa-cl.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.415</id>

    <published>2013-05-18T08:33:54Z</published>
    <updated>2013-05-18T09:50:05Z</updated>

    <summary><![CDATA[UPDATE 2:49 PST: All servers should be back up.UPDATE: The problem was a dedicated server customer plugged in a computer with a configurable power supply at the wrong setting (120V instead of 240V) and it blew out the pdu.&nbsp; We...]]></summary>
    <author>
        <name>srn</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[UPDATE 2:49 PST: All servers should be back up.<br /><br />UPDATE: The problem was a dedicated server customer plugged in a computer with a configurable power supply at the wrong setting (120V instead of 240V) and it blew out the pdu.&nbsp; We picked up another pdu from the office and are installing it right now.<br />&nbsp;<br />Following servers are down, starting at about 01:10 PST:<br /><br />sword<br />chariot<br />branch<br />seashell<br />knife<br />cauldron<br />chessboard<br />horn<br />fuller<br />coins<br />council<br />bowl<br />waite<br />whetstone<br />taney<br />mantle<br />lozenges<br />rutledge<br />pearl<br /><br />Reasons are unknown (to me at least) at this time - will post with further updates as I have them.<br /> ]]>
        
    </content>
</entry>

<entry>
    <title>/etc/resolv.conf</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/etcresolvconf.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.414</id>

    <published>2013-05-16T12:54:46Z</published>
    <updated>2013-05-16T13:04:41Z</updated>

    <summary>Please replace any existing nameserver entries with the following:nameserver 71.19.155.120nameserver 71.19.145.215...</summary>
    <author>
        <name>srn</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[Please replace any existing nameserver entries with the following:<br /><br />nameserver 71.19.155.120<br />nameserver 71.19.145.215 <br /><br /><br /> ]]>
        
    </content>
</entry>

<entry>
    <title>dhcp has been changed to the renumbered ips, dns to change shortly</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/dhcp-has-been-changed-to-the-r.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.413</id>

    <published>2013-05-16T11:31:31Z</published>
    <updated>2013-05-16T12:22:49Z</updated>

    <summary>In short, for those people who have been renumberedusername.xen.prgmr.com is becoming username.old.xen.prgmr.comusername.new.xen.prgmr.com is becoming username.xen.prgmr.comreverse dns mappings should be preserved.The old IPs are only around for another day or so, so if you are not pingable on username.new.xen.prgmr.com PLEASE take...</summary>
    <author>
        <name>srn</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[In short, for those people who have been renumbered<br /><br />username.xen.prgmr.com is becoming username.old.xen.prgmr.com<br />username.new.xen.prgmr.com is becoming username.xen.prgmr.com<br /><br />reverse dns mappings should be preserved.<br /><br />The old IPs are only around for another day or so, so if you are not pingable on username.new.xen.prgmr.com PLEASE take action.&nbsp; <br /><br />We intend to send a follow up email for those vps's which are still not responding but you are responsible for making sure your server has the correct network configuration, we do not edit files on your machine.&nbsp; <br /><br />As a side note, if you do not remember which server your vps is hosted on, current users should be able to log in as &lt;user&gt;@&lt;user&gt;.console.xen.prgmr.com<br /> ]]>
        
    </content>
</entry>

<entry>
    <title>Support</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/support.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.412</id>

    <published>2013-05-15T02:47:47Z</published>
    <updated>2013-05-15T02:54:06Z</updated>

    <summary><![CDATA[Just to let you know, support is getting a little backed up with all the move going on and the IP address renumbering.&nbsp; Please be patient.&nbsp; We are attempting to get back to normal as soon as possible.If you need...]]></summary>
    <author>
        <name>Nicholas Bebout</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[Just to let you know, support is getting a little backed up with all the move going on and the IP address renumbering.&nbsp; Please be patient.&nbsp; We are attempting to get back to normal as soon as possible.<br /><br />If you need something urgently, please go to #prgmr on Freenode IRC and ask for prgmrcom (Luke), srn_prgmr (Sarah), danihan (Dan), lifftchi (Chris), nb (Nick), or _will_ (Will) and give us your ticket number, and we will attempt to take care of it as soon as we can.&nbsp; Otherwise, we are attempting to get through the ticket queue as quickly as possible.<br /> ]]>
        
    </content>
</entry>

<entry>
    <title>Additional unplanned downtime</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/additional-unplanned-downtime.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.411</id>

    <published>2013-05-13T07:28:11Z</published>
    <updated>2013-05-13T07:34:58Z</updated>

    <summary><![CDATA[so yeah, we have a handful of dell C6100s in the bottom of each of the racks. &nbsp; These are me working on a deal where I help unixsurplus lease some of their stuff.&nbsp; Anyhow, this evening around 8:00pm my...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[so yeah, we have a handful of dell C6100s in the bottom of each of the racks. &nbsp; These are me working on a deal where I help unixsurplus lease some of their stuff.&nbsp; Anyhow, this evening around 8:00pm my time, Miles was swapping out one of the units.&nbsp; And here is why it's a bad idea to put these things on the bottom of the rack; &nbsp; the 4 blades all come out the back... and the l6-30 is right there in the way. &nbsp; Well, the upshot is that power got disconnected for the whole rack. &nbsp; <br /><br />This outage was exacerbated by me just doing the minimum to get servers up last night.&nbsp;&nbsp; some of the servers, for instance, defaulted to non-xen linux in the grub config (we do this, often, when troubleshooting the serial console, for instance, we then manually select the xen&nbsp; kernel, and then fix menu.lst after the box is up.&nbsp; Well, I didn't fix menu.lst when I was configuring these this morning.)&nbsp; <br /><br /><br />So yeah, what are we going to do to fix this problem?&nbsp;&nbsp; well, first, I think the importance of being careful around power connectors has been impressed upon miles.&nbsp; but more importantly, now we are moving all the c6100 units up to mid rack height.&nbsp;&nbsp; <br /> ]]>
        
    </content>
</entry>

<entry>
    <title>wow... that was a lot more than an hour.</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/wow-that-was-a-lot-more-than-a.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.410</id>

    <published>2013-05-12T15:40:51Z</published>
    <updated>2013-05-12T15:45:53Z</updated>

    <summary><![CDATA[Wow.&nbsp; That was a lot more than an hour of downtime.&nbsp;&nbsp; I'm sorry.&nbsp;&nbsp; Severalthings went wrong.&nbsp; You know all those plans I was talking about withregards to having everything set and ready to go?&nbsp;&nbsp; Well, first?the console server wasn't all...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[Wow.&nbsp; That was a lot more than an hour of downtime.&nbsp;&nbsp; I'm sorry.&nbsp;&nbsp; Several<br />things went wrong.&nbsp; You know all those plans I was talking about with<br />regards to having everything set and ready to go?&nbsp;&nbsp; Well, first?<br />the console server wasn't all the way setup.&nbsp;&nbsp;&nbsp; Next?&nbsp; turns out, I had 5<br />pre-made opengear dongles, not 10 like I needed;&nbsp; I of course have plenty<br />of unconfigured db9-&gt;rj45 adaptors about, but the instructions I had were<br />by color (which is stupid, as there aren't standard colurs for the<br />rj45-&gt;db9 dongles, and my new ones are different)&nbsp; and it took me a while<br />to figure out how to make a proper opengear rj45-&gt;db9 dongle.<br /><br />Then, the rest?&nbsp; mostly just being stupid and tired.&nbsp;&nbsp; Like we swapped the<br />network ports (which I had pre-configured) then stood around like idiots<br />wondering why the thing boots so slowly.<br /><br />Anyhow, I've got a 5 year lease on my four 5kw racks here in coresite<br />santa clara, so for you?&nbsp; the ordeal is over.&nbsp; Yes, at some point I'll<br />want to move you on to better hardware, but that usually comes with better<br />performance and usually more disk/ram and stuff, and often it can be done<br />at the user's convenience.&nbsp; (that's what I told the users who moved, anyhow.)<br /><br /><br />Anyhow, I think I'm not moving anything tonight.&nbsp; I will need to clean up the<br />co-lo, but most of my mistakes here were of the tired and stupid variety. &nbsp; <br /><br /><br /> ]]>
        
    </content>
</entry>

<entry>
    <title>so, uh, sorry about breaking networking on wilson</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/so-uh-sorry-about-breaking-net.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.409</id>

    <published>2013-05-11T13:07:34Z</published>
    <updated>2013-05-11T13:36:09Z</updated>

    <summary><![CDATA[I'm... kindof an idiot. &nbsp; See, I'm up writing this instead of sleeping, when I'm clearly stupid at this point. &nbsp; but yeah.&nbsp; Thing is?&nbsp; the new network consists of a force10 10G switch (4x fxp, 24xcx4) switch for all...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[I'm... kindof an idiot. &nbsp; See, I'm up writing this instead of sleeping, when I'm clearly stupid at this point. &nbsp; but yeah.&nbsp; Thing is?&nbsp; the new network consists of a force10 10G switch (4x fxp, 24xcx4) switch for all four racks, plus one woven trx100 (4 port 10g cx4, 48 port 1g rj45) switch per rack.&nbsp; I'm using lacp to aggrigate the 10g links into one 40G link (it actually works pretty well, because you all have different mac address, the traffic is well-balanced.&nbsp;&nbsp; a common problem with layer two link aggregation is that usually it uses the source and dest mac addresses to figure out which link to send a packet down, so it's easy to get unbalanced traffic... but all you have different mac addresses, so it spreads traffic as well as a much larger network.) <br /><br />Anyhow, uh, yeah.&nbsp;&nbsp; lemme check my notes.<br /><br />okay, on the woven?&nbsp; to put a port on a vlan, you do something like this:<br /><br />interface 0/3<br />vlan pvid 30<br /><br /><br />or so.&nbsp;&nbsp;&nbsp; Then, on the aggrigate, you do something like:<br /><br />on the woven:<br />interface 3/1<br />vlan acceptframe vlanonly<br />vlan participation exclude 1<br />vlan participation include 30,33<br />vlan tagging 30,33<br />lacp collector max-delay 0<br />exit <br /><br /><br />See, I thought that the 'vlan tagging' thing was for access-mode links, to tag incoming packets that don't already have a vlan header with a particular vlan, so I removed (!) it.&nbsp;&nbsp;&nbsp; from a working (!)&nbsp; vlan.&nbsp; so clearly, it was a stupid, stupid, undercaffinated mistake.&nbsp;&nbsp; <br /><br />I figured the problem out shortly after making myself a cup of coffee.&nbsp;&nbsp; <br />]]>
        
    </content>
</entry>

<entry>
    <title>dhcp currently (kindof) broken, will fix after sleep</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/dhcp-currently-kindof-broken-w.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.408</id>

    <published>2013-05-11T13:05:25Z</published>
    <updated>2013-05-11T13:07:07Z</updated>

    <summary><![CDATA[uh, yeah, set your IPs statically for now.&nbsp; We're getting reports of DHCP being broken in some places but not others.&nbsp; It doesn't make sense to me right now.&nbsp; I'm going to setup a dhrelay server on each subnet and...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[uh, yeah, set your IPs statically for now.&nbsp; We're getting reports of DHCP being broken in some places but not others.&nbsp; It doesn't make sense to me right now.&nbsp; I'm going to setup a dhrelay server on each subnet and each location, and that should cover it. &nbsp;&nbsp; ]]>
        
    </content>
</entry>

<entry>
    <title>An error in judgement</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/an-error-in-judgement.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.407</id>

    <published>2013-05-05T16:05:08Z</published>
    <updated>2013-05-05T16:16:59Z</updated>

    <summary><![CDATA[so yeah, I emailed 15 servers of users, told them they'd have an hour of downtime "Saturday Evening"&nbsp; I figured I'd do 5 servers at a go.&nbsp;&nbsp; small number.&nbsp;&nbsp;&nbsp; So yeah, I came up with some spare rails, screwed 'em...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[so yeah, I emailed 15 servers of users, told them they'd have an hour of downtime "Saturday Evening"&nbsp; <br /><br />I figured I'd do 5 servers at a go.&nbsp;&nbsp; small number.&nbsp;&nbsp;&nbsp; <br /><br />So yeah, I came up with some spare rails, screwed 'em in, setup the PDU, etc... did all the usual things you do before you move servers.&nbsp;&nbsp; But these things have a way of taking longer than you expect, and I also had to configure an unfamiliar switch (my force10 10gbe setup, which I still don't have configured)&nbsp; <br /><br />So after we setup the destination co-lo, we headed to the source co-lo (55 s. market.) - this means getting tickets to remove servers and arranging it so that the security folk know we are coming and verify our tickets before we shut things down (it always takes a few tries to get the ticket right to leave with hardware.) &nbsp; so Anyhow, it was around 4:30AM local time when we started shutting things down.&nbsp; It's 9:00am now.&nbsp;&nbsp;&nbsp;&nbsp; I really should have called it off around 1:00am or so.&nbsp;&nbsp;&nbsp; <br /><br />I mean, the rest of our problems have to do with the fact that these are some of the oldest servers (both hardware wise and software wise) in the fleet.&nbsp;&nbsp; Well, that, and I'm not exactly at my mental best at this point.<br /><br />Anyhow, we're back up.&nbsp; I'm sorry.&nbsp;&nbsp; <br /><br /><br /> ]]>
        
    </content>
</entry>

<entry>
    <title>network outage at 4:00AM pst</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/network-outage-at-400am-pst.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.406</id>

    <published>2013-05-03T11:49:13Z</published>
    <updated>2013-05-03T11:51:47Z</updated>

    <summary><![CDATA[so, uh, yeah.&nbsp; stupid mistake on my part.&nbsp;&nbsp; approx. 20 mins. downtime (longer for IPv6)&nbsp;&nbsp;&nbsp; you are back now...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[so, uh, yeah.&nbsp; stupid mistake on my part.&nbsp;&nbsp; approx. 20 mins. downtime (longer for IPv6)&nbsp;&nbsp;&nbsp; you are back now<br />]]>
        
    </content>
</entry>

<entry>
    <title>and council today </title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/and-council-today.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.404</id>

    <published>2013-05-03T06:37:55Z</published>
    <updated>2013-05-03T08:50:39Z</updated>

    <summary><![CDATA[UPDATE 2013-05-03 1:42 PST : Drive is replaced, the raid is rebuilding.&nbsp; Estimated completion time is ~ 2013-05-04 10:30 PST.---Why isn't it failing out of the raid? and that is an 'enterprise' -&nbsp; weird.&nbsp;&nbsp; anyhow, I failed the drive.&nbsp;&nbsp; (interestingly,...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[UPDATE 2013-05-03 1:42 PST : Drive is replaced, the raid is rebuilding.&nbsp; Estimated completion time is ~ 2013-05-04 10:30 PST.<br />---<br /><br />Why isn't it failing out of the raid? and that is an 'enterprise' -&nbsp; weird.&nbsp;&nbsp; anyhow, I failed the drive.&nbsp;&nbsp; (interestingly, this is one of those ancient servers that still has a 'consumer' drive- it was the 'enterprise' not the 'consumer' that is bad.)<br /><br />anyhow, I'm off to swap a drive.&nbsp;&nbsp; <br /><br /><br /><br /><br />1.00: (BMDMA stat 0x0)<br />ata1.00: tag 0 cmd 0x25 Emask 0x9 stat 0x51 err 0x40 (media error)<br />ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0<br />ata1.00: (BMDMA stat 0x0)<br />ata1.00: tag 0 cmd 0x25 Emask 0x9 stat 0x51 err 0x40 (media error)<br />ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0<br />ata1.00: (BMDMA stat 0x0)<br />ata1.00: tag 0 cmd 0x25 Emask 0x9 stat 0x51 err 0x40 (media error)<br />end_request: I/O error, dev sdc, sector 283579720<br />SCSI device sdc: 2930277168 512-byte hdwr sectors (1500302 MB)<br />sdc: Write Protect is off<br />SCSI device sdc: drive cache: write back<br />SCSI device sdc: 2930277168 512-byte hdwr sectors (1500302 MB)<br />sdc: Write Protect is off<br />SCSI device sdc: drive cache: write back<br />ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0<br />ata1.00: (BMDMA stat 0x0)<br />ata1.00: tag 0 cmd 0x25 Emask 0x9 stat 0x51 err 0x40 (media error)<br />SCSI device sdc: 2930277168 512-byte hdwr sectors (1500302 MB)<br />sdc: Write Protect is off<br />SCSI device sdc: drive cache: write back<br />raid1: sdb2: redirecting sector 262614887 to another mirror<br /><br /><br /><br /><br /> ]]>
        
    </content>
</entry>

<entry>
    <title>rehnquist is back, on fresh hardware.</title>
    <link rel="alternate" type="text/html" href="http://blog.prgmr.com/xenophilia/2013/05/rehnquist-is-back-on-fresh-har.html" />
    <id>tag:blog.prgmr.com,2013:/xenophilia//2.403</id>

    <published>2013-05-02T11:45:19Z</published>
    <updated>2013-05-02T11:49:25Z</updated>

    <summary><![CDATA[and yeah. I'm going to take the marvell card out of Wilson and replace it with a LSI or something.&nbsp; It looks like the dom0 got hit harder by the crash than the DomUs, though I would expect some of...]]></summary>
    <author>
        <name>luke</name>
        <uri>http://prgmr.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://blog.prgmr.com/xenophilia/">
        <![CDATA[and yeah. I'm going to take the marvell card out of Wilson and replace it with a LSI or something.&nbsp; <br /><br />It looks like the dom0 got hit harder by the crash than the DomUs, though I would expect some of you will have to fsck.&nbsp;&nbsp; <br /><br /><br />http://twitter.com/prgmrcom&nbsp; got many of my updates this time, though as the evening wore, my coherency did, too.&nbsp; <br /><br /><br /><br /> ]]>
        
    </content>
</entry>

</feed>
