May 2012 Archives

major network outage today.

| | Comments (0)
here's what I had on the front page when blog was down: 

complete network outage at my stuff at mpt and svtix since 8:40 or so
local time.   It looks like an upstream networking issue... less than a
month before our second line is installed.  embarrassing.   (I don't know
for sure it's them, but it looks that way, an their support line is ringing busy.)

Update 9:50 local time:  I have confirmation that it is an upstream
outage, so it's not my fault, but there isn't a lot I can do about it.
I do have a second upstream scheduled for install on the 15th that I will
be configuring to run redundantly, but that doesn't help us today.   I do
have confirmation that my current upstream is aware of and working on
the problem.


Update: 11:28 local time:
we appear to be back up.  No incident report as of yet, but I see the egi guys huddling around a rack, it appears that they replaced a bad switch.
I'll have a full incident report and move all this info to blog.prgmr.com/xenophilia soon.


I thought I had an eta in there somewhere, eh.    I am prgmrcom on twitter, and I posted details there, and I kept #prgmr on irc.freenode.net updated.


Anyhow, this is my fault for only having one upstream.  I do have papers signed to get another upstream in on the 15th (cogent) for redundancy (and a lot more bandwidth)  -  and a third (he.net) is in the works but the dates are less clear.

New distros

| | Comments (1)
We now can install
  • Debian 6.0.0
  • CentOS 6.2
  • Ubuntu 12.04 LTS
  • Fedora 16

If you want to install one of these on your domU and the files in /distros haven't been updated, ask support@prgmr.com and we can update the files in /distros on your dom0.


nb

hung disks on fuller

| | Comments (1)
users on fuller have been down all day.  this is unacceptable.  everyone on fuller gets their last month refunded or their next two months free.


trying to figure out just what the problem is here, I rebooted it without starting xen.  poking disks;  it's rebuilding a raid, but interestingly, it is not doing much with sde2.  says it's spare, even though there are only 6 drives in the box and 6 drives in the raid. 

[root@fuller ~]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sdf2[5] sde2[6](S) sdd2[3] sdc2[2] sdb2[1] sda2[0]
      5829088512 blocks 256K chunks 2 near-copies [6/5] [UUUU_U]
      [=>...................]  resync =  5.1% (298709376/5829088512) finish=477.1min speed=193158K/sec
     
md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      10482304 blocks [6/6] [UUUUUU]
     
unused devices: <none>
[root@fuller ~]# iostat /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2
Linux 2.6.18-274.17.1.el5 (fuller.prgmr.com)     05/07/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.35    0.00    2.42    0.50    0.00   96.73

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            962.13    122864.17       175.42  199596538     284968
sdb2            960.78    122845.27       175.42  199565832     284968
sdc2            962.79    122855.09       201.93  199581780     328048
sdd2            961.85    122845.50       202.01  199566192     328176
sde2              0.01         0.16         0.00        256          8
sdf2             10.00        26.79       186.53      43514     303016


the interesting bit here is that if I read from any of the disks, I get good read speed, but the rebuild drops to zero;  even If I read from sdf, which the rebuild isn't really reading from, or sde which the rebuild is doing jack with.






avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.72    5.50    0.00   93.75

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2            550.00    281600.00         0.00     563200          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.87    5.37    0.00   93.72

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2            559.50    286464.00         0.00     572928          0
sdf2              0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.87    5.37    0.00   93.75

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2            517.50    264960.00         0.00     529920          0


but if I don't read, it goes back to rebuilding just fine

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.03    0.00    0.00   94.97

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2171.50    277952.00         0.00     555904          0
sdb2           2164.50    277056.00         0.00     554112          0
sdc2           2163.00    276864.00         0.00     553728          0
sdd2           2151.00    275328.00         0.00     550656          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0



[root@fuller ~]# cat /proc/mstat
cat: /proc/mstat: No such file or directory
[root@fuller ~]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sdf2[5] sde2[6](S) sdd2[3] sdc2[2] sdb2[1] sda2[0]
      5829088512 blocks 256K chunks 2 near-copies [6/5] [UUUU_U]
      [=>...................]  resync =  6.2% (362791616/5829088512) finish=503.5min speed=180921K/sec
     
md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      10482304 blocks [6/6] [UUUUUU]
     
unused devices: <none>



progress is being made:

[root@fuller ~]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sdf2[5] sde2[6](S) sdd2[3] sdc2[2] sdb2[1] sda2[0]
      5829088512 blocks 256K chunks 2 near-copies [6/5] [UUUU_U]
      [=>...................]  resync =  7.6% (447680832/5829088512) finish=443.8min speed=202074K/sec
     
md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      10482304 blocks [6/6] [UUUUUU]


but the funny thing is that IOstat shows that it's going gangbusters... then pauses for a few beats:

[root@fuller ~]# iostat 1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2
Linux 2.6.18-274.17.1.el5 (fuller.prgmr.com)     05/07/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.21    0.00    2.22    1.12    0.00   96.46

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            852.44    109118.40       102.11  305205250     285616
sdb2            851.73    109107.42       102.11  305174536     285616
sdc2            852.83    109113.26       117.49  305190868     328608
sdd2            894.37    130276.11       117.53  364383592     328736
sde2             26.81     13536.28         0.00   37861112          8
sdf2             24.06      8506.04       108.53   23791466     303568

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    3.31    0.00    0.00   96.69

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1445.00    184960.00         0.00     184960          0
sdb2           1445.00    184960.00         0.00     184960          0
sdc2           1444.00    184832.00         0.00     184832          0
sdd2           1472.00    188416.00         0.00     188416          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.31    0.00    0.00   97.69

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1040.00    133504.00         0.00     133504          0
sdb2           1040.00    133504.00         0.00     133504          0
sdc2           1037.00    133248.00         0.00     133248          0
sdd2           1037.00    133248.00         0.00     133248          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.69    0.00    0.00   99.31

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            322.00     41728.00         0.00      41728          0
sdb2            322.00     41728.00         0.00      41728          0
sdc2            324.00     41856.00         8.00      41856          8
sdd2            326.00     41856.00         8.00      41856          8
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    3.06    0.00    0.00   96.94

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1290.00    165632.00         0.00     165632          0
sdb2           1290.00    165632.00         0.00     165632          0
sdc2           1289.00    165504.00         0.00     165504          0
sdd2           1289.00    165504.00         0.00     165504          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.38    0.00    0.00   97.62

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1037.00    133248.00         0.00     133248          0
sdb2           1037.00    133248.00         0.00     133248          0
sdc2           1039.00    133504.00         0.00     133504          0
sdd2           1040.00    133504.00         0.00     133504          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.43    0.06    0.00   97.50

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1039.00    133376.00         0.00     133376          0
sdb2           1038.00    133376.00         0.00     133376          0
sdc2           1037.00    133120.00         0.00     133120          0
sdd2           1036.00    133120.00         0.00     133120          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0


[root@fuller ~]# iostat 1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2
Linux 2.6.18-274.17.1.el5 (fuller.prgmr.com)     05/07/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.21    0.00    2.22    1.11    0.00   96.46

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            852.87    109173.88       101.87  306096258     285616
sdb2            852.16    109164.20       101.87  306069128     285616
sdc2            853.26    109169.98       117.21  306085332     328616
sdd2            894.71    130283.24       117.25  365281640     328744
sde2             26.75     13503.74         0.00   37861112          8
sdf2             24.00      8485.59       108.27   23791466     303568

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.18    0.00    0.00   94.82

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2276.00    291328.00         0.00     291328          0
sdb2           2248.00    287744.00         0.00     287744          0
sdc2           2244.00    287232.00         0.00     287232          0
sdd2           2219.00    284032.00         0.00     284032          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.12    0.00    0.00   94.88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2220.00    284160.00         0.00     284160          0
sdb2           2221.00    284288.00         0.00     284288          0
sdc2           2212.00    283136.00         0.00     283136          0
sdd2           2221.00    284288.00         0.00     284288          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    4.93    0.00    0.00   95.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2234.65    286035.64         0.00     288896          0
sdb2           2234.65    286035.64         0.00     288896          0
sdc2           2218.81    284007.92         0.00     286848          0
sdd2           2260.40    289330.69         0.00     292224          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    4.68    0.00    0.00   95.32

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2125.00    272000.00         0.00     272000          0
sdb2           2103.00    269184.00         0.00     269184          0
sdc2           2151.00    275328.00         0.00     275328          0
sdd2           2118.00    271104.00         0.00     271104          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.00    0.00    0.00   95.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2197.00    281216.00         0.00     281216          0
sdb2           2202.00    281856.00         0.00     281856          0
sdc2           2198.00    281344.00         0.00     281344          0
sdd2           2205.00    282240.00         0.00     282240          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.69    0.31    0.00   98.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            581.00     89728.00         0.00      89728          0
sdb2            532.00     93824.00         0.00      93824          0
sdc2            541.00     91776.00         0.00      91776          0
sdd2            572.00     90368.00         0.00      90368          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.24    0.00    0.00   94.76

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2135.00    273280.00         0.00     273280          0
sdb2           2119.00    271232.00         0.00     271232          0
sdc2           2118.00    271104.00         0.00     271104          0
sdd2           2102.00    269056.00         0.00     269056          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    4.93    0.00    0.00   95.07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2112.00    270336.00         0.00     270336          0
sdb2           2128.00    272384.00         0.00     272384          0
sdc2           2118.00    271104.00         0.00     271104          0
sdd2           2157.00    276096.00         0.00     276096          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.56    0.00    0.00   97.44

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1120.00    143872.00         0.00     143872          0
sdb2           1105.00    141952.00         0.00     141952          0
sdc2           1100.00    141312.00         0.00     141312          0
sdd2           1109.00    142464.00         0.00     142464          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    3.75    0.00    0.00   96.25

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1652.00    211456.00         0.00     211456          0
sdb2           1651.00    211328.00         0.00     211328          0
sdc2           1666.00    213248.00         0.00     213248          0
sdd2           1645.00    210560.00         0.00     210560          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.12    0.00    0.00   98.88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            496.00     64000.00         0.00      64000          0
sdb2            496.00     64000.00         0.00      64000          0
sdc2            470.00     60544.00         0.00      60544          0
sdd2            482.00     62080.00         0.00      62080          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.31    0.00    0.00   98.69

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            594.00     76416.00         0.00      76416          0
sdb2            594.00     76416.00         0.00      76416          0
sdc2            622.00     80128.00         0.00      80128          0
sdd2            610.00     78592.00         0.00      78592          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.00    0.00    0.00   99.94

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.87    0.00    0.00   99.13

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            337.00     43520.00         0.00      43520          0
sdb2            338.00     43520.00         0.00      43520          0
sdc2            337.00     43520.00         0.00      43520          0
sdd2            336.00     43520.00         0.00      43520          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.37    0.00    0.00   97.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1040.00    133376.00         0.00     133376          0
sdb2           1039.00    133376.00         0.00     133376          0
sdc2           1036.00    133120.00         0.00     133120          0
sdd2           1036.00    133120.00         0.00     133120          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.37    0.00    0.00   97.63

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1026.73    131928.71         0.00     133248          0
sdb2           1026.73    131928.71         0.00     133248          0
sdc2           1028.71    132182.18         0.00     133504          0
sdd2           1028.71    132182.18         0.00     133504          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.62    0.00    0.00   97.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           1158.00    148736.00         0.00     148736          0
sdb2           1163.00    149376.00         0.00     149376          0
sdc2           1154.00    148224.00         0.00     148224          0
sdd2           1140.00    146432.00         0.00     146432          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    5.17    0.00    0.00   94.83

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2           2209.00    282752.00         0.00     282752          0
sdb2           2225.00    284800.00         0.00     284800          0
sdc2           2225.00    284800.00         0.00     284800          0
sdd2           2244.00    287232.00         0.00     287232          0
sde2              0.00         0.00         0.00          0          0
sdf2              0.00         0.00         0.00          0          0




currently the disk is 14% through it's resync;  judging from what's
happening, it wouldn't suprise me if I have to do more intense data
recovery on sdf;  but it looks like it will be down all day.

I tried to bring it up and have it resync while running, but that hung
it within 5 minutes.   I may have made a huge mistake by buying these
2tb disks;  this is the first time I've replaced a 2tb disk in
production;  either there is something very wrong with one specific
drive in this server (I suspect sdf, but I have pathetically little
evidence for that, and I can't just swap it, as that's the counterpart
to the fresh drive, sd3)  or I just don't understand how MD works.
(I'm furiously reading up on that now.)

so it started over.  but not really;  before it was sync_action sync now it is sync_action recover. 


[root@fuller md]# cat sync_action
recover
[root@fuller md]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sdf2[5] sde2[6] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      5829088512 blocks 256K chunks 2 near-copies [6/5] [UUUU_U]
      [>....................]  recovery =  2.8% (54447296/1943029504) finish=256.8min speed=122564K/sec
     
md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      10482304 blocks [6/6] [UUUUUU]
     
unused devices: <none>
[root@fuller md]#

also the iostat weirdness has reversed:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.69    0.00    0.00   99.31

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2              0.00         0.00         0.00          0          0
sdb2              0.00         0.00         0.00          0          0
sdc2              0.00         0.00         0.00          0          0
sdd2              0.00         0.00         0.00          0          0
sde2           1712.00         0.00    219136.00          0     219136
sdf2           1713.00    219264.00         0.00     219264          0


So apparently sync means only 'check against the mirror'    -  and as sdf's mirror is dead, it wasn't getting read.  Now it is rebuilding, and only sde( the spare) and sdf( the data) are active. 

crock crashing

| | Comments (0)
Crock crashed, and Luke rebooted it. It crashed again while it was starting the vps guests, so I'm going to try upgrading the kernel. If that doesn't work, it likely has hardware problems and we will need to put the disks in a new system.

Update: Crock is up and back again with a new kernel. The raid also wanted to rebuild so I let it in single user mode, so I think it was easier to start the vps guests after that.

unclean reboot of crock

| | Comments (0)
got paged just now, crock hung hard.   Can't even goose the hypervisor via serial, which is unusual.   I rebooted the thing  via the power port and it appears it is coming back. 

here's what the console log looked like when I logged in:

(XEN) traps.c:2232:d508 Domain attempted WRMSR 00000000c0010004 from 00000a0f:46a0982e to 00000000:0000abcd.
(XEN) traps.c:2232:d509 Domain attempted WRMSR 00000000c0010004 from 00008552:8b638e39 to 00000000:0000abcd.
(XEN) traps.c:2232:d510 Domain attempted WRMSR 00000000c0010004 from 00001561:1341c90b to 00000000:0000abcd.
[-- lsc@localhost attached -- Mon May  7 03:45:19 2012]
[-- Console down -- Mon May  7 03:45:20 2012]
[-- Console up -- Mon May  7 03:45:20 2012]


which is to say, nothin.

Anyhow, it looks like people are coming back up.  I'm going back to sleep.


So, uh, yeah, it crashed right after that and I called Nick, who worked on it while I slept.  he got crock up and running (we will have to wait until he gets up again for a report of what the problem was) but then he called me back with a hung disk on fuller, just before 8am, so it's my turn again.
all are rebuilding okay,  (fuller's rebuild is moving slower than I expect)  

branch needs a reboot to see the new disk.  sorry.