July 2010 Archives

whetstone reboot

| | Comments (4)
We needed to reboot whetstone this morning because of
BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ> [<ffffffff8025894a>] softlockup_tick+0xce/0xe0
 [<ffffffff8020df6c>] timer_interrupt+0x3a8/0x402
 [<ffffffff80258c34>] handle_IRQ_event+0x4e/0x96
 [<ffffffff80258d20>] __do_IRQ+0xa4/0x105
 [<ffffffff8020bd6c>] do_IRQ+0x44/0x4d
 [<ffffffff80351f4c>] evtchn_do_upcall+0x19e/0x256
 [<ffffffff80209d8e>] do_hypervisor_callback+0x1e/0x2c
 <EOI> [<ffffffff8035d93e>] show_rd_sect+0x0/0x68
 [<ffffffff802ee0bc>] __read_lock_failed+0x8/0x14
 [<ffffffff803494de>] get_device+0x17/0x20
 [<ffffffff804024cd>] .text.lock.spinlock+0x53/0x8a
 [<ffffffff8035d965>] show_rd_sect+0x27/0x68
 [<ffffffff802be588>] sysfs_read_file+0xa5/0x12c
 [<ffffffff8028031c>] vfs_read+0xcb/0x171
 [<ffffffff802806fb>] sys_read+0x45/0x6e
 [<ffffffff802097b2>] tracesys+0xab/0xb5

We have seen this before on some of our other dom0s so we're planning to upgrade them eventually to xen 4 if they have this problem. The downtime lasted 6 hours, users on whetstone will get a free month.

But no reboot. I'm rebuilding off a hot spare.

[lsc@branch ~]$ cat /proc/mdstat
Personalities : [raid1] 
md2 : active raid1 sdc2[0] sda2[1]
      478375424 blocks [2/2] [UU]
      
md1 : active raid1 sdf2[2] sdd2[3](F) sde2[4](F) sdb2[0]
      478375424 blocks [2/1] [U_]
      [=>...................]  recovery =  6.4% (30769792/478375424) finish=524.6min speed=14216K/sec
      
md0 : active raid1 sdf1[3] sdd1[4](F) sde1[5](F) sdc1[0] sdb1[1] sda1[2]
      10008384 blocks [4/4] [UUUU]
      
unused devices: 
[lsc@branch ~]$ 

[root@stables ~]# xm create -c billing_e_test
Using config file "/etc/xen/billing_e_test".
Error: Creating domain failed: name=billing_e_test
[root@stables ~]# 

however, domains that are up, are up. I can log into one of mine on that box, and all appears well.

working on it.

everyone should be back up, you can reboot your domains now, too.

dish reboot

| | Comments (1)
We just needed to reboot dish.prgmr.com because of this error:
BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ> [<ffffffff8025894a>] softlockup_tick+0xce/0xe0
 [<ffffffff8020df6c>] timer_interrupt+0x3a8/0x402
 [<ffffffff80258c34>] handle_IRQ_event+0x4e/0x96
 [<ffffffff80258d20>] __do_IRQ+0xa4/0x105
 [<ffffffff8020bd6c>] do_IRQ+0x44/0x4d
 [<ffffffff80351f4c>] evtchn_do_upcall+0x19e/0x256
 [<ffffffff80209d8e>] do_hypervisor_callback+0x1e/0x2c
 <EOI> [<ffffffff8035d93e>] show_rd_sect+0x0/0x68
 [<ffffffff802ee0b9>] __read_lock_failed+0x5/0x14
 [<ffffffff803494de>] get_device+0x17/0x20
 [<ffffffff8040415d>] .text.lock.spinlock+0x53/0x8a
 [<ffffffff8035d965>] show_rd_sect+0x27/0x68
 [<ffffffff802be588>] sysfs_read_file+0xa5/0x12c
 [<ffffffff8028031c>] vfs_read+0xcb/0x171
 [<ffffffff802806fb>] sys_read+0x45/0x6e
 [<ffffffff802097b2>] tracesys+0xab/0xb5


diagnostics ongoing.