Recently in troubleshooting Category

we've been having some mysterious packet loss issues that look a lot like we are oversubscribing a 50Mbps connection, but we're not. we have a 100Mbps commit on a gig port. Our upstream believes the problem to be with our router, so I've been working on it. anyhow, I found this in my foundry:

BR-charon#show ip traffic
IP Statistics
  1916350241 received, 52550054 sent, 585142657 forwarded
  629624 filtered, 67 fragmented, 78 reassembled, 2033812 bad header
  14173 no route, 0 unknown proto, 0 no buffer, 632845943 other errors

so we're swapping it out with a procurve 2824 with firmware that was fresh this year (downright modern!)

anyhow, there shouldn't be more than 120 seconds or so downtime for anyone. we're doing the move incrementally. It should be done tonight.

Update: we are done.

GNU screen for collaboration

| | Comments (2)
GNU Screen is often used for keeping programs running while their user is logged out, but it is also very useful for collaboration when more than one user connects to a session. To do this, "multiuser on" must be set in the session, either in the screenrc file or with : in the session. Then the user who didn't start screen must connect to it with screen -r and the username of the owner of the session and the name of the session like "screen -r nick/31346.pts-15.lion". The name of the session can be found by the owner doing screen -ls or root looking in /var/run/screen/ in the owners session directory where there is a named pipe with the name of the session. The screen binary must be setuid root with chmod u+s and owner root. The session owner must explicitly allow the other users to connect with "addacl otherusernames" either in the screenrc or in the session. Then when both users are connected they have the privileges of the session owner and can see what each other are doing, allowing demonstration of various system tasks or issues.
I guess this is the part where I admit that I'm really not all that good at networking.  I tend to make silly mistakes and misunderstand basic concepts.  Thankfully Luke is quite a bit better at it than I am.

So anyway.  We were trying to figure out the cause of the error message "ip_conntrack: table full, dropping packet."  I mean, obviously it comes from iptables' connection tracking, and tells us that it's dropping a packet because the table is full.  Turn to proc:

 # cat /proc/net/ip_conntrack | grep "^udp" | wc -l
 31250

Uh-huh.  Long story short, it was pretty obviously a hammered DNS server, running in a domU.

The problem can be "solved" by adding a rule to disable connection tracking for UDP packets to the "raw" table.

 # iptables -t raw -I PREROUTING -p udp -j NOTRACK

At this point we broke local DNS resolution.  Changing NOTRACK to ACCEPT fixed it again.  Luke figured out the problem in short order, which is that the request goes out on a high-numbered port, the answer comes back, there's nothing to associate the two.  It was really a goddamn epiphany.  So we've got a way of solving our problem, but it's not exactly clean.

For right now, we're just increasing the limit on tracked connections until we figure out an iptables rule that allows us to turn off tracking for the FORWARD chain.

Ideologically speaking, we're not really interested in firewalling the clients' machines.  Protecting our network, sure, but anything beyond that is your problem.  Run whatever filters and firewalls you like.  We're not really interested in connection tracking, and will be turning it off as soon as we figure out a way of doing so without disruption.
Alternate title:  Chris is smart.

So, upon opening my e-mail this evening (I'm still a little shell-shocked from dayjob pager last week, so I'm getting a late start of it.)    I see a support question from a customer who complained of the above error.  I see nothing obvious;  Chris looks at it and immediately notices that xenconsoled is not running.    as root:

/usr/sbin/xenconsoled

problem solved. 

About this Archive

This page is a archive of recent entries in the troubleshooting category.

todo is the previous category.

Find recent content on the main index or look in the archives to find all content.