June 2008 Archives

the samsung 750G drives (with 32M cache)  are almost twice as fast as the segate 750G drives (with 16M cache)  -  even after removing the limiters (the seagates came with jumpers
that limited them to 1.5G sata1) 

Anyhow, I've started to put system domains on the new server, Boar, and they are looking pretty good. 

new server status

| | Comments (0)
so we got the new servers installed and setup... and then we realised that disk was about 1/2 the speed we requested, and more importantly, there was serious problems with random access... copying /dev/zero to the disk would lock things up to the point where you couldn't even log in on another vt.   

Obviously we're not putting customers on it until we figure it out; the server is in the garage right now for testing. 
The server is going through burn-in as we speak

as I mention on the main page, we ran out of space the other day.    we are putting in a new server, boar, and one of my ancient catalyst switches, with 'port monitor' or SPAN capabilities,
so I will be un-breaking the bridge on lion, and bandwidthD and my inward-facing IDS will both continue to function. 

This will require us to physically re-configure the network (just moving cables-  if we don't
screw it up, downtime should be less than 60 seconds-  no reboot or anything,  just a few dropped packets.)

traffic shaping: yet again.

| | Comments (1)
day-shaped.png

Traffic shaping seems to have been vaguely successful.  We're trying EXTREMELY HARD to keep our 95th percentile below 10mbit, since it appears that going beyond that will cost us incredible added monies.

Note that I don't really want to limit to the ~2mbit outgoing that's being used now -- I'm not sure why the outgoing traffic is limited so hard, and I haven't got a test domain on that box to find out.  The goal is to put the dangerous customers into an "overage class" that gets a shared 10mbit, and let them fight over it -- but they should be getting 10mbit.  I'm theorizing that maybe their traffic relies on high incoming bandwidth -- but I don't really know.  I should be able to test once Luke wakes up.

The next thing I want to do in this area is generalize the script somewhat and make the domain creation scripts call it at boot.  I'm not sure what the done thing is here -- I'm guessing making it a vif parameter would be best, then having that pass through vif-bridge and call a vif-qos script (or similar.)

Okay, I guess that's what I'll do then.

traffic shaping: round 2

| | Comments (0)
Started whacking domains with the b&width hammer, as per the directions noted a couple entries ago.

Further refinements will probably involve putting quotas directly in config files, with scripts to parse and automatically set limits at domain creation.  Ideally I'd also write a tool to re-assign domains to existing classes.
This title is, of course, a complete lie.  It looks like our disk layout scheme broke LVM snapshots.  To quote from our testing:

# lvcreate -s -L 100M -d hydra_domU/test -n test_snap
  Snapshots and mirrors may not yet be mixed.

That's some real well-supported technology there.  Google gives me two results for that error message, both of which are source diffs.

I'm not sure what to do about this.  Every so often I feel like abandoning LVM mirroring entirely and moving to LVM on MD, but that didn't exactly fill us with joy either.

I'm also considering bypassing the LVM-specific snapshot implementation and using the device mapper directly, but that worries me.  I would want to know why snapshots and mirrors can't be mixed before implementing snapshots anyway.
Today I put my money in my mouth and worked on traffic shaping.  I'm not 100% sure that this setup is correct -- we'll have to test it more before we put it in production.  Tentatively, though, here's how it works:

We're doing everything in the dom0.  Traffic shaping is, after all, a coercive technical solution.  Doing it in customer domUs would be silly.

First, we have to make sure that the packets on xenbr0 traverse iptables:

# echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

This is so that we can mark packets according to which domU emitted them.  (There are other reasons, but that's the important one in terms of our traffic-shaping setup.)

Next, we limit incoming traffic.  This is the easy part.  To limit vif "baldr" to 1mbit /s, with bursts up to 2mbit and max allowable latency 50ms:

# tc qdisc add dev baldr root tbf rate 1mbit latency 50ms peakrate 2mbit maxburst 40MB

This adds a queuing discipline, or qdisc, to the device "baldr".  Then we specify where to add it ("root",) and what sort of qdisc it is ("tbf").  Finally we specify the rate, latency, burst rate, and amount that can go at burst rate.

Next we work on limiting outgoing traffic.  The policing filters might work, but they handle the problem by dropping packets, which is. . . bad.  Instead we're going to apply traffic shaping to the outgoing physical Ethernet device, peth0.

First, for each domU, we add a rule to mark packets from that network interface:

# iptables -t mangle -A FORWARD -m physdev --physdev-in baldr -j MARK --set-mark 5

Here the number 5 is an arbitrary integer.  Eventually we'll probably want to use the domain id, or something fancy.  We could also simply use tc filters directly that match on source IP address, but it feels more elegant to have everything keyed to the domains "physical" network device.  Note that we're using physdev-in -- traffic that goes out from the domU comes in to the dom0.

Next we create an HTB qdisc.  We're using HTB because it does what we want and has good documentation (available at http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm .)  We won't go over the HTB options in detail, since we're just lifting examples from the tutorial at this point:

# tc qdisc add dev peth0 root handle 1:  htb default 12

Then we make some classes to put traffic into.  Each class will get traffic from one domU.  (As the HTB docs explain, we're also making a parent class so that they can share surplus bandwidth.)

# tc class add dev peth0 parent 1: classid 1:1 htb rate 100mbit
# tc class add dev peth0 parent 1: classid 1:2 htb rate 1mbit

Now that we have a class for our domU's traffic, we need a filter that'll assign packets to it.

# tc filter add dev peth0 protocol ip parent 1:0 prio 1 handle 5 fw flowid 1:2

At this point traffic to and from the target domU is essentially shaped.  To prove it, we copied a 100MB file out, followed by another in.   Outgoing transfer speed was 203.5KB/s, while incoming was about 115KB/s.

This incoming speed is as expected, but the outgoing rate is a bit high.  Still, though, it's about what we're looking for.  Tomorrow we'll test this with more machines and heavier loads.

New rdns policy

| | Comments (1)
so, having rdns point back to me makes handling abuse reports much easier (that is, it makes it much more likely I will get the complaint rather than my upstream)  -  so I am going to require you to stay on a .xen.prgmr.com rdns until you have been a paying customer for 3 months. 

Like everything, exceptions can be made, but if I don't know you, it's three months (or you can pay up-front for three months, with the understanding that you won't get it back if I shut you down for AUP violations.) 

the prgmr.com AUP:
http://prgmr.com/aup.html

pretty standard, except for the bit where I prohibit all bulk mail without my approval. 
I'm not interested in hosting even most double-opt in lists-  most of the larger lists, even if they are legitimately double-opt in, generate more complaints than I am willing to deal with at these prices.    If you are a legitimate mail sender, I would suggest you start with http://isipp.com 






monthly priceramDiskNetwork transfer
$5 64MiB5GiB40GiB
$9128MiB10GiB80GiB
$13256MiB20GiB160GiB
$21512MiB40GiB320GiB
$32102480GiB640GiB
$642048160GiB1280GiB
$1284096320GiB2560GiB

The new prices are only good on the new core2quad boxes, so existing customers may need to move-  also you will need to be added to the new billing system.  (users who signed up
within the last week will automatically be billed the new, lower rate when they come up for renewal.) 

Snort IDS installed.

| | Comments (0)
One way to make your network unattractive to spammers is to make setting up new accounts more expensive for the abuser- either through collecting AUP violation fees, or through high setup fees. Of course, this is difficult with the real black-hats, as they usually pay with fraudulently obtained credit cards. It works ok for the 'grey' spammers- those who mail people who 'opted-in' when they bought something, and now get tangentially related offers.

Another way to do it is to be more proactive about disconnecting abusive customers. See, most of the time, one can expect 4-24 hours between when the abuse is reported and when the provider does something about it- and in my experience, it takes quite a lot of abuse to get a complaint- sometimes the abuse has been going on for a week or more before it hits someone with the spare time and the knowledge to complain.

So my thought is this: why not run an IDS system, but instead of alerting on the constant stream of abuse coming in from the Internet, alert on abuse going out from your customers? you could even then automatically kill the ports belonging to obviously compromised or abusive hosts.

So that's what I did tonight. I setup a VPS on my new server, set my bridge to not remember MAC addresses (that is, I turned it into a hub) and installed snort on that VPS. Right now, it's pretty much just using the default rules and scanning all traffic, incoming and outgoing. Next, I need to set up some good e-mail rules (I want to allow people to run secondary MX servers, but I want to prohibit mailing lists beyond a certain size without my prior approval... I've not quite figured out how to do that.)

I figure if I'm going to be watching you, I should give you something back- so I have decided to give you access to see the snort alerts about people from the Internet trying to attack you. If you are on lion and interested, let me know via email.

I've got a shell script parsing the output, and putting it in a file for each user to watch, if they like. If it encounters an attack coming from one of my e-mail addresses, it e-mails me, meaning a worst-case response of around 8 hours. That's not a great response time if you count from when someone files an abuse report, but if you count from when the abuse starts (and that is what is happening) 8 hours isn't bad at all.