April 2010 Archives

So I had an incident the other day that highlighted a problem prgmr.com has. See, my standard pricing is $1/month per every 64MiB ram and 1.5TiB disk, and proportional share of CPU, plus another $4/month for abuse and support costs.

Now, the first problem here is that the small customers are usually okay with $4/month worth of support. I mean, they aren't paying me much. Larger customers, on the other hand, believe they are entitled to better support because they are paying more. There is a disconnect there, as from my point of view, they are paying the same $4/month. (For me, small customers who don't need help are by far the most profitable customer demographic.)

The second problem is that the support you get for $4/month is not very good. I have skimped on coverage rather than skimping on technical skill, but either way, my support sucks. You often have to wait a while to get an answer, which is bad.

Now, a big part of that is that sometimes it is a skilled job to figure out if the problem is mine (prgmr.com hardware or network) or yours, so having some unskilled worker online 24/7 won't really help us any more than an autoresponder would.

However, having crappy support means my service is significantly less useful. As I grow larger coverage will improve; I mean, as it stands I probably get less than 10 tickets on a day with no new signups, so it's hard to justify hiring more people.

So, yeah; the incident was that some guy was disconnected because he was attacking other hosts. Now, almost always this just means the guy was compromised and is now part of a botnet.

So we emailed the guy and shut him down. the thing was, his email was on the VPS, so he never got that.

Now, in this case, the guy was past due. "The check is in the mail" so I'm less sympathetic. often, we email and then shut down 12 hours or so later. (this is a very 'soft' policy... it all depends on how 'legitimate' the customer looks. If you have an account name of 'bestwatches' well, we tend to shoot first and ask questions later. God, I hate watch spam. This customer got vetted to the 'shoot first' pile, I believe, because he was past due, though there was an account note that said the check was in the mail, so maybe that was a mistake.)

This is, I think, a reasonable abuse setup for $4/month. Note, I've never shut someone down for a mistaken abuse report (I've forwarded a few to customers... that was embarrassing.)

However, this clearly damages the value I provide to businesses.

Now, the first thing I need to fix is the email notification. Emailing a dude when I'm taking down his mailserver is just stupid. I think I need to start calling people when I shut things down.

Good god I hate the phone. And I think Nick hates it even more, and I don't really feel good about making employees talk to irate customers, so I'd be making the calls. But we're talking maybe one or two of these things a week at a thousand customers, so it's not that big of a deal for me to do it myself.

Next, right now, the policy is to boot 'em into single user mode; I think we should leave them with a fresh image, and their old image mounted read-only. Otherwise, they are much less likely to actually fix the problem. (you always format and re-install after a compromise) Of course, this might piss off some customers even more. so many people think they can run 'virus scan' and be okay again.

That's still a pretty poor way to treat hacked customers. "sorry dude, you got hacked. Here, I finished the job. go now and rebuild from scratch" but anything else (well, besides booting 'em into single user mode and letting them deal with it, which just pushes the cost on to the customer) would be super expensive.

As I grow, support coverage will improve. I'll eventually get a support person from somewhere else who can look at tickets during the pre-noon hours when Nick and I are not available.

I'm especially looking for feedback on how to handle compromised domains. The more I think about it, the more I like the idea of giving the customer a completely fresh install with the old image mounted read-only for a certain number of days so they can retrieve any data that isn't backed up.

here is what my upstream had to say:

"Dear Colocation Client -

Between the hours of 2am and 4am this morning, LIS's network was
subject to a distributed denial-of-service attack which resulted in
latency. The issue has since been resolved.

Sincerely,

LIS Colocation"

Most customers are at SVTIX right now, and were thus unaffected.