Recently in Business Category

on logging serial consoles.

| | Comments (7)

So every now and again a customer will complain of a crashing domain. Occasionally, it is an early sign of a hardware problem that I need to deal with, so I don't want to just ignore it.

Now, the problem is that like a physical server, once the domain has rebooted, most of the information about why it crashed is gone. (and what little is left is in /var/log on the guest, and as a general rule we don't like mucking around in the guest. that's your business, not ours.)

Now, on a physical server, we solve this by using a logging serial console. (I reccomend opengear if you have the money, and a used cyclades if you don't have money. the 'buddy system' (making one server the console server for the next, then the next server the console server for the first) usually requires adding usb serial dongles, but is even cheaper still, for installations with only a few servers. I personally like the IOgear brand usb -> serial dongles Fry's has.

I can turn on debug logging in xenconsoled and that will log the console for all domains to a file (one file for each domain) then I can use those logs to troubleshoot the problem. The thing is, apparently some people have privacy concerns with this, so I haven't done it yet.

Now, personally, I don't think serial consoles are that sensitive. I mean, it's common to leave terminals in data centers where passers by can see the output. They will allow me to see what program is crashing, which may be sensitive, and depending on how you have the thing configured, I can see when people log in and log out.

So, I have several options.

  1. I could leave it as is, continue to go back and fourth and guess if someone asks me why something crashed after a reboot
  2. I can log all consoles and delete the data once a week or once a month
  3. I can apply a patch to log some people's consoles and not others, and let the user decide

Obviously, option 2 makes my life a /whole lot/ easier. Option 3 is better than option 1, but it still means maintaining an out of tree xenconsoled (or pushing it upstream)

So I had an incident the other day that highlighted a problem prgmr.com has. See, my standard pricing is $1/month per every 64MiB ram and 1.5TiB disk, and proportional share of CPU, plus another $4/month for abuse and support costs.

Now, the first problem here is that the small customers are usually okay with $4/month worth of support. I mean, they aren't paying me much. Larger customers, on the other hand, believe they are entitled to better support because they are paying more. There is a disconnect there, as from my point of view, they are paying the same $4/month. (For me, small customers who don't need help are by far the most profitable customer demographic.)

The second problem is that the support you get for $4/month is not very good. I have skimped on coverage rather than skimping on technical skill, but either way, my support sucks. You often have to wait a while to get an answer, which is bad.

Now, a big part of that is that sometimes it is a skilled job to figure out if the problem is mine (prgmr.com hardware or network) or yours, so having some unskilled worker online 24/7 won't really help us any more than an autoresponder would.

However, having crappy support means my service is significantly less useful. As I grow larger coverage will improve; I mean, as it stands I probably get less than 10 tickets on a day with no new signups, so it's hard to justify hiring more people.

So, yeah; the incident was that some guy was disconnected because he was attacking other hosts. Now, almost always this just means the guy was compromised and is now part of a botnet.

So we emailed the guy and shut him down. the thing was, his email was on the VPS, so he never got that.

Now, in this case, the guy was past due. "The check is in the mail" so I'm less sympathetic. often, we email and then shut down 12 hours or so later. (this is a very 'soft' policy... it all depends on how 'legitimate' the customer looks. If you have an account name of 'bestwatches' well, we tend to shoot first and ask questions later. God, I hate watch spam. This customer got vetted to the 'shoot first' pile, I believe, because he was past due, though there was an account note that said the check was in the mail, so maybe that was a mistake.)

This is, I think, a reasonable abuse setup for $4/month. Note, I've never shut someone down for a mistaken abuse report (I've forwarded a few to customers... that was embarrassing.)

However, this clearly damages the value I provide to businesses.

Now, the first thing I need to fix is the email notification. Emailing a dude when I'm taking down his mailserver is just stupid. I think I need to start calling people when I shut things down.

Good god I hate the phone. And I think Nick hates it even more, and I don't really feel good about making employees talk to irate customers, so I'd be making the calls. But we're talking maybe one or two of these things a week at a thousand customers, so it's not that big of a deal for me to do it myself.

Next, right now, the policy is to boot 'em into single user mode; I think we should leave them with a fresh image, and their old image mounted read-only. Otherwise, they are much less likely to actually fix the problem. (you always format and re-install after a compromise) Of course, this might piss off some customers even more. so many people think they can run 'virus scan' and be okay again.

That's still a pretty poor way to treat hacked customers. "sorry dude, you got hacked. Here, I finished the job. go now and rebuild from scratch" but anything else (well, besides booting 'em into single user mode and letting them deal with it, which just pushes the cost on to the customer) would be super expensive.

As I grow, support coverage will improve. I'll eventually get a support person from somewhere else who can look at tickets during the pre-noon hours when Nick and I are not available.

I'm especially looking for feedback on how to handle compromised domains. The more I think about it, the more I like the idea of giving the customer a completely fresh install with the old image mounted read-only for a certain number of days so they can retrieve any data that isn't backed up.

we're still backlogged a little, so provisioning will be slower than usual for a while longer, but the delay shouldn't exceed 24 hours, so you can order now if you like.  
soon being within a small number of days.   Tuesday at the latest.  Cogent and IPv4 only, but it has been a reliable site for me.  the guy who runs rippleweb is pretty good.  

double billing

| | Comments (0)
invoices went out tonight to everyone who paid me yesterday.    Fortunately, I use paypal;  this means that unless you pay those invoices twice, you are not getting charged twice.   This is why I use paypal instead of credit cards;  because my billing system is not setup properly (I know that's no excuse for a billing system in pieces like this, but I think this is better than charging your credit cards twice) 


the necessity of Kool-aid

| | Comments (0)

I'm not the type to drink Kool-aid, so in running my business, I've not made any Kool-aid. I try to remain sober and keep in mind my advantages (and my competitor's advantages) when asked, I give an honest overview. This sometimes costs me deals, sometimes it costs me employees, but overall I think maintaining that humility is a win.

Like most technical people, I have a 'there is nothing new under the sun' attitude- almost everything is an incremental improvement over something else (certainly almost all businesses are)

Now, a lot of time that incremental improvement makes a big difference- it makes a lot of sense to focus on improving your incremental improvement as much as possible, and really, it makes sense to focus on marketing that incremental improvement, as that is the value you are providing over your competitors.

But many companies seem to want to put themselves forward as some kind of revolution... sometimes they ignore the good parts of how things have always been done and end up with a product that is better in some ways, but that fails in some of the basic ways that they would not have failed in had they stuck with 'what exists now plus our incremental improvement' rather than trying to re-invent wheels.

Other companies do the (I think rational) 'take what is common now, add some incremental improvements, sell' but then market it as if it was something completely new.

I give the example here of the so-called "cloud computing" providers- (You probably shouldn't call prgmr.com a cloud computing provider, not until we slightly improve our provisioning system, at least. I've been looking at eucalyptus as the way forward, as I think an API that is compatible with several providers is essential to providing a product that is actually useful to consumers.)

Cloud computing is [virtual] dedicated servers, with the incremental improvement of a nice provisioning system. Now, that's a real and very useful incremental improvement, (and it does enable some fundamental changes in how data center space is thought about and managed) but you can take what you are doing on your current [virtual] dedicated servers and put it on a server at a 'cloud computing' provider with no other changes. Assuming the cloud computing provider gives you static IP addresses (most do, frontend servers without a stable set of IPs is generally considered a Bad Idea)

(this is why I see ec2 as a competitor even though they are a 'cloud' and I am not. I realize they don't know I exist, yet. But that's ok. I was here first, but nobody calls me a serious business guy. It's been over five years since I've worn a tie.)

Now, this incremental improvement of good provisioning is something that you have been able to implement on your own with your own hardware for a long time now. look at tools like cobbler and koan. (or really, if you need to clone your servers, systemimager.) Functionally about the same thing, but you needed access to your own boot server infrastructure. The new part is that there is now an easy way to do this with a small number of servers, and the cloud computing providers maintain the provisioning system for you (rather than you maintaining your own boot sytem, using systemimager or whatever other tools you like.)

but setting up a good provisioning system used to require a good bit of server and sysadmin infrastructure. the cloud computing providers have removed most of that barrier. (you still need SysAdmin resources on the application level to scale- taking a webapp from running on one server to running on two servers requires application level thought. Depending on the application, it goes from trivial to very hard. But that is a problem that must be solved at the application level.)

Charity and advertising

| | Comments (0)

Running a business, of course, I need advertising. Now, personally I think that charity and advertising can go hand in hand. Doing something good often gets you press in ways that are more valuable than the kind of press you can directly buy. A linux user group saying that they host on my server is probably worth quite a lot of those pay-per-click search ads. It is cheaper for me, usually, and it supports causes I like. I like Open-Source software, and I recognise that it does need some support from commercial entities, and I also recognise that commercial entities like prgmr.com would not exist without open-source software. (for that matter, I wouldn't be able to do my dayjob without open-source software. Open-source is what allows me to be drastically more productive (and thus get paid more) than a windows reboot monkey.)

If you are running a computer or open-source related project that is generally not profit-seeking (you don't need to officially be non-profit... I'm writing off the cost of hosting you as an advertising expense, rather than writing off the retail value of the package as a charitable donation. Smaller writeoff for me, but much easier to defend.) email lsc@prgmr.com, and maybe we can reach an agreement. At this moment, I'm not in a position to hand out free images that consume more than 10Mbps on the 95th percentile, but I hope to change that soon.

About this Archive

This page is a archive of recent entries in the Business category.

billing is the previous category.

hardware is the next category.

Find recent content on the main index or look in the archives to find all content.