What we’ve come to..

So I’ve been watching a bit of “Americas Got Talent” the last week or two.. and tonight, I stumbled across some of the videos on YouTube for “Britain’s Got Talent” from earlier this month.


This one I ended up in tears afterwards.


This one I ended up with “goosepimples” and tears with as well.

And now, ladies and gentlemen, I ask you to consider what “America” thinks is Talent.

I think I’m moving to the Britain.


ZOMG!!! WHAT A DAY!

Today was one of those days.. when everything goes wrong. Believe me…. Everything. Went. Wrong.

3am – pages started coming in that our equipment up on the tank in Bloomington Hills was having packet loss, going up and down, etc. Appears the backhaul must have blown out of alignment with the fun winds we’ve been having. I noticed our Candy Cane backhaul had been down since midnight. (sorry tomp.) I was tempted to go up to the candy cane and see what I could see, however, it was dark, and scary climbing rocks at night, with the wind. I opted to wait until the morning.
5am – page about BGP peering going down.. back up 5 minutes later.
6am – pages coming in about power outages on our UPSes. Then pages few seconds later about power going back to normal. This and that. For a few minutes. annoying. Then our Spam database (postgresql) server rebooted (his name is Gibraltar). We determined that this is due to the 2-4ms time that it takes our UPSes to switch to battery during an outage. This server is a power hungry monster (16xSAS 15kRPM drives, dual woodcrest cpus, 8GB ram, tons of fans). We figured it must not be able to handle a 2-4ms switchover (lack of capacitors? who knows). So, it took some hand holding to make sure that server came up right. Otherwise, mail wouldn’t flow.
6:30am – Once Gibraltar came up, I started ‘cvsup-ping’ the OS changes. It stopped after the first file and I logged into the CVSup server (mirrors). Noticed that the ethernet had been going up and down. What was really odd, was the “ was using my IP!” error. For some reason, this box (mirrors) had stolen the IP from Quillo (our old 1U web server box). There is no way he could have done that, since it had to have happened when I had started cvsup-ping. Soo, weird. Anyway, I fixed that issue on Mirrors, Gibraltar started cvsupping and all was good.
7:30am – Left home. Went to office to get ladder. Called Cory @ SGCity to get keys to the gate so I could drive my 4WD truck up. Upon arriving at the office right before 8am, Dan (one of our techs) said all the machines had been powered off and were having issues logging in. After a reboot of his machine, it resolved itself.
8:00am – Had Randy call SG Water to get up on the tank to check alignment. Drove up to candy cane. It appears that our backhaul unit that feeds our CandyCane access had blown over in the wind. Six concrete lag screws that were put into the concrete roof of our little ’shack’ had all popped out. I tried rigging it back up with rocks and dumb tie down wire.. which worked until Randy came up with a “non-penetrating roof mount” around 9am.

It was one thing after another this morning. I ordered an “online” UPS (minuteman brand) to fix that 2-4ms problem for the one server that can’t handle it. Hopefully, it’ll be here before another “reboot”. The weird thing about that issue is that I have dual power supplies, plugged into different power strips, plugged into different batteries, on different power circuits. Yet, I still have the issue. I’m hoping to plug one of those power supplies into this new “online UPS” and that’ll solve my problem. *fingers crossed*.

Oh and our installer quit yesterday…well, we think. He just dropped off the truck with the phone/laptop/keys inside of the truck in the parking lot, without telling anybody. Soo congrats tomp on starting Friday. May your future be filled with many-a-non-problematic-install.

New hardware..

I had ordered two new servers (non-RAID, diskless) to replace two servers that like to crash and reboot whenever they darn well please. (see previous post).

Early this morning (12am to 1.30am) I swapped them out. The first server was ‘Zahara’, one of our mailbox storage servers. It’s where 1/4 of our mailboxes reside. A quick RAID card change, and disk swap and it was back up. No issues.

The second server was ‘Cobre’, our CPanel hosting server. This one was running an older (3ware 8506) RAID card, and I wanted to upgrade to a newer (3ware 9500S) RAID card to get better everything. Luckily for me 3ware released a ‘convert.exe’ file that will convert your old 8500 series RAID arrays into 9500/9550SX arrays. All it took was downloading the convert.exe to a floppy, making a boot floppy, and rebooting the box (before I swapped chassis and RAID cards) and running ‘convert *’. It marked the RAID array as workable on the new 9500 card. Yeah, I know. Too technical. But it was pretty amazing that it worked. I had tried it earlier at the office to make sure it would save my data, and it did.. but there’s always that chance that it might not work and then there goes my morning.. rebuilding a server from backups. Ugh.

Anywho.. after I ran the convert.exe script, I swapped the 4 drives into the new server, powered it up and all was good. Until it kernel panicked. Yeah, I know. It sucked. Sucked hard. I was worried that it’d still panic on startup each time. So, reboot into single-user-mode, fsck my drives, did a ‘make installworld’. I had not done that since installing my new kernel with support for the new RAID card. After that, another reboot and all came up. Well.. except my big Catalyst 6513 had some weird ARP issues with the IPs that the cpanel server was asking for. It just didnt end, really. Well.. it did.. around 1.30am. After a fun hour and a half in a cold room working with hardware.

Sometimes I just want to be a chef.

Sunday Morning MySQL issues..

You’d think I get to sleep in most Sunday mornings. Well, I do. Most Sundays, that is. However, this Sunday is a different story. Our hosting server (cpanel) has had issues for quite some time and likes to reboot itself at random times.. usually it comes up fine. This case it did not.

The server runs ‘fsck’ to scan the drives for any errors when it reboots, since it doesn’t unmount them on the crash. I know, it sucks. This time, my data in the /var/db/mysql/mysql folder had some issues. The mysql permissions tables are what resides in there. So MySQL refused to start. What a joke.

So I moved the mysql folder to mysql_old, re-ran the mysql_install_db script to regenerate the permissions table. Then set my mysql ‘root’ password. Lucky for me, cpanel backs up the MySQL user privileges in the users backup directory every night (or is it morning?).

So some quick bash scripting later:

for blah in $( ls /backup/cpbackup/daily/*/mysql.sql ); do
echo "Restoring.. $blah"
mysql --password=secretmysqlpassword mysql < $blah
done;

And I have all my user permissions back in the database. It probably would have been easier had I had a recent backup of the mysql database. But this way, I have the passwords in tact.

Nothing like writing some bash scripting, and fixing server issues at 6:30am on a nice sunday morning.

Back to bed..