Photo for Glenn Fleishman

Blog

Writing

What I Do

Biography

GlennLog

Turning technology from mumbo-jumbo into rich tasty gumbo

� Finally Sleeping | Main | Ben Wears a Hat �

November 22, 2005

Note to Self: Stop Upgrading Linux

This is a note to myself in the future, because I apparently keep forgetting.

I have three Linux rack-mounted servers at a co-location facility run by digital forest, a company the name of which I shall embroider in gold thread on a sampler to be mounted above my office computer. I've had a few peculiarities with them recently, including a weird problem in getting PHP to work reliably in specific ways in conjunction with Apache, so I decided to upgrade to Fedora Core 4 from Red Hat 9. Fedora is a project that continues the open-source, freely available Linux distribution that was at the heart of Red Hat's software. Red Hat now focuses on supported-driven commercial versions of Linux designed for enterprises. (In fact, I also bought a copy of their Enterprise Linux distribution for one of these servers, a plan I'm rethinking the deployment of.)

After performing some tests, including a full upgrade of a plain Red Hat 9 installation on an unused Linux box, I went in Sunday night expecting to be there three to four hours in the best case and six to seven in the worst case. Instead, I was there from about 7.45 pm til about 6.45 am.

What went wrong? I forgot a lesson I'd learned before. Despite many people's experiences in having successful Linux upgrades, including using the yum software update tool to upgrade Linux while it was running and then reboot into an entirely upgraded installation, I have rarely had good luck. Linux has poor revert positions. The Red Hat and Fedora installers don't leave an intact system, but rather write over software as they go. Unlike Mac OS X, Archive and Install option, and it's basic behavior of not rewriting boot blocks and replacing items until the installation is complete, Red Hat/Fedora just plow ahead. I should probably look into safer Linux distributions, but Red Hat works fine for me as a platform.

My path should have been to migrate services from one of the boxes to another copying all non-system data. Then wipe that box and install. In the event of failure, I'd still have working services. With a successful install, I'd customize it with my settings and move services back. Repeat as need be.

Instead, I wound up with two servers down and hours of unhappiness in working through the difficulties of sorting out what went wrong. The best thing I had was an installer that let me run "linux rescue," a limited version that let me mount volumes, copy files, and try again.

I had made good backups before starting of critical files on a fourth server, an Apple Xserve, and thus wasn't worried about losing critical data. Less important data has been backed up digital forest and I need to review whether any of those files need retrieval. Probably just a few.

I left in the morning (during rush hour, no less) having gotten my third machine to run all the services save one that the other two handled. This allowed me to bring up my most important Web site. The one I had to leave down with an apology note in response to all requests was isbn.nu, which has a high database dependency.

Before leaving, I'd sent email to Penguin Computing, the source of all goodness I now know, about the problems. By the time I got home, they'd answered my emails, and a series of email and phone calls continued after I woke up from three hours sleep to head back in. With their advice, we determined that there was probably a hardware fault with my highest performance server, which used SCSI instead of IDE. It had a SCSI card not supported in Red Hat 9, so they had installed a customized version. But Fedora Core 4 and the Red Hat commercial version should both have worked fine. (I managed to do a full install of FC4 which hung on reboot at the SCSI load stage, while the commercial Red Hat booted once, refused to join a network, and then hung on reboots.)

They helped get me to a position where I was able to wipe and install one of the servers with Red Hat 9 and get it to a working position. Last night after my boy went to sleep, I turned it into a database server and re-enabled isbn.nu, which is a real moneymaker.

Today, Penguin issued me an RMA for the SCSI-equipped server, and the fine folks at digital forest unracked it, packed it, and shipped it overnight for me. Despite me installing unsupported software, the Penguin folks are going above and beyond by helping out. They may replace the SCSI hardware--I have a 3 year on-site warranty, so they could have sent a tech, but that tech doesn't do software--but they'll also wipe and replace the faulty OS and make sure it works.

So my lesson learned, I say to anyone reading this far and my future self: Copy, wipe, install, restore. No more of this upgrade nonsense for production systems. Life's too short for server room all nighters. The other tip: never try to handle two servers at once. If I'd tried on one and it had failed, I could have moved databases with great ease. Instead, I tried on two at the same time.

All praise to digital forest, for being a great, great place, and having a 24 by 7 network operation center (new since their move to south of Boeing in a wonderful new facility) and Penguin Computing for their prompt and incredibly helpful efforts to get me running.

Posted by Glennf at November 22, 2005 10:32 PM

Trackback Pings

TrackBack URL for this entry:

Comments

May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Recent Entries

Archives


May 2008 | April 2008 | March 2008 | February 2008 | January 2008 | December 2007 | November 2007 | October 2007 | September 2007 | August 2007 | July 2007 | June 2007 | May 2007 | April 2007 | March 2007 | February 2007 | January 2007 | December 2006 | November 2006 | October 2006 | September 2006 | August 2006 | July 2006 | June 2006 | May 2006 | April 2006 | March 2006 | February 2006 | January 2006 | December 2005 | November 2005 | October 2005 | September 2005 | August 2005 | July 2005 | June 2005 | May 2005 | April 2005 | March 2005 | February 2005 | January 2005 | December 2004 | November 2004 | October 2004 | September 2004 | August 2004 | July 2004 | June 2004 | May 2004 | April 2004 | March 2004 | February 2004 | January 2004 | December 2003 | November 2003 | October 2003 | September 2003 | August 2003 | July 2003 | June 2003 | May 2003 | April 2003 | March 2003 | February 2003 | January 2003 | December 2002 | November 2002 | October 2002 | September 2002 | August 2002 | July 2002 | June 2002 | May 2002 | April 2002 | March 2002 | February 2002 | January 2002 | December 2001 | November 2001 | October 2001 |

Powered by Movable Type 3.33