Archive

Posts Tagged ‘Frustration’

HD Crash…

February 17th, 2010 No comments

So…. Here we go again: that all too familiar feeling of ‘uhoh’…. I was working on my PC the other night, doing a routine upgrade of KDE to the newly release version 4.4, when all of a sudden aptitude started spewing loads and loads of errors about not being able to save files and giving read-only errors all over the place. A quick check with mount and dmesg confirms my suspicion: a hard drive is giving problems and the file system has been remounted read-only. Dmesg confirmed that it was the drive timing out on commands, giving mentions that it was unable to remap a sector.

Now, this puts me into a bit of a bind. This is an olderly PC that I scrapped together from a Dell Dimension 8400 and an additional HD and SATA card. As it doubles as the media-server for my XBox, it has to have some additional storage, hence the extra harddisk. The extra harddisk was a 300Gb one, the system harddisk it came with (which started failing) was a 160Gb one. I’ve routinely run LVM2 on my system exactly for this case, being able to add additional diskspace easily without fiddling with mounting additional drives, running out of space on one partition while swimming in space on another one. This time, that may have gotten the better of me though. I don’t run LVM in a RAID setup, so if one drive fails, I’m up sh*t creek. My only luck may have been that I only very recently added that disk, and I hadn’t really started using it.

So, now I need to recover this data somehow. My plan of attack is as follows:

  • Put in another drive
  • Move the entire contents of the failing drive to the new drive
  • Try to reconstruct the LVM data from there.
  • Profit! :)

Step 1: find a new drive

We need a donor drive to transfer the data too. Preferably, use a drive that is bigger than the original drive. Nothing would be worse when you try to rescue your precious data and finding out that you are a few sectors short on the new drive. Even if both drives are advertised as being 160Gb drive, that doesn’t mean they will have the exact same sector count! When we have the drive, hook it up to the system

In my case, I bought a 250Gb Samsung drive, only to find out after I built it into my machine that it was a) refurbished (it had a complete Vista install on it) and b) dying! I immediately got a warning from Ubuntu that the SMART data indicated that the drive was failing… Not good. I removed the drive and replaced it with a slightly larger 160Gb drive I had lying around. I will replace that drive when I get a replacement for the refurbed drive, but by then, I can hopefully use the ‘pvmove’ commands.

When connecting the drives, make sure that you hook up the donor drive to the cable that was originally connected to the failing drive and connect the failing drive with an extra cable. This way, the device enumeration under Linux would be identical and the partitions we salvage will appear on the device files that LVM expects.

Step 2: re-create the partitions you had on the drive

This step, in retrospect, may not have been necessary. I booted the mediaserver into a bootable Ubuntu 9.10 CD-image and re-created the partitions using Gparted, ensuring that each partition was slightly larger than the corresponding partition on the failing disk. Note: gparted will complain that it is not capable of working with LVM2, but you can ignore that.

Step 3: get a copy of dd_rescue

For the copying of the data to succeed, we need a copy of dd_rescue. Dd_rescue is a clone of dd, that copies data from a file or device and does not abort when it encounters a read error. Instead, it will fall back to a ‘sector-at-the-time’ reading mode, log the error and continue. Additionally, it has the option of reverse-copying, starting at the end of a device and working its way backwards, so that you can approach a problem area from both sides, maximizing the amount of a file or device that can be read.

Ubuntu 9.10 does not come with this by default, but the sources are tiny and luckily our live-CD includes a working gcc compiler. Compiling it is a breeze, just consisting of a ‘make’ call. We run it is follows:

$ sudo ./dd_rescue /dev/sdc1 /dev/sdb1

Of course, you will need to change /dev/sdc1 and /dev/sdb1 into their appropriate device files. Also, be very careful about the order  of these arguments. The first is the input file, the second argument is the output file. Mess these up and you are overwriting the data you want to recover with whatever was on the donor drive!

I had two partitions, so I executed this command twice. In retrospect I could have just specified the device files for the entire harddisk (i.e. /dev/sdc and /dev/sdb) and dd_rescue would have duplicated everything (including the partition table).

I got around 40 read-errors on the first partition, and about 520 on the second partition. Now we have the drives duplicated, but don’t throw the faulty drive away just yet! If we manage to cock up the LVM restore, we can re-attempt the saving.

Step 4: start salvaging using LVM

Now we have a copy of our physical volumes. We will try to activate the volumegroups they belong to by executing vgscan:

$ sudo vgscan -v
Wiping cache of LVM-capable devices
Wiping internal VG cache
Reading all physical volumes.  This may take a while...
Finding all volume groups
Finding volume group "htpc-server"
Found volume group "htpc-server" using metadata type lvm2

After this is done and our volumegroups have been found, we execute a vgchange:

$ sudo vgchange -a y
2 logical volume(s) in volume group “htpc-server” now active

vgdisplay will give us the names of all volumes and volumegroups:

$ sudo vgdisplay
— Volume group —
VG Name               htpc-server
System ID
Format                lvm2
Metadata Areas        3
Metadata Sequence No  5
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                2
Open LV               0
Max PV                0
Cur PV                3
Act PV                3
VG Size               428.24 GB
PE Size               4.00 MB
Total PE              109629
Alloc PE / Size       109629 / 428.24 GB
Free  PE / Size       0 / 0
VG UUID               2arhGV-xkVa-MUXt-38PD-XOmE-TaYN-tpPJBT

Finally, we can execute a full fsck on the volumegroups involved and if no unfixable errors are found, we can mount the filesystems and start assessing the real data-loss.

Categories: Linux Tags: ,
Easy AdSense by Unreal