My latest, WTF moment!

Why hello there! It’s time once again for that story followed by the WTF moment!

Quick note before I go off: Yes, the server is Debian however, my thoughts are that software raid1 would be non-distro specific. Besides, I would implement that on this systems and my lappy too.

Last week (maybe Thursday or so) I come to sit down by my beloved latest install on an HP all-all-in-one. I went to access my server but could not ssh into her. OK, fine. lets ping her. Check!
Ok, fine again (oh so I thought). Since I’m running her headless, it was time to dig up a 15 inch monitor from the garage and see whats going on.

And there it was… Stuck at the hit enter to continue or enter root password for maintenance.
Fine again, I thought. This has been happening a bit now but normally at the boot up with the old familiar BIOS warning that I have an eminent hard disk failure approaching.

So I enter into maintenance and low and behold, hard disk failure. OK, a bit of historical background, I run 2 - 1.5 TB Seagate’s (One labeled Data, the other labeled Backup). Every night I cron an rsync to duplicate from Data to Backup. I also have a directory on Data for deleted items as a just in case that IIRC, rsync manages for me.

Cool - now the Data drive failed, but that’s OK, Backup was just hours old so no biggie (Heh, so I thought). I popped in a 1 TD drive with the intentions of ordering a new 2 TB Seagate (Amazon for 50.00 USD). I wiped the old Windows 10 install that was on the temp drive, relabeled to to something I would remember was temp (Duh, like Temp). and set an rsync and went off to sleepy time.

Next day (last Friday night) after work I went to check on all that was restored. To my surprise, sometime during the night, the rsync crapped out. No, not just crapped out - drive crap-out. Annnnd, not temp drive either!.

You guessed it, it was Backup. I thought to my self, WTH kind of luck is this!!! I pulled Backup from the server, iced it (literally) for a few hours, then popped her into my external drive bay on this system. After a few remove/inserts of the drive, I was able to access it. OUTSTANDING!!! I pulled the Temp drive from the server, cfdisk’ed it and mkfs it again and started another rsync while I have the chance.
I sat there, praying and monitoring it until 2:30 am in the morning. It finished, with only a few dozen server OS tarballs lost from January. No biggy - I still had all of March and April of 2020.

Placed the now populated Temp drive back into the server then ordered the above mentioned 2 - 2 TB drives. So, I looked at the now failed 1.5 TB drives and gathered the DOM codes and did a look up. HA! These two drives are only a few months apart … From 2010! They have been working on the server for nearly 10 years!

So now a week later, I am back to where I was but with an additional 1/2 TB on each drive. So this brings me to my WTF moment. Why didn’t I pull off software raid for the new Data and Backup drives?
Is there a way way to do this without loosing the data on the Data drive?

Thoughts and comments please!!!


Wow, that s some story in there @Chris

Hopefully you didn t loose too much in there.

Yeah - it’s a bit of a read but I think that sometimes more paints a clearer picture - so I go for the visual.
But to your point, altman, I did loose some but it is illrelavent compared to what could have been lost.

I’m not a cloud person and if I was, that issue would have never come about. However, The cloud could introduce other issues. Either way I suppose :wink:

Got lucky ion a way I guess.

Yep, guess that one s get caught either way, cloud or not.