Hey hey…
Hope you are doing well so far. 🙂
So recently I have been doing some fancy little thingies regarding my PC personal setup and a homelab.
One of these thingies was to finally implement a proper backup strategy that isn’t: “Hey, I have a hard drive; let me put things whenever on it…”
On that journey, I have learned an interesting thing or two and wanted to share that with whoever is actually reading this. 😄
So yeah… Here. We. Go.
In general: Having multiple copies of your data is a good thing. Before you go out and buy 100 hard drives to store these in your closet: Hold your horses.
You should also think about where you actually put your backups physically.
There’s always the possibility of something happening in your home:
To give you some examples.
Of course, you can’t have a data center everywhere like those cloud providers - so what is the middle ground?
A good rule of thumb is the so-called “3-2-1” rule.
So an implementation of this could be done with, say 2 HDDs and one SSD with one of the hard drives being stored at a friend’s place.
The cool thing is that this general rule-of-thumb can be expanded pretty easily.
For example, you can expand on one more copy of your data to improve redundancy, one more backup medium type to lessen the average failure rates of your storage medium, or one more storage location to improve geo-redundancy.
Each of these increases comes with the inherent cost of additional time spent on maintenance or inherent cost. Therefore, you need to weigh in each factor accordingly.
As mentioned with each factor added towards our basic 3-2-1 rule, either effort or general cost increases. So let’s maybe look into that in a bit more detail. (without making it too precise and outdated like in a week after publishing this…)
Regarding the “having X copies of your data” rule; that gets relatively simple. With each copy, ideally, you want to have that on different storage devices, the amount of necessary storage space increases. That increases inherent cost, which scales roughly linearly, depending on the storage type you get. Also, effort increases due to the necessity of duplicating your data. In theory, one could put the different backups in different NASes, but this would also a) pose an inherent security risk due to possible unregulated access and b) degrade the backup disks quicker; in turn causing more drive failures.
When looking at the different storage types, that gets a lot more complex: There are around 10 types of storage mediums, that one could realistically choose from and not critically overpay as well (sorry floppy disks and other legacy media…)
They are categorized into four main types:
Flash Storage
Magnetic Storage
Optical Storage
Storage Medium | Advantages | Disadvantages |
---|---|---|
SSD (Solid State Drives) | - High speed and reliability. - No moving parts, less prone to mechanical failure. |
- Expensive per GB. - Limited write endurance. - Data can degrade over time without power. |
SD-Cards | - Portable and widely available. - Compact size for easy storage. |
- Not designed for long-term storage. - Prone to data corruption. - Limited durability. |
CF-Cards | - Durable and commonly used in professional cameras. - Moderate capacity options. |
- High cost per GB. - Not optimized for long-term archival. |
USB-Drives | - Portable and affordable. - Easy to use and widely supported. |
- Unreliable for long-term storage. - Limited write endurance. |
eMMC Storage | - Common in embedded systems. - Compact and integrated. |
- Not user-replaceable. - Limited lifespan for archival purposes. |
HDDs (Hard Disk Drives) | - Cost-effective for large capacities. - Widely supported and replaceable. |
- Susceptible to mechanical failure. - Magnetic data degrades over years. |
LTO Tapes | - Extremely long lifespan (20–30 years). - High capacity and cost-effective for large archives. |
- Requires specialized hardware. - Slow read/write speeds compared to modern storage. |
CDs/DVDs | - Inexpensive and widely available. - Easy to distribute. |
- Limited capacity. - Prone to physical damage and degradation (10–20 years lifespan). |
BluRays | - Higher capacity than DVDs. - Better longevity (20–50 years, depending on quality). |
- Expensive compared to DVDs. - Requires specific hardware. |
MDisc | - Designed for archival storage (1000+ years lifespan). - Resistant to environmental degradation. |
- High cost per disc. - Requires compatible drives for writing. |
This is also not mentioning data storage capabilities, accessible via the cloud, from the likes of Google, AWS, and so on. which is number four in that list… I didn’t want to directly compare that with the other types of storage media, because this one is, in contrast, not self-managed and comes with its own set of challenges and opportunities. More on that in Cloud Providers.
Lastly, the conversation arises in regards to where you store that data and where you put it off-site. Generally, there are three philosophies regarding this:
Each of them has a set of advantages and disadvantages, of course; but in the end, you have to know what is best for you in your situation. Maybe you have some extra space, where you could store some spare hard drives somewhere offsite. Or maybe you are also fine with the other two options as well. Even if the data were to be compromised, you could always employ encryption, with a tool like Veracrypt, meaning nobody could actually do anything with that data, essentially rendering it useless.
Whatever you choose; that choice depends on your trust towards other entities.
So… I have kinda touched upon that topic for a second but network-attached storage can be kinda cool. But what is that actually? In general that describes every category of storage that can be accessed via a network connection. That would kinda include cloud storage from third-party providers but for all intents and purposes, I am only talking about storage that you locally host and make accessible to the outside or just your local network.
Sounds cool, right? Your own cloud, with all the bells and whistles. Throw something like Nextcloud onto a small mini-PC or Raspberry Pi and you can even have fancy features like albums, media viewers a fancy UI, or even automatic sync with your devices easily available.
The caveat is just that you have to host that by yourself. This comes with its own issues. To not make it ultra-scuffed, you need to buy some extra hardware, like that aforementioned mini-PC in some way, shape, or form. Then you would also probably want some enclosure to not have everything just lying around.
Well… And that is only the easy part. Handling the networking to route everything correctly and expose the drive or Nextcloud service or whatever you use. Finally, there’s the almighty “security” looming above you. Having an exposed service also means, that not only you can access it but potentially other people. Of course, pre-configured services have authentication measures configured but depending on how careless you set up everything one could always get into your system by some sort of backdoor, or misconfigured firewall (did I already mention the firewall that you would have to configure? 😂), non-secure password choice, or what-ever.
Of course, there are pre-made systems, that make a lot of these steps a lot easier or already pre-configured. These, naturally, come at an increased cost though. just like in real life it’s just a compromise that you have to make
So having that extra fanciness leaves you with some extra choices to make and possible extra effort and extra cost spent. But don’t let that intimidate you: If you are able to do that, oftentimes it’s worth it since the extra comfort is often worth it.
But strange guy on the internet: How does this link back to backups?
I mean… In the end, this NAS is just another copy of your data on another storage medium that could be potentially placed off-site.
You could also regard the NAS as your single source of truth regarding your state of the backups and then configure it in a way, so, for example, whenever a hard drive gets plugged in, the NAS’ contents get mirrored onto the hard drive, via an rsync script. The possibilities are endless and only limited by your imagination. This just serves you as an idea point to build your setup. So yeah… I guess you got the main idea of that: NASes are cool. 😁
I already talked about a lot of things already: So let’s just complete this topic. Digital security is always kinda important; especially when you expose your drives to the outside world. Going into detail with that is a bit of a long one, especially since this is already getting too long for my liking. If you follow those tips at least somewhat, you are already on a good path.
However, there is also the topic of the physical security of your backups. Securing it against intruders, or physical damage, like fires or floods. For the last one, there are disaster-proof cases for hard drives: so that makes it kinda easy. Securing against intruders is a bit trickier. You could always employ the encryption strategy to render the files of those drives useless. Otherwise, this is just like hiding any other valuables at your place and you know best, where to put them. Maybe in some hidden cupboard, or in a safe? You know it best. 😁
Well… This one is also something I have touched upon a few times in this never-ending story. Their promise is simple: Easily scalable storage that is available with the press of a button and all of the backing up/security is already taken care of…
All of that is for just a small monthly fee.
Sounds cool, right? Well… It has its caveats. First, I already kinda mentioned this aspect “a small monthly fee”. Especially if you want to consider your backup in timeframes of years instead of months, this can become expensive. So can be buying extra hard drives for small amounts of data. Just keep that aspect in mind and when it’s worth it to go for one or another.
Next up, the main advantage that someone else is taking care of your data basically is also its greatest disadvantage. Whether it’s privacy, (planned) obsolescence, or security issues (either account security or data breaches)… You’ve got it all. So are you sure that you want to give that data to someone else, especially big corp like AWS, Google, or whoever else…
You could encrypt all of that data so that no one could do something with your data, even if it went to someone else but in the end, it’s a thing that you need to know yourself.
Now that we’ve talked about a lot of different things, maybe let’s do some final touchups, that would be neat to keep in mind:
Backup Validation
Regularly check that your backups are actually working. Trust me, it’s frustrating to find out your backups are corrupted; especially in the moments you need them most. (Little tip: Hashes can be good to quickly and somewhat automatically verify your backup’s integrity.) Test your backups every once in a while as well as the process to recover those files. Maybe also see how that takes to retrieve your backups: especially the offsite ones; just in case you actually need them urgently… 😁
Error Correction
Error correction can be cool to prevent corruption a bit… At least it gives you some added resilience, at the cost of added storage capacity necessary.
Automated Backups
Eliminate the human error factor by automating your backups. Whether through NAS software, rsync scripts, or third-party backup tools, automating ensures consistency without relying on your memory.
I think I talked long enough about backups and that kind of stuff… So yeah… Just think about it and do something; you’ll still be better than most other people. 😁 Maybe adapt it to your circumstances, since everyone’s situation is different, and write everything down, so you don’t forget.
Well… Have fun and…
See ya