I'm an IT consultant specializing in digital archiving. AMA.
I work in a niche inside a niche. I deal with terabytes of storage, massive servers, a variety of storage tech, and I've been in interested in computers in general for... Around 40 years. (Yeah, I'm old.)
I have my own single person company and have worked in 40+ US states, done assignments in the UK, Norway.
The migration to cloud is a big deal. Learning about cloud storage is straightforward, but there's a huge number of new services offerings that don't nearly fit into the way the existing tech was built 25+ years ago. I'm "scaroused" at the idea of having to learn how all this works.
My organization is moving a bunch of on prem stuff to the cloud over the next few years and its been interesting to see how things are changing, Azure has a TON of features but is overwhelming when I look at my deployment now and where I want to get to in the future. But I will get there, one piece at a time.
Can you get in touch with me? I work in archives, in IT, and have a nasty situation I'm looking for advice on from someone with experience in exactly this. Can we dm? Not sure how that works here.
Sadly, no... My niche is so very, very small that it's unlikely I can help your specific situation. It's also a self-preservation thing -- giving professional advice for free without contracts in place is a liability issue.
How did you end up in that niche? Was it a conscious decision or was it something that was thrust on you?
Follow-up question: Did you take any courses for the archiving portion of your job, or is it entirely self-taught? Any certifications or additional (formal) training?
Heh. I told my boss to fuck off after I got back from a vacation and she yelled at me because the people who were supposed to do my work couldn't do it -- because it was too technical.
I went back to my cube, cleared out my desk, and waited for security to escort me out. Three days later, my boss came to my cube and said "Go to the 11th floor and ask for Dave." then they walked away. I was sure it was an exit interview with HR. I put my box on my desk and went downstairs where... I got a job interview in the IT department, managing their new archive.
As part of the transfer to IT, I got a week's training in the USA, and several boxes of software manuals. Dave (my new IT mentor) said he wanted me to read all of them. He'd stop by, ask me which manual I'd read most recently, flip through it, read something, and say "Tell me about... X Y Z", and I'd have to barf out what I'd learned about storage management or database indexes, or server OS commands or functions.
After that, everything was self-taught. I ended up buying some old decommissioned server hardware from a friend that worked at the manufacturer, borrowing the install CDs from work, and building my own server to repeatedly fuck up / learn on.
Wow, what a story. The reason I asked is because I am (hopefully) coming from the other side: librarianship, trying to get into records management and archives and eventually into digital archives.
The insane storage capacities. The first archive I was responsible for had a total of 75GB of data. When I raised the alert that we didn't have an offsite backup, the enterprise backup team said, "Are you insane? You want us to back up 75 gigabytes EVERY WEEK?"
I have 128GB MicroSD cards the size of a fingernail... In my office I have over 250TB of storage, and I'm just some nerd. I routinely move hundreds of terabytes between systems for migrations.
Also... abstraction.
Many of the servers I connect to are VMs on servers using storage in SAN fabric using virtual IPs. Troubleshooting performance problems is difficult because nothing is strictly physical anymore. Another VM on a different physical server might be soaking up all the I/O bandwidth on SAN hardware I can't even see from my server. Even tape libraries can be entirely virtual, backed with cheap SATA disks, and a massive physical tape library in another datacentre that serves multiple sites. This "bends my noodle" more than I'd like to admit.
What is the best filesystem for archive ? BTRFS, ZFS, ReFS?
How will quantum computer affect your field ?
What is the 2023 bottleneck ?
IMHO , it's permanent storage . I remember in 2005 up to 2010, we all wanted the fastest CPU , more GHz , more cores ,etc. the industry gives us that . Then during the last decade , we all wanted a more powerful GPU, more core ,more memory ,more Mhz. We also craved faster internet connection ,now we have optical fiber with 3.5Gb/sec in home for 120$/month .(impossible to think in 2014)
Nowadays , I have the impression that permanent storage is lagging behind . With the new medias being in 8K, video games storing average 100Gb and what not, we regularly move around dozens of Gb, even as casual users .
I'm only familiar with ZFS, but only in my lab, not in production... ZFS is great because it can self-heal files / re-allocate blocks. I tried it on SMR drives, and it's terrible, I advise against it. :)
ZFS is very good, but OFFSITE, TESTED BACKUPS are critical. There's 'reliable' storage (storage that can deal with a failure) and then there's backups. All the parity in the world won't save your data from a fire.
In my small office, I have about 100TB of data that's important to me, so I have a local copy, a backup in my office, and a stack of tapes at home about 1km away. Anything that affects both locations is outside my threat model, as I'll have bigger issues.
Archive.org, one of my favourite websites, has been targetted by litigation.
What do you feel will happen to this website in the years and decades ahead? And what will happen to digital archiving should the site be eliminated from the web?
That's more of a regulatory / legal grey area, because their mission is data preservation and sharing, which conflicts with the interests of copyright holders.
In my situation, ownership isn't in question, so it's not an issue.
Since I'm not a lawyer or prognosticator, I'll just say that copyright law has been manipulated to favour companies for too long, and government/politicians need to claw back some rights for their constituents. I'd personally like to see copyright cut short, and execptions for digital lending carved out.
What do you see as the biggest risk to digital archives? Is there anywhere online I can learn more? My industry utilizes archives up to 25 years and I'd like to learn more about digital archive fundamentals!
Honestly? It's people not understanding that there's no such thing as a perfectly reliable storage medium, and that it's the PROCESS that keeps data safe.
Instead of saying "My RAID array has TWO hot spares", people should be saying "I have THREE copies on TWO different media, in TWO locations, and I tested my offsite backups within the last 30 days."
In my world, due to the size of the archives, it's all proprietary software... So, consider learning large enterprise IT systems/software... Operating Systems, Storage Management, Tape Library management software, database engines, etc. I realize this is all moving to the cloud now, so regardless of which software/service stack you use, understand how all the pieces fit together, and become proficient at each of them, so you can be useful regardless of where the problem is. :)
Do you see a movement for more works in the public domain? Many museums and NASA are sharing much of their collections, but media at large does not have reliable repositories for many , even niche, important works.
Again, that's a little outside my realm of expertise, because I'm archiving digital records for companies and organizations -- so there's no question about ownership or copyright, it's about legal and regulatory compliance (how long you're allowed to keep a document, etc.).
Personally speaking, I profoundly dislike the idea that works are protected for longer than a human lifetime. It's hard for society and technology to progress if ideas are locked up by copyright, patents, trademarks, and other intellectual property laws. There are patents that have been registered for which no product has been created for decades -- preventing someone from trying to make an idea a tangible thing, and that's dumb.
If you make a hit song, and you profit from 10 or 20 years of popularity, shouldn't that be enough? Shouldn't we encourage people to do more than coast for 20 years on one idea? Shouldn't the public benefit from that work falling into the public domain? Anyway, we're way off topic here. :)
Most people in the industries I've worked in (mainly SMB, MSP.... Sysadnin roles), seem to think that tape is an archaic method of doing backups, and anyone using tape is living in the past.
Additionally, for archival/backup software, what's the go to for you? Both paid and Foss, if you have options for both, I'd like to hear it. What makes it the go to software?
Tape is awesome. Relatively inexpensive at scale, huge storage volumes, consumes almost no power compared to what it stores. But it has its time and place. That place is archival and long-term offsite backups that are very infrequently accessed. People aren't using it for what it's best at doing.
The backup/archive software I use for work is enterprise grade - Tivoli Storage Manager a.k.a. Spectrum Protect. In my office, I use Time Machine on the Macs, and simply 'tar' on Linux to back up specific important directories. Windows machines are backed up by their owners with various tools that I don't tend to concern myself with.
For the enterprise stuff, what makes it great is that it gives you a huge amount of control and flexibility and storage options. I love the idea of TSM/SP's 'incremental forever' backup methodology. It means you can roll back to any backup at any point in time, as long as you're storing enough historical versions of the files. The device support is also amazing, and I've built systems that can scale to be petabytes large with it.
For my office, I just use what I know is built in and reliable. I know every Linux system has tar, and every Mac has Time Machine. For my NAS device, I make copies of it with rsync to a USB-SATA enclosure with 5 drives, usually every 90 days or so, less if I've made a lot of changes.
In your opinion what would be the best archival format for storing photos and videos of the family. Without relying on a ZFS server running for 20 plus years, but a "hard" copy like Blueray M etc