Thursday, November 13, 2014

Website Preservation

What was the last thing you googled? Was it interesting or helpful? Do you think it was important?
What about the last website you went to? Was it something for a class or was it social media? Be honest, it was Facebook, wasn't it?

When was the last time typed a URL and got this message?

We live in a time where our history isn't recorded on scrolls or bound in books. Websites are where our drama's play out, they are where we get our news 24/7 and what we turn to when we have questions.  Webpages come and go in the blink of an eye. They are updated, redesigned, or taken down with ease. How often do you find yourself mourning the loss of a favorite blog or cursing the latest Facebook update? Is your answer "more often than I'll admit in public?"

In short - The internet is crazy important and we all know it.

Webpages are going to be key sources historians of the future use. They will be reference material. Archivists agree that they need to be preserved.  However digital preservation is still in a stage where it creates more questions than answers, and believe me, these aren't questions you can Google.

The first major question that needs to be asked about website preservation is what deserves to be preserved?

There isn't a universal answer. Some say major websites. Some want a sampling of things that are culturally relevant regardless of popularity. It's hard to know what will be important in five years, ten, fifty. It's also hard to say who is in charge of making these decisions. The Library of Congress and the British Library and National Archives are working on these issues now. (Roland & Baldwin, 2012) But these major institutions should not be alone. Content creators need to start looking towards the future as well if historians are to have a full idea of our generation.
This isn't to say preservation institutions are sitting on their thumbs. Did you know the Library of Congress has your (okay, everyone's) tweets on record? 

Once a website is deemed worthy of preservation the next step question is who owns the content?

Digital data is tricky. You've already read about copyright and licensing but websites create additional issues. Who owns what? Is it the person who put up the site? Are their multiple authors? Do the authors even have rights to their own work? Finding the answers to these questions can be time consuming and then actually gaining the permission to preserve these pages can be difficult and at times prohibit preservation.(Kastellec, 2012)

Once something has been preserved it becomes important to know who can access the content and how?
It's important to know that we are talking about preserving something that needs to be useable. Look at Facebook again. How many layout changes have you personally seen? A lot, right? The site is constantly changing! So how do we make sure that this change is visible and still easy to navigate? 

and lastly, how do we ensure that our preservation efforts can stand the test of time?
Digital data is created at a rapid pace and our means of storing it are ever changing. Our storage devices, like hard drives and optical discs, are not great for long term use. Data can deteriorate, software bugs can effect files and CDs can be scratched.

A lot of questions, right?

Websites are being preserved and have been for a while. They will be a major part in the future of digital preservation.  There are still many issues that need to be worked out before there is a more universal system that ensures important data doesn't disappear into the ether but at least these are issues archivists and other preservation professionals are looking at.

As budding professionals do you have any answers?
What would you like to see preserved? Who's responsibility is it to archive the internet?  What other problems are likely to come up as the internet continues to grow? 


Kastellec, M. (2012). Practical limits to the scope of digital preservation. Information Technology and Libraries, 31(2), 63-71

Kavcic-Colic, A. (2003). Archiving the Web–some legal aspects. Library Review, 52(5), 203-208

Roland, L., & Bawden, D. (2012). The Future of History: Investigating the Preservation of Information in the Digital Age. Library & Information History, 28(3), 220-236. doi:10.1179/1758348912Z.00000000017


  1. Have you even used the Wayback Machine from the Internet Archive? (
    I've never tried using it for research purposes, but it is fun to play around with. It doesn't always work, but it is worth trying.

    1. I use the Wayback Machine all the time in my job as a prospect researcher. It is the most comprehensive website I have found for web archiving. Whenever I come across an expired page I enter the link into website and I would estimate it comes back with results about 50% of the time. I also use google cache with varying degrees of success. Most of the pages that I am looking at are old bios from previous employers or board membership pages.

  2. I went to the site, Claire, and had a little trouble navigating, but it seems interesting! It actually goes along with a question I had when reading through this post. Is it important to archive the actual look of websites? I think there is a valid argument for archiving content, regardless of how mundane the content may seem at the time (a la Twitter posts), but as you point out in this post, Sarah, Facebook has changed it's layout a number of different times - so much so, that I don't even know what it looked like when I joined 10 years ago! I'm sure Facebook has saved it's past layouts, but is there any purpose for this? Or is it more of a novelty archive?

    Also, is it theoretically possible to archive ALL of the internet's content? It's being added to and changed at such an alarming rate, I would think it's just not possible. Did you come across anybody trying to take this on during your research? Just me being curious. But to tie it into one of your questions, I would put the responsibility of archiving on the host sites. Facebook out to feel a responsibility to archive it's content, Google should archive Blogger content, ESPN (or its parent company Disney) should archive the content on, etc. Then, if some other party is interested in archiving content (for instance, the Library of Congress archiving Twitter feed), then that would be a cooperation between the two parties.