The Missing Link

Joanne Leong · Fri, Dec 15 2017 in Articles
“Nothing stays the same, it all gets crushed. It all gets broken. It all passes with time. Only the moment you're in has any meaning.”
- French Novelist Guillaume Musso, Que serais-je sans toi? (Where Would I Be Without You?)

Remember that article or web page you bookmarked for future reference? What are the chances that it may not be there the next time you want to access it?

From Wikipedia:

In a 2003 experiment, Fetterly et al. [1] discovered that about one link out of every 200 disappeared each week from the Internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year. In 2014, bookmarking site Pinboard's owner Maciej Cegłowski reported a “pretty steady rate” of 5% link rot per year.

Recognizing the Threat

Unknown to many and even myself previously, missing links, or “link rot” [2] are an existing and daunting threat to the invaluable information we see and access on the World Wide Web every day. For instance, random web pages were found to have a resource half-life of about 2 years (Koehler, 2002) [3], whilst links in scientific literature withstood a median lifespan of 9.3 years (Hennessey & Ge, 2013) [4]. Additionally, academic resources stored in digital libraries (DLs) which are expected to persist longer than the average web page are not spared from this decay either. Nelson and Allen (2002) [5] revealed that within a span of one year, 3% of DL objects were no longer available.

In the face of link rot, our bookmarking efforts seem inadequate and will eventually be in vain. (Bookmarking only saves the URL of the page, but it assumes that the bookmarked resource will always be there indefinitely, which as we’ve seen above isn’t always true.) How can we ensure the preservation of our much-loved web pages while avoiding the disappointment of a “Page Not Found” or “This site can’t be reached”?

Discovering the Solution

Web archiving has been a way in which information on the World Wide Web is collected and safeguarded from decaying links for future use. Many institutional archiving web services [6] crawl the web continuously to verociously archive web pages before it is gone forever. As of today, one of the most widely known and publicly available web archives is the Wayback Machine, which serves as a way to retrieve and relive the histories of the web. Nonetheless, the Wayback Machine is the exception to the rule; many institutional archiving ingenuities possess limited public access and absolutely no personal volition of what web pages are saved. Also, having personally tried retrieving pages from the Wayback Machine, I found it to be awfully slow and impractical for fast lookups.

Hence, I am of the opinion that an ideal web archive would have to be one that allows users free rein over what digitalised information is saved into a personal scrapbook of web articles, research papers, blog entries, social media postings, and various other web material. As mentioned, retrieval of these saved pages also needs to be convenient and effortless.

Bridging the Gap

PageDash was created with the above concerns in mind, to offer a platform of personal digital safekeeping in the midst of the perpetual link rot situation. Apart from being able to duplicate web pages for permanent access via a simple click of a browser extension, PageDash enables users to organize a potentially cluttered scrapbook of web content into personalized folders (perfect for control-neat freaks like myself). Moreover, the efficiency and convenience in the retrieval process is made possible with a full-text search of all the stored pages on PageDash. (Editor note: Folders is still a feature in development.)

Link rot is a reality and a threat. As users of the Internet, we can do something about this problem of disappearing links and web pages. Keeping a personal, digitalised web scrapbook is one way to go about it.

References

  1. Fetterly, D., Manasse, M., Najork, M., & Wiener, J. L. (2003). A large-scale study of the evolution of Web pages. Software: Practice and Experience, 34, 213–237. http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html
  2. https://en.wikipedia.org/wiki/Link_rot
  3. Hennessey, J. & Ge, S. X. (2013). A cross disciplinary study of link decay and the effectiveness of mitigation techniques. BMC Bioinformatics, 14(Suppl 14), S5. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S14-S5
  4. Koehler, W. (2002). Web page change and persistence - a four-year longitudinal study. Journal of the American Society for Information Science and Technology, 53(2), 162-171. http://onlinelibrary.wiley.com/doi/10.1002/asi.10018/abstract
  5. Nelson, M. & Allen, B. (2002). Object persistence and availability in digital libraries. D-Lib Magazine, 8(1). http://www.dlib.org/dlib/january02/nelson/01nelson.html
  6. https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
Joanne is a Masters in Clinical Psychology student who believes in the empowerment of individuals, and claims to have a greater affinity for people than technology. Her favourite days are rainy days spent indoors with a psychological thriller and a cuppa Earl Grey tea as companions. When she’s not busy trying to figure people out, she can be found scrolling through pictures of food and dogs, or any living thing with fur for that matter.