Kevin Tracy
From the Desk of
Kevin Tracy

2023-11-26

Should Archive Restoration Take A Higher Priority Than New Content?

The Jedi Archives from Star Wars: Episode II

Background of our Archive Dilemma

In 2021, I decided to ditch WordPress and, in the process, free myself from the creative chains and slow load times that come with the massive databases and unnecessary code in a bloated CMS. I opted to return to my roots and hand-code my website from scratch using HTML, CSS, and PHP. As I had hoped, hand coding my own web design has been enormously fun and extremely rewarding. The web design isn't complete, but I hope it never will be. I want to continue adding new features to the site for as long as I have the ability to create them. Early on, I knew I wanted to bring back as much of my archived content as I had available. At the time, I decided that I would slowly work on restoring the archives while simultaneously creating new content for this website. However, over the past month or so, I have begun to question the wisdom of taking that approach.

In fact, I have begun to suspect that I would be better off spending a significantly larger portion of my time restoring my archives rather than focusing on new content. Here's why:

Pros of Prioritizing Archives

Better Internal Linking

It's well known that links to your page are extremely important for SEO. While external links (links from other websites to your page) carry the most weight, they have the unfortunate disadvantage of being largely out of your control. Of lesser importance, are internal links (links from one of your pages to another one of your pages). While these carry less weight, they do still have some significance for SEO purposes and they are 100% within your control! Search engines don't want to see a ton of dead end pages on your website. They want to see a vibrant community of linked content to both your own web pages and to other web pages with credibility.

When you begin creating content on a new page, there's a lack of other pages to link to. Likewise, there is also a lack of other pages that can be used to link to your new content. If you're not churning out several pages a day, it can take a long time to generate enough content to create the vibrant cross linking between your pages that search engines want to see.

Having a huge pool of archived content at your disposal can mitigate this problem to a significant degree.

For example, when I wrote about the global proliferation of paid progressive protesters and the lack of conservative counterparts in 2022 and <the progressive protesters being paid to support terrorist organizations in 2023; I mention a personal story about my time involved with the Tea Party movement (before it went off the rails) where the SEIU paid and bused in counter protesters who didn't even know why they were there at a 2009 event. It would have been really awesome if I had a photo gallery or video from that event already in my archived content that I could link to. Then, I could link to the new content on the archived page to give them a boost the next time the archived content is crawled.

Unfortunately, now when I do eventually upload those archives, I'm going to be doing it in bulk and I won't take the time to consider all of the new content I've created that could be linked to on every page.

In the long run, the internal links would be much stronger if I just worried about the archives at first and then focused on new content.

Crawlers Find It Hard To Differentiate Between New and Old Content

If you mix new and old content frequently, as I have been doing, search engines have to figure out which content is new and worthy of indexing and what content is old and unworthy of indexing.

Because my website is comprised mostly of "blog" pages, the major search engine doesn't like to index my content like it used to in the golden era of the internet. However, if that search engine was going to decide between indexing a blog article I wrote this afternoon and a blog article I wrote 14 years ago, they're going to prefer to index the recent content as it's more likely to be relevant to a web search being conducted tooday.

However, due to my approach of creating new content and a ton of archive pages at the same time, my website is in the unfortunate position of sharing one new article for every five archived pieces that I post. Nobody should expect a search engine to figure out which post was actually written recently out of the dozens that are just re-postings of old content from a decade ago.

It would be nice having the archives completed already

Since I moved, I've spent about two or three hours a night pulling data from my massive 2009-2010 WordPress database and formatting it with HTML and inserting it into an archived page template.

That's time that could have been spent upgrading my left sidebar, creating new content, unpacking my office, or getting back into video creation and streaming. If I had focused on the archives immediately upon deciding to add the archives to this version of the site, the archives would likely be done (at least until I find some new archived content on an old hard drive or something).

Of course, there's a lot of content that wouldn't have been created in 2021 and early 2022 if I took that approach. Yet, due in part to the internal linking dilemma mentioned above, that time spent creating content largely went to waste as far as the search engines were concerned.

RSS and Feed Reader Compatibility

I want to create an RSS feed for KTracy.com and have for a while. The reason I haven't done it is because I'm fairly certain the RSS feed would capture my archived content whenever I added it. I do sincerely appreciate my readers' hardcore fandom of my website and I will always bend over backwards for you, so adding an RSS feed to my static, hand coded site is something I really want to do, but I know nobody will want to subscribe if almost all of what they see every morning are archived news stories from a decade or two earlier.

Due to these complexities, I won't even attempt this until the archives are done. In the meantime, I'm taking the time to create new content is going to struggle unnecessarily to be seen.

Pros of Prioritizing New Content

Archive Restoration is Boring

I don't want to call archive restoration brainless work, but the biggest challenge in restoring your own archives is doing the repetitive task of converting one document to the target document's format time and time again without falling asleep or learning to hate your life for hours on end.

Creating websites should be fun. If you're doing the mundane day in and day out, there's a very real chance you're going to lose interest in the site before your archives are restored and you're ready to create real, modern content.

New Content is King

In one or two months time, you're likely to be better off with a handful of pages created in the past two months than a mountain of pages created 10 years ago.

Having that mountain of archived content is fantastic and a solid foundation to build off of, but your website will be better off if you focus on creating that relevant modern content.

Having a Plan for Content Restoration

When I started archiving my old pages, I had an Archives Page and News page for the modern updates. In the moment, that made sense. Archived data could be stored one way and new stuff would be kept in more relevant places. However, after a year and a half, the problem became obvious.

What's the difference between a post that was written two years ago and a post that was written five or ten years ago? Honestly, the archives and news should be displayed, stored, and handled in the same way. Otherwise, at some point, the line between the two will seem extremely arbitrary. That's why KTracy.com announced that we were combining the archives and news back in December of 2022.

Had I focused on new content only for two or three years, I would have recognized that and saved myself a lot of trouble merging the two partway through the process and creating a huge mess of my .htaccess file.

In Hindsight, Would I Have Still Waited to Restore my Archives?

Regardless of the benefits of waiting, I still feel like I made a mistake when I chose to restore my archives slowly as I created new content.

Realistically, I don't see any way I could have avoided the pitfalls of creating content while restoring SOME of my archives. Practically speaking, on my computer are stored archives for 1997, 2001 through 2005, 2009, 2010, and 2013. Somewhere on a scratched up DVD-RW or a massive 50GB external USB-2 hard drive, there's a chance I have archived data for 2014-2017 and maybe even 2018-2019 on my old computer sitting under my desk. I can't be sure whether these other archives even exist, but if they do, they're not likely to turn up for a while. Even if I restored these known archives first, I would have likely ended up restoring the mystery archives while creating new content.

The difference then would be the restoration of these archives (the ones that may or may not exist) would be something of an event. However, now that I've been slowly restoring archives for the past two years, these remaining archives that may or may not exist bring about some feelings of dread that the finish line is going to be moved at the last minute and I'll have another year or two left of double duty doing new content and archive restoration.

Truthfully, I'm tired of having the archives hanging over my head. I'm slowing down my normal content creation for the time being to try and push through the remaining 2010 archives as quickly as possible while simultaneously kicking myself for not doing this over a few months after creating this website by hand.