Kevin Tracy
From the Desk of
Kevin Tracy

2023-09-18

.htaccess, Sitemap, Robot.txt, and WebP - Oh My!

You may have noticed that we haven't been sharing much about our website development in recent months. The reason is because almost all of that development has been happening on the backend where ordinary users shouldn't see what's going on. Just to make sure you're caught up, here's what we've discussed publicly this year.

The Updates Thus Far

In February, I was complaining about Multcloud. I'm still somewhat reliant on Multcloud for the Recent Images; which is why the thing never seems to update. From what I can tell, multcloud tasks max out after they've been run 125 times. IFTTT integration is also proving to be a little too unreliable for what I'm doing as well. Once Katie and I move out from the KTracy Tower to the much larger KTracy Estate, I'll be creating a Raspberry Pi with the exclusive purpose of automating processes for KTracy.com.

In June, there was a brief update about investigating a static site generator CMS to make content creation a little easier. Just to fill you in, I've largely given up finding a CMS or SSG that will do exactly what I need it to. I'm not entirely sure I won't be able to create something myself two or three years from now; which may be my best chance to get this done. (I also announced I was changing my tone a little bit; which I have).

In early August, we eliminated our store and brought back Google Ads as a way to boost revenue a bit. Currently, there's no way for me to run the store from our shared office space. With that said, when Katie and I move to the KTracy Estate; we'll likely be re-opening the store once things are ready. We're probably going to let the store be hosted on Square, since Kevin uses their payment processor when he's doing comic conventions. That liberates us entirely from the need for WordPress and will free up server resources for continued growth.

What's Been Done

Focus on SEO

In the last few months, I've been spending a lot of time looking at the Google Search Console. While I refuse to sell my soul to Google or any third party entity when it comes to the creation of content on KTracy.com; I would be foolish to think they didn't offer valuable insights. Anyway, there were some alarming statistics.

For example, there are 431 pages on KTracy.com according to our most recently generated sitemap. According to the Google Search Console, there are 497 that have been crawled. Of those, only 24 are indexed. A huge part of the problem is that when we transitioned the regular updates from /press/ directory to /archives/ and later from /archives/ to /news/; we royally screwed up our SEO. Likewise, there are still links out on the web to old WordPress posts I wrote many many years ago that are (or will one day be) restored to KTracy.com, but have new URLs. Although this older content isn't usually indexed in Google anymore, some of it still is and it's still generating some traffic to KTracy.com; but the site doesn't go directly to the content that's been restored. The bottom line is that I've been REALLY lazy when it comes to managing my redirects, and I've been paying a price for it for a long time. The last few months have been spent largely trying to solve these kinds of issues.

.htaccess File Changes

Over the years (over a decade), I've updated my .htaccess file by adding new code to it (just not redirects). Several WordPress installations later and it's become a complete mess. In addition to lacking the re-directs necessary to make the site work, it was also creating a soft 404 error; which I learned that Google really doesn't like. I did try to fix this about a year ago, but gave up. That problem has been fixed. Likewise, rules have been established to force URLs with the /press/ and /archives/ directories to correct themselves. I'll likely share a post about this later, because I couldn't find a tutorial to do exactly what I wanted to do; but figured it out on my own. There was also a ton of useless garbage in there that no longer applied.

After cleaning out the garbage, I did get to start on the redirects from those old WordPress URLs, but it's not complete. Now, there are thousands (perhaps tens of thousands) of old posts that need redirects. I'm not going to add them all to the .htaccess file at once. Instead, I'm going to take a reactionary approach. Every month, I plan to look at my Awstats traffic statistics. When I see old links that no longer work, I add them to an Excel spreadsheet. When those old links pointed to content that has already been archived on KTracy.com, I'll add that to the .htaccess file. If not, I'll wait until those old posts are added to the News Archives and add their redirects later.

At some point, I plan on just writing a little tool that will combine everything and create the code I need. Until then, this process works well enough and will create about a dozen or so redirects every month for a while.

The soft 404 error was a problem that drove me crazy, until I realized it was hidden in some WordPress code that I thought was part of my redirect to https.

Sitemap is Back

So in September 2022, I added Sitemaps to KTracy.com. Then in October 2022, I got rid of Sitemaps because they were a pain in the butt to create and, when you think about it, the News page is practically a sitemap because very page that gets created is linked to there. Well, it turns out that search engines really like sitemaps. So, in early August, I downloaded and began using Vladislav Hristov's Sitemap Generator.

The problem was that Google and a dozen other search engine from around the world were still looking for old sitemaps that haven't existed in a long, long time. I used that .htaccess file to redirect the old sitemaps to the new one, and created a ...

robots.txt file.

Basically, there are some directories being crawled that didn't need to be, and this was hurting my SEO ranking and using up system resources I would rather keep for my website's continued growth.

The impact to you

should be pretty minimal or non-existent to most normal visitors. If anything, you're less likely to get 404'd, and if you do, you're going to be directed to a 404 page instead of the KTracy.com home page.

This work behind the scenes wrapped up at just the right time as Katie and I are planning on moving in the next few weeks and having lingering projects that go uncompleted before a weeks-long move was bound to make the entire process more confusing and take considerably longer.

*.webp

There is one other change slowly rolling out to KTracy.com that will have a more noticeable effect for you, the reader. I'm in the process of replacing our *.jpg and *.png images with the more compressed *.webp images. Google and a couple other resources I use to measure page performance seem to suggest that should cut load time on most pages by 0.6 to 1.1 seconds. I'm actually pretty excited about that as slow page load speeds was one of the biggest annoyances that led me to drop WordPress ("page load times" was literally in the first paragraph I wrote with the newly designed site) and I've discovered a way to make even further progress when I thought I was already optimized.

So far, I've been pretty shocked by the file size difference between webp and jpeg. I can't wait to see how everything races to load when we get rid of jpeg and png from the theme.