SEO For Engineers

I like to joke that SEO stands for "somebody else's obligation" because it's easy to point a finger when something goes wrong. Engineers know this pain. Lots of fingers get pointed at them, sometimes by "SEO people."  But, the reality is this: There is no such thing as search engine optimization unless a website’s technical ducks are in a row.

Engineers have a responsibility to understand their role in SEO, and likewise, those who work with engineers have a responsibility to partner with them, not scapegoat them. That kind of relationship requires information to be shared openly and honestly.

I hope this article highlights important-yet-seldom-mentioned topics that are worthy of discussion within engineering circles, but also amongst SEO professionals who rely on engineering teams.


Server Stability and Downtime

  • The "503 Service Unavailable" HTTP response code is the best way to handle planned or unexpected downtime. It has a minimal impact on search rankings compared to other 5XX responses.

  • 500, 502 and 504 HTTP response codes cause Google to suppress a webpage or de-index it altogether. Each time a person or search crawler receives one of these response codes, expect to lose 5-10 organic visits over time.

  • Communicate quickly and send updates to stakeholders when something goes wrong. Otherwise, people will come looking for answers and get in the way of finding a solution.

  • Create a custom skin and tracking event for each harmful response code (e.g. 4XX and 5XX errors). Problems are easier to diagnose when a layperson can provide some details.

  • Be careful when calculating error rates. A complex web page might call the server 150 times as it loads to completion. That means that log files will understate the frequency of harmful response codes that happen up front. E.g. Imagine that a web page is loaded twice. First it responds with a "200-OK" status and loads everything else on the page. On the second attempt, it responds with a "502-Bad Gateway" status and the rest of the page can't load. The server was called 151 times total and only one of those calls was a 502 status, however, the error rate for a user is 50%, not 0.6%!

  • Resist the temptation to dismiss anecdotal evidence. Many bugs that are dismissed as "can't reproduce" are symptomatic of a bigger issue.

Content Delivery Networks and Caching

  • Caching is no substitute for fundamental site optimization. Think of a cached page as having a great photo on a dating website. It's the first thing that people see, but when you start a relationship, a person gets to know the "real you." Same goes for users and search engines.

  • Similarly, AMP-enabled pages are no antidote for a slow mobile site.

  • Be aware of page size restrictions. For example, Akamai has a hard 1MB file size limit that results in a 500 response code when it is exceeded.

  • Merge internal logs with a CDN's logs, otherwise, more than 90% of a problem may go undetected.

  • Consider using the "304-Not Modified" response code on large websites with lots of pages that don't change very often.

  • Look for queries that don't need to be dynamic (e.g. logic that populates listing pages that rarely change). Caching queries and scheduling refreshes can avoid unnecessary strain on the server.

Rewrite Rules and Redirection Management

  • When changing URLs, make sure that redirections are verified at launch. This will carry-over the maximum amount of the old pages' trust and equity. Letting URLs break and then fixing them later can be suicide: Google decays the value of a broken page over time.

  • Check to see if a rewrite flag or rule is causing a redirect chain. This can easily happen when a site began its life as HTTP and was migrated to HTTPs. Some URLs ping-pong between secure and non-secure versions until reaching a final destination. These extra hops decay the equity that the original URL had.

  • When undoing or reversing a redirection, clear the CDN's cache to avoid a redirect loop.

Bot Blocking

  • Err on the side of innocent until proven guilty. A website's power users are the ones whom are most likely to resemble bots because of "unnatural" browsing speeds or browser-installed plug-ins that may trip a honeypot. It may sound like a fringe case, but a single power user on a community website like Quora can attract 10,000 to 12,000 monthly visits. 

  • A bot in Russia isn't automatically bad and a bot in the United States isn't automatically good. Plenty of bad actors deploy bots from within Amazon's U.S.-based AWS servers.

Latency and Pagespeed

  • Pick a measurement tool quickly (like Rigor, Lighthouse or PageSpeed Insights) and stick with it. Trends are more important than exact numbers and it's easy to waste time quibbling over a tool.

  • Mobile page speed matters, even when running an AMP version of the site. Google judges websites by their native mobile experiences (including speed, UX and other factors).

  • Demand that someone take ownership of each tracking pixel and tag that's added to a page, then, make these stakeholders justify their tags every six months. If not, those same stakeholders will keep requesting tags until the engineering team gets blamed for a slow site.

  • Server response time is especially important for sites with millions of pages. Google won't stick around for long when the servers are slow to respond.

  • When running NGINX on a large website, make sure that on-the-fly Gzip compression isn't doing more harm than good (i.e. causing a bottleneck that slows the server response time).

  • Clear out anything that blocks the rendering of a page. This will improve a bunch of metrics all at once. (Even a plain-text website can get bottlenecked when loading JS, CSS and fonts.)

  • Focus on what happens during the first 200ms and 2s of page load. Some pages never load in "full" because of dynamic elements (like advertisements).

The Critical Rendering Path

  • Time-to-first-byte is an important metric, but just as important is what is contained inside that first byte. The browser should be able to construct above-the-fold content before opening a new connection with the server.

  • Define the size of page elements to avoid jumping and dancing pages. Users get frustrated when pages move around and it makes the entire page feel slow, even if it is loading quickly.

  • Read what Ilya Grigorik publishes on this subject. Even veteran developers may learn something.

"New" Technologies and Accessibility for Googlebot

  • Client-side rendering can mean SEO death. (Look what happened to Hulu.) Google recommends providing their search crawler with a server-side rendered page, even if users will see a client-side rendered page. (Note: Google doesn't consider this cloaking, even though it seems like it is.)

  • Provide Googlebot with simple pagination in cases where users see "infinite scroll." 

  • Avoid using "block-level" links, even though it simplifies the code. All of that extra stuff that's packed into that single <a> tag makes it harder for Googlebot to pass contextual value to the destination page.

Staging and QA

  • Use the robots.txt file to disallow search engines from crawling staging and QA sites.

  • Register staging and QA sites in Google Search Console. It sounds counterintuitive (because search engines shouldn’t be allowed to find these domains), but if the test domain gets indexed accidentally, the entire domain can be de-indexed via Search Console.

Product Requirements

  • Get someone (preferably on the SEO team, but if there isn't one, a product manager) to define everything that must be built into a page, including the obvious things like <title> tags and other metadata. It's tedious, but not as tedious as auditing these critical tags after the fact.

Internal Linking

  • Links are the lifeblood of a website and the web overall. Anything important should never be more than five clicks away from the homepage. As a corollary, be prepared to question the "great simplifiers" who want to eliminate landing pages, navigation links, etc.

Registrar and IP Management

  • Never, ever, let marketers send newsletters and promotional e-mails from the same IPs that the websites are hosted on. A rogue employee who violates the CAN-SPAM act may result in the entire website being blacklisted.

  • Make sure someone takes the time to fill out the annual "is your contact information up-to-date" survey that registrars require. Failure to do so makes it easier for some bad actor to steal the domain away on a technicality.

Javascript

  • A page that starts to render, then becomes plain white, is often breaking because of an open write() tag.

  • Google will try to follow relative paths inside of Javascript, even when they don't exist. This can result in polluted crawl error reports.

When Mistakes Happen

  • Act fast because Google is a fickle lover. It takes months to build a house and minutes to burn it down, so snuff that match quickly and take the time to teach everyone about fire safety!