I must admit that I’m no expert on SEO, I just read a lot, and I think I’ve come to amass a good sense in what makes a site relevant in the eyes of web crawlers such as Google.
Semantic HTML and Content Relevance
The absolute first step was semantic markup. That is, using HTML 5 tags, such as
<footer> and so on. For a full list of HTML 5 tags, visit MDN.
This helps crawlers assign weight (importance) to each piece of HTML in your page. It makes your pages future-proof, meaning that when and if crawlers start giving more importance to semantic markup, you will be ready for it. Additionaly, it makes your CSS more semantic, too, which can’t hurt.
Always make sure your content is relevant to the keywords you aspire to be found for, don’t just spit a bunch of keywords onto your site and expect good things to happen. Your visitors won’t take your site seriously if you do that, which is your end goal anyways.
What is the point in being well-ranked if your visitors don’t consume or appreciate the contents of your site?
The second step I took was adding Google Analytics to my solution. I later went on and added a couple of other services: Clicky and [New Relic](http://newrelic.com/ "New Relic Application Monitoring), to improve my analytics and have at least some sort of uptime monitoring. All these services are really easy to include effortlessly in your application, and they provide a lot of value.
Analytics can tell you what pages users land on, what pages are the most linked to, where your users come from, as well as how your users behave and what they are looking for. In summary, it’s really important to know what’s going on with your site in the grand scheme of things, and figure out how to proceed, and analytics tools are a great way to accomplish just that.
You also definitely want to sign up for Webmaster Tools, which will be immensely helpful in putting it all together, and will also help you track your index status closely.
Without a proper AJAX crawling strategy, any other efforts to improve SEO are utterly useless. You need a good crawling strategy that allows search engines to get the content a regular surfer would find in each page.
The desired behavior would, then, be some sort of reflection of this mockup:
Once I had this down, it was just a matter of adding a little helper to every single GET route to handle crawler requests differently. If a request comes in and it matches one of the known web crawler user agents, a second request is triggered on behalf of the crawler, against the same resource, and through the headless browser.
One last step if you care about performance is dumping this into a file cache that is relatively short-lived (meaning you’ll invalidate that cached page if a determined amount of time elapses), in order to save yourself a web request in subsequent calls made by a crawler agent against that resource.
If you are curious about how to implement this, here is my take for this blog, it is implemented in Node.
Note that this might not be the latest version. It’s the one contained in the v0.2 tag. Although I don’t expect to change it much.
Once that’s settled, and working, you can do awesome stuff such as updating your
Metadata is crucial to being well-positioned in search results. There are lots of meta content you can enrich your site with. I’ll talk about some of the metadata you can include, particularly about what I have chosen to include.
The single most important
<meta> tag is
<meta name='description' content='...'>. This tag should uniquely describe each page in your site. Meaning different description tags should never have the same content attribute value. You should keep the description tag brief, though not too short, since it’ll be the description users will get when your page gets a search results impression.
<meta name='keywords' content='...'> tag is much discussed, and seems to have been mostly irrelevant for a while now, but it won’t do any harm should you decide to include it.
Provide Open Graph metadata
Open Graph is a set of meta properties pushed by Facebook. These are mostly useful when sharing links to your site on social networks. Try to include a relevant thumbnail, the actual title of each page, and an appropriate description of that page. You definitely should include these in your website if you care about SEO.
Implement Schema.org microdata
Schema.org microdata allows you to mark up your site with attributes indicating what the content might be. You can read more about microdata on Google. This will help Google display your site in search results, and figure the types of content published in your site.
Search engine Support
There are at least a couple of ways to help search engines get on the right tracks when indexing your website. You should take full advantage of these.
Provide a robots.txt file
There isn’t a lot to say about robots.txt, don’t depend on crawlers honoring your rules, but rather make your site follow the REST guidelines to the letter, so that your unsuspecting data doesn’t get ravaged by a curious spider.
Publish a sitemap.xml
By submitting a sitemap, you can give hints to a web crawler about how you value your site’s content, and help it index your website. You can read more about sitemaps on Google or here.
Alternative traffic sources
It is always wise to provide users with alternative means of accessing your website’s content.
Implement OpenSearch Protocol
A while back I talked about implementing OpenSearch. This allows users to search your site directly from their browser’s address bar.
I don’t think feeds such as RSS need an introduction. I recommend to publish at least a single feed to your site’s content. Keeping it up to date is just as important.