Search Engine Optimization

Oct. 22, 2020

These are some of the notes from my research on SEO techniques for a website. 'Notes', and not a definitive guide. This is based on the guidelines outlined in Google's own docs and therefore is primarily geared towards improving a site's ranking in Google search.

  1. Make sure your site is registered with google crawler. Though Google crawler will eventually find your site, it's always better to register it formally so the process of indexing your site can begin immediately.

    To verify search for "site:yoursite.com" and see if you get results that lists the links to permanent pages within your site as separate search results. If you do, site is registered. If not, goto Search Console and register your site. Registering a site requires making a TXT record on DNS with a unique string that the search console will provide. This process is to verify that you're indeed the owner of the domain that you're trying to register.

  2. Provide a sitemap file with links that point to the important pages in your site.

  3. Make sure your server handles If-Modified-Since request header and returns 304 response code if the site has not been modified since the date value provided to the If-Modified-Since header. To test this, make a regular request to your site:

    $ curl -I https://www.yoursite.com
    HTTP/1.1 200 OK
    Server: nginx/1.19.0
    Date: Thu, 22 Oct 2020 01:45:53 GMT
    Content-Type: text/html
    Content-Length: 19963
    Last-Modified: Wed, 21 Oct 2020 08:12:22 GMT
    Connection: keep-alive
    Vary: Accept-Encoding
    ETag: "5f8fed66-4dfb"
    Accept-Ranges: bytes
    

    The Last-Modified response header indicates that last time the site's files were modified. Now make another request supplying this time in the request header.

    $ curl -I https://www.yoursite.com  -H "If-Modified-Since:Wed, 21 Oct 2020 08:12:22 GMT"
    HTTP/1.1 304 Not Modified
    Server: nginx/1.19.0
    Date: Thu, 22 Oct 2020 01:47:27 GMT
    Last-Modified: Wed, 21 Oct 2020 08:12:22 GMT
    Connection: keep-alive
    ETag: "5f8fed66-4dfb"
    

    Server should now respond with status code 304 indicating that the site has not changed since last modification. When crawler sees this response, it'll stop crawling the site again thereby reducing needless bandwidth consumption -- both for the server as well as for the search engine crawler.

  4. If your server has portions that are not to be index by the search engine and you can't control access to them via user authentication, use robots.txt file hints to inform the crawler to skip these pages. Typically this will not be required, but it's good to know. You can also robots.txt to stop the crawler from indexing your site's static assets such as images & scripts.

  5. Avoid pages that require URL parameters. Instead pages should use simple, clear and well terminated URLs.