Skip to main content

Robots.txt Setup for UAT and Production

This document explains how robots.txt is configured for different environments to prevent search engine indexing of UAT/staging environments.

Overview

The setup uses environment-based robots.txt selection:

  • Production (ENVIRONMENT=production): Uses standard robots.txt that allows search engine crawling
  • UAT (ENVIRONMENT=uat): Uses restrictive robots-uat.txt that blocks all search engines

Files

1. /static/robots.txt (Production)

Default robots.txt for production environment at https://iapp.co.th

  • Allows all search engines
  • Includes sitemap reference
  • Used when ENVIRONMENT=production or not set

2. /static/robots-uat.txt (UAT)

Restrictive robots.txt for UAT environment at https://uat.iapp.co.th

  • Blocks all search engine crawlers
  • Explicitly blocks: Googlebot, Bingbot, Slurp, DuckDuckBot, Baiduspider
  • Used when ENVIRONMENT=uat

3. /scripts/setup-robots.sh

Shell script that runs at container startup

  • Checks ENVIRONMENT variable
  • Copies appropriate robots.txt to build directory
  • Applies to both default and Thai (th) locales

Docker Configuration

Production (docker-compose.yml)

environment:
- NODE_ENV=production
- ENVIRONMENT=production

UAT (docker-compose.uat.yml)

environment:
- NODE_ENV=production
- ENVIRONMENT=uat

How It Works

  1. Docker container builds the Docusaurus site with default robots.txt
  2. At startup, /scripts/setup-robots.sh runs before the server starts
  3. Script checks ENVIRONMENT variable:
    • If uat: Copies robots-uat.txt over robots.txt in build directory
    • If production or empty: Uses default robots.txt (no action needed)
  4. Docusaurus serve starts with correct robots.txt

Deployment

UAT Deployment

docker-compose -f docker-compose.uat.yml up -d --build

Production Deployment

docker-compose up -d --build

Verification

Check UAT robots.txt

curl https://uat.iapp.co.th/robots.txt

Should return:

User-agent: *
Disallow: /

Check Production robots.txt

curl https://iapp.co.th/robots.txt

Should return:

User-agent: *
Allow: /
...

Additional SEO Protection

In addition to robots.txt, consider:

  1. Meta robots tag: Add <meta name="robots" content="noindex, nofollow"> in UAT
  2. X-Robots-Tag header: Configure nginx/reverse proxy to add X-Robots-Tag: noindex, nofollow
  3. Password protection: Add basic auth for UAT environment
  4. Remove from Google: If already indexed, use Google Search Console to request removal

Troubleshooting

robots.txt not updating

  1. Rebuild Docker image: docker-compose -f docker-compose.uat.yml up -d --build
  2. Check environment variable: docker exec <container> env | grep ENVIRONMENT
  3. Check script execution: docker logs <container> | grep robots

Still showing in Google

  1. Verify robots.txt is correct: curl https://uat.iapp.co.th/robots.txt
  2. Request removal in Google Search Console
  3. Consider adding noindex meta tags and X-Robots-Tag headers

References