Robots.txt Setup for UAT and Production
This document explains how robots.txt is configured for different environments to prevent search engine indexing of UAT/staging environments.
Overview
The setup uses environment-based robots.txt selection:
- Production (
ENVIRONMENT=production
): Uses standard robots.txt that allows search engine crawling - UAT (
ENVIRONMENT=uat
): Uses restrictive robots-uat.txt that blocks all search engines
Files
1. /static/robots.txt
(Production)
Default robots.txt for production environment at https://iapp.co.th
- Allows all search engines
- Includes sitemap reference
- Used when
ENVIRONMENT=production
or not set
2. /static/robots-uat.txt
(UAT)
Restrictive robots.txt for UAT environment at https://uat.iapp.co.th
- Blocks all search engine crawlers
- Explicitly blocks: Googlebot, Bingbot, Slurp, DuckDuckBot, Baiduspider
- Used when
ENVIRONMENT=uat
3. /scripts/setup-robots.sh
Shell script that runs at container startup
- Checks
ENVIRONMENT
variable - Copies appropriate robots.txt to build directory
- Applies to both default and Thai (th) locales
Docker Configuration
Production (docker-compose.yml
)
environment:
- NODE_ENV=production
- ENVIRONMENT=production
UAT (docker-compose.uat.yml
)
environment:
- NODE_ENV=production
- ENVIRONMENT=uat
How It Works
- Docker container builds the Docusaurus site with default robots.txt
- At startup,
/scripts/setup-robots.sh
runs before the server starts - Script checks
ENVIRONMENT
variable:- If
uat
: Copiesrobots-uat.txt
overrobots.txt
in build directory - If
production
or empty: Uses default robots.txt (no action needed)
- If
- Docusaurus serve starts with correct robots.txt
Deployment
UAT Deployment
docker-compose -f docker-compose.uat.yml up -d --build
Production Deployment
docker-compose up -d --build
Verification
Check UAT robots.txt
curl https://uat.iapp.co.th/robots.txt
Should return:
User-agent: *
Disallow: /
Check Production robots.txt
curl https://iapp.co.th/robots.txt
Should return:
User-agent: *
Allow: /
...
Additional SEO Protection
In addition to robots.txt, consider:
- Meta robots tag: Add
<meta name="robots" content="noindex, nofollow">
in UAT - X-Robots-Tag header: Configure nginx/reverse proxy to add
X-Robots-Tag: noindex, nofollow
- Password protection: Add basic auth for UAT environment
- Remove from Google: If already indexed, use Google Search Console to request removal
Troubleshooting
robots.txt not updating
- Rebuild Docker image:
docker-compose -f docker-compose.uat.yml up -d --build
- Check environment variable:
docker exec <container> env | grep ENVIRONMENT
- Check script execution:
docker logs <container> | grep robots
Still showing in Google
- Verify robots.txt is correct:
curl https://uat.iapp.co.th/robots.txt
- Request removal in Google Search Console
- Consider adding noindex meta tags and X-Robots-Tag headers