About CrawlGraderBot
CrawlGraderBot is the web crawler that powers the CrawlGrader API — a technology detection and website intelligence service. This page explains what the bot does, what data it collects, and how you can control its access to your site.
How to identify it
CrawlGraderBot always identifies itself with the following User-Agent string:
CrawlGraderBot/1.0 (+https://crawlgrader.com/bot)
| Property | Value |
|---|---|
| User-Agent | CrawlGraderBot/1.0 (+https://crawlgrader.com/bot) |
| Respects robots.txt | Yes, always |
| Respects Crawl-delay | Yes |
| Max request rate | 1 request per domain per scan cycle |
| Pages fetched per visit | Homepage + robots.txt + sitemap.xml only |
| JavaScript rendering | No — standard HTTP GET only |
| Reverse DNS | *.crawlgrader.com |
What it does
CrawlGraderBot makes a standard HTTP GET request — identical to what any web browser does — to your homepage, robots.txt, and sitemap.xml. It reads the server's response headers, DNS records, and TLS certificate metadata. This data powers the CrawlGrader API, which provides technology detection and infrastructure signals to API subscribers.
No public reports or pages are generated for any domain. Data is only accessible through the CrawlGrader API.
What data we collect
CrawlGraderBot reads only publicly visible, technical metadata that any visitor or tool (such as curl, dig, or a web browser) can observe:
- HTTP response headers — server software, security headers, CDN identification, caching directives
- TLS certificate metadata — issuer, certificate type, TLS version
- DNS records — nameservers, MX provider category, SPF/DMARC presence
- HTML meta tags — generator, canonical URL, hreflang, Schema.org type
- Cookie names only — for technology detection (e.g., analytics, CMS identification). We never read or store cookie values.
- robots.txt & sitemap.xml — crawl directives, page count, content freshness signals
- Connection timing — TTFB, page size, compression type
What we never collect
- Personal information of any kind (names, emails, phone numbers, addresses)
- Employee data, headcounts, or salary information
- Cookie values, session tokens, or authentication data
- Content behind login or authentication
- User-generated content, comments, or forum posts
- Page content, article text, or images
- IP addresses of the servers we scan (we store domain names only)
- Any data from pages other than the homepage, robots.txt, and sitemap.xml
How to block CrawlGraderBot
CrawlGraderBot fully respects robots.txt. To block it, add the following to your site's robots.txt file:
User-agent: CrawlGraderBot
Disallow: /
Once blocked, your domain will be excluded from all future scan cycles and any stored data will be purged within 24 hours.
How to allow CrawlGraderBot
No action is needed. If your site allows general crawling, CrawlGraderBot will work automatically. To explicitly allow it while blocking other bots:
User-agent: CrawlGraderBot
Allow: /
Why allow CrawlGraderBot?
CrawlGraderBot behaves like a well-mannered search engine crawler:
- Makes only 1 request per domain per scan cycle (typically monthly)
- No aggressive crawling, no deep spidering, no resource-heavy rendering
- Respects all robots.txt directives including Crawl-delay
- Identifies itself transparently with a full User-Agent and this info page
- Does not publish any public-facing report or page about your domain
Reproducibility
Every data point CrawlGraderBot collects can be independently verified by anyone using standard tools:
curl -I https://yourdomain.com # HTTP headers
dig yourdomain.com MX TXT NS # DNS records
openssl s_client -connect yourdomain.com:443 # TLS certificate
curl https://yourdomain.com/robots.txt # Crawl rules
curl https://yourdomain.com/sitemap.xml # Sitemap
Contact
Questions or concerns about CrawlGraderBot? Email us at bot@crawlgrader.com. We respond within 24 hours.