About CrawlGraderBot

CrawlGraderBot is the web crawler that powers the CrawlGrader API — a technology detection and website intelligence service. This page explains what the bot does, what data it collects, and how you can control its access to your site.

How to identify it

CrawlGraderBot always identifies itself with the following User-Agent string:

CrawlGraderBot/1.0 (+https://crawlgrader.com/bot)

Property	Value
User-Agent	`CrawlGraderBot/1.0 (+https://crawlgrader.com/bot)`
Respects robots.txt	Yes, always
Respects Crawl-delay	Yes
Max request rate	1 request per domain per scan cycle
Pages fetched per visit	Homepage + robots.txt + sitemap.xml only
JavaScript rendering	No — standard HTTP GET only
Reverse DNS	*.crawlgrader.com

What it does

CrawlGraderBot makes a standard HTTP GET request — identical to what any web browser does — to your homepage, robots.txt, and sitemap.xml. It reads the server's response headers, DNS records, and TLS certificate metadata. This data powers the CrawlGrader API, which provides technology detection and infrastructure signals to API subscribers.

No public reports or pages are generated for any domain. Data is only accessible through the CrawlGrader API.

What data we collect

CrawlGraderBot reads only publicly visible, technical metadata that any visitor or tool (such as curl, dig, or a web browser) can observe:

HTTP response headers — server software, security headers, CDN identification, caching directives
TLS certificate metadata — issuer, certificate type, TLS version
DNS records — nameservers, MX provider category, SPF/DMARC presence
HTML meta tags — generator, canonical URL, hreflang, Schema.org type
Cookie names only — for technology detection (e.g., analytics, CMS identification). We never read or store cookie values.
robots.txt & sitemap.xml — crawl directives, page count, content freshness signals
Connection timing — TTFB, page size, compression type

What we never collect

Personal information of any kind (names, emails, phone numbers, addresses)
Employee data, headcounts, or salary information
Cookie values, session tokens, or authentication data
Content behind login or authentication
User-generated content, comments, or forum posts
Page content, article text, or images
IP addresses of the servers we scan (we store domain names only)
Any data from pages other than the homepage, robots.txt, and sitemap.xml

How to block CrawlGraderBot

CrawlGraderBot fully respects robots.txt. To block it, add the following to your site's robots.txt file:

User-agent: CrawlGraderBot
Disallow: /

Once blocked, your domain will be excluded from all future scan cycles and any stored data will be purged within 24 hours.

How to allow CrawlGraderBot

No action is needed. If your site allows general crawling, CrawlGraderBot will work automatically. To explicitly allow it while blocking other bots:

User-agent: CrawlGraderBot
Allow: /

Why allow CrawlGraderBot?

CrawlGraderBot behaves like a well-mannered search engine crawler:

Makes only 1 request per domain per scan cycle (typically monthly)
No aggressive crawling, no deep spidering, no resource-heavy rendering
Respects all robots.txt directives including Crawl-delay
Identifies itself transparently with a full User-Agent and this info page
Does not publish any public-facing report or page about your domain

Reproducibility

Every data point CrawlGraderBot collects can be independently verified by anyone using standard tools:

curl -I https://yourdomain.com          # HTTP headers
dig yourdomain.com MX TXT NS             # DNS records
openssl s_client -connect yourdomain.com:443  # TLS certificate
curl https://yourdomain.com/robots.txt    # Crawl rules
curl https://yourdomain.com/sitemap.xml   # Sitemap

Contact

Questions or concerns about CrawlGraderBot? Email us at bot@crawlgrader.com. We respond within 24 hours.