robots.txt
file that blocks a variety of known AI web-crawlers. This helps website owners control access and manage crawlers more effectively.
README.md |
📜 List of AI Web-Crawlers
Welcome to the repository dedicated to maintaining a list of AI web-crawlers, aimed to help website owners manage and block specific crawlers using the robots.txt
file. This resource is valuable for those wishing to control access to their website by various AI-driven bots.
🚫 robots.txt
This robots.txt
file serves as a guideline for web crawlers, explicitly blocking access to a list of known AI-driven bots.
⛔ Blocked AI Web-Crawlers
The robots.txt
file blocks a comprehensive list of AI web-crawlers, such as:
- Google bots (including AdsBot and Google-Extended)
- Applebot
- FacebookBot
- Amazonbot
- OpenAI bots (GPTBot and ChatGPT-User)
- Anthropic Claude bots (ClaudeBot and Claude-Web)
- PerplexityBot
- Anthropic's general bot
- Cohere's bot
- Diffbot
- img2dataset crawler
- Various friendly crawlers (e.g., FriendlyCrawler and Bytespider)
- CCBot
- Omgili crawlers
- Peer39 crawlers
- Russian state-sponsored crawlers (e.g., Awakari)
- YouBot
For the exact rules and bot names, please refer to the robots.txt
file in this repository.
🛠 Contributing
We welcome contributions to enhance and expand this list. If you know of any AI web-crawlers that should be added, follow these steps to contribute:
- Fork the repository.
- Create a new branch.
- Add your entry to the
robots.txt
file. - Open a merge request.
📧 Need an Account?
If you would like to contribute but don't have an account on git.cyberwa.re, please contact @revengeday@corteximplant.com on the Fediverse to request an account.