# 📜 List of AI Web-Crawlers Welcome to the repository dedicated to maintaining a list of AI web-crawlers, aimed to help website owners manage and block specific crawlers using the `robots.txt` file. This resource is valuable for those wishing to control access to their website by various AI-driven bots. ## 🚫 robots.txt This `robots.txt` file serves as a guideline for web crawlers, explicitly blocking access to a list of known AI-driven bots. ## ⛔ Blocked AI Web-Crawlers The `robots.txt` file blocks a comprehensive list of AI web-crawlers, such as: - Google bots (including AdsBot and Google-Extended) - Applebot - FacebookBot - Amazonbot - OpenAI bots (GPTBot and ChatGPT-User) - Anthropic Claude bots (ClaudeBot and Claude-Web) - PerplexityBot - Anthropic's general bot - Cohere's bot - Diffbot - img2dataset crawler - Various friendly crawlers (e.g., FriendlyCrawler and Bytespider) - CCBot - Omgili crawlers - Peer39 crawlers - Russian state-sponsored crawlers (e.g., Awakari) - YouBot For the exact rules and bot names, please refer to the `robots.txt` file in this repository. ## 🛠 Contributing We welcome contributions to enhance and expand this list. If you know of any AI web-crawlers that should be added, follow these steps to contribute: 1. **Fork** the repository. 2. **Create** a new branch. 3. **Add** your entry to the `robots.txt` file. 4. **Open** a merge request. ## 📧 Need an Account? If you would like to contribute but don't have an account on [git.cyberwa.re](https://git.cyberwa.re), please contact [@revengeday@corteximplant.com](https://corteximplant.com/@revengeday) on the Fediverse to request an account.