Add README.md
This commit is contained in:
commit
f4fd4aeac6
1 changed files with 46 additions and 0 deletions
46
README.md
Normal file
46
README.md
Normal file
|
@ -0,0 +1,46 @@
|
||||||
|
# 📜 List of AI Web-Crawlers
|
||||||
|
|
||||||
|
Welcome to the repository dedicated to maintaining a list of AI web-crawlers, aimed to help website owners manage and block specific crawlers using the `robots.txt` file. This resource is valuable for those wishing to control access to their website by various AI-driven bots.
|
||||||
|
|
||||||
|
## 🚫 robots.txt
|
||||||
|
|
||||||
|
This `robots.txt` file serves as a guideline for web crawlers, explicitly blocking access to a list of known AI-driven bots.
|
||||||
|
|
||||||
|
|
||||||
|
## ⛔ Blocked AI Web-Crawlers
|
||||||
|
|
||||||
|
The `robots.txt` file blocks a comprehensive list of AI web-crawlers, such as:
|
||||||
|
|
||||||
|
- Google bots (including AdsBot and Google-Extended)
|
||||||
|
- Applebot
|
||||||
|
- FacebookBot
|
||||||
|
- Amazonbot
|
||||||
|
- OpenAI bots (GPTBot and ChatGPT-User)
|
||||||
|
- Anthropic Claude bots (ClaudeBot and Claude-Web)
|
||||||
|
- PerplexityBot
|
||||||
|
- Anthropic's general bot
|
||||||
|
- Cohere's bot
|
||||||
|
- Diffbot
|
||||||
|
- img2dataset crawler
|
||||||
|
- Various friendly crawlers (e.g., FriendlyCrawler and Bytespider)
|
||||||
|
- CCBot
|
||||||
|
- Omgili crawlers
|
||||||
|
- Peer39 crawlers
|
||||||
|
- Russian state-sponsored crawlers (e.g., Awakari)
|
||||||
|
- YouBot
|
||||||
|
|
||||||
|
For the exact rules and bot names, please refer to the `robots.txt` file in this repository.
|
||||||
|
|
||||||
|
## 🛠 Contributing
|
||||||
|
|
||||||
|
We welcome contributions to enhance and expand this list. If you know of any AI web-crawlers that should be added, follow these steps to contribute:
|
||||||
|
|
||||||
|
1. **Fork** the repository.
|
||||||
|
2. **Create** a new branch.
|
||||||
|
3. **Add** your entry to the `robots.txt` file.
|
||||||
|
4. **Open** a merge request.
|
||||||
|
|
||||||
|
## 📧 Need an Account?
|
||||||
|
|
||||||
|
If you would like to contribute but don't have an account on [git.cyberwa.re](https://git.cyberwa.re), please contact [@revengeday@corteximplant.com](https://corteximplant.com/@revengeday) on the Fediverse to request an account.
|
||||||
|
|
Loading…
Reference in a new issue