commit f4fd4aeac6132f9dcab060e860e5ff726c5d87ae
Author: revengeday <revengeday@noreply.git.cyberwa.re>
Date:   Fri Jul 5 20:57:12 2024 +0000

    Add README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..2b55d86
--- /dev/null
+++ b/README.md
@@ -0,0 +1,46 @@
+# 📜 List of AI Web-Crawlers
+
+Welcome to the repository dedicated to maintaining a list of AI web-crawlers, aimed to help website owners manage and block specific crawlers using the `robots.txt` file. This resource is valuable for those wishing to control access to their website by various AI-driven bots.
+
+## 🚫 robots.txt
+
+This `robots.txt` file serves as a guideline for web crawlers, explicitly blocking access to a list of known AI-driven bots.
+
+
+## ⛔ Blocked AI Web-Crawlers
+
+The `robots.txt` file blocks a comprehensive list of AI web-crawlers, such as:
+
+- Google bots (including AdsBot and Google-Extended)
+- Applebot
+- FacebookBot
+- Amazonbot
+- OpenAI bots (GPTBot and ChatGPT-User)
+- Anthropic Claude bots (ClaudeBot and Claude-Web)
+- PerplexityBot
+- Anthropic's general bot
+- Cohere's bot
+- Diffbot
+- img2dataset crawler
+- Various friendly crawlers (e.g., FriendlyCrawler and Bytespider)
+- CCBot
+- Omgili crawlers
+- Peer39 crawlers
+- Russian state-sponsored crawlers (e.g., Awakari)
+- YouBot
+
+For the exact rules and bot names, please refer to the `robots.txt` file in this repository.
+
+## 🛠 Contributing
+
+We welcome contributions to enhance and expand this list. If you know of any AI web-crawlers that should be added, follow these steps to contribute:
+
+1. **Fork** the repository.
+2. **Create** a new branch.
+3. **Add** your entry to the `robots.txt` file.
+4. **Open** a merge request.
+
+## 📧 Need an Account?
+
+If you would like to contribute but don't have an account on [git.cyberwa.re](https://git.cyberwa.re), please contact [@revengeday@corteximplant.com](https://corteximplant.com/@revengeday) on the Fediverse to request an account.
+