AI tools like Grammarly, ChatGPT, or Dall-E are increasingly a part of daily life. These tools require training, which in turn requires vast quantities of data. Many AI developers satisfy this requirement by utilizing data gathered from the internet. Many people would prefer that their data not be used to train AI models or sold for such purpose.
As many AI developers have already scraped the entire web, and many service providers make it difficult or impossible to protect your data, it's unlikely that anyone can prevent all training use of that information. However, as you build or expand your Davidson Domains site, you can include a file that directs AI bots not to crawl your site (and collect your data and content). The file is called robots.txt. You can write your own version of robots.txt, but that requires ongoing research to stay current.
Fortunately, one GitHub user provides and maintains an AI-crawler-blocking robots.txt file. To acquire and install this file, follow these steps:
- Visit the GitHub site.
- Locate and click the Download icon at the upper right.
- When prompted, save the file as robots.txt.
- Open your domain's dashboard, locate the Files section, and click File Manager. It will open in a new tab or window.
- From the gray toolbar, click Upload. In the new tab/window, drag and drop your robots.txt file, and it will be uploaded.
- Close the Upload page and return to the File Manager. Refresh the page and verify that robots.txt appears.
That's it. Your site is now marked off-limits to web crawling bots. You can stay up to date by periodically repeating this process (use the File Manager to delete your old copy of robots.txt first).
Note: The effectiveness of this tactic is entirely dependent on developers of the crawler bots respecting the presence of the robots.txt and following its instructions. So far, most do.