Combating AI Bots

gnif · December 29, 2025, 12:50am

Once again I am requesting that Brave bots use a proper user agent to identify themselves, or better yet, publish the IP ranges that they crawl from like Google and Bing do. I totally understand the reasoning behind https://search.brave.com/help/brave-search-crawler however the internet is a changed place with the advent of AI and this is no longer a valid approach.

Let me explain.

I manage hosting for a plethora of websites, two of which are very large forums (eevblog.com and forums.realgm.com). These two forums are seeing an ENORMOUS amount of bot traffic originating from IP ranges that belong to data centers such as Microsoft, GoogleCloud, OVH, DigitalOcean, etc. A huge amount of the traffic is coming from GTT.net, Tencent, Amazon and Bytedance.

Most of these bots fake normal user-agents to avoid being filtered out, they ignore crawl rate limits, and take smaller sites offline once they decide to start crawling them. As such we have had to take measures to protect them, such as forcing all IPs that originate from data centres to always go through a Captcha process.

This has been very successful in solving the bot problem, and to ensure that our search rankings are not affected we use the published IP ranges by search engines like Google, Bing, Yandex, etc… or identify the search engines by IP network ownership (ie, AppleBot) and let these through.

To give an idea of how problematic this is, here are the bandwidth stats for EEVBlog before and after implementing anti-bot measures (note that each large reduction in traffic was as we found another datacenter owned range to flag that was abusive):

The problem is, Brave’s policy is now impacting the search results. Already EEVBlog is showing up as a “Captcha” prompt in your search results.

We have no way to fix this and with the way AI is going, eventually all websites will end up listed like this in your search engine making your search engine useless.

You need to re-asses this policy ASAP before your search engine becomes unusable.

Perhaps even evaluate a middle ground where you crawl as you are now, but if you get a captcha response re-queue the crawl with one of the published IP ranges with a bot that identifies itself as your crawler. You could even cache that the domain requires this so you don’t need to do this on future crawls.

289wk · December 29, 2025, 4:51am

@gnif

Note to self, in order to be clear on expected search results . . .

Currently using Brave Browser (iOS).

Brave Search (AI assistance and Discussions settings are Disabled) result shows the EEVBlog Cloudflare CAPTCHA link - re OP’s concern:

image1170×1342 102 KB

DuckDuckGo Search (all settings switches are Disabled, except 2 re page breaks) result shows what I imagine EEVBlog and OP wish:

image1170×1682 129 KB

And a year 2023 post about Brave Search Crawling:

system · February 27, 2026, 12:50am

This topic was automatically closed after 60 days. New replies are no longer allowed.

Topic		Replies	Views
IP and User Agent for Brave Bot Brave Search	4	2212	February 3, 2026
Bravebot & AbuseIPDB Search Feedback	1	137	March 18, 2025
User Agent/Crawler IPs Brave Search	1	172	March 26, 2025
POW Captcha - Just Disable It	2	524	May 17, 2025
Stop website being shown in Brave search	7	1311	October 3, 2023
Why would you put a captcha on Brave search? Feedback feedback	5	1221	August 18, 2022
Filtering Search Results Brave Feature Requests	1	239	January 1, 2025
Reverse IP Lookup Brave Bots or User-Agents? Brave Search	0	721	July 13, 2021

Combating AI Bots

Related topics