What to know

  • Cloudflare alleges Perplexity’s AI bots are bypassing blocks to crawl restricted websites.
  • Cloudflare says Perplexity uses alternate IP addresses and disguises its user agents.
  • Perplexity denies intentionally evading website restrictions or acting deceptively.
  • The dispute highlights growing tensions over AI data collection practices.

Cloudflare, a major internet infrastructure provider, has accused Perplexity, an AI-powered search startup, of using “stealth crawling” tactics to access websites that have explicitly blocked its bots. The controversy centers on how Perplexity’s AI gathers data from the web, raising fresh questions about transparency and consent in the age of AI-driven search.

According to Cloudflare, Perplexity’s bots have been accessing websites that have set rules to block them. Typically, websites can prevent bots from crawling their content by using a file called robots.txt or by blocking known IP addresses. However, Cloudflare claims Perplexity has been getting around these restrictions by using a variety of cloud hosting providers and rotating IP addresses, making it difficult for sites to identify and block their crawlers.

Cloudflare’s report says that Perplexity’s bots do not always identify themselves honestly. Instead of using a user agent that clearly states it is Perplexity, the bots sometimes present themselves as regular browsers or other services. This practice, known as “user agent spoofing,” can make it nearly impossible for website administrators to know when Perplexity is crawling their sites.

Perplexity responded to the allegations by denying any intentional wrongdoing. The company said it is not deliberately trying to evade website restrictions or mislead anyone about its crawling activities. Perplexity maintains that it respects robots.txt and other standard web protocols, and that any issues are unintentional or the result of third-party infrastructure providers.

This dispute comes at a time when AI companies are under increasing scrutiny for how they collect and use data from the open web. Many publishers and website owners are concerned about AI bots scraping their content without permission, especially as AI-powered tools become more prevalent and influential. Cloudflare’s claims add fuel to the ongoing debate about the responsibilities of AI companies and the rights of website owners.

For now, Cloudflare says it will continue to monitor Perplexity’s activities and work with website owners to help them block unwanted crawlers. The company also encourages transparency from all AI firms about how they collect data and interact with the web.