WooCommerce sites are prone to aggressive crawling and unusual crawling of query strings, add to cart and add to wishlist links.
In this post and video we a share a robots.txt template you an use for your own site that will help reduce load on your site, improve site speed and improve your SEO through better crawling from Google.
Click this link to get the sample robots.txt file in the video: https://www.wpspeedfix.com/woocommerce-robots.txt
It’s also included inline below so you can copy and paste it.
Click play on the video to learn more – notes on the various recommendations in the video are below.
Head over to our FREE Site Audit page and provide some detail on where you’re stuff or what you’re looking to achieve and one of the team will review your site and tell you how we can help.
User-agent: *
#Added to slow down aggressive crawlers from causing a denial of service attack
Crawl-delay: 5
# Block crawling of logon pages
Disallow: /wp-admin/
Disallow: /*wp-login.php*
Disallow: /my-account/*
# Block search pages
Disallow: *s=*
Disallow: */search/*
# Block add to cart and wishlist links, some themes link directly to these and can cause high CPU usage
Disallow: *add-to-cart*
Disallow: *add_to_wishlist*
# Block common query strings, you may want to block other filter strings if you theme has a sidebar filter
Disallow: *currency=*
Disallow: /*?orderby*
# Block the WordPress feed URLs
Disallow: */feed/*
# Block Plesk and Cpanel smart update test site crawling
Disallow: *wp-toolkit*
# We need this allow link because we blocked wp-admin earlier
Allow: /wp-admin/admin-ajax.php
# Change this to your sitemap link
Sitemap: https://www.yourdomain.com/sitemap_index.xml
# Block aggressive SEO tools
# See more at https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/robots.txt/robots.txt
User-agent: AhrefsBot
Disallow: /
User-agent: Semrush
Disallow:/
User-agent: SemrushBot
Disallow:/
Table of Contents
Implement a Crawl Delay
Aggressive crawlers can chew up your server’s CPU resources. Adding a Crawl-delay directive tells bots they must wait a certain amount of time between hits. For example, a Crawl-delay: 5 means bots can only crawl once every 5 seconds. Not all crawlers honour the crawl delay, e.g. Google ignores it but that doesn’t mean its completely useless.
Note: For very large sites (thousands of products), a high delay might prevent full indexing, so adjust accordingly
Block Login and Account Pages
There is no reason for search engines to crawl standard WordPress login areas or customer account pages. You should specifically disallow:
/wp-admin/
/wp-login.php
/my-account/
Disallow Internal Search Results
Allowing Google to index your internal search results is often a vector for negative SEO attacks. It is best practice to block crawlers from accessing these dynamic search pages to prevent them from being indexed.
Stop “Add to Cart” & “Add to Wishlist” Crawling
A major performance killer on WooCommerce sites occurs when bots crawl “Add to Cart” or “Add to Wishlist” links. This forces the server to process these actions as if real customers were performing them 3 to 5 times a second, causing massive load spikes. These query strings should be disallowed.
Filter Out Dynamic Query Strings
WooCommerce uses many filters, such as ?orderby= or currency switchers. Generally, you do not want these crawled as they create duplicate content issues and waste crawl budget.
Caution: Ensure your site structure doesn’t rely on these filters for standard navigation before blocking them.
Essential Inclusions: Admin-Ajax and Sitemaps
While you want to block most of /wp-admin/, you must allow /wp-admin/admin-ajax.php. WooCommerce relies heavily on Ajax for functionality like product variations, and blocking it can break your site’s features for both users and bots.
Always include a direct link to your sitemap_index.xml at the bottom of your robots file so crawlers can easily find your preferred URLs.
Block Aggressive SEO Tools
Tools like Ahrefs and Semrush can be very aggressive when crawling. If you don’t need their data, blocking them can save server resources and keep your site’s competitive data private.
Don’t Use Search Queries or Filters In Your Menu Structure
It’s not mentioned in the video but we regularly see sites that have built their menu system in WooCommerce partly using URLs that are a search string. In some cases, menus are linking to filters directly. This is generally a bad idea for speed AND SEO because these pages are not cached by default and they’re also not real pages so not counted from a SEO perspective.
Setting up caching for these query strings can help in some instances but you’re better off using real pages instead of query strings. How to achieve this really depends on the number of products, types of products and how you’re using the filters and searches. Broadly speaking, product categories and product tags are generally the best way to create various groupings of products.
Use Cloudflare Firewall Rules To Protect From Malicious Crawlers
While genuine crawlers like Google’s crawler bot honor the robots.txt file, many crawlers do not and will continue to aggressively crawl add-to-cart URLs and other URLs they shouldn’t be.
For this reason we also use Cloudflare Firewall rules to filter this traffic. If you check this post on Cloudflare Firewall Rules for WordPress it’ll walk you through some of the most common rules we use for WordPress and WooCommerce sites.