You can click on any of the dates they have data for and see what the file included on that particular day. The Wayback Machine on also has a history of the robots.txt files for the websites they crawl. For instance, in the GSC robots.txt tester, if you click the dropdown, you’ll see past versions of the file that you can click and see what they contained. What I’d recommend is checking the history of your robots.txt file. Intermittent issues can be more difficult to troubleshoot because the conditions causing the block may not always be present. Your hosting provider may also give you access to a File Manager that allows you to access the robots.txt file directly. If you have FTP access to the site, you can directly edit the robots.txt file to remove the disallow statement causing the issue. Similar to Yoast, Rank Math allows you to edit the robots.txt file directly. If you’re using the Yoast SEO plugin, you can directly edit the robots.txt file to remove the blocking statement. Make sure ‘Search Engine Visibility’ is unchecked.This mistake is common on new websites and following website migrations. If the issue impacts your entire website, the most likely cause is that you checked a setting in WordPress to disallow indexing. How you do this varies depending on the technology you’re using. You’ll want to remove the disallow statement causing the block. However, if the problem appears to be resolved but appears again shortly after, you may have an intermittent block. It’s possible that someone already fixed the robots.txt block and resolved the issue before you’re looking into the issue. If your site is new or has recently launched, you may want to look for: User-agent: * There may be a specific user-agent mentioned, or it may block everyone. We have more information in our robots.txt article, but you’re likely looking for a disallow statement like: Disallow: / If you know what you’re looking for or you don’t have access to GSC, you can navigate to /robots.txt to find the file. The easiest way to see the issue is with the robots.txt tester in GSC, which will flag the blocking rule. Let’s go through these in the order you should probably be looking for them. But there are a few other scenarios where you may see messages saying that you’re blocked. The most likely cause is a crawl block in robots.txt. You need to figure out why Google can’t crawl the URL and remove the block. Just make sure proper canonicalization signals are in place, including a canonical tag on the canonical page, and allow crawling so signals pass and consolidate correctly. If the URL canonicalizes to another page, don’t add a noindex meta robots tag. Unless Google can crawl a page, they won’t see the noindex meta tag and may still index it because it has links. If you block a page from being crawled, Google may still index it because crawling and indexing are two different things. Just add a noindex meta robots tag and make sure to allow crawling-assuming it’s canonical. You can see that the first step is to ask yourself whether you want Google to index the URL. But there are a few additional conditions that can trigger the problem, so let’s go through the following troubleshooting process to diagnose and fix things as efficiently as possible: In most cases, this will be a straightforward issue where you blocked crawling in your robots.txt file. “Indexed, though blocked by robots.txt” tells you that Google has indexed URLs that you blocked them from crawling using the robots.txt file on your website.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |