Why Broken Link Checker Reports Working Links as Broken (And How to Fix It)
Content
One of the most common issues users encounter with the Broken Link Checker plugin is the frustrating experience of it flagging perfectly functional links as broken. This often manifests as links returning HTTP status codes like 403 Forbidden, 401 Unauthorized, 500 Internal Server Error, or simply timing out. If you're seeing a sudden influx of these false positives, you're not alone. This guide explains the common reasons behind these false reports and the steps you can take to resolve them.
Why Does This Happen?
Broken Link Checker is a powerful tool, but it relies on successfully connecting to and receiving a valid response from a remote server. Several factors can interfere with this process, causing a working link to appear broken from the scanner's perspective.
- Server Security & Firewalls: Many modern websites employ security measures like Cloudflare's Bot Fight Mode, Wordfence, or other firewall rules. These systems are designed to block automated traffic, which can include the Broken Link Checker's scanning requests. A 403 Forbidden or 401 Unauthorized error is a classic sign of being blocked by a security rule.
- Aggressive Scanning: In its local version, the plugin could sometimes make requests from your server's IP address. If this happens too frequently while scanning a site, that IP can be temporarily blacklisted by the remote server, leading to timeouts or 5xx errors.
- Query Parameter Stripping: The cloud scanner (BLC 2.0+) removes query parameters from URLs by default to improve performance. However, this breaks links that rely on those parameters to function correctly, such as YouTube playlists or Google Calendar links, resulting in 404 errors.
- Server Configuration: Proxies, specific server settings, or even the use of self-signed SSL certificates on a development site can prevent the scanner from successfully completing a connection.
Common Solutions to Try
1. Address Security & Firewall Conflicts
If you are seeing many 403 Forbidden errors, the first step is to check if the remote site uses a service like Cloudflare. The Broken Link Checker team has been asked to submit their details to Cloudflare as a "good bot," but in the meantime, there is little an individual user can do for outgoing checks to other sites. For your own site's firewall (e.g., Wordfence), you may need to whitelist the Broken Link Checker cloud scanner's IP addresses and user agent to prevent it from being blocked.
2. Recheck and Dismiss Links
For a small number of false positives, you can manually recheck and dismiss them. In the cloud dashboard, click on the individual link and use the "Ignore" or "Dismiss" option. For a large number of links, this process can be tedious, and a bulk action option is a frequently requested feature from users.
3. Understand Cloud vs. Local Scanner Behavior
The cloud-based scanner in BLC 2.0 operates differently from the old local version. It scans all links it finds on the front end of your site and strips query parameters by default. Be aware that this can lead to more false positives for certain types of links. Manually ignoring these links is currently the primary workaround.
4. Check for IP Blocking (Local Scanner)
If you are using the local version of the plugin and suddenly see a high number of timeouts or 403 errors on links that were previously fine, check your server's access logs. The IP address making the requests may have been banned by remote websites. The solution often involves waiting for the ban to be lifted or, in some complex cases, configuring the plugin to use a proxy server.
When to Seek Further Help
If you have tried the steps above and are still experiencing a high volume of inaccurate broken link reports, the issue may be more complex. The community on the Broken Link Checker support forums can be a valuable resource. When asking for help, be prepared to provide specific examples of the URLs being flagged and the error codes they are returning.
Remember, while false positives are annoying, they are often a sign of robust security on the web. A combination of understanding the scanner's limitations and proactively managing your ignore list is key to maintaining an efficient workflow with Broken Link Checker.
Related Support Threads Support
-
Large number of timeouts being reportedhttps://wordpress.org/support/topic/large-number-of-timeouts-being-reported/
-
BLC is auditing subdomain of main domainhttps://wordpress.org/support/topic/blc-is-auditing-subdomain-of-main-domain/
-
Scan initiation failedhttps://wordpress.org/support/topic/scan-initiation-failed/
-
False broken linkhttps://wordpress.org/support/topic/false-broken-link-3/
-
How do we tell when indexing was last run?https://wordpress.org/support/topic/how-do-we-tell-when-indexing-was-last-run/
-
BLC used for phishinghttps://wordpress.org/support/topic/blc-used-for-phishing/
-
How do I find out what 500 errors are about?https://wordpress.org/support/topic/how-do-i-find-out-what-500-errors-are-about/
-
BLC Operationshttps://wordpress.org/support/topic/blc-operations/
-
Nuclear Option?https://wordpress.org/support/topic/nuclear-option/
-
BLC ignores /etc/hostshttps://wordpress.org/support/topic/blc-ignores-etc-hosts/
-
Look for links in option for BLC 2.0https://wordpress.org/support/topic/look-for-links-in-option-for-blc-2-0/
-
BLC 2.0 100% free but $250?https://wordpress.org/support/topic/blc-2-0-100-free-but-250/
-
BLC dauert zu langehttps://wordpress.org/support/topic/blc-dauert-zu-lange/
-
Only detecting 3 linkshttps://wordpress.org/support/topic/only-detecting-3-links/
-
BLC plugin is set to Local Engine and can not perform any Cloud Engine action.https://wordpress.org/support/topic/blc-plugin-is-set-to-local-engine-and-can-not-perform-any-cloud-engine-action/
-
Could BLC be behind the IP spamming our server ?https://wordpress.org/support/topic/could-blc-be-behind-the-ip-spamming-our-server/
-
BLC keeps reporting links that are not brokenhttps://wordpress.org/support/topic/blc-keeps-reporting-links-that-are-not-broken/
-
Setting a proxy user and passwordhttps://wordpress.org/support/topic/setting-a-proxy-user-and-password/
-
User Accesshttps://wordpress.org/support/topic/user-access-22/
-
BLC identifying deleted links as broken linkshttps://wordpress.org/support/topic/blc-identifying-deleted-links-as-broken-links/
-
BLC reporting working links as brokenhttps://wordpress.org/support/topic/blc-reporting-working-links-as-broken/
-
BLC works for main domain but not for subdomainhttps://wordpress.org/support/topic/blc-works-for-main-domain-but-not-for-subdomain/
-
BLC 2.0 False Positives on common services with manadory query parametershttps://wordpress.org/support/topic/blc-2-0-false-positives-on-common-services-with-manadory-query-parameters/
-
Another instance of BLC is already workinghttps://wordpress.org/support/topic/another-instance-of-blc-is-already-working/
-
BLC 2.0 Fails to initialize on HTTPS sites using a self-signed certificatehttps://wordpress.org/support/topic/blc-2-0-fails-to-initialize-on-https-sites-using-a-self-signed-certificate/
-
BLC 2.0 false positives when issued a CloudFlare Challenge page during scanhttps://wordpress.org/support/topic/blc-2-0-false-positives-when-issued-a-cloudflare-challenge-page-during-scan/
-
A couple of questions having moved to BLC Cloudhttps://wordpress.org/support/topic/a-couple-of-questions-having-moved-to-blc-cloud/
-
CloudFlare Bot Fight Mode Conflicthttps://wordpress.org/support/topic/cloudflare-bot-fight-mode-conflict/
-
Cannot remove cookie set by BLChttps://wordpress.org/support/topic/cannot-remove-cookie-set-by-blc/
-
Large log filehttps://wordpress.org/support/topic/large-log-file-3/