REPORT BROKEN LINKS

Like most large websites, Wikipedia suffers from the phenomenon known as link rot, where external links become dead, as the linked web pages or complete websites disappear, change their content, or move without HTTP redirection. This presents a significant threat to Wikipedia's reliability policy and its source citation guideline.

REPORT BROKEN LINKS

Download Zip

Links added by editors to the English Wikipedia mainspace are automatically saved to Wayback Machine within about 24 hours (nb. in practice not every link is getting saved for various reasons). This is done with a program called "NoMore404" which Internet Archive runs and maintains; other language wiki sites are included. It monitors EventStreams API, extracts new external URLs and adds a snapshot to the Wayback. This system became active sometime after 2015, though previous efforts were also made. Also, sometime after 2012, archive.today (aka archive.is) attempted to archive all external links then existing on Wikipedia at that time. This was incomplete but a significant number of links were added to archive.today during this period making it a major archival source filling in gaps of coverage. Archive.today is still making some automated archives as of 2020, though the extent of coverage and frequency is unknown.

As of 2015, there is a Wikipedia bot and tool called WP:IABOT that automates fixing link rot. It runs continuously, checking all articles on Wikipedia if a link is dead, adding archives to Wayback Machine (if not yet there), and replacing dead links in the wikitext with an archived version. This bot runs automatically but it can also be directed by end users through its web interface. It is available when viewing any page's history, located near the top of the page on the line of "External Tools", with the "Fix dead links" option.

As of 2015, the periodic bot WP:WAYBACKMEDIC checks for link rot in the archive links themselves. Archive databases are dynamic: archives move or go missing, new ones are added, etc. This bot maintains existing archive links on English Wikipedia. It also archives resources on request at WP:URLREQ. It is a flexible tool that can carry out many custom jobs such as URL migration/move, usurped domains, soft-404 discovery and repair.

Check for archived versions at one of the many web archive services. The "Big 3" archive services are web.archive.org, webcitation.org and archive.today. These account for over 90% of all archives on Wikipedia, with web.archive.org being over 80% of all archive links. Other archive services are listed at WP:WEBARCHIVES.

Place date=March 2023 after the dead citation, immediately before the tag if applicable, leaving the original link intact. Marking dead links signals to editors and to link rot bots that this link needs to be replaced with an archive link. Placing dead link also auto-categorizes the article into Articles with dead external links project category, and into specific monthly date range category based on date= parameter. Do not delete a citation just because it has been tagged with dead link for a long time.

This crawl overview pane updates while crawling, so you can see there number of client error 4XX links you have at a glance. In the instance above, there are 9 client errors which is 0.18% of the links discovered in the crawl.

This is a useful post for finding broken links within the website, what about links pointing outwards that are broken? I can use a free web service but wondered if this was possible within screaming frog.

Hello, i was wondering, if after getting the broken links lists it is possible to export them altogether with the inLinks info. In that way you have in one report, the broken link and where in my page it is located. Thanks

Hi,

I am sorry i need more details about this one too. When we run a report in crawl mode, go to response codes. Does it also scan the img src tags?

I wonder if it is possible to detect the broken images inside img src.

thank you.

Hi Dan,

when I crawl the hyp-a.de, I get a list of all external links of the whole website. But the page -a.de/sponsoring/nachwuchsfrderung.html contains a broken link to This link is not discovered and not marked as broken.

I think those are the main points! Obviously the SEO Spider will also report on all response codes (like redirects, whether they are 301s or 302s, server errors etc) and plenty of other data at the same time anyway, so the tool is more than just finding broken links.

Someone recommended Xenu for checking for broken links, then I remembered this post. I use Screaming Frog almost daily now when quoting clients and diagnosing on-page SEO issues. Thanks again for making such a great tool.

Is it possible to show the text that a broken link is linked to? So far it seems as if the Broken Link Report produces a list of broken links, but does not show the text that the broken links were found in.

I tried many tools to find broken links for my projects. But this one is really great. Interface and function of this software is itself great and helps to fix 404 error very efficienlty.. Thanks you!

When I use Google Search Console / Webmaster Tools, it reports that I have 138,000 broken urls on my website. When I did a crawl using ScreamingFrog and filtered for 404 errors, I find fewer than 2,000. What explains this discrepancy?

I think scream frog is one of the best software out there for on-page analysis. Thank you. One question. How does it find broken links pointing to a website (404 errors) does scream frog use a third party software like ahrefs? Thanks

Hi guys, I use Screaming Frog Tool to find attactive expired domains, which I use to build my PBNs. We can also use found domains to get some strong BLs by 301 redirection. SF seems very usefull for searching expired links in edu or gov websites. Regards, Pawel

May I know if your plugin can do the following?

1) Search for broken links on a website and do it with different IP addresses? Reason is that my site has links that redirect users to different links based on their their country. So I would need to check whether the various redirects when users click on the link work as well.

2) Searches hidden pages on my websites for broken links.

Great Tutorial. Thank you very much for detailed guidance. It is very helpful. I love the screaming frog tool for finding broken links and doing keyword research on seasonal pages, really speeds up the process. :)

Is this tool helpful when u are checking for broken links in an intranet portal of a software vendor website which require login credentials everytime you want to access the webpages, for example broken links for partnerworld internal portal!!

Great tutorial with very detailed instruction! I have a question, is it the 404 links reported by ScreamingFrog are the same with the 404 crawl errors reported by the Google Search Console? Thank you! :)

Thoughtfully designed tool to get the complete info on broken link. I used this tool for my site and got really amazing results. Now I have no worry to think about broken links about any site. I have any single tool with this perfection. Highly recommended. And the comments section too very helpful and thanks to the author for addressing each and every comment.

I love your tutorial. We started using Screaming frog for 3 years. Its a no brainer and one of my favorite tools.

At first it was a lil bit confusing, but after a short time of using i use the different options intuitively. Screaming frog has a lot of features to optimize and identify technical problems of my clients websites. The technical analysis ist comprehensive but totally necessary. I use a lot of the analysis of offpage Seo and broken Links Checker. Thank you for your tutorial.

If you want the link checker to exclude checking certain links or domains (like your library catalog or any proxied links), add those links / domains to the Exclusions list! We recommend adding domains for any sites that require a login, as the Link Checker does not have any way to log into the site to verify that the link is valid, so it will always be listed as "broken" in your report (a.k.a., a false positive).

The link checker is enabled 24/7, checking for broken links every 30 minutes on the hour and half-hour in all regions except the U.S. For customers in the U.S. region, the link checker is enabled between the following hours to optimize system performance:

To optimize performance of the link checker, links are only checked on visited pages in published or private guides while the link checker is enabled (see above section for details). When a user views a page, it will be scanned by the link checker the next time it is scheduled to run. That page will not be scanned again for 30 days following its last scan.

Any regular or admin user can access the link checker report. Although regular users can see every link listed in the report, they can only modify links for the assets they own. Admin users, however, can modify every link because they have the ability to edit any asset in your system.

It can also return false positives if it's checking links that require a password before the page loads. Since the Link Checker cannot log into that site to verify the page, it returns the link as "bad". Or the server it's checking might refuse the connection altogether, as noted in the bullet points above. Please add domains that require logins to the Exclusions list.

Links can be dismissed from the report individually or in bulk. You may dismiss a link from the report after you have edited it ( and corrected the broken link), if it was a false positive that you can ignore, or for any other reason that you might not want to see it on the report. :)

If you find that you're adding multiple links from the same site to the exclusions list, you can add the site's domain name to the exclusions list. This will prevent all URLs from that domain from appear in future link checker reports.

38c6e68cf9

Page updated

Google Sites

Report abuse