The robots.txt file of the non-public weblog of Google’s John Mueller turned a spotlight of curiosity when somebody on Reddit claimed that Mueller’s weblog had been hit by the Useful Content material system and subsequently deindexed. The reality turned out to be much less dramatic than that but it surely was nonetheless a little bit bizarre.
From An X (Twitter) Submit To Subreddit
The saga of John Mueller’s robots.txt began when somebody tweeted about it on X (previously generally known as Twitter):
.@JohnMu FYI, your web site received utterly deindexed in Google. It appears Google went loopy 😱
H/T @seb_skowron & @ziptiedev pic.twitter.com/RGq6GodPsG
— Tomek Rudzki (@TomekRudzki) March 13, 2024
Then it received picked up on Reddit by somebody after the tweet who posted that John Mueller’s web site was deindexed, posting that it fell afoul of Google’s algorithm. However as ironic as that will be that was by no means going to be the case as a result of all it took was a number of seconds to get a load of the web site’s robots.txt to see that one thing unusual was occurring.
Right here’s the highest a part of Mueller’s robots.txt which contains a commented Easter egg for these taking a peek.
The primary bit that’s not seen daily is a disallow on the robots.txt. Who makes use of their robots.txt to inform Google to not crawl their robots.txt?
Now we all know.
The subsequent a part of the robots.txt blocks all engines like google from crawling the web site and the robots.txt.
In order that most likely explains why the positioning is deindexed in Google. However it doesn’t clarify why it’s nonetheless listed by Bing.
I requested round and Adam Humphreys, an internet developer and search engine optimisation(LinkedIn profile), steered that it could be that Bingbot hasn’t been round Mueller’s web site as a result of it’s a largely inactive web site.
Adam messaged me his ideas:
“Person-agent: *
Disallow: /topsy/
Disallow: /crets/
Disallow: /hidden/file.htmlIn these examples the folders and that file in that folder wouldn’t be discovered.
He’s saying to disallow the robots file which Bing ignores however Google listens to.
Bing would ignore improperly applied robots as a result of many don’t know find out how to do it. “
Adam additionally steered that perhaps Bing disregarded the robots.txt file altogether.
He defined it to me this manner:
“Sure or it chooses to disregard a directive to not learn an directions file.
Improperly applied robots instructions at Bing are probably ignored. That is essentially the most logical reply for them. It’s a instructions file.”
The robots.txt was final up to date someday between July and November of 2023 so it may very well be that Bingbot hasn’t seen the most recent robots.txt. That is smart as a result of Microsoft’s IndexNow net crawling system prioritizes environment friendly crawling.
One among directories blocked by Mueller’s robots.txt is /nofollow/ (which is a bizarre identify for a folder).
There’s principally nothing on that web page besides some web site navigation and the phrase, Redirector.
I examined to see if the robots.txt was certainly blocking that web page and it was.
Google’s Wealthy Outcomes tester did not crawl the /nofollow/ webpage.
See additionally: 8 Widespread Robots.txt Points And How To Repair Them
John Mueller’s Rationalization
Mueller gave the impression to be amused that a lot consideration was being paid to his robots.txt and he revealed an evidence on LinkedIn of what was occurring.
He wrote:
“However, what’s up with the file? And why is your web site deindexed?
Somebody steered it could be due to the hyperlinks to Google+. It’s potential. And again to the robots.txt… it’s advantageous – I imply, it’s how I need it, and crawlers can take care of it. Or, they need to have the ability to, in the event that they comply with RFC9309.”
Subsequent he stated that the nofollow on the robots.txt was merely to cease it from being listed as an HTML file.
He defined:
“”disallow: /robots.txt” – does this make robots spin in circles? Does this deindex your web site? No.
My robots.txt file simply has a variety of stuff in it, and it’s cleaner if it doesn’t get listed with its content material. This purely blocks the robots.txt file from being crawled for indexing functions.
I may additionally use the x-robots-tag HTTP header with noindex, however this manner I’ve it within the robots.txt file too.”
Mueller additionally stated this concerning the file dimension:
“The scale comes from checks of the varied robots.txt testing instruments that my group & I’ve labored on. The RFC says a crawler ought to parse at the very least 500 kibibytes (bonus likes to the primary one who explains what sort of snack that’s). You need to cease someplace, you can make pages which can be infinitely lengthy (and I’ve, and many individuals have, some even on goal). In follow what occurs is that the system that checks the robots.txt file (the parser) will make a lower someplace.”
He additionally stated that he added a disallow on prime of that part within the hopes that it will get picked up as a “blanket disallow” however I’m unsure what disallow he’s speaking about. His robots.txt file has precisely 22,433 disallows in it.
He wrote:
“I added a “disallow: /” on prime of that part, so hopefully that will get picked up as a blanket disallow. It’s potential that the parser will lower off in a clumsy place, like a line that has “permit: /cheeseisbest” and it stops proper on the “/”, which might put the parser at an deadlock (and, trivia! the permit rule will override when you’ve got each “permit: /” and “disallow: /”). This appears most unlikely although.”
Thriller Of Deindexed Web site Solved
It seems that it wasn’t the robots.txt that brought about Mueller’s web site to drop out. He had used Search Console to do this.
Mueller revealed what he did in a subsequent LinkedIn put up:
“I used the Search Console instrument to attempt one thing out. I’d make a quick restoration if I hit the suitable button :-).
Google has a people.txt. I don’t actually know what I’d place in mine. Do you’ve got one?”
Mueller’s web site swiftly returned to the search index.
Screenshot Displaying Mueller’s Web site Has Returned To Search Index
And there it’s. John Mueller’s bizarre robots.txt.
Robots.txt viewable right here: