Posts tagged as:

Search Engine Optimization

A rel=canonical corner case

by Matt Cutts on May 17, 2011

I answered an interesting rel=canonical question over email today and thought I’d blog about it. If you’re not familiar with rel=canonical read these pages first. Then watch this video about rel=canonical vs. 301s, especially the second half:

Okay, I sometimes get a question about whether Google will always use the url from rel=canonical as the preferred url. The answer is that we take rel=canonical urls as a strong hint, but in some cases we won’t use them:
- For example, if we think you’re shooting yourself in the foot by accident (pointing a rel=canonical toward a non-existent/404 page), we’d reserve the right not to use the destination url you specify with rel=canonical.
- Another example where we might not go with your rel=canonical preference: if we think your website has been hacked and the hacker added a malicious rel=canonical. I recently tweeted about that case. On the “bright” side, if a hacker can control your website enough to insert a rel=canonical tag, they usually do far more malicious things like insert malware, hidden or malicious links/text, etc.

I wanted to talk today about another case in which we won’t use rel=canonical. First off, here’s a thought exercise: should Google trust rel=canonical if we see it in the body of the HTML? The answer is no, because some websites let people edit content or HTML on pages of the site. If Google trusted rel=canonical in the HTML body, we’d see far more attacks where people would drop a rel=canonical on part of a web page to try to hijack it.

Okay, so now we come to another corner case where we probably won’t trust a rel=canonical: if we see weird stuff in your HEAD section. For example, if you start to insert regular text or other tags that we normally only see in the BODY of HTML into the HEAD of a document, we may assume that someone just forgot to close the HEAD section. We don’t allow rel=canonical in the BODY (because as I mentioned, people would spam that), so we might not trust rel=canonical in those cases, especially if it comes after the regular text or tags that we normally only see in the BODY of a page.

But in general, as long as your HEAD looks fairly normal, things should be fine. If you really want to be safe, you can make sure that the rel=canonical is the first or one of the first things in the HEAD section. Again, things should be fine either way, but if you want an easy rule of thumb: put the rel=canonical toward the top of the HEAD.



{ Comments on this entry are closed }

Search Engineering at Google

by Matt Cutts on May 8, 2011

I’m always a fan of Googlers doing more communication and more videos, so when some fellow search quality folks made a video about working at Google, I said I’d be happy to post it:

You can find out more info and apply to be a search engineer at Google if you’re interested.



{ Comments on this entry are closed }

Overdoing url removals

by Matt Cutts on April 3, 2011

If you have a lot of urls that you don’t want in Google anymore, you can make the pages return a 404 and wait for Googlebot to recrawl/reindex the pages. This is often the best way. You can also block out an entire directory or a whole site in robots.txt and then use our url removal tool to remove the entire directory from Google’s search results.

What I would not recommend is sending tons (as in, thousands or even tens of thousands) of individual url removal requests to the url removal tool. And I would definitely not recommend making lots (as in, dozens or even more) of Webmaster Central accounts just to remove your own urls. If we see that happening to a point that we consider excessive or abusive, we reserve the right to look at those requests and responding by e.g. broadening, cancelling, or narrowing the requests.

So if you’re sending huge numbers of requests to our url removal tool, it might be a good idea to take a step back and ask whether you should be removing at the directory level instead.



{ Comments on this entry are closed }

An interesting essay on search neutrality

by Matt Cutts on January 25, 2011

(Just as a reminder: while I am a Google employee, the following post is my personal opinion.)

Recently I read a fascinating essay that I wanted to comment on. I found it via Ars Technica and it discusses “search neutrality” (PDF link, but I promise it’s worth it). It’s written by James Grimmelmann, an associate professor at New York Law School. The New York Times called Grimmelmann “one of the most vocal critics” of the proposed Google Books agreement, so I was curious to read what he had to say about search neutrality.

What I discovered was a clear, cogent essay that calmly dissects the idea of “search neutrality” that was proposed in a New York Times editorial. If you’re at all interested in search policies, how search engines should work, or what “search neutrality” means when people ask search engines for information, advice, and answers–I highly recommend it. Grimmelmann considers eight potential meanings for search neutrality throughout the article. As Grimmelmann says midway through the essay, “Search engines compete to give users relevant results; they exist at all only because they do. Telling a search engine to be more relevant is like telling a boxer to punch harder.” (emphasis mine)

On the notion of building a completely transparent search engine, Grimmelmann says

A fully public algorithm is one that the search engine’s competitors can copy wholesale. Worse, it is one that websites can use to create highly optimized search-engine spam. Writing in 2000, long before the full extent of search-engine spam was as clear as it is today, Introna and Nissenbaum thought that the “impact of these unethical practices would be severely dampened if both seekers and those wishing to be found were aware of the particular biases inherent in any given
search engine.” That underestimates the scale of the problem. Imagine instead your inbox without a spam filter. You would doubtless be “aware of the particular biases” of the people trying to sell you fancy watches and penis pills–but that will do you little good if your inbox contains a thousand pieces of spam for every email you want to read. That is what will happen to search results if search algorithms are fully public; the spammers will win.

And Grimmelmann independently hits on the reason that Google is willing to take manual action on webspam:

Search-engine-optimization is an endless game of loopholing. …. Prohibiting local manipulation altogether would keep the search engine from closing loopholes quickly and punishing the loopholers–giving them a substantial leg up in the SEO wars. Search results pages would fill up with spam, and users would be the real losers.

I don’t believe all search engine optimization (SEO) is spam. Plenty of SEOs do a great job making their clients’ websites more accessible, relevant, useful, and fast. Of course, there are some bad apples in the SEO industry too.

Grimmelmann concludes

The web is a place where site owners compete fiercely, sometimes viciously, for viewers and users turn to intermediaries to defend them from the sometimes-abusive tactics of information providers. Taking the search engine out of the equation leaves users vulnerable to precisely the sorts of manipulation search neutrality aims to protect them from.

Really though, you owe it to yourself to read the entire essay. The title is “Some Skepticism About Search Neutrality.”



{ Comments on this entry are closed }