Danny Yee >> Travelogues >> Oxford Blog >> Technology
spires from Carfax

Google no longer likes my book reviews :-(

Books + Ideas, Technology — June 2011

My web site dannyreviews.com has ranked well in Google searches for more than a decade, but in the last few months Google has taken a dislike to it. This appears to be a result of the search algorithm change called "Panda".

  • Many (maybe half?) of my reviews have been removed from Google's index in favour of duplicate copies on other web sites. For example, a search like "Her Smoke Rose Up Forever Danny Yee review should find the review on my own web site but instead finds a duplicate copy. (Another example: "dirt the erosion of civilizations danny yee")
  • My web site used to rank in the top ten for a search on "book reviews", but now ranks around 60th. This kind of drop is apparently a "-50 penalty".
  • overall traffic to my book reviews is down more than 60%.

So it looks like Google thinks I'm some kind of search engine spammer and is penalising my content accordingly. I haven't changed my navigation structure at all in recent years, and certainly haven't done anything spammy, but it seems I'm a false positive (or "collateral damage") in Google's war on spam.

I had always assumed that, since my site has been around forever - at an earlier location which still 301 redirects, it goes back to 1994! - and is hardly "thin content", it would never have problems like this.

What to do?

I never used to worry about duplicate content much, since I figured Google would always be smart enough to know that a) my own copies of my reviews came first and b) my site was the "more significant" one. But it seems that I can't rely on that any more, so maybe I need to start sending out a lot of DMCA takedown notices <urgh>.

I've never really done any SEO (Search Engine Optimisation). The obvious stuff - good TITLEs, simple spiderable structure, etc - I mostly got right from the very beginning. (Though if I could go back in time I'd abandon the mixed-case and underscores in URLs.) And attempting to game Google always seemed like a stupid arms race to me. My approach to "optimisation" has been to optimise for my readers and assume that that would (most of the time) work as Google optimisation, since Google was aiming at the same target (useful content).

And I've never done much in the way of PR, with self-advertisement really not being my thing. A problem here is that everyone wants links, so it's not clear that getting more of them is going to help convince Google I'm not a spammer (or a "content farm").

I'm reluctant to change too much now, both from fear of triggering other problems and because I've already fine-tuned my site for what I consider optimal usability.

It would be nice if there were some way to provide feedback to Google so they can tweak their algorithm to avoid this kind of false positive, but given that everyone who is affected (including spammers and borderline cases) would swamp any such feedback mechanism, it's not clear how they can provide one. But I have submitted a "reconsideration request" using Google's webmaster tools [response: "no manual penalty is in place"], I have a friend at Google who might be able to find me a contact, and I can always hope that someone like Matt Cutts takes an interest.

Update

I started discussions about this on Google's webmaster help forum, an Amazon Associates forum, and Webmaster World.

  • The most popular recommendation is that I should update the site aesthetics and make it look more modern (Google having taken against old sites that are resting on their laurels, users preferring more features, etc). I've been planning a front page redesign anyway, so adding more text to the front page is at least feasible. (Any idea where I can find a "designer" who can cope with a "no images" constraint?)

    There are a few people (thanks Durant!) who understand that the site looks the way it does for a reason. And I'm not convinced that every web site needs to be "social networking enabled".

  • There's a general consensus that I should whack any and all duplicate copies with DMCA notices. This still seems like an endless fight to me, and I think Google should be able to distinguish the original/principal copy from duplicate content fairly reliably, but I've sent a notice off to the hosting service of the most prominent site copy.
  • The Webmaster World posters lean more to "it's not broken, don't try to 'fix' it".
  • No one seconded my theory that I have too many link pages. But I've removed the 1000+ "link to all reviews" title and index pages (leaving only the "full chronological index" doing that). Those pages were unwieldy anyway, and my link checker used to complain about them.

Manual trackback: Literary Saloon

20 Comments

  1. Thinking about what lexically distinguishes your content from that of other genuine content websites, perhaps your vocabulary is too diverse and this generates a false positive for word salad.

    Comment by Mark L — June 2011
  2. A lexical explanation is possible, but seems less likely to me than a "link" one.

    It seems that internal links on my web site are no longer being "counted" - that's why reviews that have no external (incoming) links are not being ranked at all. So my guess is that I'm deemed to have too many internal links to each review, all with the same or very similar anchor text - and too many links to the top page all with "book reviews" in them, hence the penalty there.

    I have 1200+ reviews, but nearly 400 index pages (subject pages, publisher pages, author pages, and the 50 author and title letter pages), which may be considered too many. They all seem to have a place, though! I could delete the alphabetical author and title indices, I guess, since search will mostly work there. And I could delete the "boring" publisher pages and just keep the ones for small independent publishers and university presses.

    It seems to me that the fact that the site has grown organically - with maybe four to ten new pages every month over more than a decade - and that it has original content and plenty of links from a broad range of other sites should be enough to convince Google that I'm not a "content farm". OTOH, the people producing automated content farms have an incentive to make them hard to distinguish from sites like mine, and maybe they can replicate the incremental growth, auto-generate plausible looking content one way or another, and buy, beg or steal the links from other sites.

    Comment by danny — June 2011
  3. Or maybe, thanks to your help, Mark, my reviews have unnaturally low levels of spelling and grammar mistakes, suggesting generation by a robot rather than a human.

    Comment by danny — June 2011
  4. Danny you may be a victim of providing what the user wants right away.

    Add a new template and on the bottom of each review add a links to 4-5 related (if possible) reviews. Use a freelancer if needed for the script and template, cuts the costs.

    Comment by Name — June 2011
  5. I can't imagine my meagre suggestions have any effect given the exceptionally low level of grammar and spelling errors to start with, especially in comparison to the typical standards out there on the web.

    Comment by Mark L — June 2011
  6. Most of the people at WebmasterWorld don't have a clue when it comes to real search engine optimization. They're mostly into gaming Google.

    You don't mention when you lost your traffic. The timing of the traffic loss is critical to understanding WHY you lost the traffic. If your traffic loss occurred around one of the known Google Panda updates (February 23, April 11, May 9) it could be that those 1000+ link pages were indeed your problem -- and Google has said there is no recovery in sight (yet) for Panda-downgraded sites, although *some* sites may recover with Panda 2.2.

    One issue that I see with your site is use of negative pixel placement in your style, which Google decided they would no longer tolerate last year. Whether that caused your problem is anyone's guess. It's not like Google changes the rules one day and the next day all infractions are targeted.

    Your page titles don't seem very optimal to me, but then I don't know what queries you're trying to rank for.

    When I search on random book titles I see some pretty highly respected (and well-linked) sites ranking. If I add the words "book review" after the title, I STILL see some pretty highly respected/linked sites. It could be you're no longer able to compete in that arena for lack of authority.

    And though you say you have never done any SEO for the site it could be that some of the link value pointing to the sites that do link to you has declined, and hence they are no longer able to help you with their reduced value/authority.

    One think you might consider doing is moving your Amazon links to an iframe. It's conceivable (though I think doubtful) that your site has been tagged as made-for-advertising.

    Are any pages on your site still performing as previously? Have any pages improved over previous referral traffic? How are your new reviews doing? I see one of the front page for a query for the book name. That seems good but would you expect it to be higher?

    Comment by Michael Martinez — June 2011
  7. Michael, the drop off started in mid-April, and the symptoms seem similar to those other people are describing, so it looks Panda-related.

    There's no negative pixel placement in the book review styling. It's single-column with no frills! (There might be some on this site, as I didn't realise it was frowned on.)

    I've spent a fair bit of time fine-tuning my page titles, given the constraints that they are auto-generated. Surely "<book title> (<author>) - book review" can't be that bad!

    I'm sure the value of my incoming links has declined, along with my "early mover" advantage generally. But that would explain a slow drop in rankings and traffic, not being supplanted by random duplicates or relegation to nowhere at all.

    It's hard to tell how individual queries have changed, as I haven't tracked them so I have no baseline. I don't necessarily expect to rank at the top of search results when there are real authorities - not to mention Amazon! - but I expect to - and always used to - rank ahead of sites with no original information.

    Comment by danny — June 2011
  8. Danny, "There's no negative pixel placement in the book review styling." -- There's at least one occurrence on dannyreviews.com. I don't know how widespread it is. Nor can I argue whether it did or did not contribute to your ranking loss.

    "I've spent a fair bit of time fine-tuning my page titles, given the constraints that they are auto-generated. Surely ' () - book review' can't be that bad!"

    Good? Bad? I don't know your query space. But if people are not searching on that kind of expression, then it's not helping. For example, there is a book title on your front page, "Talking about Detective Fiction". If I plug that title into the Google Adwords Keyword Suggestion Tool, I see query traffic for that title only (and not much at that).

    "I'm sure the value of my incoming links has declined, along with my 'early mover' advantage generally. But that would explain a slow drop in rankings and traffic, not being supplanted by random duplicates or relegation to nowhere at all."

    Not necessarily. The fact your traffic tanked in mid-April does suggest Panda could be your issue. But does that mean your site was downgraded by Panda or that the sites linking yours were downgraded by Panda or that both happened or is it that something else happened?

    You can't assume anything, really.

    And without a baseline for search referral traffic, you're kind of stuck for figuring out what happened.

    Comment by Michael Martinez — June 2011
  9. You're right, there's a margin: -1px on the top page. I'll try to get rid of that in the redesign. (But are negative margins really a problem? Do you have a source for this?)

    One effect of the Panda filter is to downgrade (or completely remove/relegate) my reviews when there are any duplicates at all. Mostly the duplicates don't rank anywhere (because they're on low-profile sites and have no links), but their mere presence is enough to push my copies out of the index. Consider, for example, the page http://tech.groups.yahoo.com/group/bsg-goa/message/1395 - that doesn't show up anywhere on general searches, but throw in my name or some text specific to my review, and that's what comes up (and my own copy doesn't).

    I can't really go around DMCAing every copy of my reviews anywhere, as that would take forever. And it's actually nice to see my reviews being forwarded to the Botanical Society of Goa mailing list!

    When things were normal with my web site (as they were for ten years), none of these copies appeared in Google SERPs at all, or if they did they ranked nowhere and didn't interfere with the original copies on my web site.

    I have all the server logs ever for my site, so I can track referral changes. But it's pretty obvious what's gone wrong.

    Comment by danny — June 2011
  10. Michael, I'm still not sure what you're saying about my review TITLEs. They are of the form "[book title] ([author]) - book review" (if there's enough room in a 60 character title). An example is
    "A Song of Ice and Fire (George R.R. Martin) - book review"

    Since these are auto-generated (from the bibliographic information with each review), I'm hard put to see how this could be significantly improved on.

    The aim of my TITLEs is not to attract random Google traffic - that would make me a spammer! - but to accurately describe my pages so that searchers (and others) know whether they are of interest or not.

    Comment by danny — June 2011
  11. Ignore Michael,
    he's a troll, going from site to site to comment without having a clue.

    if you had a drastic move around April 11th it's Panda, this is specific Panda issues not that links GRADUALLY loses their power.

    What you could do is change the title to better guess what the user are searching for, maybe 'Book title review - Book Tile by FirstName LastName.'

    Maybe, just maybe your template is similar to those sitesell crap? Google is letting a robot do the ranking, remember, and long pages without columns have a strike against them?

    Comment by Name — June 2011
  12. Sitesell? It's possible "single column" could be something picked up by a machine learning algorithm trained on spam, but so could any other feature. Actually, given this is probably some kind of neural net or otherwise non-linear, it's likely to be the interaction of different features that matters: I have some combination of single-column layout plus lexical complexity plus Amazon links plus link structure plus backlink distribution plus ... that just happens to match the approach taken by some spammers. (Or perhaps it's not just bad luck, if there are spammers who have actively modelled content farms on sites like mine.)

    Putting "book title" twice into the TITLE doesn't really leave room for anything else, and seems redundant anyway. I've actually put a lot of thought into this. I have some hideously ad hoc code that tries to make the best use possible of the space in TITLEs, down to stripping out middle initials in author names or dropping editors (but not authors) to surnames, adding "book review" if that will fit or just "review", and so forth.

    I've also thought about whether line-height should be 1.5em or 1.4em for optimal readability, what the width of the review text should be, and a pile of other such issues, so it's always a bit frustrating when people confuse "lack of frills" with "not designed"!

    Comment by danny — June 2011
  13. Danny, want to rank or what :)?
    Simple design didn't help you with Panda. So add another combo in title and mention those words a few times in each page.

    Just try things.

    Comment by Name — June 2011
  14. See my following post in this blog! Chasing Google's current search results is stumbling around in the dark trying to catch a moving target. Can it really make sense to "just try" effectively random things?

    I'll stick to optimising my web site for my users, which means using the most informative TITLEs possible. And putting words into my reviews for external reasons is definitely not happening - I'm really, really picky about my choice of words anyway, and will never use a word that sounds good or reads well if it's not accurate.

    Comment by danny — June 2011
  15. Danny: "'A Song of Ice and Fire (George R.R. Martin) - book review'"

    The question is: Does anyone search for that expression? If not, are they searching for subsets of the expression? If not, then your autogenerated titles are not helping you. That's NOT search engine optimization.

    Danny: "The aim of my TITLEs is not to attract random Google traffic"

    That doesn't make you a spammer. Flooding the search engine with worthless content and links would make you a spammer.

    The average 1,000-word article is relevant to millions of random queries. You cannot NOT be relevant to random queries. What you want to do, IF YOUR GOAL IS TO OPTIMIZE FOR SEARCH, is to pick expressions that people search on and use them in your page titles and in your body copy.

    People DO search on book titles but ranking for an individual book title is not as easy as ranking for a more structured query.

    It's highly doubtful that a singal column layout is your problem. It could be that there is insufficient original commentary in your reviews to pass the Panda quality test.

    It could also be, as has been suggested, that there are too many ads on your pages. It's hard to say.

    In any event, I don't think your page titles are optimum -- especially since you point out that they are autogenerated.

    Comment by Michael Martinez — June 2011
  16. Danny: "But are negative margins really a problem? Do you have a source for this?"

    Read this post by Googler Maile Ohye from last year: http://maileohye.com/html-text-indent-not-messing-up-your-rankings/

    Comment by Michael Martinez — June 2011
  17. http://maileohye.com/html-text-indent-not-messing-up-your-rankings/

    This says "don't hide text", with "text-indent: -9999px" as one example.

    Comment by danny — June 2011
  18. More specifically, she advised (in a bright yellow box): "If possible, it’s still best to avoid techniques such as 'text-indent:-9999px' or 'margin:-4000px' or 'left:-2000em'."

    There is no point in arguing whether -1px is as significant as -9999px -- obviously there is a huge world of difference.

    But does the document classifier that looks at these things care? We don't know.

    I pointed it out as something to consider. I have simply been advising people to avoid using negative placements of any significance in their CSS because we DON'T know whether there is a built-in tolerance.

    It's just something for you to consider.

    Comment by Michael Martinez — June 2011
  19. That seems way too paranoid to me. You can also use CSS attributes like font and bgcolor to hide text, so maybe we should avoid those too...

    Comment by danny — June 2011
  20. I realise I'm unusually hard core about this, but as far as I'm concerned sticking words into web pages or TITLEs just to attract search engine traffic is spam.

    A word is either the best word to use in a particular context or it's not. It's the best word if it most accurately communicates what I want to convey. If it's not the best word, that it will attract searchers doesn't make it so.

    And I don't want random referrals from search engines anyway! There's no point attracting people who are only going to be disappointed. Many of my reviews are of obscure books about obscure topics and only people interested or potentially interested in those topics should be finding them in their search results.

    But I think this conversation is going around in circles now.

    Comment by danny — June 2011

TrackBack URI

Sorry, the comment form is closed at this time.

Technology << Oxford Blog << Travelogues << Danny Yee