Google search is bad and getting worse. Here's how search is evolving in the era of AI.

Google Bard VS OpenAI ChatGPT displayed on Mobile with Openai and Google logo on screen seen in this photo illustration. On 7 February 2023 in Brussels, Belgium.
Google's attempt at understanding what humans mean from a search is poor, writes Mason Pelt.
  • Mason Pelt writes about marketing and technology and has studied the evolution of search. 
  • He says Google search is worse than ever, and the company isn't financially incentivized to fix it.
  • Pelt also says that AI-powered search is problematic, especially at processing natural language. 

Google grants access to information kings didn't have 50 years before. I have consumed so much content — books, podcasts, movies, articles, songs, and possibly the PhD thesis of a woman from Chesapeake. I cannot remember it all.

Like most, I'll occasionally use Google to find a specific but only half-recalled crumb of content. Increasingly I use services other than Google because Google sucks as a search engine.

No, Grammarly, I don't mean "Google search could be better." Google search is worse than it was three years ago. 

People use Google search in two ways

People use Google to either find general information where any credible source is acceptable. Or they use Google looking for specific results.

Searching, "who is Neil Gaiman," or "list of the endless in the Neil Gaiman series" will likely give searchers the answers they seek. 

But ask with less specificity, incorrect information, and synonyms, "list of the eternals from the Marvel comics books by Neil Gaiman" and Google fails to return an answer about the DC Comics series The Sandman. 

A human could justifiably struggle to answer the same question. This is a fundamental limitation of indexing an evolving glob of information.

Indexing growing information is complex

You don't need to keep an index for a few books on a nightstand. If you have no memory of one or more books, just read the dust jackets. This solution doesn't scale.

At libraries with rooms of shelves crammed with books, indexing them is a process. Library classification is complex, but every book has its place. Staff spend their days' shelf reading, looking for out-of-place books, and putting them where they belong. 

Google is the shop with an index of the web. Per a Google help page, "that index is similar to an index in a library, which lists information about all the books the library has available." Instead of books, Google indexes webpages.

Google was the first search engine to use bibliometrics as part of an algorithm to sort and rank results based on quality and relevance to a search. People used this index of webpages to find the specific in the everything. The web has grown exponentially, shifting as pages are changed, deleted, replaced, and moved.

Ranking at the top of a highly searched term on Google can mean millions of dollars

It's like a high-profit marathon that never ends, and only pays out while you're winning. The incentives mean Google has been playing cat and mouse with marketers trying to beat the algorithm since the early days.

For a few years now, Google, the Kleenex of online search, has been observed as worse than it once was. Marissa Mayer, a former Google and Yahoo executive, implied in an interview with Freakonomics Radio that Google's rapid quality decline is the result of a larger, and lower average quality internet.

Between the volume of information on the internet and those who seek to manipulate the results, Google has an uphill battle. Bing, and Duck Duck Go, face the same challenges. 

Google isn't focused on improving search

Google is the entrenched behemoth. The company really can't capture more search market share. Google owns the largest mobile operating system, and the largest web browser. Revenue for at least the next several years is likely to increase almost by default.

Google has economic incentives not to worry about being the best search engine. Any publicly traded company with a money-printing machine guaranteed to work predictably for the next few years would focus on reducing cost, and finding the next honey pot.

Google and many other players seem to view AI as the next disruptive tech, and they are all focused on winning the arms race for the best dumb AI. That means testing and training the machine. Google, first with RankBrain, and later with BERT (names of search algorithm updates) incorporates far more machine learning into search than the competition.

Google executives — who again, want to make money — seem willing to turn a dial that lowers search quality in the present for a profitable future. Ideally, without distracting headlines about how they are promoting bleach as a COVID-19 cure. Slightly worse search results may even raise Google's ad revenues.

Not the only issues plaguing Google, but the search results are biased towards larger websites, especially for controversial topics. Even when searching for specific content, like a blog post's title, Google tends not to show small websites.

Needle in a haystack

All search engines have to prioritize ranking multiple web pages with similar keywords somehow. Even the most advanced machine learning is abysmal at processing natural language. With enough competing results, a non-fungible piece of content can be buried.

I ran into this problem looking for a quote from Neil Gaiman for use in a forthcoming article. I vividly remember not just reading but hearing Gaiman read the story of sending his publisher the pitch for American Gods and receiving a mockup of a book cover in response.

Google and Bing both failed me. I searched in vain for a semi-specific bit of content mentioning the words "Neil Gaiman" and "American Gods" and "Email" or possibly "letter" and "publisher" or perhaps "agent" or maybe "editor" and about the word "cover." That sentence's chaotic grammatical mess is a window into the Google search results pages. 

Measured by volume of articles online, American Gods is Gaiman's most successful work. Thousands of pages containing all of those words or synonyms exist. A blog post teasing Robert McGinnis creating artwork for the covers of the novel's paperback rerelease has all these keywords, but is a different story.

I finally asked everyone's favorite oracle, the generative pre-trained transformer AI, ChatGPT. Its answer, Neil Gaiman discussed the idea for American Gods in his blog post "American Gods and the Hugo Awards" which was posted on his website on May 14, 2001. In the post, he mentioned that he had emailed the idea for the book to his publisher.

AI is flawed

Problem is, the blog post seemingly no longer exists on the live web. ChatGPT has yet to become a reliable source for citations. Researching for the same article ChatGPT told me that Billy McFarland was listed on the Forbes '30 Under 30′ for "Technology" in 2017 and also on the list for "Finance" in 2013.

These are both untrue. Barring a conspiracy that Forbes removed the embarrassing Fyre Festival guy from online archives but did not remove Martin Shkreli, ChatGPT is wrong.

After hours of searching, I found the quote. Not from a search engine, or ChatGPT, but from remembering where a I once saw it. Three sentences, from the novel's intro.

And then, during a stopover in Iceland, I stared at a tourist diorama of the travels of Leif Erickson, and it all came together. I wrote a letter to my agent and my editor that explained what the book would be. I wrote "American Gods" at the top of the letter, certain I could come up with a better title. A couple of weeks later, my editor sent me a mock-up of the book cover.

AI-powered search is problematic AF

As mentioned, Google doesn't like SEO and has financial incentive for both slightly worse search results and for prioritizing building Wensleydale over all else. AI in every current iteration, is bad at natural language processing. Google's over-reliance on poor natural language can be seen across the search results pages.

Search "natural remedy" even with quotation marks in Google and you'll see results for "home remedy" and "herbal medicine." Google even boldfaces "home remedies" in the search engine results as if it what was searched, but these are not at all the same thing.

People take many drugs at home, that are not natural. I can buy RAD 140 legally as a research chemical and use it at home to treat a muscle wasting disease (I'm not recommending you do that). But RAD 140 is fully created in a lab. It's a home remedy, not a natural one.

In fairness to Google, Bing and other search engines do treat "natural remedy" and "home remedy" as the same thing. Google is just far and away the worst offender.

Google's attempt at understanding what humans mean from a search is poor. I assume the company is leveraging user behavior, like relative click-thru rate, time on site and return to the search results page as training data for its AI projects. 

Google crowns kings from many versions

Intellectual properties existence contrasts between the physical and digital worlds. Online the enforcement for copyrighted work can be (with some exemptions in law) that the work lives on one webpage, that can be viewed by millions of people at once. Corporeally millions of people reading a book at once, requires millions of copies of the book.

Online, the same article may appear many places, perhaps with slight differences in formatting, title, links on the page, or user comments. The broad internet tends to work best with a sort of syndication model that generates copies of the same content on various platforms. But the article is not entirely different, any more than different printings of a book are different books. 

Cory Doctorow uses POSSE (post own site, share everywhere). As Doctorow said in a Tweet "[POSSE] allows me to maintain control over my work while still meeting my audience where they are, on platforms whose scale makes them hard to rely on." I lifted from his approach when I opted for a syndication heavy-model to distribute my writing.

Google (unlike the non Google internet and the real world) generally wants a single best source of anything raining as canonical. I won't get deep into the technical explanation of canonicalization, but suffice it to say, Google wants a single page to be the source of any given article. Creating problems for searchers.

For searchers, being unable to find the content they are seeking can be an issue. Example: Looking for a specific website using screenshots instead of embedded content, or the option to join the mailing list. Other reasons include the website with the fewest ads, without a paywall, the one that isn't on Medium because you dated a developer at the company and they were an asshole. Someone may also want to see how many places syndicated the article.

Perhaps they may not even care about the article at all and they want to find a user comment. Or just want to find a website again, and are searching for it the best way they know how.

Bing, and Duck Duck Go, are both usable for these purposes. Google is no longer.

The thoughts expressed are those of the author.

Mason Pelt writes about how marketers and technology shape the world. 

Read the original article on Business Insider


from Business Insider https://ift.tt/6yip3cU

No comments

Powered by Blogger.