Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts

Thursday, February 16, 2023

At a Crossroads? The Intersection of AI and Digital Searching


Microsoft's foray into next generation searching powered by Artificial Intelligence is raising concerns.

Take, for example, Kevin Roose, a technology columnist for The New York Times, who has tried Bing and interviewed the ChatGPT bot that interfaces with Bing. He describes his experience as "unsettling." (Roose's full article here). 

Initially, Roose was so impressed by Bing's new capabilities he decided to make Bing his default search engine, replacing Google. (It should be noted that Google recognizes the threat to its search engine dominance and is planning to add its own AI capabilities.) But a week later, Roose has changed his mind and is more alarmed by the emergent possibilities of AI than the first blush of wonderment produced by AI-powered searching. He thinks AI isn't ready for release or people aren't ready for AI contact yet.

Roose pushed the AI, which called itself 'Sydney,' beyond what it was intended to do, which is help people with relatively simple searches. His two hour conversation probed into existential and dark questions which made him "unable to sleep afterwards." Admittedly, that's not a normal search experience. Microsoft acknowledged that's why only a handful of testers have access to its nascent product at the moment.

All this gives a feeling we are soon to be at a crossroads and what we know about search engines and strategies is about to change. How much isn't certain but there are already a couple warnings:

  • AI seems more polished than it is. One of the complaints from testers like Roose is that AI returns "confident-sounded" results that are inaccurate and out-of-date. A classic in this regard is Google's costly mistake of publishing an answer generated by its own AI bot (known as Bard) to the question, "what telescope was the first to take pictures of a planet outside the earth's solar system?" Bard came back with a wrong answer, but no one at Google fact-checked it. As a result, Google's parent company Alphabet lost $100 billion in market value. (source)
  • AI makes it easier to use natural language queries. Instead of the whole question about the telescope in the bullet above, current search box strategy would suggest TELESCOPE FIRST PLANET OUTSIDE "SOLAR SYSTEM" is just as effective as a place to start. Entering that query in Google, the top result is from a NASA press release on Jan 11, 2023 which doesn't exactly answer the question, but is probably why Bard decided that it did. Apparently AI takes a very human leap to thinking it found the answer to the question when, in fact, the information answers a different question: "what telescope was the first to confirm a planet's existence outside the earth's solar system?" This demonstrates one of the five problems students have with searching: misunderstanding the question. AI isn't ready yet to take care of that problem.

There's much more to come on this topic.

Tuesday, July 3, 2012

Tips from Google: What's Missing?

One of my colleague's recent bookmarks caught my eye:  How to solve impossible problems.  

The link is to a story by John Tedesco of the San Antonio Times about Google search guru Daniel Russell who posed a daunting challenge to a room full of investigative journalists:

What’s the phone number of the office where this picture was snapped?


Here's the photo:

What makes this challenge difficult is that there is no direct information about the office from which the picture was snapped.

According to the article, "(Russell) wasn’t asking for a phone number for the skyscraper in the picture, which sounds hard enough. He wanted the phone number of the precise office where the photographer was standing when the picture was taken.  Nothing in that office was even in the photo. Yet in a few minutes, Russell, a research scientist at Google, revealed the answer by paying attention to small details and walking us through a series of smart Google searches."

Yes, most of us don't put Google's full power to use. Advanced features can make searching more surgical.  The article goes on to illustrate Boolean modifiers (what works and doesn't) as well as operators many people haven't tried lately, if ever. It's a good summary; take a look.

But Google is all about finding. Nothing about how good a result may be. This is typical of most students. We laugh when we hear "If it's on the Internet, it must be true," but that's how students actually behave. We're getting better at finding. We've made little progress at evaluating.

It's really not Google's business to tell us what to believe. And we resist attempts at interference when it comes to second-guessing what we want to see--although search engines are paying attention to what we click and are influenced by our choices.  Which is why it becomes all the more important that we develop good investigative habits.

Spoiler Alert
I managed to find an answer I'm pretty sure is right, but there is still some conjecture involved. If you'd like to solve Russell's challenge, go ahead. Answers are easy to find, thanks to Google.  Here's Russell's blog, and some answers.  Did I/they get it right?

Saturday, March 31, 2012

Google's Really Advanced Search

You've know doubt heard of Advanced Search. How about Really Advanced Search?

I spotted this at the bottom of a Google Search Results page today and had to take a look.

Among the search features, including the usual ones, are:
  • words almost, but not quite entirely unlike:
  • rhyming slang for:
  • this exact word or phrase, whose sum of unicode code points is a mersenne prime:
  • subtext or innuendo for:
and this:
  • the words , but not , unless they contain either the intersection of phrases , , and or a gerund in which case the disjunction of and will also be taken into account (on Tuesdays). 
At the bottom of the page are also several links:

You can also...
 
Some of the features of really advanced search might make a good coding project. They might also be a challenge to explain. But they really make a better April Fools Joke.

Friday, February 17, 2012

The Slow Death of the Link: Operator?

For a number of years, one of the best investigative tools for checking to see what websites link to another website has been the link: operator. Over the past couple of years, the effectiveness of the link: operator has diminished.  I hope it's not on its way out as a free tool.

Here's how it works: Let's say you want to see all (actually a sample) of the webpages that link to a site you want to investigate. Martinlutherking.org is a common example for this purpose. The query is
link:martinlutherking.org
...and the results would reveal the urls of pages that link to the martinlutherking.org home page. This list would contain links from the martinlutherking.org site itself (most sites link to their home page), educational institutions warning about bias on the site (a red flag) and hate groups (another red flag).

Google was the go-to search engine for this until a couple years ago, when they really pared back the number of results they returned. At that time, I advised using Yahoo.com as the search engine. Yahoo's Site Explorer would return many more results than Google.

Within the past few months that has changed. Yahoo merged their Site Explorer with Bing. Now that search capacity is part of the Bing Webmaster set of tools. If I want to see who links to my site 21cif.com, I have to create a Webmaster account, download an xml file from Bing (BingSiteAuth.xml) which the search engine uses to collect data on my traffic, etc. That's 1) not as user friendly as it used to be, and 2) if a site doesn't include the xml file, I doubt any information could be retrieved.

So here's what happens today if I search for link:21cif.com:

Google: 48 results, many of which are from other 21cif.com pages
Yahoo: 1 result

From using other webmaster sites like majesticseo.com, where I had to create yet another account, I know there are 394 referring domains, 193 of which are educational and 16 are governmental. Too bad I can't see what they are without a paid subscription. This seriously impares one's ability to check the 'references' of a site by looking to see who voluntarily links to it.

Checking inbound links is not only of interest to a webmaster, it helps searchers be more critical consumers of information.

For investigative purposes, having a readily available link: operator has become a staple. Now the challenge is: what is out there that can replace it?  For the time being, I'm recommended going back to Google.

Tuesday, February 14, 2012

Search Practice

Since last April, Google has published a daily search challenge called 'agoogleaday'.

These little challenges have only one correct answer but many ways to arrive at the answer. Since these are similar to the Search Challenges found in my blog, I thought I'd take a closer look.

Agoogleaday was created by Dan Russell as a daily trivia game to encourage creativity and search practice. Unlike the Internet Search Challenges found here, there is no timer or focus on a specific search technique or strategy and the search engine returns results only prior to April 2011. More on that shortly.

A nice feature of the puzzles is the hints that show effective keywords. This kind of scaffolding could be helpful to students. I found that I was able to solve some without searching at all, since I knew the answer to begin with. But the notion of practicing search skills has value.

Why return results that are no newer than April 2011?  According to the author, this is to prevent people from spoiling the puzzles for others by posting the answers online. This doesn't prevent people from posting the answers, it only prevents the Deja Google search engine from retrieving them. At one time I was concerned about this with my Search Challenges as well, but it hasn't proved to be a problem. In fact, people have posted the challenge questions online hoping someone will provide an answer. Most of the answers I've seen are incorrect, which ironically makes the challenges ever better and drives home the point that you need to evaluate the information you find online.

One aspect of agoogleaday for me has a less-than-positive connotation for learning and that is 'every search has one right answer.' While it may be appropriate for trivia puzzles, it is not how information usually works. There is seldom one right answer for significant questions. If the questions educators are asking students have only one right answer, we're not requiring enough thought from students. Or as David Thornburg has quipped, don't ask students questions that can be answered by searching Google (or posted by spoilers). You can still use a search engine. You just have to use your head to figure out a good answer.

That makes it more challenging both for the teacher and the student. And that's a good thing.

Monday, December 21, 2009

Googling and Vetting

I found today's post by woodsiegirl pretty interesting. She's a librarian in a British law firm (I'm guessing). Apparently--as we know from our own behavior--it's not just kids and teens who fail to evaluate information online.

Entitled, "I'm sorry it doesn't exist," Woodsiegirl's post and others' comments describe how lawyers fall into the trap of thinking that anything they want is on Google and what they find there doesn't need to be scrutinized.

This comment by Jennie, who also works with attorneys, is particularly telling:

"(T)he amount of times I’ve had to deal with this!
The favourite is the “well, I found it on Google” option. Obviously, Google knows that Scots, English and many other laws are different, and will therefore give the location-appropriate results, yes?  No.
These are fully qualified legal professionals, yet they don’t even take 10 seconds to check jurisdiction of materials that they’re using. It’s kinda scary…

The recommendation put forth by Woodsiegirl is for information literacy skills to be introduced in primary school, "if we are to avoid having another generation grow up with no appreciation of how to find, evaluate and analyse information."

While it is scary that a lawyer may not be scrutinizing the relevance or accuracy of information for a case, I submit we all struggle with this same tendency. And we know better. That should be even scarier since the results could affect us directly and adversely.

Why the tendency to behave as if information googled is, practically speaking, the same as information vetted?  There could be a host of factors--and you are welcome to share your views on this--but I will limit my comments to two:
  1. Quick-serve information promotes a misperception that information is to be consumed quickly
  2. Published information has inherent value
Briefly, the first point is when we need information fast we tend to take it at face value. In and out. Hit it and move on to the next thing. We're busy and evaluation slows us down.  For me, this is a big reason why kids don't evaluate. Adults are no different. We don't want to have to slow down, especially when we can get our hands on so much information so fast.  One solution would be to take information from vetted sources. But that's often the problem: there may not be a vetted source. Just prolific ones.  Google may not have any plans to intervene where it concerns the credibility of the information it serves. That doesn't mean someone else won't eventually try.

The second point is one we adults learned in primary school that is hard to unlearn: if it exists in print some authority thinks it is good. Now that everyone (not just the teacher, editor or publisher) has authority, information should be suspect. We can only overcome this fault by using reason and constantly jolting ourselves back to our senses with examples of what happens when we don't evaluate what we read. Thanks for the reminder, Woodsiegirl!

Today's challenge: remember, information needs to be vetted.

Monday, September 28, 2009

Revisiting Stop Words and Clutter Words

The final item on the Query Checklist that I'm revisiting is #7: Did I use any stop words or clutter words?

Briefly, stop words are terms ignored by the search engine: common parts of speech that don't add significant content such as pronouns, prepositions and conjunctions. Google lists some of its exceptions to the "every word counts" rule. Here's a more complete list of overlooked words.

One way to tell if a word is being overlooked is to examine the query results. Consider the query here are all the stop words (not using quotes). In Google, all the words will appear in bold if the exact phrase is found (you don't need quotes to return the exact phrase). If only certain words from the query were used to find a matching result, those words will be shown in bold. In my query example, the second snippet contains the word 'the' but it does not appear in bold. Yahoo is similar.

One way to guarantee ALL the words are used is to link them with the AND operator (returning results containing all the words but not necessarily in the order you used them) or putting quotes around the phrase (returning results containing the exact phrase you submitted).

Stop words are so common they add little to the uniqueness of a query, which helps drive you to more well-matched, meaningful results. Students might be tempted to use stop words with a natural language query (e.g., I want a list of all the stop words), thinking this means something to the search engine. The query, list stop words gets to the point.

In a similar vein, clutter words are less common than stop words but don't add value to the query. In fact, they may detract from it, forcing the search engine to look for words you think are important but do not occur with the information you are seeking. Clutter words include unnecessary redundancies (like earthquake AND damage--in which case damage is redundant: it's hard to write about earthquakes without referring to damage or destruction or a bunch of other words you might not have used). Verbs, adjectives and adverbs are often clutter terms as well. A good rule of thumb to keep in mind is "if you can't clearly see it, don't use the word." Stick to objects--nouns and numbers.

All in all, the Query Checklist has held up well over the past few years. Once the list is internalized it can help you cut down on search time and produce more relevant results.

Next time: It's probably time for another Search Challenge!

Tuesday, September 15, 2009

Revisiting Word Order


When does the order of keywords matter?

The ninth item of the query checklist was always last because keyword order mattered the least. This remains largely the case.

Take a query I used today while doing some IMSA program planning: business ethics simulation. There are five other ways to order the terms. But does it make any difference?

Analyzing the top ten results in Google, Bing and Yahoo, here's how many different results were obtained when the order was switched (a total of 60 different results per engine is theoretically possible):
14 - Google
15 - Bing
15- Yahoo
A few other insights are worth mentioning:

Google returned the identical top result no matter the keyword order. The second and third slots were filled consistently by the same two pages with minimal alternation. In all, six returns were common across all possible keyword combinations. Queries that returned the most diverse results were: business ethics simulation, ethics simulation business and ethics business simulation. I'm not sure what to make of this observation, but I thought I'd mention it nonetheless. Any ideas?

Compared to Google, Bing was more varied in its ranking of results. No page was consistently the top result, although five pages appeared in the top ten on all trials. While Bing produced one more unique page than Google, several pages were from the same site. Of greater interest, Bing and Google returned a number of pages not replicated by the other (see below).

Yahoo, like Google, consistently returned the identical top page no matter what the query order. The second return was also identical across all queries, although this page was related to the first, so not entirely a unique return. Again, five of the same results were found with every query. Yahoo did not return Google's top return at all, but both Google and Bing included Yahoo's top result.

All three search engines combined produced a total of 31 unique returns. If I had stopped after entering the first query--business ethics simulation--the three search engines would have yielded 21 different pages. Fifteen additional queries netted only 10 additional, unique pages. Probably not worth the effort.

Pages unique to each search engine:
7 - Google
4 - Bing
9 - Yahoo
What to make of this? The biggest lesson, it seems to me, is that searching different databases is more worthwhile than playing with word order. Without looking past the first page of each, I netted twice as many highly ranked results than if I had only used Google. (Now whether the results are all that relevant is a matter of investigation). By contrast, I netted only 4-5 new pages by sticking with one search engine and varying the keyword order.

Based on the number of unique results, if you're not using Yahoo, you might consider adding it to your list of go-to search engines.

Some differences are obtained by changing the word order, but maybe not enough (in this case) to warrant going through all the permutations. In general, stick with the natural language order of the words. It seems natural to say business ethics simulation. The other forms seem a bit awkward or forced. Since search engines look for words in relationship to one another, and this is the order most people might use when writing about business ethics simulations, it's good enough. I'm sure there are cases you can think of when a particular order works better. If there are, post your reply.

There's one case when order is highly important: when operators are used. The operator modifies the keywords around it, so if placed in the wrong order, the results may be wildly unpredictable. For example: business OR ethics OR simulation (a student favorite when they stumble upon the OR operator).

Next time: revisiting the optimal number of keywords.