top of page

AI’s Reality Check: The Importance of Double-Checking Your Digital Assistant

Updated: Jan 16

Approach AI with a blend of curiosity and caution

In the world where AI is becoming increasingly integrated into our daily tools and services, I decided to put these AI systems to the test with a simple yet intriguing challenge: fetching movie-related information. Here’s how they fared.

The setup is simple — I am on an IGN Entertainment page where Jason Statham’s new movie — The Beekeeper is reviewed. I want these AI tools to give me a summary of the page along with the rating and author of the review. Also, the & Rotten Tomato rating, if possible.

First, I used Microsoft Edge’s summarization feature. I asked about the movie rating and its author. The AI confidently responded with a rating of 5 and named Alex McLevy as the author.

Plot twist: the actual rating was 6, and the author was Emma Stefansky.

the page says 6 bright and bold but Copilot AI thinks it’s 5.

The Author’s name is right there on the page — Emma Stefansky but Copilot thinks it’s Alex McLevy.

Next, I turned to Perplexity AI. Here, the AI threw in a surprising twist by claiming hadn’t rated the movie and incorrectly credited the review to the film’s director, David Ayer. A curious mix-up, indeed!

Perplexity claims IGN has not rated the movie

The review of the movie was apparently written by the director himself — David Ayer.

Google Bard, the next contender. It impressively captured the “Okay” sentiment of the review but stumbled on the rating number. However, it scored points by accurately identifying the reviewer’s name.

Bard got the “Okay” part but not the highlighted number 6.

Next, the most popular of them all, the one that started this AI era, OpenAI’s ChatGPT… with the browser plugin… it pretty much got everything right and perfect… it just missed out on rating.

ChatGPT gets everything right!

Next, OpenAI’s ChatGPT 4 model which integrates Dall-E, browsing by Microsoft Bing and analysis(code interpreter). Surprisingly, this version hit a dead end and couldn’t decipher any of the movie details, suggesting that the issue might lie with Bing’s search capabilities.

It sort of went off the rails here… is Bing to blame?

Finally, for a fun comparison, I used The Browser Company’s Arc Browser’s, Ask on Page feature, powered by Anthropic’s Claude AI. Impressively, it nailed both the rating and the author with just a question. However, it couldn’t fetch ratings from IMDb or Rotten Tomatoes, likely due to its lack of internet access.

Both the rating and the reviewer info with just a single question

It didn’t get the IMDb & Rotten Tomato score but that’s a given

While AI systems offer remarkable conveniences, they are far from infallible. My journey through various AI tools revealed a consistent theme: these systems don’t always stick to the facts and often get things wrong.

This realization is a crucial reminder in our tech-savvy era. As we increasingly lean on AI for information and decision-making, we must remain vigilant. These AI tools, from Microsoft Edge’s summarization feature to ChatGPT and others, demonstrated a range of errors — from minor inaccuracies to significant misinformation. Such discrepancies highlight the inherent risks in relying solely on AI for accurate information.

The implications are significant. Whether it’s a movie rating or something more consequential, these tools can mislead, albeit unintentionally. It underscores the importance of cross-verifying information, especially when accuracy is paramount.

As users, we should approach AI with a blend of curiosity and caution, always ready to double-check its outputs. This mindful approach ensures we harness the benefits of AI while safeguarding ourselves against its limitations and inaccuracies.

125 views0 comments


bottom of page