AI’s Reality Check: The Importance of Double-Checking Your Digital Assistant

Vinit Nair
Jan 12, 2024
2 min read

Updated: Jan 16, 2024

Approach AI with a blend of curiosity and caution

In the world where AI is becoming increasingly integrated into our daily tools and services, I decided to put these AI systems to the test with a simple yet intriguing challenge: fetching movie-related information. Here’s how they fared.

The setup is simple — I am on an IGN Entertainment page where Jason Statham’s new movie — The Beekeeper is reviewed. I want these AI tools to give me a summary of the page along with the rating and author of the review. Also, the IMDb.com & Rotten Tomato rating, if possible.

First, I used Microsoft Edge’s summarization feature. I asked about the movie rating and its author. The AI confidently responded with a rating of 5 and named Alex McLevy as the author.

Plot twist: the actual rating was 6, and the author was Emma Stefansky.

ree — the page says 6 bright and bold but Copilot AI thinks it’s 5.

ree — The Author’s name is right there on the page — Emma Stefansky but Copilot thinks it’s Alex McLevy.

Next, I turned to Perplexity AI. Here, the AI threw in a surprising twist by claiming IGN.com hadn’t rated the movie and incorrectly credited the review to the film’s director, David Ayer. A curious mix-up, indeed!

ree — Perplexity claims IGN has not rated the movie

ree — The review of the movie was apparently written by the director himself — David Ayer.

Google Bard, the next contender. It impressively captured the “Okay” sentiment of the review but stumbled on the rating number. However, it scored points by accurately identifying the reviewer’s name.

ree — Bard got the “Okay” part but not the highlighted number 6.

Next, the most popular of them all, the one that started this AI era, OpenAI’s ChatGPT… with the browser plugin… it pretty much got everything right and perfect… it just missed out on IMDb.com rating.

Next, OpenAI’s ChatGPT 4 model which integrates Dall-E, browsing by Microsoft Bing and analysis(code interpreter). Surprisingly, this version hit a dead end and couldn’t decipher any of the movie details, suggesting that the issue might lie with Bing’s search capabilities.

ree — It sort of went off the rails here… is Bing to blame?

Finally, for a fun comparison, I used The Browser Company’s Arc Browser’s, Ask on Page feature, powered by Anthropic’s Claude AI. Impressively, it nailed both the rating and the author with just a question. However, it couldn’t fetch ratings from IMDb or Rotten Tomatoes, likely due to its lack of internet access.

ree — Both the rating and the reviewer info with just a single question

ree — It didn’t get the IMDb & Rotten Tomato score but that’s a given

While AI systems offer remarkable conveniences, they are far from infallible. My journey through various AI tools revealed a consistent theme: these systems don’t always stick to the facts and often get things wrong.

This realization is a crucial reminder in our tech-savvy era. As we increasingly lean on AI for information and decision-making, we must remain vigilant. These AI tools, from Microsoft Edge’s summarization feature to ChatGPT and others, demonstrated a range of errors — from minor inaccuracies to significant misinformation. Such discrepancies highlight the inherent risks in relying solely on AI for accurate information.

The implications are significant. Whether it’s a movie rating or something more consequential, these tools can mislead, albeit unintentionally. It underscores the importance of cross-verifying information, especially when accuracy is paramount.

As users, we should approach AI with a blend of curiosity and caution, always ready to double-check its outputs. This mindful approach ensures we harness the benefits of AI while safeguarding ourselves against its limitations and inaccuracies.

Vinit Nair

AI’s Reality Check: The Importance of Double-Checking Your Digital Assistant

Approach AI with a blend of curiosity and caution

Recent Posts

Comments

Get practical growth tactics and insight — direct to your inbox. No spam, only value.