Apple researchers: LLMs don’t do formal reasoning

Big new article from several AI researchers at Apple that’s been making the rounds that concludes that large language models (LLMs) don’t do formal reasoning and can be easily distracted by minor irrelevant information. Pretty dense 20+ page paper but Gary Marcus has an excellent but still in-depth summary on his site.

Marcus’ key takeaway from Apple’s AI research:

“We found no evidence of formal reasoning in language models… Their behavior is better explained by sophisticated pattern matching—so fragile, in fact, that changing names can alter results by ~10%!”

Based on both his expertise and Apple’s findings, Marcus makes a definitive statement about LLM’s reasoning capabilities:

There is just no way can you build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bit of irrelevant info can give you a different answer.

These findings align with my experience. While getting high-quality, accurate responses from LLMs is possible, it often requires careful prompting and iteration. LLMs are excellent tools that I use every day and am actively helping build AI products at the day job, but like most tools they have their limitations. What’s particularly noteworthy to me is that these same limitations were documented back in 2019, and while LLMs have made remarkable progress in many areas, their fundamental reasoning capabilities haven’t improved at nearly the same pace.

So what does all this mean? Does it mean AI tools are dead? Not at all. I am a big proponent of human-in-the-loop AI solutions that leverage the strengths of AI and iteratively improve with human review and intervention. With human oversight, model monitoring, and great AI product designers (of course), we can build powerful AI tools that help us do the work we all need to get done everyday even if those tools can’t do formal reasoning.

The Disciplines Companies Need to Get the Most Out of Gen AI (cough, design, cough)

Want to not just throw money away investing in Gen AI? HBR article on The 6 Disciplines Companies Need to Get the Most Out of Gen AI says your company should focus on behavioral change, controlled experimentation, measurement of business value, data management, human capital development, and systems thinking.

Know what most of those sound like to me? Design. In other words, your company should invest in design.

Thoughts on How to Speak Machine

How to Speak MachinePicked up John Maeda’s How to Speak Machine: Computational Thinking for the Rest of Us and read through it pretty quickly, but I’m not sure it was a worthwhile read for me and I don’t think I’d recommend it for anyone already familiar with modern software design.

Maeda spends the majority of the book’s 200 pages explaining the basics and extolling the value of UX research, product design, agile delivery, and iterative development and comparatively little on the actual premise of the book.

Both Amazon and Maeda frame the book as a way for designers to understand “the complex world of AI and machine learning”, but while it hints at AI’s transformative potential these mentions are more speculative than practical. There really isn’t any actionable insights or detailed explorations of how AI can concretely impact design work today.

Continue reading “Thoughts on How to Speak Machine”

Zelda to star in new The Legend of Zelda: Echoes of Wisdom game

Finally, Zelda herself will be the main protagonist in a mainline game in the series named after her. (Not counting CD-i games.) Yesterday, Nintendo surprise announced The Legend of Zelda: Echoes of Wisdom coming already this September to Switch:

Looks pretty darn great. Like a cross between the Link’s Awakening remake with some of the open ended puzzle solving ideas from Tears of the Kingdom.

While I like that the gameplay isn’t just palette swapped Zelda for Link hack and slash, I do hope that Zelda is actually able to occasionally solve some of her problems by murdering a bunch of moblins with a sword. Pretty excited for this one, might be one last great hurrah for the Switch before the next console comes out in 2025.

Use AI to highlight and summarize podcasts with Snipd

Been enjoying using Snipd lately for capturing podcast notes/highlights.

Anytime there’s something I want to capture I mark it and their AI generates transcripts & highlights. Those highlights are then automatically exported to my PKM (Obsidian) making it easy find later for future reference. Worth checking out if you listen to a lot of podcasts and especially if you’re already using some sort of system to save the highlights to.

Check out this example of a snippet from a recent episode of Cal Newport’s podcast:

I still use Overcast for all my tech/sports/entertainment podcast listening, but having a separate app for my business/design/productivity shows that helps me remember what I learn has been really useful.

In the path of totality

Wasn’t sure what all the hubbub was about beforehand, but super glad I took the time to go outside and observe what turned out to be a pretty incredible sight. The actual 3 minutes (minus the 2 minutes blocked by the clouds) during total occlusion were pretty remarkable.

Extra special getting to see it next door to my kids’ elementary school to hear all the chatter and screaming and squealing they were doing.

Thoughts on Deep Work

Deep WorkAs a person who is constantly multi-tasking, I’m worried I’ve broken my brain. Well at least that’s my takeaway after reading Cal Newport’s book Deep Work about the importance of deep, focused work at a time when it’s easier than ever to be in a constant state of distraction.

I’ve been listening to Newport’s podcast for a bit and reading his newsletter that often touches on similar topics, but Deep Work is where he first outlined many of his ideas on focus, distraction, and the huge opportunity for knowledge workers who turn the other direction to get ahead in their lives and careers:

The Deep Work Hypothesis: The ability to perform deep work is becoming increasingly rare at exactly the same time it is becoming increasingly valuable in our economy. As a consequence, the few who cultivate this skill, and then make it the core of their working life, will thrive. (Page 14)

The book is structured around the following main ideas:

  1. The Value of Deep Work: Newport posits that deep work is crucial for mastering complicated information and producing better results in less time. This skill is becoming scarcer in our increasingly distracted world but is more valuable in the knowledge economy.
  2. Working Deeply: Newport offers strategies for cultivating a deep work habit. This includes working with greater intensity and creating rituals and routines that foster the ability to concentrate deeply. Key suggestions include minimizing distractions, scheduling deep work blocks, and adopting a work philosophy that accommodates deep work. (More on those philosophies in a bit.)
  3. Enhancing Focus: In the last several sections on the book, Newport outlines several strategies to cultivate deep work. He emphasizes the importance of embracing boredom to train the brain for focus, advises limiting social media to minimize distractions, and recommends reducing shallow tasks through delegation and careful scheduling. Together, these practices aim to create a productive environment where deep, concentrated work prevails, ensuring that valuable, skill-enhancing tasks receive the attention they deserve.

As a person already bought into the idea that deep work is valuable and that meetings and emails are the devil, I got the most value out of the book’s actionable advice and strategies that demonstrate how I can implement deep work principles to improve my productivity, satisfaction, and success. Continue reading “Thoughts on Deep Work”

Texas WR Xavier Worthy is fastest man in NFL Combine history

Texas Longhorns wide receiver Xavier Worthy is fastest man in NFL Combine history after running a record 4.21-second 40-yard dash yesterday.

It’s the way the crowd gasps and then cheers when the original 4.22 announced time that give me goosebumps. So cool. Just a human doing something better than any other human before that’s all. Watch the NFL Network simulcam version with Worthy’s run overlaid with John Ross’ former record run below to see Worthy’s incredible initial acceleration:

Truly insane speed.