|
In this newsletter:
- Claude Opus 4.7 checks its own work before responding
- Perplexity personal computer automates your to-do list
- Canva AI assistant creates designs using tools and layers
Plus, you’ll find new AI tools and this week’s top AI news headlines!
⚡ Anthropic's Claude Opus 4.7 Checks Its Own Work Before Responding
Anthropic released Claude Opus 4.7, its most powerful publicly available model, outperforming OpenAI's GPT-5.4 and Google's Gemini 3.1 Pro on key benchmarks, including agentic coding, tool use, computer use, and financial analysis.
What's new:
- Available today across Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry
- API pricing unchanged at $5/$25 per million tokens
- Processes images up to 2,576 pixels on the longest edge (roughly 3.75 megapixels), a threefold increase from previous versions
- New "effort" parameter with xhigh (extra high) setting between high and max for granular control over depth of reasoning
- Task budgets in public beta let developers set a hard ceiling on token spend for autonomous agents
- Updated tokenizer improves text processing efficiency but can increase token count of certain inputs by 1.0–1.35x
- Claude Code gets a new /ultrareview command that flags subtle design flaws and logic gaps like a senior human reviewer
- Auto mode extended to Max plan users for autonomous decisions without constant permission prompts
Performance highlights:
- GDPVal-AA knowledge work: Elo score of 1753, beating GPT-5.4 (1674) and Gemini 3.1 Pro (1314)
- SWE-bench Pro agentic coding: 64.3% task resolution vs 53.4% for Opus 4.6
- GPQA Diamond graduate-level reasoning: 94.2%
- arXiv visual reasoning with tools: 91.0% vs 84.7% for Opus 4.6
- XBOW visual-acuity tests: 98.5% vs 54.5% previously
- BigLaw Bench: 90.9%
- CyberGym vulnerability reproduction: 73.1%
- GPT-5.4 still leads in agentic search (89.3% vs 79.3%), multilingual Q&A, and raw terminal-based coding
How it works differently:
- Devises own verification steps before reporting task complete
- Internal test example: built a Rust-based text-to-speech engine from scratch, then independently fed generated audio through a separate speech recognizer to verify output against a Python reference
- Follows instructions literally, requiring legacy prompt libraries to be re-tuned
- Older models "read between the lines," Opus 4.7 executes the exact text provided
Enterprise feedback:
- Intuit VP of Technology: the ability to "catch its own logical faults during planning phase" is a game-changer for velocity
- Replit President: higher quality at lower cost for log analysis and bug hunting, "feels like a better coworker."
- Cognition CEO: can work coherently "for hours" and pushes through difficult problems that cause models to stall
- Notion AI Lead: 14% improvement in multi-step workflows, 66% reduction in tool-calling errors, feels like "true teammate"
- Factory Droids: carries work through to validation steps rather than "stopping halfway."
- Harvey, Head of Applied Research: "noticeably smarter handling of ambiguous document editing tasks."
Cybersecurity controls:
- Systems designed to detect and block requests suggesting high-risk cyberattacks like automated vulnerability exploitation
- New Cyber Verification Program allows vulnerability researchers, penetration testers, and red-teamers to apply for access for defensive purposes
- More powerful Mythos model remains restricted to a small number of external enterprise partners for cybersecurity testing through Project Glasswing
Why it matters:
The razor-thin margins between Opus 4.7, GPT-5.4, and Gemini 3.1 Pro signal that the frontier model race has plateaued, with improvements measured in single percentage points, forcing companies to compete on reliability rather than raw intelligence.
Self-verification addresses the hallucination problem that kept enterprises from trusting AI for autonomous work, shifting AI from a creative assistant to a tool that can run unsupervised.
👀 Read more about Claude’s Opus 4.7 updates!
💻 Perplexity's Personal Computer Automates Your To-Do List
Perplexity released Personal Computer, bringing the multi-model orchestration of Perplexity Computer to your local machine to work across files, native apps, connectors, and the web.
What's new:
- Rolling out today to Perplexity Max subscribers, prioritizing users on the waitlist
- Works across local files, native applications, connectors, and web in one system
- Available for Mac, works best on Mac mini for 24/7 availability
- Press both CMD keys in Notes to activate the Personal Computer
- Reads your Notes to-do list, reasons how to accomplish each task, works across local files, iMessage, email, connected apps, and web to complete it
- Sorts messy Downloads folder into clear project folders with sensible names and structure
- Compares local files against information on the web, uses both to help make decisions or complete tasks
- Voice activation to carry out actions on Mac
- Initiates tasks from phone or manages on the go
Security and control:
- Files created in a secure sandbox
- Actions are auditable and reversible
- Designed to keep the user in the loop on sensitive actions
- Can see what it's doing, step in when needed, stay in control of important decisions
Why it matters:
Perplexity is turning your Mac from a tool you control manually into an AI assistant that handles tasks on its own across your files, apps, and the web, competing directly with Apple's upcoming AI features.
The Mac mini setup means it runs 24/7 and completes work while you're away, but giving AI access to your messages, emails, and personal files raises the question of whether people will trust a third-party company with that level of access to their private information.
👀 Read more about Perplexity's Personal Computer features!
🎨 Canva's AI Assistant Now Creates Designs by Calling Tools and Using Layers
Canva upgraded its AI assistant to let users describe what they want, then the bot calls required tools and creates editable layered designs with multiple options.
What's new:
- Launching in research preview this week, rolling out to all users in the coming weeks
- Uses Canva's AI model to create editable designs from text prompts
- The bot calls required tools automatically and provides multiple design options
- Uses layers to make designs, giving flexibility to tweak different aspects
- New integrations with Slack, Gmail, Google Drive, Calendar, and Zoom let the AI bot build context by reading emails, conversations, files, and meeting data
- Web research skills let the AI bot browse the internet to complete tasks
- Scheduling feature for repeatable tasks to run in the background (creates a draft for review and posting)
- AI code generator now imports HTML
- Text prompts to describe spreadsheets you want to generate
Performance improvements:
- Lucid Origin image-generation model now 5x faster and 30x cheaper
- 12V image-to-video model now 7x faster and 17x cheaper
Competition:
- Adobe launched the Firefly AI assistant this week, which uses the company's apps to do tasks
- Figma added AI agent support last month with the MCP server
- Canva integrates with Anthropic, Google, and OpenAI for agentic workflows
Why it matters:
Canva is positioning itself as the final destination for editing and publishing AI-generated content, competing with Adobe and Figma for control of the design workflow as AI tools proliferate.
The Slack, Gmail, and Google Drive integrations turn Canva from a design tool into a context-aware assistant that reads your emails and meetings to create relevant content, though giving a design platform access to your communications raises privacy questions about how that data gets used.
👀 Read more about Canva's AI Assistant upgrades!
My Latest LinkedIn & X/Twitter Posts:
- How top operators are finally getting the recognition they deserve (view post)
- How to build full AI agents in n8n with one prompt (view post)
- 15 ChatGPT prompts to structure research from idea to output (view post)
- 5 ChatGPT prompt frameworks to get better results (view post)
- 15 powerful ChatGPT prompts to supercharge your workflow (view post)
In partnership with Kit:
As a creator, your time should be spent doing what you love, not juggling a dozen tools just to run your business.
Kit gives you everything you need in one place:
✅ Build and grow your email list (I use Kit for my newsletter) ✅ Easily monetize with paid newsletters and digital products ✅ Automate your emails with triggers and custom workflows ✅ Track what’s working and optimize with powerful insights
It’s not just email. It’s your entire creator business, simplified.
Join a thriving community of successful creators.
Use Kit for Free — and Start Building Smarter
New AI Tools to Boost Your Productivity:
- VM0: AI teammate that works across tools for research, outreach, and reports.
- Metadata Reactor: Generates YouTube titles, tags, and descriptions from thumbnails.
- Waikay: Analyzes how AI tools represent your brand and finds SEO gaps.
- Geekflare AI Chat: Accesses top AI models in one shared workspace.
- Calyo: Finds matching creators and automates outreach campaigns.
- WhyIQ: Simulates visitors to find landing page conversion issues.
- Novella: Sidebar with AI tools for video creation workflows.
- DetectMyAI: Detects AI-written and AI-paraphrased text.
- Aisa: Certifies AI skills through a short conversation test.
- SayTXT: Converts books, PDFs, and articles into audio.
- APIClaw: Provides structured Amazon product data for AI agents.
- Stageflow: Generates Etsy product photos from uploads.
- Fello AI: Mobile AI chatbot app with multiple model access.
- AI Vocal Remover: Removes vocals or instrumentals from audio tracks.
- Claude: Opus 4.7 model for advanced reasoning, coding, and long-context work.
This Week's Top AI News Headlines:
- Anthropic Launches Claude Design, an AI Prototyping Tool that Turns Text Prompts into App Mockups and Challenges Figma’s Design Workflow (View Article)
- Salesforce Launches Headless 360 to Turn its Customer Data Platform into Backend Infrastructure for Autonomous AI Agents (View Article)
- OpenAI Debuts GPT-Rosalind, a Limited-Access Life Sciences AI Model, and Expands Codex GitHub Plugin With Broader Developer Integrations (View Article)
- Luma Launches AI-Powered Production Studio with Faith-Based Media Company Wonder Project to Create Film and TV Content Faster (View Article)
- GPT-5.4 Cyber Signals OpenAI’s Push into Next-Generation Security AI Designed to Detect Threats and Strengthen Cyber Defense (View Article)
- Google Adds Nano Banana-Powered Image Generation to Gemini’s Personal Intelligence Features, Expanding Personalized AI Creativity Tools (View Article)
- Physical Intelligence, a Robotics Startup, Says its New AI Robot Brain Can Solve Tasks it Was Never Specifically Trained to Perform (View Article)
- OpenAI Upgrades Codex With Desktop Control, Letting its AI Coding Agent Use Apps on Your Computer to Challenge Anthropic’s Claude Code (View Article)
- Anthropic’s Chief Product Officer Leaves Figma Board After Reports He Plans to Launch Claude Design Tool that Could Compete With Figma (View Article)
- Google Adds Side-by-Side Web Browsing to AI Mode, Letting Users Explore Search Results While Chatting with Gemini AI (View Article)
- Roblox Expands its AI Assistant With Agentic Tools that Help Creators Plan, Build, and Test Games Automatically (View Article)
- Google Blocked More Ads but Banned Fewer Advertisers as AI Reshaped How the Company Detects and Enforces Policy Violations (View Article)
- Runway CEO Says AI Could Help Hollywood Make 50 Lower-Cost Films Instead of Betting on One $100 Million Blockbuster (View Article)
- Meta Raises Quest 3 and Quest 3S Prices Due to RAM Shortage, Increasing Costs of its VR Headsets Amid Supply Constraints (View Article)
- “Tokenmaxxing” Trend is Making Developers Less Productive as Excessive AI Prompting Creates More Work Than it Saves (View Article)
Work With Me:
If you enjoy this newsletter, please forward it to your friends and colleagues.
Follow me on LinkedIn and X/Twitter to see my latest posts.
Have a wonderful week!
Andrew Bolis
|