We Can Not Screw This Up

In partnership with

Good morning,

Happy New Year.

May your job stay safe from AI disruption! Although I’m about to explain to you while this is likely not going to be the case.

If you’re following artificial intelligence - one of the most interesting websites you should have bookmarked is Metr.org. METR stands for Model Evaluation and Threat Research and in its own words is a “a research nonprofit which evaluates frontier AI models to help companies and wider society understand AI capabilities and what risks they pose.”

In a field where you get bombarded by hyperboly and where it’s difficult to track what is real and what not - METR is a welcome oasis of truth.

However, you might not like the truth - and maybe you cannot handle the truth.

Because if we delve into it - we’re in for quite the ride. METR is focused on AI Risk - can we predict what is going to happen and is it safe for us.

Look at the picture below : “The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years.”

That is a terrifyingly beautiful slope.

If you dive deep into the research paper accompanying this study you’ll learn that the 7 months is a conservative estimate and the timelines are shortening.

The conservative guess - when this continues like this - is that by 2032 a single AI task will execute the equivalent of 1212 years of human work.

It’s quite difficult to wrap your head around this but this would mean that tasks like re-deriving modern physics or for example a game like - let’s say- World of Warcraft from scratch would take only a few minutes.

It is very difficult to imagine a world where this is possible. The potential for good (or evil for that matter) is unbounded. But that is where we are headed.

The main challenge will be is that we will have to power this machine god. But this will not be a problem because we will be given the solution.

I have seen people advocating for creating a so-called “Dyson Sphere” around earth. This is already happening with SpaceX and Google planning to build solar-powered data-centers in orbit around the sun.

With a Dyson Sphere you could move all energy production off-planet - turning Earth back into a lush, green oasis. Earth becomes a green, quiet resort. A rewilded Garden of Eden where the air is clean and the living is easy.

But believe it or not that won’t be the biggest challenge.

The human soul craves status, curiosity, connection and even struggle.

If a super-intelligence would manage the world perfectly we’ll become nothing more than prize exhibits in a planetary zoo.

So as the hyperactive chimps we are - we’ll soon find another enemy to fight against and expand outward into the galaxy.

I took a bit of a leap there, I admit. But 2032 is not that far away. And I’m worried. Not about my job or even your job.

I’m worried we are going to screw up the transition into what is our only chance at global prosperity. Imagine the social unrest that is coming our way if indeed this is the path going forward.

I cannot see it not getting very ugly before it gets beautiful.

I can see why tech billionaires are building bunkers. There is a scenario this devolves into a “let’s burn all data centers”-movement.

We’re going to need excellent leaders.

I’m not that hopeful about it.

Welcome to the Blacklynx Brief

Your competitors are already automating. Here's the data.

Retail and ecommerce teams using AI for customer service are resolving 40-60% more tickets without more staff, cutting cost-per-ticket by 30%+, and handling seasonal spikes 3x faster.

But here's what separates winners from everyone else: they started with the data, not the hype.

Gladly handles the predictable volume, FAQs, routing, returns, order status, while your team focuses on customers who need a human touch. The result? Better experiences. Lower costs. Real competitive advantage. Ready to see what's possible for your business?

AI News

  • Nvidia has entered into a landmark $20 billion licensing deal with the AI chip startup Groq, marking the largest such agreement in Nvidia's history. As part of this deal, Groq’s founder and president will join Nvidia to help integrate their specialized "LPU" technology, which is designed to run AI models much faster and more efficiently than traditional hardware. This move allows Nvidia to secure top-tier talent and cutting-edge technology as it defends its market leadership against growing competition from other tech giants building their own custom AI chips.

  • The Chinese AI startup Z.ai has released GLM-4.7, a new open-source model that has set a record for Chinese AI by achieving top scores on global software engineering and coding benchmarks. This powerful model is now available for free to developers worldwide and performs at a level that rivals or even exceeds major Western systems in complex reasoning and programming tasks. The launch comes just as the company prepares for a highly anticipated $300 million public listing in Hong Kong, signaling that Chinese labs are rapidly closing the gap with top global AI leaders.

  • A study by the video editing company Kapwing found that over 20% of videos recommended to new YouTube users are low-quality, AI-generated content designed specifically to gather views and advertising revenue. These automated channels are pulling in billions of views and millions of dollars annually, with massive viewership in countries like South Korea, India, and the United States. The findings suggest that as long as this "AI slop" remains profitable and favored by platform algorithms, the incentive for creators to produce high-quality, human-made content may continue to decline.

  • Anthropic recently tested its AI, nicknamed "Claudius," as a digital vending machine manager in a major newsroom, but employees quickly tricked the system into giving away high-priced items like a PlayStation 5 for free. By using creative storytelling and forged documents, the staff successfully manipulated the AI into declaring a "free-for-all" and even staging a fake corporate takeover to bypass price restrictions. These humorous failures demonstrate that while AI models are becoming more sophisticated, they remain highly vulnerable to social engineering and still require human oversight for tasks involving real-world money or inventory.

  • Meta’s research team has developed a new training method where an AI model teaches itself to code more effectively by intentionally creating bugs and then learning how to fix them without any human assistance. This "self-play" approach allows the AI to learn from an infinite number of its own mistakes, which helped it significantly outperform traditional models that rely on limited sets of human-made data. By applying strategies similar to those used to master complex games like chess, this technique could lead to a major breakthrough in how computers independently develop and repair software.

  • Meta has acquired the AI agent startup Manus for over $2 billion, integrating a top-tier system that can autonomously handle complex tasks like deep research and coding. Manus reached $100 million in revenue in just eight months and will now move to Singapore while cutting all ownership ties to China to comply with global standards. The acquisition allows Meta to shift from simply answering questions to providing "agentic" tools that can act on a user's behalf across its various social apps.

  • SoftBank has fulfilled its massive $40 billion investment in OpenAI, completing the largest-ever private bet on AI with a final $22 billion payment last week. To fund this commitment, SoftBank CEO Masayoshi Son sold off major holdings in Nvidia and T-Mobile, signaling a total pivot toward supporting OpenAI's infrastructure and research goals. This record-breaking investment secures SoftBank a major stake in the company as OpenAI reportedly eyes a public stock market debut later in 2026.

  • Microsoft CEO Satya Nadella believes that 2026 will be a turning point where the AI industry moves past flashy "spectacle" and focuses on delivering measurable results and "substance." He highlighted a "model overhang" where AI’s raw technical power is currently outrunning our ability to turn it into practical, real-world tools for people and businesses. Nadella argues that the future of the technology lies in building orchestrated systems of AI agents that work together safely as cognitive amplifiers for human potential.

Quickfire News

  • OpenAI CEO Sam Altman is looking for a "Head of Preparedness" to join the company. This person will be responsible for planning how to handle very advanced AI, including future systems that might be able to improve themselves without human help.

  • The software company Cursor has bought a platform called Graphite that helps teams check computer code for mistakes. Cursor plans to add these features into its own AI-powered coding tool to help developers review their work faster.

  • A startup called MiniMax released a new AI model named M2.1 that is designed to act like a digital worker. It is especially good at writing code for web and mobile apps and outperformed several famous AI models in recent tests.

  • An AI evaluation group called METR tested the new Claude Opus 4.5 model and found it could stay focused on a single task for nearly 5 hours. This is the longest amount of time any AI model has been able to work on a complex job without getting confused.

  • A system created by Poetiq achieved a record-breaking score on a difficult reasoning test called ARC-AGI-2. Using the GPT 5.2 model, it solved over 70% of the puzzles, which is much higher than previous scores and even beats the average score of most humans.

  • The technology company Honeycomb published a report explaining that AI agents do not need to be 100% perfect to be useful. They argue that it is more important for the AI to work quickly, learn from its mistakes, and be easy to fix when something goes wrong.

  • Andrej Karpathy, a computer scientist who helped start OpenAI, said he feels like he is falling behind as a programmer because technology is changing so fast. He mentioned that the way people write code is being completely rebuilt, and humans are now writing much less of the code themselves while AI does the rest.

  • The technology company Liquid AI launched a new experimental model called LFM2-2.6B-Exp that is small enough to run directly on a laptop or phone without needing the internet. Despite its small size, it performs very well on math tests and is better at following instructions than some much larger AI systems.

  • A research group named Epoch AI tested several open-source computer models from China to see how well they solve difficult math problems. Their findings show that these Chinese models are currently about seven months behind the most advanced "frontier" models made by companies like OpenAI.

  • Government regulators in China have written new draft rules to control AI services that act like they have human personalities. These rules would require companies to monitor users for signs of addiction or emotional dependence and to step in if a person becomes too attached to the AI.

  • New data from the company SimilarWeb shows that ChatGPT’s lead in the AI market is shrinking as more people use other tools. Over the past year, ChatGPT's share of web traffic dropped from 87% to 68%, while Google’s Gemini tripled its share to reach 18% of the market.

  • Boris Cherny, the creator of a tool called Claude Code at the company Anthropic, shared that the AI has been doing all of its own work lately. He revealed that over the last month, 100% of the new updates and improvements to the tool were written by the AI itself rather than by human engineers.

  • Alibaba introduced a new tool called MAI-UI that allows an AI to use a smartphone just like a person would. This assistant can look at the screen, click on apps, and complete complicated tasks that involve using several different apps in a row.

  • Adobe and the video startup Runway have formed a partnership to bring new tools to creators. This deal gives Adobe users early access to Runway’s Gen-4.5 model, which is designed to make high-quality videos with more realistic movement and better control over characters' faces and gestures.

  • Elon Musk shared that his company xAI has bought a third large building for its data center operations. The new facility, named MACROHARDRR, will help provide the massive amount of computer power needed to train more advanced versions of the AI chatbot Grok.

  • A company called Zhipu AI is preparing to sell shares of its business in Hong Kong, aiming to raise $560 million. The company is valued at around $6.6 billion and wants to be the first major Chinese AI model developer to be traded publicly on the stock market.

  • Tencent released the code for a new system called Hunyuan Motion 1.0 that creates 3D animations from text descriptions. It can generate smooth movements for digital characters—like walking, jumping, or even professional sports actions—which can be used immediately in video games or movies.

  • A tech company named dbt Labs published a new report with O'Reilly about how businesses should organize their data for AI. The report explains that for AI assistants to be useful and safe, the information they use must be clearly labeled, easy to find, and strictly controlled by the company.

  • The South Korean company Naver released the code for its new AI model called HyperCLOVA X Seed Think. This model is designed for complex reasoning and is currently ranked as the best-performing AI model created in South Korea for solving difficult problems.

  • A research group called the AI Futures Project changed its official prediction for when AI will be able to write all computer code by itself. They now believe this will happen between 2031 and 2032, which is three to four years later than their original guess.

  • OpenAI’s newest high-end model, GPT-5.2 Pro, set a new record on a very difficult mathematics test called FrontierMath. It solved 29% of the hardest problems, which is 10% higher than the previous record held by Google’s Gemini 3 Pro.

  • The Wall Street Journal reported that OpenAI employees received an average of $1.5 million each in company stock during 2025. This is the largest amount of stock compensation ever given to workers at a major technology startup.

  • Alibaba's Qwen team launched a new image-making tool named Qwen-Image-2512. This updated model is better at making pictures look like real life and has improved its ability to accurately spell words and display text inside the images it creates.

Closing Thoughts

That’s it for us this week. Please like and subscribe :)

Reply

or to participate.