The ‘godfather’ of AI Yoshua Bengio has joined a British project to prevent AI disasters

Safeguarded AI’s goal is to create AI systems that can offer quantitative assurances, such as risk scores, about their impact on the real world, says David “davidad” Dalrymple, program director for Safeguarded AI at ARIA. The intention is to supplement human testing with a mathematical analysis of the damage potential of new systems.

The goal of the project is to build security mechanisms of artificial intelligence by combining scientific models of the world, which are essentially simulations of the world, with mathematical proofs. This evidence would include an explanation of the AI’s work, and humans would be tasked with verifying that the AI ​​model’s security controls are correct.

Bengio says he wants to help ensure that future AI systems can’t cause serious harm.

“Right now we’re hurtling toward a fog beyond which there may be an abyss,” he says. “We don’t know how far the precipice is, or if there is one at all, so it could be years, decades, and we don’t know how severe it could be… ​​We need to create tools to clear that fog and make sure we don’t get into the abyss, if there is one.”

Science and technology companies have no way to provide mathematical guarantees that AI systems will behave as programmed, he adds. According to him, this unreliability could lead to catastrophic consequences.

Dalrymple and Bengio argue that current techniques for mitigating the risk of advanced AI systems – such as red-teaming, where humans identify flaws in AI systems – have serious limitations and cannot be relied upon to ensure that critical systems are not triggered. -path.

Instead, they hope the program will provide new ways to secure AI systems that rely less on human effort and more on mathematical certainty. The vision is to build an AI “gatekeeper” tasked with understanding and reducing the security risks of other AI agents. This watchdog would ensure that AI agents operating in important sectors such as transportation or energy systems work the way we want them to. The goal is to work with companies soon to understand how AI security mechanisms can be useful for different industries, Dalrymple says.

The complexity of advanced systems means we have no choice but to use AI to protect AI, Bengio argues. “That’s the only way, because at some point these AIs get too complicated. Even the ones we have now, we can’t really break down their responses into human, comprehensible sequences of reasoning steps,” he says.

Leave a Comment