It's challenging to get both children and adults to organize things, but it's even more complex for AI to handle tasks like organizing objects. The core skills of visual movement play a crucial role in success—approaching an object, grasping it, lifting it, opening a container, and placing it inside. To perform more complicated actions, these basic skills must be executed in the right sequence.
Controlling tasks such as tidying a table or stacking objects requires an agent to determine how, when, and where to coordinate the six joints of a robotic arm and fingers. At any given moment, there are countless possible action combinations, and arranging them in the correct order leads to significant challenges—making reinforcement learning a fascinating area of study.
Methods like reward shaping, apprenticeship learning, or learning from demonstrations can help address these issues. However, they often require a clear understanding of the task. Learning complex control tasks with limited prior knowledge remains a major challenge.
Recently, DeepMind introduced a new learning model called SAC-X (Planned Auxiliary Control) to tackle these problems. The key idea behind SAC-X is that, to master complex tasks from scratch, an agent must first learn and refine a set of basic skills. Just as babies need to learn balance before crawling and walking, agents should start with simple skills to improve internal coordination and better understand how to perform more complex actions.
Researchers tested SAC-X in simulated environments and real robots, including tasks like stacking objects and clearing a table. The general approach of auxiliary tasks is to encourage the agent to explore its sensory space. For instance, activating a finger’s tactile sensor, sensing wrist force, maximizing joint angles, or moving objects within the camera’s view. If the goal is achieved, the agent receives a simple reward; otherwise, it gets nothing.
The simulation agent eventually mastered the complex "stacking" task. It could also decide its own "purpose"—whether to focus on an auxiliary task or an external goal. Importantly, by using off-policy learning extensively, agents can detect and learn from reward signals. For example, while trying to pick up an object, the agent might accidentally complete a stacking action, which would trigger a reward. Planning goals is essential because simple tasks can lead to rare external rewards, helping create personalized learning paths based on collected knowledge. This method has proven effective in broad fields, especially when only a few external rewards are available.
The agent determines the next target through a scheduling module, which is refined during training using a meta-learning algorithm. This helps maximize progress on the main task and improves data efficiency. After exploring various internal auxiliary tasks, the agent learned how to stack and clean up objects.
Evaluations show that SAC-X can solve problems from scratch using the same underlying auxiliary tasks. What's exciting is that in the lab, SAC-X was able to learn pick-and-place tasks from zero on a real robotic arm—a challenge in the past due to the need for high data efficiency. Usually, researchers train a simulation agent and then transfer it to a real robot.
DeepMind researchers believe SAC-X represents a major step forward in learning tasks from scratch. It allows any auxiliary task to be set up—whether general (like activating a sensor) or specific to a researcher’s needs. In this sense, SAC-X is a versatile reinforcement learning method, applicable not only to control and robot tasks but also to general sparse reinforcement learning environments.
Dongguan Jili Electronic Technology Co., Ltd. , https://www.jlglassoca.com