DeepMind proposes the SAC-X learning paradigm to solve sparse reward tasks without prior knowledge

It's challenging to get both children and adults to organize things, but it's even more complex for AI to handle tasks like organizing objects. A lot of the core skills involved in visual movement are crucial for successâ€”approaching an object, grasping it, lifting it, then opening a container and placing it inside. To perform more complicated actions, these basic skills must be applied in the right sequence. Controlling tasks such as cleaning a table or stacking objects requires an agent to decide how, when, and where to coordinate the six joints of a robotic arm and fingers to achieve the goal. At any given moment, there are numerous possible action combinations, and trying to combine them in order leads to significant challengesâ€”making reinforcement learning an exciting and active area of research. Methods like reward shaping, apprenticeship learning, or learning from demonstrations can help address some of these issues. However, they often require a strong understanding of the task at hand. Learning complex control tasks with minimal prior knowledge remains a major challenge in the field. Recently, DeepMind introduced a new learning model called SAC-X (SAC with Exploration), designed to tackle these problems. The idea behind SAC-X is that to master complex tasks from scratch, an agent must first learn and perfect a set of basic skills. Just like babies learn to balance before crawling or walking, agents should first develop simple skills to enhance internal coordination and better understand how to perform more complex tasks. Researchers tested SAC-X in simulated environments and real robots, including tasks like stacking objects and tidying up a table. The general principle of the auxiliary tasks used is to encourage the agent to explore its sensory space. For example, activating a fingerâ€™s tactile sensor, sensing the force at the wrist, maximizing joint angles, or making the object move within the cameraâ€™s view. Each task gives a simple reward if the goal is reached, otherwise no reward is given. The simulation agent eventually mastered the complex task of "stacking." The agent can now determine its own "purpose"â€”whether it's an auxiliary task or an externally set goal. Importantly, by using off-policy learning extensively, agents can detect and learn from reward signals. For instance, while picking up or moving an object, the agent might accidentally complete a stacking action, which would trigger a reward. Planning goals is essential because a series of simple tasks can lead to rare external rewards. This approach allows for personalized learning paths based on collected knowledge, proving effective in broad domains, especially when external rewards are scarce. The agent determines the next target through a scheduling module, which is improved using meta-learning during training. This algorithm aims to maximize progress on the main task and significantly improves data efficiency. After exploring internal auxiliary tasks, the agent learned how to stack and clean up objects. Evaluations of SAC-X show that it can solve problems from scratch using the same underlying auxiliary tasks. Even more exciting is that in the lab, SAC-X was able to learn pick-and-place tasks from scratch on a real robotic armâ€”a challenge that has been difficult due to the need for high data efficiency. Traditionally, people trained a simulation agent and then transferred it to a real robot. DeepMind researchers believe that the development of SAC-X marks an important step toward learning tasks from zero, with the ultimate goal of defining the mission itself. SAC-X allows for any kind of auxiliary taskâ€”whether general (like activating a sensor) or specific to what a researcher needs. In this sense, SAC-X is a versatile reinforcement learning method that can be applied broadly across sparse reinforcement learning environments, not just in control and robotics.

For Oppo Touch

For Oppo Touch,Front Touch Screen Glass,Touch Screen For Reno 3 Pro,Glass Touch Screen For Oppo

Dongguan Jili Electronic Technology Co., Ltd. , https://www.jlglassoca.com

DeepMind proposes the SAC-X learning paradigm to solve sparse reward tasks without prior knowledge

Disastrous consequences of low-cost electronic research and development