AI’s evolution in the world of video games and its transfer to the real world
The use of AI in gaming is nothing new and has evolved immensely since the early days of video games.
Initially, CPUs simply executed a set of more or less basic rules; today, they are extremely complex systems capable of performing millions of calculations in a split second to determine the best strategy to defeat human players.
The height of this evolution was reached in 2017 when DeepMind’s AlphaZero program made headlines for learning to play and master shogi (Japanese chess) and Go using deep reinforcement learning (deep RL).
What was most novel about AlphaZero was, precisely, the use of reinforcement learning. In this technique there is no “output label” (I understand the concept of label as a classification of the achieved result), so it is not of the supervised type, where a human guides the learning; and although these algorithms learn by themselves, they are not unsupervised either, as an attempt is made to classify groups taking into account some distance between samples. In contrast, the real world involves multiple variables that are usually interrelated, depend on other business cases, and give rise to larger scenarios in which to make decisions.
In supervised (or unsupervised) learning models such as neural networks, trees, and KNN, the aim is to “minimize the cost function,” that is, to reduce error. In RL the aim is instead to “maximize the reward.” And this can be done despite making mistakes or being sub-optimal. RL therefore proposes a new approach for making our machine learn, postulating the following two components:
- The agent: this is the model we want to train and to learn to make decisions.
- Environment: this is the context in which the agent interacts and “moves.” The environment is subject to possible constraints and rules at any given time.
Between them is a relationship that feeds back on itself and has the following links:
- Action: the possible actions that can be taken at a given time by the agent.
- State (of the environment): these are indicators that show how the various components of the environment are at that moment.
- Rewards (or punishments!): for each action taken by the agent there is a reward or a penalty that will tell it whether it is performing well or poorly.
The real-world applications are very diverse and exciting; for example, with mechanical arms, where instead of teaching them how to move instruction by instruction, we can let them make “blind” attempts and reward them when they get it right. RL can also be used in environments that interact with the real world, such as in other types of industrial machinery and for predictive maintenance, but also in the financial environment, for example to decide how to set up an investment portfolio without human intervention.
In this regard, somewhat related to autonomous driving systems and simulation, Sony has released new updates to its AI agent for Gran Turismo Sophy (GT Sophy), which is capable of beating the best players in the world. Such agents represent an extreme and state-of-the-art example of AI systems, as drivers must execute complex tactical maneuvers to pass or block opponents while operating their vehicles at their operational limits.
GT Sophy received training using the RL techniques described above and includes cutting-edge learning algorithms and training scenarios developed by Sony AI, using Gran Turismo Sport, a real driving simulator, and leveraging SIE’s cloud gaming infrastructure for large-scale training.
AI systems trained in simulated environments are helping to further establish simulation as a training system for highly complex applications such as autonomous driving, where AI plays a predominant role in control systems, and which can undoubtedly be extrapolated to other environments such as industry, finance and, medicine.
Author: Ángel Cristóbal Lázaro