DynaPlex

Introduction

In December 2018, a team of researchers of DeepMind (owned by Google) published a paper in the journal Science, demonstrating the ability of their newly developed AlphaZero algorithm to beat the best game engines in Chess, Go, and Shogi. What is more, instead of relying on hand-crafted evaluation functions of board states, the AlphaZero algorithm contains no expert information on any of the games played: it autonomously learns to play each game only by playing the game many times against itself.

This AI breakthrough is exciting because Go and Chess are games where it is crucial to anticipate unknown moves of the opponent. When making logistics decisions, it is equally important to anticipate the arrival of new data (e.g., orders, delays, disruptions, etc.). For various dynamic data-driven decision problems, Deep Reinforcement Learning (DRL) algorithms like AlphaZero have been demonstrated to be game-changers. The logistics sector recognizes the opportunities and is eager to adopt AI for decision automation. However, companies struggle to translate the abstract possibilities of AI into the tangible project plans, and employing DRL-based decision making requires expert algorithmic knowledge that is difficult to source.

To overcome these challenges, we started to develop the DynaPlex toolbox. In a similar fashion as AlphaZero was designed as a generic tool to solve various games, we created the DynaPlex toolbox to support the rapid development of automated decision making based on DRL. DynaPlex focuses on dynamic data-driven logistics challenges, with a focus on planning, scheduling, allocation and routing decisions, for example in transportation management, inventory management, and warehouse management.

Modelling Framework

The modelling framework acts as a common interface to define the problem to be solved by our DRL algorithms. This modelling architecture is the linking pin that connects all other constituents of the tool, such as the environment models and algorithms. The modelling framework is structured around the language of Markov Decision Processes (MDP), where the following problem elements have to be defined: stages (decision epochs), states (characterization of the problem at a decision epoch), decisions (possible decisions), rewards (costs or rewards corresponding with a decision in a state), and the transition function (how a decision in a state brings us to another state).

Algorithmic Framework

The algorithmic framework will be developed to solve the problems defined through our modelling framework. This algorithmic framework unifies the broad RL field: various problem types (e.g., model free versus model based, online versus offline), value or policy approximations (e.g., parametric, non-parametric, linear, kernel-based, neural networks), and algorithmic forms, such as policy gradient algorithms (e.g., REINFORCE), value based algorithms (e.g., Q-learning, Deep Q-Network), Actor-Critic Algorithms (e.g., A3C, TD3, SAC), trust region policy optimization (TRPO), proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), and many more.

Algorithmic selector

Given the diversity in RL solution methodologies, insight is needed into the applicability of certain algorithmic forms to specific classes of logistics problems. This is achieved by creating a mapping based on certain problem features, as well as establishing hyper-heuristic forms that could automate this mapping. Furthermore, we support the user with parameter tuning, especially related to the tuning of hyper parameters in neural networks. This problem includes selecting good architectures (e.g., graph based, feedforward, or recurrent based) and the right numbers of layers and nodes for NNs.

Portfolio

Overview of example problems implemented in DynaPlex.

Robotic AS/RS

An automated storage and retrieval system with order picking robots.

Trucks & Barges

Synchromodal planning using the Trucks & Barges game

Tetris

A 2D Tetris game representing bin-packing problems.

Nomadic Trucker

Illustration of DRL on the well-known Nomadic Trucker problem

Airplane seat revenue

A customer acceptance problem for airplane seats.

Pacman

Various forms of pacman representing various logistics problems

Den Hartogh

DRL for Global Network Management

About

History of DynaPlex.

2015-2020

Our Humble Beginnings

Researchers from Eindhoven University of Technology (TU/e) and University of Twente (UT) have been working separately on extensive RL code bases. More specifically, Willem van Jaarsveld (TU/e) has worked on algorithms in the spirit of AlphaZero and other model-based algorithms typically relying on approximate poliyc iteration, wheras Martijn Mes (UT) has worked on approximate value-iteration methods typically denoted by Approximate Dynamic Programming.
March 2020

DynaPlex is Born

Willem van Jaarsveld came with the idea to create a generic DRL toolbox that could solve any stochastic sequential decision-problem formulated as an MDP in a similar fashion as CPLEX could solve any deterministic problem formulated as a mathematical program.
December 2020

Joining forces

December 2020, researchers from the TU/e and UT joined forces by applying for funding by TKI Dinalog for ther further development of DynaPlex. The project has been granted early 2021, after which several PhD students could be hired on this project.
November 2021

Combined code-base

The DynaPlex toolbox now includes both policy- and value-based (D)RL methods, making it suitable for solving a large range of logistics problems.
Be Part
Of Our
Story!