The modelling framework acts as a common interface to define the problem to be solved by our DRL algorithms. This modelling architecture is the linking pin that connects all other constituents of the tool, such as the environment models and algorithms. The modelling framework is structured around the language of Markov Decision Processes (MDP), where the following problem elements have to be defined: stages (decision epochs), states (characterization of the problem at a decision epoch), decisions (possible decisions), rewards (costs or rewards corresponding with a decision in a state), and the transition function (how a decision in a state brings us to another state).
The algorithmic framework will be developed to solve the problems defined through our modelling framework. This algorithmic framework unifies the broad RL field: various problem types (e.g., model free versus model based, online versus offline), value or policy approximations (e.g., parametric, non-parametric, linear, kernel-based, neural networks), and algorithmic forms, such as policy gradient algorithms (e.g., REINFORCE), value based algorithms (e.g., Q-learning, Deep Q-Network), Actor-Critic Algorithms (e.g., A3C, TD3, SAC), trust region policy optimization (TRPO), proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), and many more.
Given the diversity in RL solution methodologies, insight is needed into the applicability of certain algorithmic forms to specific classes of logistics problems. This is achieved by creating a mapping based on certain problem features, as well as establishing hyper-heuristic forms that could automate this mapping. Furthermore, we support the user with parameter tuning, especially related to the tuning of hyper parameters in neural networks. This problem includes selecting good architectures (e.g., graph based, feedforward, or recurrent based) and the right numbers of layers and nodes for NNs.
Researchers from Eindhoven University of Technology (TU/e) and University of Twente (UT) have been working separately on extensive RL code bases. More specifically, Willem van Jaarsveld (TU/e) has worked on algorithms in the spirit of AlphaZero and other model-based algorithms typically relying on approximate poliyc iteration, wheras Martijn Mes (UT) has worked on approximate value-iteration methods typically denoted by Approximate Dynamic Programming.
Willem van Jaarsveld came with the idea to create a generic DRL toolbox that could solve any stochastic sequential decision-problem formulated as an MDP in a similar fashion as CPLEX could solve any deterministic problem formulated as a mathematical program.
December 2020, researchers from the TU/e and UT joined forces by applying for funding by TKI Dinalog for ther further development of DynaPlex. The project has been granted early 2021, after which several PhD students could be hired on this project.
The DynaPlex toolbox now includes both policy- and value-based (D)RL methods, making it suitable for solving a large range of logistics problems.
Lead Developer & PI DynaPlex Project
PI DynaPlex project
PhD student DynaPlex Project
PI DynaPlex Project
PI DynaPlex project
PI DynaPlex Project
In addition to the lead developers shown above, many other scientists are contributing to DynaPlex.