Optimized dynamic vehicle routing policies with applications
Embargoed until:
IndefinitePermanent Link
https://hdl.handle.net/2144/32029Abstract
This dissertation addresses two applications: (a) optimizing dynamic vehicle routing policies in warehouse forklift dispatching, and (b) reward collection by a group of air vehicles in a 3-dimensional mission space. For the first application, we successfully deployed an inexpensive mobile Wireless Sensor Network in a commercial warehouse served by a fleet of forklifts, aiming at improving forklift dispatching and reducing costs associated with the delays of loading/unloading delivery trucks. The forklifts were instrumented with sensor nodes that collect an array of information, including the forklifts' physical location, usage time, bumping/collision history, and battery status in an event-driven manner. A hypothesis testing algorithm was implemented to capture the location information. Combined with inventory information, the acquired information was fed into an Actor-Critic type stochastic optimization method to generate dispatching decisions.
For the second application, we considered an application where mobile vehicles (agents) fly in a forest with obstacles. They "chase" potentially moving targets that carry rewards, which the agents wish to collect by approaching the targets. We cast the problem into a Markov Decision Process framework. In order to seek an optimal policy that maximizes the long-term average reward collection, and to conquer the curse of dimensionality, we propose an approximate dynamic programming algorithm termed Distributed Actor-Critic Algorithm. Motivated by the way animals move while hunting for food, we incorporated several bio-inspired features into our control policy structure. Simulation results demonstrate that the policies with these bio-inspired features lead to a higher reward collection rate compared to the non-bio-inspired counterparts; by 40% in some examples. We also considered a setting where targets have intelligence and try to move away from agents in order to minimize the reward being collected. The problem is formulated as a Pursuit Evasion Game. Assuming that the targets also use an Actor-Critic method to optimize their control policy, we have shown that the game converges to a Local Nash Equilibrium. Furthermore, we proposed an Actor-Critic with Simulated Annealing (ACSA) algorithm, and established that the game converges to a Nash Equilibrium. Simulation results show that the ACSA algorithm can achieve a higher reward collection rate for both stationary and moving targets.
Description
Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
Collections