Show simple item record

dc.contributor.authorPaschalidis, Ioannis Ch.en_US
dc.contributor.authorBelta, C. A.en_US
dc.contributor.authorWang, J.en_US
dc.contributor.authorDing, X. C.en_US
dc.contributor.authorLahijanian, M.en_US
dc.contributor.authorMoazzez-Estanjini, R.en_US
dc.date.accessioned2016-08-26T02:10:59Z
dc.date.accessioned2016-09-29T15:45:25Z
dc.date.available2016-09-29T15:45:25Z
dc.date.issued2011
dc.identifier.citationR Moazzez-Estanjini, X-C Ding, M Lahijanian, J Wang, CA Belta, I Ch Paschalidis. "Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control." Proceedings of the 50th IEEE Conference on Decision and Control,
dc.identifier.otherhttp://arxiv.org/abs/1108.4698v2
dc.identifier.urihttps://hdl.handle.net/2144/18014
dc.description.abstractWe consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.en_US
dc.language.isoen_US
dc.relation.ispartofProceedings of the 50th IEEE Conference on Decision and Control
dc.relation.ispartofseriesDecision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on;
dc.relation.replaceshttp://hdl.handle.net/2144/17758
dc.relation.replaces2144/17758
dc.rightsAttribution 4.0 Internationalen_US
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectMarkov decision processesen_US
dc.subjectRobotics (cs.RO)en_US
dc.subjectRoboticsen_US
dc.subjectRobot motion controlen_US
dc.subjectActor-critic methodsen_US
dc.subjectDynamic programmingen_US
dc.subjectSystems and control (cs.SY)en_US
dc.subjectOptimization and control (math.OC)en_US
dc.titleLeast squares temporal difference actor-critic methods with applications to robot motion controlen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/CDC.2011.6160485
pubs.notesOther: This publication does not fall under the open access policy because it was completed before February 11, 2015. When it was harvested from arXiv, the citation did not have a publication date.en_US
pubs.notesoptpages:en_US
pubs.notesEmbargo: No embargoen_US
pubs.organisational-groupBoston Universityen_US
pubs.organisational-group/Boston University/College of Engineeringen_US
pubs.organisational-group/Boston University/College of Engineering/Department of Electrical & Computer Engineeringen_US


This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's license is described as Attribution 4.0 International