Boston University Libraries OpenBU
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    •   OpenBU
    • BU Open Access Articles
    • BU Open Access Articles
    • View Item
    •   OpenBU
    • BU Open Access Articles
    • BU Open Access Articles
    • View Item

    Q-learning for robust satisfaction of signal temporal logic specifications

    Thumbnail
    Date Issued
    2016-01-01
    Publisher Version
    10.1109/CDC.2016.7799279
    Author(s)
    Aksaray, Derya
    Jones, Austin
    Kong, Zhaodan
    Schwager, Mac
    Belta, Cahn
    Share to FacebookShare to TwitterShare by Email
    Export Citation
    Download to BibTex
    Download to EndNote/RefMan (RIS)
    Metadata
    Show full item record
    Permanent Link
    https://hdl.handle.net/2144/29722
    Citation (published version)
    Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, Cahn Belta. 2016. "Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications." 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC). 55th IEEE Conference on Decision and Control (CDC). Las Vegas, NV, 2016-12-12 - 2016-12-14
    Abstract
    This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations.
    Collections
    • BU Open Access Articles [3664]
    • ENG: Mechanical Engineering: Scholarly Papers [245]


    Boston University
    Contact Us | Send Feedback | Help
     

     

    Browse

    All of OpenBUCommunities & CollectionsIssue DateAuthorsTitlesSubjectsThis CollectionIssue DateAuthorsTitlesSubjects

    Deposit Materials

    LoginNon-BU Registration

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Boston University
    Contact Us | Send Feedback | Help