LLMs cannot reliably identify and reason about security vulnerabilities (yet?): a comprehensive evaluation, framework, and benchmarks

S. Ullah, M. Han, S. Pujar, H. Pearce, A. Coskun, G. Stringhini. 2024. "LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks" 2024 IEEE Symposium on Security and Privacy (SP), pp.862-880. https://doi.org/10.1109/sp54263.2024.00210