Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction

Jan 14, 2024ยท
Huanting-Wang
,
Zhanyong-Tang
,
Shin Hwei Tan
Chunwei Xia
Chunwei Xia
,
Zheng Wang
ยท 0 min read
Abstract
Deep learning (DL) has emerged as a viable means for identifying software bugs and vulnerabilities. The success of DL relies on having a suitable representation of the problem domain. However, existing DL-based solutions for learning program representations have limitations - they either cannot capture the deep, precise program semantics or suffer from poor scalability. We present Concoction, the first DL system to learn program presentations by combining static source code information and dynamic program execution traces. Concoction employs unsupervised active learning techniques to determine a subset of important paths to collect dynamic symbolic execution traces. By implementing a focused symbolic execution solution, Concoction brings the benefits of static and dynamic code features while reducing the expensive symbolic execution overhead. We integrate Concoction with fuzzing techniques to detect function-level code vulnerabilities in C programs from 20 open-source projects. In 200 hours of automated concurrent test runs, Concoction has successfully uncovered vulnerabilities in all tested projects, identifying 54 unique vulnerabilities and yielding 37 new, unique CVE IDs. Concoction also significantly outperforms 16 prior methods by providing higher accuracy and lower false positive rates.
Type
Publication
The 46th IEEE/ACM International Conference on Software Engineering, is the premier software engineering conference (ICSE) (Artifacts Evaluated!)