Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction

Jan 14, 2024·

Huanting-Wang

Zhanyong-Tang

Shin Hwei Tan

Chunwei Xia

Zheng Wang

· 0 min read

PDF Code Dataset DOI

Abstract

Deep learning (DL) has emerged as a viable means for identifying software bugs and vulnerabilities. The success of DL relies on having a suitable representation of the problem domain. However, existing DL-based solutions for learning program representations have limitations - they either cannot capture the deep, precise program semantics or suffer from poor scalability. We present Concoction, the first DL system to learn program presentations by combining static source code information and dynamic program execution traces. Concoction employs unsupervised active learning techniques to determine a subset of important paths to collect dynamic symbolic execution traces. By implementing a focused symbolic execution solution, Concoction brings the benefits of static and dynamic code features while reducing the expensive symbolic execution overhead. We integrate Concoction with fuzzing techniques to detect function-level code vulnerabilities in C programs from 20 open-source projects. In 200 hours of automated concurrent test runs, Concoction has successfully uncovered vulnerabilities in all tested projects, identifying 54 unique vulnerabilities and yielding 37 new, unique CVE IDs. Concoction also significantly outperforms 16 prior methods by providing higher accuracy and lower false positive rates.

Type

Conference paper

Publication

The 46th IEEE/ACM International Conference on Software Engineering, is the premier software engineering conference (ICSE) (Artifacts Evaluated!)

Last updated on Jan 14, 2024

Authors

Chunwei Xia

Lecturer (Assistant Professor)

← Optimizing Deep Learning Inference via Global Analysis and Tensor Expression Apr 29, 2024

HOPE: a heterogeneity-oriented parallel execution engine for inference on mobiles Apr 29, 2022 →