Vulnerabilities threaten the security of information systems. It is crucial to detect and patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer from long-term dependency, out of vocabulary,… Click to show full abstract
Vulnerabilities threaten the security of information systems. It is crucial to detect and patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer from long-term dependency, out of vocabulary, bias towards global features or local features, and coarse detection granularity. This paper proposes an automatic vulnerability detection framework in source code based on a hybrid neural network. First, the inputs are transformed into an intermediate representation with explicit structure information using lower level virtual machine intermediate representation (LLVM IR) and backward program slicing. After the transformation, the size of samples and the size of vocabulary are significantly reduced. A hybrid neural network model is then applied to extract high-level features of vulnerability, which learns features both from convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The former is applied to learn local vulnerability features, such as buffer size. Furthermore, the latter is utilized to learn global features, such as data dependency. The extracted features are made up of concatenated outputs of CNN and RNN. Experiments are performed to validate our vulnerability detection method. The results show that our proposed method achieves excellent results with F1-scores of 98.6% and accuracy of 99.0% on the SARD dataset. It outperforms state-of-the-art methods.
               
Click one of the above tabs to view related content.