Buffer overflow detection in C code - Word embedding and Random forrest algorithm

Project information

  • Category: Machine learning
  • Client: University Kelaniya
  • Project URL: project link

The Buffer Overflow Detection project is an interactive web application developed using the Streamlit framework. Its primary goal is to identify potential buffer overflow vulnerabilities within uploaded C programming code files. The project combines machine learning techniques, data processing, and a user-friendly interface to provide an efficient solution for detecting security risks.

Technical Details

  1. File Uploading: Users can upload C code files through the Streamlit user interface for analysis.
  2. Model Loading: The project employs a trained machine learning model, specifically a Random Forest classifier, loaded using the 'joblib' library.
  3. Code Processing: Uploaded code files are cleaned and processed, removing irrelevant lines and assigning line numbers.
  4. Prediction and Filtering: The loaded model predicts vulnerabilities for each code line, and the vulnerable lines are filtered based on the predictions.
  5. Results Presentation: Detected vulnerable lines are displayed, including line numbers, code snippets, and vulnerability probabilities.
  6. Dataset Creation: The project also involves the creation of a dataset by analyzing C code files, identifying vulnerable and non-vulnerable lines, and compiling them into a structured dataframe.

The approach taken in this project is highly suitable for buffer overflow detection. By integrating a Random Forest classifier, the project effectively handles tabular data and captures complex patterns within code snippets. This approach enables efficient identification of potential vulnerabilities, enhancing software security by proactively addressing buffer overflow risks.