To understand what a specific software does you can take a look at its source code if available. However, if you want to find more than just a high-level description for a given vulnerability your options are usually limited. Searching for write-ups might be hard for non-popular vulnerabilities and even if you find some they will likely have quite varying quality standards and formatting. Here, I would like to present a solution to this problem which was developed over the last half year. Please note: This is the result of my 20% work-time + spare time activity and not an officially supported Google product.
tl;dr:
- The vulnerable code database (Vulncode-DB) is a database for vulnerabilities and their vulnerable (open-source) code if available. It’s run under vulncode-db.com and its code is available at github.com/google/vulncode-db.
- It’s launched as an experimental alpha version mostly for demonstration purposes. The application might be unreliable, contains many bugs *badumtss* and is not feature complete. Please set your expectations accordingly.
- If you’re interested in project news feel free to follow @vulncodedb.
Credits:
- My colleague Timo Schmid (@bluec0re) contributed essential parts to this project.
- My team and everyone else who helped with feedback.
Let’s look at a concrete example. To understand how Heartbleed (CVE-2014-0160) works in detail you might consider reading its MITRE CVE entry or its National Vulnerability Database (NVD) entry. After doing so, you learn that it is some kind of out-of-bounds read problem and that the files d1_both.c and t1_lib.c are relevant but you still don’t have any code in front of you and don’t know which specific parts of these files are affected. The day might be saved by a good write-up for this popular bug but you don’t really want to repeat the above steps every time you want to learn about a new vulnerability or bug pattern. Let’s look at a proposal to make this less painful at scale.
Vulncode-DB
The database mostly builds upon the NVD and CVE data sets. Accordingly, you are directly able to view more than 100k entries with their default descriptions and available data. However, in addition the project intends to collect data like relevant files, code regions and comments for entries with available code. Particularly, the database differentiates between three different entry types:
- Basic entry – No available code for example due to proprietary software or since no patch is known / assigned for this vulnerability.
- Entry with known patch – A patch has been assigned and its contents are automatically displayed as a heuristic for defect code. The entry has not been annotated yet.
- Annotated entry – Relevant files and regions have been annotated.
Let’s have a look at concrete examples for each category:
- Adobe Acrobat Read – Out-of-bounds read bug – As neither code nor patch are available, this is just displayed as a normal entry.
- Linux Kernel SNMP NAT module bug – This entry was not annotated. However, the application automatically detected a github.com link in the references (currently simple regex) and fetched the relevant context like files that have been affected by the patch. As can be seen, a patch is often already a useful heuristic to show files and areas that might be relevant for a vulnerability.
- Heartbleed or Python 2.5 buffer overflow bug– This is a rough sketch of what an annotated entry could look like. In addition to the normal data, you can see file sections with short annotations. Please note that relevant lines are highlighted and irrelevant lines are folded away for a better overview.
There are more than 100k entries for the first category as they are straightforward to obtain through the NVD data set. However, there are only about 3.5k entries of category two and less than 5 of the third category as this is still in a proof-of-concept state.
Making this crowd-based, think Wikipedia, and allowing annotations by everyone with content moderation could improve this by a lot.
Usage
If the simplified view is insufficient and you would like to have more context about the code you can switch to the detailed repository view. Consider this example where you can see repository contents, as well as the content of the currently selected/opened file with the ability to jump between annotated or patched areas. Such an entry is created over this annotation view which is currently only available for demonstration purposes.
Goals
Making real-world examples of vulnerable code universally accessible and useful. Particularly:
- Education – Giving insights into how a vulnerability works under the hood and communicating real-world problem patterns in code.
- Tooling and research purposes – Providing a data set to develop (static) source code analysis tooling and to test it with benchmarks.
Next steps
- Collecting feedback for this project. Please share your opinion and potential use-cases. The project lives and dies with public interest as it’s a large amount of work for only a few individuals.
- Improving the quality and stability of this project and making the current state more useful to you.
- Collecting more data. Once we get this to a more stable version we would like to open up content creation. Generally, you should be able to create entries for any kind of vulnerability also without an assigned CVE identifier.
Conclusion
The Vulncode-DB project strives to collect and make vulnerable (open-source) code available to everyone. Please try its prototype out at vulncode-db.com and let us know your opinion and feedback in the comment section or on Twitter @vulncodedb. Particularly, we would be interested in features you think are missing or things that should be done differently.
If you have more questions please also take a look at vulncode-db.com/about.
Add Comment