Generate a CSV file of all cve2stix CVEs

I have imported all cve2stix data into ArangoDB using stix2arango

python3 utilities/arango_cti_processor/insert_archive_cve.py \
    --database forum_demo

I need to export all the CVEs in CSV format for another tool we use that does not understand STIX.

Right now I’m using this query

FOR doc IN nvd_cve_vertex_collection
    FILTER doc.type == "vulnerability"
    SORT doc.modified DESC
    RETURN {
        id: doc.id,
        modified: doc.modified,
        name: doc.name,
        description: doc.description
    }

Selecting show “all results” in the UI crashes the browser, and only really works when I batch the results into 10,000 at a time.

I can do this manually by using limit in AQL;

FOR doc IN nvd_cve_vertex_collection
    FILTER doc.type == "vulnerability"
    SORT doc.modified DESC
    LIMIT 0, 10000
    RETURN {
        id: doc.id,
        modified: doc.modified,
        name: doc.name,
        description: doc.description
    }

And then running the requests until I’ve paged though all results. Does anyone have any advice as to how I can automate this logic?

Give this code a try (make sure to update the configuration section);

import csv
import requests

# Configuration
arangodb_host = "http://127.0.0.1:8529"
database_name = "forum_demo_database"
arangodb_user = "USERNAME"
arangodb_password = "PASSWORD"
chunk_size = 1000  # Number of documents per page

# AQL query template
aql_query = """
FOR doc IN nvd_cve_vertex_collection
    FILTER doc.type == "vulnerability"
    SORT doc.modified DESC
    RETURN {
        id: doc.id,
        modified: doc.modified,
        name: doc.name,
        description: doc.description
    }
"""

# Initialize CSV file
csv_file = "nvd_cve_vertex_collection.csv"
csv_headers = ["id", "modified", "name", "description"]

with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=csv_headers)
    writer.writeheader()

    # Pagination loop using cursor
    query_url = f"{arangodb_host}/_db/{database_name}/_api/cursor"
    response = requests.post(query_url, json={"query": aql_query, "batchSize": chunk_size}, auth=(arangodb_user, arangodb_password))
    response.raise_for_status()

    result = response.json()
    total_fetched = 0

    while True:
        documents = result.get("result", [])

        # Write the current chunk to the CSV file
        if documents:
            for doc in documents:
                writer.writerow(doc)
            total_fetched += len(documents)
            print(f"Fetched {len(documents)} documents, total fetched: {total_fetched}")
        else:
            break

        # Check if we have more documents to fetch
        if not result.get("hasMore"):
            break

        # Fetch the next batch using the cursor ID
        cursor_id = result.get("id")
        response = requests.put(f"{query_url}/{cursor_id}", auth=(arangodb_user, arangodb_password))
        response.raise_for_status()
        result = response.json()

print(f"Total fetched documents: {total_fetched}")
print(f"Results have been written to {csv_file}")

The resulting csv file should look like this

The csv file will look like this;

```txt
id,modified,name,description
vulnerability--a6069912-af3f-5775-98f7-e810c11df4a9,2024-06-30T23:15:02.563Z,CVE-2024-1135,"Gunicorn fails to properly validate Transfer-Encoding headers, leading to HTTP Request Smuggling (HRS) vulnerabilities. By crafting requests with conflicting Transfer-Encoding headers, attackers can bypass security restrictions and access restricted endpoints. This issue is due to Gunicorn's handling of Transfer-Encoding headers, where it incorrectly processes requests with multiple, conflicting Transfer-Encoding headers, treating them as chunked regardless of the final encoding specified. This vulnerability allows for a range of attacks including cache poisoning, session manipulation, and data exposure."
vulnerability--ca2fca16-3e75-5bce-b90c-d8092572f236,2024-06-30T23:15:02.443Z,CVE-2023-48733,An insecure default to allow UEFI Shell in EDK2 was left enabled in Ubuntu's EDK2. This allows an OS-resident attacker to bypass Secure Boot.