Can someone describe the structure of a CPE?

,

I am looking at implementing CPEs for SBOMs, but I’m very new to CPEs. I’m struggling to find any detail on how a CPE string is structured.

For example;

cpe:2.3:a:google:earth:-:*:*:*:*:*:*:*

What does each part of the CPE string represent?

The answer you’re looking for is in the spec document:

But that will take you a few hours to digest, so the tl;dr version…

Here is the CPE 2.3 URI schema (2.3 is the latest version, and I’ve never seen older versions being used);

cpe:2.3:<part>:<vendor>:<product>:
<version>:<update>:<edition>:<language>:<sw_edition>:<target_sw:>:<target_hw>:<other>

Where:

  • cpe: always cpe
  • 2.3: the cpe version (currently latest is 2.3). Note, you should check this is always 2.3 when parsing, as older versions do not have the same structure.
  • <part>: The part attribute SHALL have one of these three string values:
    • a for applications,
    • o for operating systems,
    • h for hardware devices
  • <vendor>: described or identifies the person or organisation that manufactured or created the product
  • <product>: describes or identifies the most common and recognisable title or name of the product
  • <version>: vendor-specific alphanumeric strings characterising the particular release version of the product
  • <update>: vendor-specific alphanumeric strings characterising the particular update, service pack, or point release of the product.
  • <edition> assigned the logical value ANY (*) except where required for backward compatibility with version 2.2 of the CPE specification
  • <language>: valid language tags as defined by [RFC5646]
  • <sw_edition>: characterises how the product is tailored to a particular market or class
    of end users.
  • <target_sw>: characterises the software computing environment within which the product operates.
  • <target_hw>: characterises the instruction set architecture (e.g., x86) on which the product being described or identified operates
  • <other>: capture any other general descriptive or identifying information which is vendor- or product-specific and which does not logically fit in any other attribute value

Here is an example of a CPE URI (for Apple Quicktime v7.71.80.42);

cpe:2.3:a:apple:quicktime:7.71.80.42:*:*:*:*:*:*:*

Where;

  • part: a (application)
  • vendor: apple
  • product: quicktime
  • version: 7.71.80.42
  • update: *
  • edition: *
  • language: *
  • sw_edition: *
  • target_sw: *
  • target_hw: *
  • other: *

You’ll see hyphens - and asterisks * in CPE strings. From the spec (6.2.3.1):

If a field contains only an asterisk, it is unbound to the logical value ANY. If a field contains only a hyphen, it is unbound to the logical value NA.

And 5.3.1:

An attribute of a WFN MAY be assigned one of these logical values:

  1. ANY (i.e., “any value”). The logical value ANY SHOULD be assigned to an attribute when there are no restrictions on acceptable values for that attribute.
  2. NA (i.e., “not applicable/not used”). The logical value NA SHOULD be assigned when there is no legal or meaningful value for that attribute, or when that attribute is not used as part of the description. This includes the situation in which an attribute has an obtainable value that is null.

A good example of when a - is used could be to describe an initial release if a platform does identify with a given component but no term is commonly used to describe it. An example of this might be the initial release of an application before any updates have been released. The very first release of an application might be known as “Acme Product 1.0”.

cpe:2.3:a:acme:product:1.0:-

A few months later an update is released by the vendor and the once applied, the platform is known as “Acme Product 1.0 Update 1”.

cpe:2.3:a:acme:product:1.0:update1

Also from the spec (5.2):

It is important to note that when a CPE Name is being created for a specific platform type (i.e. a specific release of an application) then the hyphen should be used at all times (when a vendor term is not available) instead of a blank component. The blank component is to be used only when a more general platform type is being enumerated. (i.e. any version of an application)

The * value is a little more simple to get your head around. Essentially, if a * is present it means any value for this part. For example, if the version part contains a *, e.g.

cpe:2.3:a:acme:product:*

It means ANY version of that product that exists.

One final word of warning…

When parsing CPE stings, you will occasionally see escape characters in the match string (especially to escape the interpretation of * and - charachters).

The following are examples of where escape characters need to be used:

  • foo\-bar (hyphen is quoted)
  • \"oh_my\!\" (quotation marks and exclamation point are quoted)
  • g\+\+ (plus signs are quoted)
  • 9\.? (period is quoted, question mark is unquoted)
  • sr* (asterisk is unquoted)
  • big\$money (dollar sign is quoted)
  • foo\:bar (colon is quoted)
  • back\\slash_software (backslash is quoted)
  • with_quoted\~tilde (tilde is quoted)
  • *SOFT* (single unquoted asterisk at beginning and end)
  • 8\.?? (two unquoted question marks at end)
  • *8\.?? (one unquoted asterisk at beginning, two unquoted question marks at end)

For example, the CPE string cpe:2.3:a:apple:swiftnio_http\\/2:1.19.1:*:*:*:*:swift:*:* has a backslashes \ to escape a \ present in the version string.

This text should be interpreted literally as cpe:2.3:a:apple:swiftnio_http\/2:1.19.1:*:*:*:*:swift:*:*

For full information about when escaping is needed/used, see 5.3.2 Restrictions on attribute-value strings in the specification for detailed info.