I would be willing to bet that for every hundred readers of this post, there will be more than 10 SIEM or XDR tools being used amongst you.
This is problematic because it adds a large amount of friction to the sharing detection content.
What is needed is a detection rule standard. A way to write a query once and use it everywhere, on any system.
That is the ambitious aim of the Sigma project.
Sigma has been around for about three years. Though in the last year it has seen a marked increase in adoption.
Sigma is for log files what Snort is for network traffic and YARA is for files.
The Sigma Rule format has been designed to accommodate for conversion into other query languages to match the systems on which they will be used. For example, a Sigma Rule could be translated into Splunk’s query language, SPL, or Google Chronicle’s YARA-L 2.0 queries.
Before jumping into the detail, let me first explain the foundations of Sigma Rules.
The YAML Structure
Sigma Rules are structured in a YAML format.
As I go through this tutorial, I recommend having some example rules open to cross-reference this theory to real world implementations.
The Sigma project maintains a body of public rules in the core Sigma repository here.
When writing Sigma Rules, I find it helpful to think of them as five core sections:
- Metadata
- General info: Descriptive information about the rule
- Versioning: How to handle and track changes to the rule
- Tags: how to add further classification to the rule
- Log sources: describes the log data on which the detection is meant to be applied to
- Detection: the logic to identify something in the log
The log information (4) and detections (5) are the most important part, and what I’ll focus on in this tutorial.
1. Metadata
1.1 General info
The main purpose of these properties is to help manage and maintain rules. I won’t explain the obvious ones here and will instead link to the official specification. My advice would be to take a look at some existing Sigma Rules to see some example values used for these properties;
id
(required)title
(required)status
(optional)description
(optional)author
(optional)license
(optional)references
(optional)level
(optional)
Here’s a example of a partial Sigma rule incorporation ONLY the properties above;
id: 929a690e-bef0-4204-a928-ef5e620d6fcc
title: Test rule
status: experimental
description: Just a demo for the blog
author: Someone
license: MIT
references:
- https://www.example.com
level: high
1.2 Versioning
I showed that id
is a required property for a Sigma Rule.
Sigma rules should be identified by a globally unique identifier using this id
attribute. For this purpose randomly generated UUIDs (version 4) are recommended but not mandatory.
To help version Sigma Rules when rules are updated the date
(the date the rule was created) and modified
(the date it was last modified) values can be used in the format YYYY/MM/DD
.
For example;
id: 929a690e-bef0-4204-a928-ef5e620d6fcc
title: Test rule
date: 2020/01/01
modified: 2022/01/01
Rule id
s can change for the following reasons:
- Major version changes of the rule. E.g. a different rule logic.
- Derivation of a new rule from an existing or refinement of a rule in a way that both are kept active.
- Merge of rules
To being able to keep track on relationships between detections, Sigma rules may also contain references to related rule id
s along with the description of the relationships. For example
id: 929a690e-bef0-4204-a928-ef5e620d6fcc
related:
- id: 08fbc97d-0a2f-491c-ae21-8ffcfd3174e9
type: derived
- id: 929a690e-bef0-4204-a928-ef5e620d6fcc
type: obsoletes
Here the current rule (929a690e-bef0-4204-a928-ef5e620d6fcc
) is derived from another rule (08fbc97d-0a2f-491c-ae21-8ffcfd3174e9
) and replaced (obsoletes) a rule (929a690e-bef0-4204-a928-ef5e620d6fcc
).
This can also be used to link rules. For example, you might link two rules that have some other relationship, in which case you just use a different related.id
property.
That said, it’s probably better to use Rule Collections for this purpose.
1.3 Tags
A Sigma rule can be categorised with tags. Tags can be anything, but Sigma ships with some predefined tags which I’d recommend you use where possible (or sending a pull request / creating an issue to the Sigma repo with proposals for new tags).
A tag ultimately provides more contextual information about the rule.
Tags are namespaced, (a .
is used as separator, e.g. attack.t1059.001
, here attack
is the namespace).
The three predefined Sigma tags have the following namespaces;
attack.
: MITRE ATT&CKcar.
: MITRE Cyber Analytics Repositorytlp.
Traffic Light Protocolcve.
NVD CVEsdetection.
Indicates the type of rule
Example;
tags:
- attack.defense_evasion
- attack.t1027
- attack.execution
- attack.t1059.001
- car.2016-04-005
- tlp.amber
Of course, you can use any tag you with, e.g.
tags:
- david.tag
Though, categorising on the standard tags will allow for easier discoverability of related rules.
2. Log sources
The logsource
attribute describes the log data on which the detection is meant to be applied to and has a number of sub-attributes to allow it to be very specific;
category
:- used to select all log files written by a certain group of products, like firewalls or web server logs.
- e.g.
firewall
,web
,antivirus
- e.g.
- used to select all log files written by a certain group of products, like firewalls or web server logs.
product
:- used to select all log outputs of a certain product, e.g. all Windows Eventlog types including “Security”, “System”, “Application” and the new log types like “AppLocker” and “Windows Defender”.
- e.g.
windows
,apache
- e.g.
- used to select all log outputs of a certain product, e.g. all Windows Eventlog types including “Security”, “System”, “Application” and the new log types like “AppLocker” and “Windows Defender”.
service
:- used to select only a subset of a product’s logs, like the “sshd” on Linux or the “Security” Eventlog on Windows systems.
- e.g.
sshd
,applocker
- e.g.
- used to select only a subset of a product’s logs, like the “sshd” on Linux or the “Security” Eventlog on Windows systems.
definition
- used to describe the logsource, including some information on the log verbosity level or configurations that have to be applied.
- e.g.
INFO
,DEBUG
- e.g.
- used to describe the logsource, including some information on the log verbosity level or configurations that have to be applied.
Read the Log Source specification here.
A list of standard logsources can be viewed here.
You’ll see these in most of the public rules in the SigmaHQ repository. For example;
logsource:
product: aws
service: cloudtrail
Sigma compiles the logsource
sub-attributes using AND
statements, so for the last example I am saying the logsource
must be; product:aws
AND service: cloudtrail
.
You can also only pass one logsource
attribute per rule, and thus by definition Sigma Rules are specific to logsource
s.
You may also see a definition field within logsource description. This can also provide more information about how to onboard the log data source correctly so it can be detected by the Sigma Rule.
For example;
logsource:
product: windows
category: ps_script
definition: Script Block Logging must be enabled
Above, the author is noting that unless Script Block Logging is enabled in Windows Powershell scripts, the rule won’t work properly. The definition
property is purely informational.
Generally, you’ll want to use a pre-existing logsource. Of course, if a log source does not exist for your log type, most likely when custom products, logs, or field naming is used, then you can specify a non-standard logsource using the logic defined.
Sigma does not restrict what a Sigma logsource can be defined as, meaning you can use Sigma for just about any kind of logsource within your SIEM.
With the use of Pipelines, you can specify granular field-mapping, and logsource-mapping to ensure that your Sigma rules get converted correctly to the intended SIEM format downstream.
In short, this means fields in the detection part of the Sigma Rule map to those used in the SIEM.
For now, park that thought though. I will revisit pipelines once we’ve covered detections. The takeaway here should be; you should first refer to the standard logsource
s, and use these where possible.
A shortcut is also to take a look at existing Sigma Rules to see if a rule exists with a logsource you’re using.
3. Detections
Inside a Sigma Rules, the detection
attribute is where the actual logic for when the rule will be triggered.
Here is the detection specification defined by Sigma… it’s long and fairly complex, I’ll try and simplify here as best I can.
The detection
section contains a set of sub-attributes that represent searches on log data and how they should be evaluated:
- Selections: What you actually wish to select/search from the log data
- Conditions: How should the Selections or filters are to be evaluated
Let me show you how these work using a few examples.
Selections
Each Sigma detection is categorised and split up into groups called selections. Each selection contains the definition for the detection itself.
At the most basic lists are the simplest way to define a Search Identifier. They contain strings that are applied to the full log message and are linked with a logical OR
statement.
logsource:
product: windows
service: system
detection:
keywords:
- 4728
- 4729
- 4730
condition: keywords
In this example, keywords
matches on 4728
OR
4729
OR
4730
.
The naming of the field keywords
under the detection and selection fields in this example is arbitrary.
For example, this detection would work in the same way
logsource:
product: windows
service: system
detection:
selection:
- 4728
- 4729
- 4730
condition: selection
However, you should use a standard for the value when creating your own Sigma rules. In other words, make the selection names descriptive and obvious to the reader.
The rule above is simple, it will simply search for either of the two strings in Windows logs in any field used by the SIEM. This is, of course, very inefficient.
That’s where we can search by field list. For example, lets narrow the last rule down to only search for the specified values in a field called EventID
…
logsource:
product: windows
service: system
detection:
selection:
EventID:
- 4728
- 4729
- 4730
condition: selection
For example, now only the EventID field in downstream tools will be searched for the values 4728
OR
4729
OR
4730
.
You can also use key value selections to search for single fields. For example,
logsource:
product: windows
service: system
detection:
selection:
EventID: 6416
condition: selection
You can also pass multiple fields to be joined with an AND
statement;
logsource:
product: windows
service: system
detection:
selection:
EventID: 6416
ClassName: DiskDrive
condition: selection
In this example I am matching on any events where the EventID=6416
AND
the ClassName=DiskDrive
. I could add more field/values to filter on, if needed.
You can also combine these concepts together.
logsource:
product: windows
service: system
detection:
selection:
EventLog: Security
EventID:
- 517
- 1102
condition: selection
Here the selection matches on Eventlog=Security
AND
( EventID=517
OR
EventID=1102
).
You should be aware there are special field values that can be used for values.
- An empty value is defined with
''
- A non-existent value is defined with
null
To demonstrate…
logsource:
product: windows
service: system
detection:
selection:
EventLog: Security
EventID:
- ''
condition: selection
Would mean EventID
field should be present, but have no value.
logsource:
product: windows
service: system
detection:
selection:
EventLog: Security
EventID:
- null
condition: selection
Would mean the EventID
should not be present.
You can also use wildcards in the value string. For example using *
to replace an unbounded length wildcard;
logsource:
product: windows
service: system
detection:
selection:
EventLog: Security
EventID: 5*
condition: selection
Would match on any EventID starting with 5
(e.g. 500
, 5121
, etc.)
?
is used to replace a single mandatory character, for example;
logsource:
product: windows
service: system
detection:
selection:
FileName: prog?.exe
condition: selection
Would match on any FileName where the ?
had a value (e.g. prog1.exe
, prog2.exe
, proga.exe
)
The backslash character \
is used for escaping of wildcards *
and ?
as well as the backslash character itself. Escaping of the backslash is necessary if it is followed by a wildcard depending on the desired result.
For example, if you wanted to match on a value ?
or *
, you’d need to use a \
to show that it should not be used as a wildcard,
logsource:
product: windows
service: system
detection:
keywords:
- question\?
- star\*
condition: selection
Would match on the literal values question? or star*
.
Similarly, if a backslash is to be matched on (e.g. a Windows path) it also needs to be escaped.
logsource:
product: windows
service: system
detection:
selection:
FilePath: \\example.exe
condition: selection
Would match on the value \example.exe
.
I won’t cover all escapes here, you can read more about them here.
So far I’ve only covered one selection in a rule. In fact, a Sigma Rule detection can have many selections. That’s where conditions come into play.
For example;
detection:
selection_1:
CommandLine:
- DumpCreds
- invoke-mimikatz
selection_2:
CommandLine:
- rpc
- token
- crypto
selection_3:
CommandLine:
- bitcoin
condition: selection_1 OR selection_2 OR selection_3
Here I use three selection. The condition
property defines the combinations of selections that should trigger a detection.
In this case, if one of the three Search Identifiers (selection_1 OR selection_2 OR selection_3
) is true, then a detection should be triggered.
Conditions
Conditions can be defined using a variety of Operators, all of which I will cover in the next tutorial.
There are a range of condition
s that can be defined for detection
.
There are currently eight operators that can prove useful in tuning the extensibility and accuracy a rule by describing how the selections should be considered. We’ve already covered two…
- Exact match
- e.g.
selection_1
- Logical AND/OR
- e.g.
selection_1 OR selection_2
- e.g.
selection_1 AND selection_2
x/all of them
Before I showed the example condition: selection_1 OR selection_2 OR selection_3
.
This could actually be mapped in a simpler way…
detection:
selection_1:
CommandLine:
- DumpCreds
- invoke-mimikatz
selection_2:
CommandLine:
- rpc
- token
- crypto
selection_3:
CommandLine:
- bitcoin
condition: 1 of them
To denote at least 1 of the selections should be true (selection_1 OR selection_2 OR selection_3
).
You can ensure more than selection matches in the format x of them
, e.g. 2 of them
, 3 of them
, etc.
x/all of selection
Instead of all of them
or them
you can be more granular and use specific selections.
This condition type type can be very useful in overriding the default behaviour of a Lists (where items are considered with OR
operators). For example,
detection:
selection_1:
- EVILSERVICE
- svchost.exe -n evil
selection_2:
- token
- rpc
- crypto
condition: 2 of selection_1 and selection_2
Here both values (2) for search_identifier_1
must be true and 1 value from search_identifier_2 must be true. Here the condition
is overriding the default list behaviour for search_identifier_1
.
*
wildcards (i.e. any number of characters) at arbitrary positions in the condition pattern can also be used. For example;
logsource:
product: windows
service: system
detection:
selection_1:
EventLog: Security
EventID:
- 517
- 1102
selection_2:
EventID: 6416
ClassName: DiskDrive
keywords_filter:
- error
- failure
condition: 1 of selection* and not keywords_*
Here either selection_1 OR selection_2
must be true, however these events must not contain error OR failure
(keywords_filter).
Negation with not
Conditions can be especially useful for filtering use-cases which is where the not
operator comes in handy. Take this example;
detection:
search_identifier_1:
EventID: 4738
search_identifier_2:
PasswordLastSet: null
condition: search_identifier_1 and not search_identifier_2
Here I am using the and not
condition to say the log line must contain EventID
EQUALS 4738
but not PasswordLastSet
EQUALS null
.
Using Parenthesis
Parenthesis can be used to add more complex logic to the expression. The part of the condition in parenthesis will be considered first. For example;
As condition
s need to become more complex, using brackets (parenthesis) can offer additional options to the other Condition options specified.
detection:
keywords_1:
- DumpCreds
- invoke-mimikatz
keywords_2:
- rpc
- token
- crypto
keywords_3:
- bitcoin
condition: (keywords_1 and keywords_2) or (keywords_2 and keywords_3)
In this example, if any value from keywords_1 and keywords_2
OR keywords_2 and keywords_3
being seen in the same log line will trigger a detection.
Bracket are considered with the highest order of operation by downstream Sigma tooling.
Dealing with false positive detections
Once you deploy a rule to one of your security tools analysts will also start to discover some incorrect detections.
You can try to avoid false positive detections by tuning the rule to ignore known erroneous triggers, for example…
detection:
selection:
- 'rm /var/log/syslog'
- 'rm -r /var/log/syslog'
- 'rm -f /var/log/syslog'
- 'rm -rf /var/log/syslog'
- 'mv /var/log/syslog'
- ' >/var/log/syslog'
- ' > /var/log/syslog'
false_positives:
- '/syslog.'
condition: selection and not false_positives
However, it’s not always possible to account for all reasons false positives occur. This is where the informational falsepositive
field can be used to describe a list of known false positives which can occur from a detection.
falsepositives:
- PIM (Privileged Identity Management) generates this event each time 'eligible role' is enabled.
- Legitimate administration activities
When the rule is triggered, the falsepositives
information inside the rule can help an analyst triaging alerts as to how they proceed (or if they follow up on it at all).
Dealing with detections
When a rule is deployed, it is only a matter of time before it triggers a detection. When that happens, the analyst handling the detection event might want to start a deeper investigation.
The fields
attribute inside a Sigma Rule can help an analyst decide the next steps by defining a log fields that could be interesting in further analysis of the event.
fields:
- CommandLine
- ParentCommandLine
Value modifiers
- Transformation Modifier: transform values into different values. Furthermore, this type of modifier is also able to change the logical operation between values.
- Type Modifier: change the type of a value. The value itself might also be changed by such a modifier, but the main purpose is to tell the backend that a value should be handled differently by the backend (e.g. it should be treated as regular expression when the re modifier is used). More on backends in a later tutorial.
Transformation Modifiers
A full list of Transformation Modifiers are available to view here.
Lets examine them one-by-one.
contains
contains
which puts *
wildcards around the values, such that the value is matched anywhere in the field. Here is an example;
detection:
selection:
CommandLine|contains:
- DumpCreds
- invoke-mimikatz
condition: selection
This is the same as using:
detection:
selection:
CommandLine:
- *DumpCreds*
- *invoke-mimikatz*
condition: selection
It’s also important to point out how modifiers are written – they are appended after the field name with a |
character.
In the first example, the selection matches the CommandLine
field in the log data and uses the Transformation Modifier contains
in order to check if the keywords DumpCreds
OR invoke-mimikatz
are present in the field.
For example this detection would match on CommandLine="Has detected DumpCreds"
OR CommandLine="DumpCreds"
OR CommandLine="now invoke-mimikatz"
If I did not use the contains
Value Modifier like so;
detection:
selection:
CommandLine:
- DumpCreds
- invoke-mimikatz
condition: selection
Now only an exact match for the CommandLine
field would match. This would be either CommandLine="DumpCreds"
OR CommandLine="invoke-mimikatz"
. CommandLine="Has detected DumpCreds"
would not match.
startswith / ends with
You might want to use a more specific Transformation Modifier, like startswith
OR endswith
.
detection:
selection:
CommandLine|startswith:
- DumpCreds
- invoke-mimikatz
condition: selection
Here the CommandLine
field in the log line must start with either DumpCreds
or invoke-mimikatz
.
all
The all
Transformation Modifier can also prove very useful on occasion. As noted, Lists of values are treated by default using the logical OR
statement. This modifier changes this to AND
.
detection:
selection:
CommandLine|all:
- DumpCreds
- invoke-mimikatz
condition: selection
In this example, I am now saying the CommandLine
field in the log line must have both DumpCreds
AND invoke-mimikatz
in its value.
windash
Will replace all -
occurrences with /
detection:
selection:
CommandLine|windash|contains:
- '-s '
- '-f '
- '-t '
- '-m '
- '-a '
- '-u '
Here -s
would be searched as /s
in the log. The same is true for other items in the list.
As you can see above, Modifiers can also be chained using a |
, (e.g. fieldname|mod1|mod2:
). The value modifiers are applied in the given order to the value.
This example logically reads the CommandLine
field should be searched for one of the list items after the dash is replaced with a forward slash, and the search should consider that the field contains the converted value.
base64
The base64
denotes the value will be base64 encoded in the log.
detection:
selection_destination:
Destination|base64:
- 'WriteProcessMemory'
- 'This program cannot be run in DOS mode'
- 'This program must be run under Win32'
condition: selection_destination
Is essentially looking for either;
WriteProcessMemory
that will beV3JpdGVQcm9jZXNzTWVtb3J5
in the log OR,This program cannot be run in DOS mode
that will beVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4gRE9TIG1vZGU
in the log OR,This program must be run under Win32
that will beVGhpcyBwcm9ncmFtIG11c3QgYmUgcnVuIHVuZGVyIFdpbjMy
in the log
base64offset
If a value might appear somewhere in a base64-encoded value the representation might change depending on the position in the overall value. There are three variants for shifts by zero to two bytes and except the first and last byte the encoded values have a static part in the middle that can be recognized.
detection:
selection_destination:
Destination|base64offset|contains:
- 'WriteProcessMemory'
- 'This program cannot be run in DOS mode'
- 'This program must be run under Win32'
condition: selection_destination
Type Modifiers
Sometimes Transformation Modifiers do not quite suit what you are trying to achieve, particularly with more complex/varying values that need to be detected.
Currently only one type of Type Modifier exists for Regular Expressions (re
). Here is an example of it being used;
detection:
search_identifier_1:
- CommandLine|re: '\$PSHome\[\s*\d{1,3}\s*\]\s*\+\s*\$PSHome\['
- CommandLine|re: '\$ShellId\[\s*\d{1,3}\s*\]\s*\+\s*\$ShellId\['
- CommandLine|re: '\$env:Public\[\s*\d{1,3}\s*\]\s*\+\s*\$env:Public\['
condition: search_identifier_1
Here the log line CommandLine
field values must match at least one of the Regular Expressions defined.
You might be tempted to use the Regular Expressions Type Modifier a lot, though avoid it where possible (as it can create downstream conversion issues) this is because in many cases a Transformation Modifier is better supported during rule conversion.