5 tips for using the Rego language for Open Policy Agent (OPA)

Artikel von:

18. September 2020

0 Min. Lesezeit

Editor's note

This blog originally appeared on fugue.co. Fugue joined Snyk in 2022 and is a key component of Snyk IaC.

At Fugue, we’re pretty fond of Open Policy Agent (OPA), and we’ve written a lot of Rego code to keep cloud resources secure. So we’ve put together the most valuable lessons we’ve learned in the process. You can also use OPA and Rego languages to enable policy as code to automatically enforce coded policies.

1. De Morgan’s laws are your new best friends

If I had to restrict this list to just one item, it would be this one. De Morgan’s laws are two transformation rules. Applied to Rego, this gives you a way to transform a rule while keeping the behavior exactly the same.

They’re commonly written as:

¬(A ∧ B) ⇔ ¬A ∨ ¬B
¬(A ∨ B) ⇔ ¬A ∧ ¬B

Or, in code:

!(a && b) <=> !a || !b
!(a || b) <=> !a && !b

But perhaps an example makes things more clear. The following two statements are equivalent and are an example of applying the first law:

This is not a pizza with ham and mushroom.
This pizza does not have ham or does not have mushroom.

They allow you to transform AND clauses into OR clauses and vice versa, all while keeping the code behave exactly the same way. Wait a minute – you may ask – if these transformations keep my code behaving the way it is, then why are they useful?

The key idea is that Rego, as a query language, is heavily based towards disjunctions (or statements). For example; checking if someone in the group is qualified to cut a pizza can be written as:

default allow = false

allow {
  input.people[_].profession == "mathematician"
}

However, how do we check if everyone in the group is qualified to cut pizzas? We can, uhm, create a group of the qualified people first? And then check the length?

qualified_pizza_cutters[name] = person {
  person = input.people[name]
  person.profession == "mathematician"
}

default allow = false

allow {
  count(qualified_pizza_cutters) == count(input.people)
}

This is not great. Aside from being hard to read, a statement like this is hard for a query planner to optimize, since we need to count these sets; even if we could already bail out when we see a single minor in the group.

This is where De Morgan’s laws come in – it tells us we always write queries as a disjunction by negating some parts. We simply introduce an auxiliary statement to negate the original statement:

cannot_cut_pizza {
  input.people[_].profession != "mathematician"
}

And then writing the policy is easy:

default allow = false

allow {
  not cannot_cut_pizza
}

There we go! Efficient and easy to read.

Conclusion: If a Rego policy seems hard to write, always consider if the negated policy is easier to write, and then negate it again! Yay logic!

2. or or any?

As we just saw, Rego is just easier to read and write if you have your OR-statements at the top and your AND-statements nicely one after the other inside the rule bodies. However, as programmers, it usually doesn’t take that long before you come across that one case where the idiomatic thing doesn’t work well.

In Rego, this often comes up when a rule body already has a good number of queries, e.g.:

probably_a_pizza {
  input.shape == 'circle'
  input.ingredients[_] == 'cheese'
  input.ingredients[_] == 'tomato_sauce'
}

What if we’ll want to allow pizza slices as well? We can either duplicate the whole body:

probably_a_pizza {
  input.shape == 'circle'
  input.ingredients[_] == 'cheese'
  input.ingredients[_] == 'tomato_sauce'
} {
  input.shape == 'circular_sector'
  input.ingredients[_] == 'cheese'
  input.ingredients[_] == 'tomato_sauce'
}

Well, that’s not great. In most cases we can introduce an auxiliary rule:

has_a_pizza_shape {
  input.shape == 'circle'
} {
  input.shape == 'circular_sector'
}

probably_a_pizza {
  has_a_pizza_shape
  input.ingredients[_] == 'cheese'
  input.ingredients[_] == 'tomato_sauce'
}

That works fine in this case. Sometimes however, we’ll have so many auxiliary rules it gets hard to even come up with a good name for them! In those rare cases, we can just pretend there’s an OR statement in Rego called any:

probably_a_pizza {
  any([input.shape == 'circle', input.shape == 'circular_sector'])
  input.ingredients[_] == 'cheese'
  input.ingredients[_] == 'tomato_sauce'
}

This is a perfectly readable way of writing this. When used in moderation, any can make your life a lot easier.

3. any and all comprehensions

Continuing down the same path, any and all are extremly useful in more places, expecially around list comprehensions – these two features make a combination that’s really greater than the sum of their parts.

If we look back at the first example, here’s how we can use all together with a list comprehension to express the same policy:

allow {
  all([person.profession == "mathematician" | person = input.people[_]])
}

I think this is really just as readable as the version we constructed following De Morgan’s laws. It usually depends on context which one is more idiomatic.

Is it easier to reason about the negated policy? Then we’ll follow the De Morgan transformation.
Or is it easier to think about the things we’re about to police as a list or collection? Then a list comprehension is appropriate.

4. use walk to browse entire document trees

Rego queries always terminate – which is great, but it comes at a price: it’s not, in general, possible to write recursive rules.

Why are recursive rules necessary? Well, they’re very useful for dealing with recursive input documents, such as deeply nested trees. Fortunately OPA provides some builtins to help here.

For example, we could be writing a policy to monitor discussion threads:

[
  {
    "author": "Alice",
    "message": "I think New York is at least as good as Italy for pizza",
    "replies": [
      {
        "author": "Bob",
        "message": "It's because of the water!",
        "replies": [
          {
            "Author": "Alice",
            "message": "I've heard that before but I'm not convinced.",
            "replies": []
          }
        ]
      },
    ]
  },
  {
    "author": "Charlie",
    "message": "You are a horrible person.",
    "replies": []
  }
]

How do we obtain the messages in Rego? We can try something like:

messages[message] {
  message = input[_].message
} {
  message = input[_].replies[_].message
}
  message = input[_].replies[_].replies[_].message
}

This works fine for finite nesting levels, but we can’t always assume that and besides, it’s not super easy to read. Instead, you can use walk to recursively walk over the tree and obtain every .message field:

messages[message] {
  [_, value] := walk(walk_input)
  message = value.message
}

Once we have those messages in a regular Rego rule, we can write an idiomatic policy to deny discussion threads containing swear words.

deny {
  swear_words := {"hawaii", "pineapple"}
  contains(lower(messages[_]), swear_words[_])
}

On real-life example where this comes up is Terraform child modules: here we use walk in Regula so we can flatten the child modules in a way that’s not very different from flattening the messages in the discussion thread.

5. Dynamically loading packages and rules

One of the interesting design aspects of Rego is how the whole “universe” of rules and data is nested under the same document. Whether you’re accessing user input, data from JSON or YAML files, or rules from your packages, it’s all just references:

input.people[0].order
data.recipes.pizza.pepperoni
data.policies.strict.allow

Because we can enumerate over references in Rego using variables or wildcards, we can dynamically look at the set of packages that was loaded. This allows you build an architecture where you simply add new policy files, and summarize all of them in a report.

Placing the policies in separate packages also has the additional benefit that they can be debugged separately and that there is no chance for rule names to conflict. We’ll place all of them in the data.policies namespace. The first one uses De Morgan’s Laws to check if that there is cheese on a pizza:

package policies.cheese

contains_cheese {
  input.toppings[_] == "cheese"
}

deny["must contain cheese"] {
  not contains_cheese
}

The second policy verifies that two people won’t start fighting over the last slice:

package policies.slices

deny["must be able to share with two people"] {
  input.slices % 2 != 0
}

Once we’ve agreed on a namespace to put our policies in, it’s relatively straightforward to summarize them: we grab every deny in data.policies and pretty-print a human-readable report.

package summary

report = msg {
  denies := {m | m := data.policies[_].deny[_]}
  msg := sprintf("%d policies failed\n%s", [
    count(denies),
    concat("\n", [sprintf("- %s", [m]) | denies[m]]),
  ])
}

Together with some jq to extract the actual values, this can produce a nice looking report right in your terminal. We’ll use the -I to read input from stdin here.

$ opa eval -I 'data.summary.report' -d . --format json | \
    jq -r '.result | .[] | .expressions | .[] | .value'
{"toppings": ["crab", "tomato sauce"], "slices": 6}
1 policies failed
- must contain cheese

There’s many more advanced tricks that can be done by accessing packages as data – for example passing manipulating input into a different format depending on whether a package declares input_version = 1 or input_version = 2. All within Rego!