November 27, 2023

Keepin' It Legal: How to Automate License Checks in Your CI Pipeline

Karim Shakirov
DevOps Engineer

Hey there DevOps wizards, code gurus, and IT maestros! Ever found yourself buried in a mountain of code dependencies? Worried about staying on the right side of open-source licenses? We've got you covered. Let's talk about how you can automate license checks in your CI pipelines. 🚀

Why Should You Care?

Like many of you, we're big fans of open-source components. But hey, we also want to play nice and respect everyone's rights. So we decided to auto-approve certain “safe” licenses like Apache 2 and MIT, and review any other license types we bump into “on the go,” making approvals or rejecting to the particular packages. Sound like you? Then read on!

The Setup: GitHub Actions

We're using GitHub Actions as our go-to CI tool. Don't worry, the process is straightforward—just a bit of Python and some GitHub magic. We're making a reusable workflow for our many microservices, and we're storing them in our GHA-Store repo.

Interested in scaling GitHub Actions? Subscribe to our newsletter (on the right); we've got more on that in upcoming posts.

Your First Step: Reusable Workflow File

Here's a snippet for initiating the reusable workflow:

on:
  workflow_call:
    inputs:
      inherit-inputs:
        required: false
        type: string


env:
  CHECKOUT_REF: ${{ fromJSON(inputs.inherit-inputs).ref || github.event.pull_request.head.sha }}

Nothing too crazy, right?

This sets up the workflow to be reusable and passes down inputs from the parent workflow to the child. As reusable workflow can't take all parent's inputs by default, we pack it into json and pass it as an argument to this workflow.

In CHECKOUT_REF you can find how we get the value back, I’ll show you the packaging at the end of this post, where we will call this workflow.

Let's Get to Work: The Main Job

1. Fetch the Source Code and License Config

Your job starts with fetching your repo and a separate repo containing your license configuration. Here's the code:

jobs:
  deps-lic-check:
    runs-on: ubuntu-latest
    steps:
      - name: Generate token
        id: generate_token
        uses: tibdex/github-app-token@v1
        with:
          app_id: ${{ secrets.APP_ID }}
          private_key: ${{ secrets.PRIVATE_KEY }}

      - name: Checkout current repo
        uses: actions/checkout@v4
        with:
          ref: ${{ env.CHECKOUT_REF }}
          fetch-depth: 1

      - name: Checkout configuration
        uses: actions/checkout@v3
        with:
          repository: perfectscale-io/gha-store # replace to your org/repo
          ref: v1
          token: ${{ steps.generate_token.outputs.token  }}
          path: gha-store
          fetch-depth: 1

Regarding "Checkout current repo" everything is simple, but pay attention to "ref". We use the to ensure that the PR will get Commit from PR because the default behavior on PR is to checkout ephemeral commits of merging PR to the target branch.

To access configuration, we need to checkout another repo (inside our org). To do so, we created a Github Application with a specific set of permissions.

So, as result, we have the next working directory structure:

pkg/
main.go
go.mod
go.sum
gha-store/

2. Setting Up Go and Vendor

If you're dealing with a Golang repo, you'll need to set up Go and Govendor like so:

      - uses: actions/setup-go@v4
        with:
          go-version: 'stable'

      - name: Setup vendor
        run: |
          go env -w GOPRIVATE="github.com/perfectscale-io" # replace to your org
          git config --global url." "
        env:
          TOKEN: x-access-token:${{steps.generate_token.outputs.token}}

Here we setup “go” and again reuse the token from the "Generate token" step.
If you don't have private dependencies, you can omit the "Setup vendor" step.

3. Generate Software Bill of Materials (SBOM)

Why SBOM? It makes it easier to list all your dependencies.
To do it with Golang we will use "CycloneDX/gh-gomod-generate-sbom@v2" from CycloneDX.

Note, we used OWASP CycloneDX as it is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction. It is managed by the CycloneDX Core Working Group, is backed by the OWASP Foundation

As an output, it will generate a JSON file with all our dependencies.
Take a look:

      - name: Generate SBOM
        uses: CycloneDX/gh-gomod-generate-sbom@v2
        with:
          version: v1
          args: app -licenses=true -assert-licenses=true -json=true -output sbom.json .

By using SBOM as a standard to get dependencies information, we can avoid further refactoring to support multiple languages and make our code simple.

The only thing left is to parse this SBOM and compare dependencies licenses with the allowed list.
To do it, we will run a simple Python script which we will run directly in GitHub actions step using "shell: python"

But before we start, let's look at our "allowed-licenses.yaml" (or "configuration") file:

---
# array of allowed licenses
allowed:
  - MIT
  - Apache-2.0
# array of package names to ignore during check
ignore:
  - github.com/golang/protobuf

Here we have 2 arrays:

  1. allowed - array ([]string) of allowed licenses
  2. ignore - array ([]string) of packages which we ignore and allow any license for them

In this configuration file we store in our gha-store repo, folder "helpers" and are able to update it as often as we want. Every new workflow we run will take an updated configuration.

4. Python Magic: Analyzing Licenses

Now, the real fun begins! We've written a Python script that does all the heavy lifting.

      - name: Analyze licenses
        shell: python
        run: |
          import sys
          import json
          import yaml
          def generate_markdown_table(non_allowed_licenses):
              table = "| License ID | Packages |\n"
              table += "|------------|----------|\n"
              for license_id, packages in non_allowed_licenses.items():
                  table += f"| {license_id} | {', '.join(packages)} |\n"
              return table
          def extract_licenses(json_data):
              license_package_mapping = {}
              for component in json_data['components']:
                  package_name = component['name']
                  licenses = component.get('licenses', [])
                  for lic in licenses:
                      license_id = lic['license']['id']
                      license_package_mapping.setdefault(
                          license_id, []).append(package_name)
              return license_package_mapping
          def check_licenses(license_package_mapping, allowed_licenses, ignored_packages):
              non_allowed_licenses = {}
              for license_id, packages in license_package_mapping.items():
                  if license_id not in allowed_licenses:
                      non_ignored_packages = [
                          pkg for pkg in packages if pkg not in ignored_packages]
                      if non_ignored_packages:
                          non_allowed_licenses[license_id] = non_ignored_packages
              return non_allowed_licenses
          json_data = json.load(open('sbom.json'))
          with open('gha-store/helpers/allowed-licenses.yaml', 'r') as f:
              yaml_data = yaml.safe_load(f)
          extracted_licenses = extract_licenses(json_data)
          allowed_licenses = yaml_data['allowed']
          ignored_packages = yaml_data['ignore']
          non_allowed_licenses = check_licenses(
              extracted_licenses, allowed_licenses, ignored_packages)
          if non_allowed_licenses:
              print("Non-allowed licenses and their packages:")
              for license_id, packages in non_allowed_licenses.items():
                  print(f"License: {license_id}, Packages: {', '.join(packages)}")
              markdown_table = generate_markdown_table(non_allowed_licenses)
              with open("result.md", "w") as f:
                  f.write("# Non-Allowed Licenses and Their Packages\n")
                  f.write(markdown_table)
              sys.exit(1)
          else:
              with open("result.md", "w") as f:
                  f.write("#No problems with Licenses\n")
              print("All licenses are allowed.")
              sys.exit(0)

Here we compare dependencies licenses from SBOM with the allowed licenses and log everything that doesn't have the proper license and is not ignored.
Also, we generate a markdown table with the "failed" packages.

5. Keeping It User-Friendly: GitHub Actions Summary

We want the output to be as developer-friendly as possible, so we use GitHub Actions Summary:

      - name: Update summary
        if: always()
        run: echo -e "$(cat result.md)" >> $GITHUB_STEP_SUMMARY

No magic, just GitHub Actions.

6. Don't Let It Slide: Slack Notifications

If anything goes south, you'll get a Slack notification. Trust me, you want to set this up!

      - name: Notify about deploy
        if: failure()
        uses: slackapi/slack-github-action@v1.23.0
        with:
          payload: |
            {"text": "License issue in ${{ github.event.repository.name }}\nRef: ${{ github.ref_name }}\nWorkflow url: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.GLOBAL_SLACK_DEPENDENCY_LICENSE }}
          SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK

To setup, please create a new Slack application and add it to your workspace. Then add a new WebHook and save it as a global secret, for example, "GLOBAL_SLACK_DEPENDENCY_LICENSE" as we do.
You can read more about it in Slack documentation: https://api.slack.com/messaging/webhooks

Call me ...

We finished our Reusable workflow, now we need to call it somehow:

# Perfectscale-Go-Worfklow.yaml

name: PSC Golang Workflow

on:
  push:
    branches:
      - 'main'
  pull_request:
  workflow_dispatch:
    inputs:
      ref:
        type: string
        description: 'Optional: Commit, branch etc.'
        required: false

permissions:
  checks: write
  id-token: write

jobs:
  deps-lic-check:
    uses: perfectscale-io/gha-store/.github/workflows/_deps_lic_check.yaml@v2
    with:
      inherit-inputs: ${{ toJSON(inputs) }}
    secrets: inherit

As promised example of how to pass parent's inputs to child workflow. We just call this Reusable workflow to check our licenses.

Bringing It All Together

And there you have it, folks! Automating license checks is easier than you thought. With just a bit of Python and GitHub Actions, you're all set to ensure you're not stepping on any legal landmines.
So, go ahead, give it a try, and let us know how it goes. Happy coding! 🎉

Full reusable workflow:

on:
  workflow_call:
    inputs:
      inherit-inputs:
        required: false
        type: string


env:
  CHECKOUT_REF: ${{ fromJSON(inputs.inherit-inputs).ref || github.event.pull_request.head.sha }}

jobs:
  deps-lic-check:
    runs-on: ubuntu-latest
    steps:
      - name: Generate token
        id: generate_token
        uses: tibdex/github-app-token@v1
        with:
          app_id: ${{ secrets.APP_ID }}
          private_key: ${{ secrets.PRIVATE_KEY }}

      - uses: actions/checkout@v4
        with:
          ref: ${{ env.CHECKOUT_REF }}
          fetch-depth: 1

      - uses: actions/checkout@v3
        with:
          repository: perfectscale-io/gha-store
          ref: v2
          token: ${{ steps.generate_token.outputs.token  }}
          path: gha-store
          fetch-depth: 1

      - uses: actions/setup-go@v4
        with:
          go-version: 'stable'

      - name: Setup vendor
        run: |
          go env -w GOPRIVATE="github.com/perfectscale-io"
          git config --global url." "
        env:
          TOKEN: x-access-token:${{steps.generate_token.outputs.token}}

      - name: Generate SBOM
        uses: CycloneDX/gh-gomod-generate-sbom@v2
        with:
          version: v1
          args: app -licenses=true -assert-licenses=true -json=true -output sbom.json .

      - name: Analyze licenses
        shell: python
        run: |
          import sys
          import json
          import yaml
          def generate_markdown_table(non_allowed_licenses):
              table = "| License ID | Packages |\n"
              table += "|------------|----------|\n"
              for license_id, packages in non_allowed_licenses.items():
                  table += f"| {license_id} | {', '.join(packages)} |\n"
              return table
          def extract_licenses(json_data):
              license_package_mapping = {}
              for component in json_data['components']:
                  package_name = component['name']
                  licenses = component.get('licenses', [])
                  for lic in licenses:
                      license_id = lic['license']['id']
                      license_package_mapping.setdefault(
                          license_id, []).append(package_name)
              return license_package_mapping
          def check_licenses(license_package_mapping, allowed_licenses, ignored_packages):
              non_allowed_licenses = {}
              for license_id, packages in license_package_mapping.items():
                  if license_id not in allowed_licenses:
                      non_ignored_packages = [
                          pkg for pkg in packages if pkg not in ignored_packages]
                      if non_ignored_packages:
                          non_allowed_licenses[license_id] = non_ignored_packages
              return non_allowed_licenses
          json_data = json.load(open('sbom.json'))
          # yaml_data = yaml.load(open('allowed-licenses.yaml'))
          with open('gha-store/helpers/allowed-licenses.yaml', 'r') as f:
              yaml_data = yaml.safe_load(f)
          extracted_licenses = extract_licenses(json_data)
          allowed_licenses = yaml_data['allowed']
          ignored_packages = yaml_data['ignore']
          non_allowed_licenses = check_licenses(
              extracted_licenses, allowed_licenses, ignored_packages)
          if non_allowed_licenses:
              print("Non-allowed licenses and their packages:")
              for license_id, packages in non_allowed_licenses.items():
                  print(f"License: {license_id}, Packages: {', '.join(packages)}")
              markdown_table = generate_markdown_table(non_allowed_licenses)
              with open("result.md", "w") as f:
                  f.write("# Non-Allowed Licenses and Their Packages\n")
                  f.write(markdown_table)
              sys.exit(1)
          else:
              with open("result.md", "w") as f:
                  f.write("#No problems with Licenses\n")
              print("All licenses are allowed.")
              sys.exit(0)

      - name: Update summary
        if: always()
        run: echo -e "$(cat result.md)" >> $GITHUB_STEP_SUMMARY

      - name: Notify about deploy
        if: failure()
        uses: slackapi/slack-github-action@v1.23.0
        with:
          payload: |
            {"text": "License issue in ${{ github.event.repository.name }}\nRef: ${{ github.ref_name }}\nWorkflow url: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.GLOBAL_SLACK_DEPENDENCY_LICENSE }}
          SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK

Common Open source software (OSS) libraries licenses

Open source software (OSS) libraries typically use a variety of licenses, but some are more common than others. Always read and understand the terms of a license before using or contributing to a project.

LicenseTypePermissive/ RestrictiveUsage Examples Obligations
MIT LicensePermissivePermissiveCommercial and non-commercial projects, proprietary software.Include the original license and copyright notice in any copy or substantial portion of the software.
GNU GPL (v2 or v3)CopyleftRestrictiveFree and open source projects, may impact commercial use and proprietary software.Distribute derivative works under the same license. Provide access to the source code of the software when distributing binaries.
Apache LicensePermissivePermissiveCommercial and non-commercial projects, widely used in corporate environments.Include the original copyright, license, and notice in any copy or substantial portion of the software.
BSD License (2 or 3-Clause)PermissivePermissiveCommercial and non-commercial projects, allows for use in proprietary software.Include the original copyright, license, and disclaimer in any copy or substantial portion of the software.
Creative CommonsVariousVariesNot typically used for software, but for creative works like images, music, etc. Usage varies based on specific Creative Commons license chosen.Varied obligations depending on the specific Creative Commons license (e.g., attribution, non-commercial use, share-alike).
Mozilla Public License (MPL)CopyleftPartially RestrictiveFree and open source projects, allows for proprietary derivative works. Mozilla products often use this license.Distribute any derivative works under the MPL, provide access to the source code of derivative works, include an original copy of the MPL with any substantial portions.

PerfectScale Lettermark

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Subscribe to our newsletter
Are you concerned about OSS license compliance? Check out this step-by-step guide on building an automated workflow to automate your OSS license checks.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.