OpenTelemetry and GitLab (failed experiment)

by Sebastien Mirolo on Mon, 9 Sep 2024

As Web traffic has spiked other the summer, the engineering team is looking to upgrade the logging infrastructure to have better visibility into "soft errors" - i.e. 502, 503, 504 and the like. The team has been more and more relying on a self-hosted GitLab instance for day-to-day activity, and has seen many new features around Monitoring coming out of GitLab. Soon the idea of shipping OpenTelemetry events to GitLab started to look good.

Installing OpenTelemetry

We recently migrated to Amazon Linux 2023. The expectation here is that we could just run a dnf install opentelemetry, but things turned a bit more complicated than that.

First, it does not seem there are no native OpenTelemetry packages on any Linux distribution. Second, AWS has its own version of the OpenTelemetry collector called ADOT, but this one doesn't seem to have been packaged on AL2023 either.

Since most of the interesting receivers are in the opentelemetry-collector-contrib repository, not the core opentelemetry collector repository, we will build and install the upstream repos instead of the AWS one. That's where we started to find out there are in fact pre-built .rpm available on GitHub. After poking around, it seems we don't need the core otel service when we are running otelcol-contrib, so we are only going to install that one.

Terminal
$ dnf install https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.109.0/otelcol-contrib_0.109.0_linux_amd64.rpm
...
Created symlink /etc/systemd/system/multi-user.target.wants/otelcol-contrib.service → /usr/lib/systemd/system/otelcol-contrib.service.
...
AttributeError: module 'libdnf.conf' has no attribute 'ConfigParser_substitute'

$ cat /usr/lib/systemd/system/otelcol-contrib.service
...
EnvironmentFile=/etc/otelcol-contrib/otelcol-contrib.conf
...
User=otelcol-contrib
Group=otelcol-contrib
...

$ cat /etc/otelcol-contrib/otelcol-contrib.conf
...
OTELCOL_OPTIONS="--config=/etc/otelcol-contrib/config.yaml"

So at this point, we are going to be modifying /etc/otelcol-contrib/config.yaml and see what happens.

Shipping OpenTelemetry events to Gitlab

Reading through Get started with monitoring your application in GitLab, it seems we can forward logs to GitLab - at least in version 17.4+ (You can find the version you are running by going to the "/help" URL path on your GitLab instance). Though on further investigation, the Logs feature is only available in the Ultimate edition, not the Enterprise Edition we are running.

We can none-the-less find an Error Tracking, Alerts and Incidents menu in the Monitor section of the GitLab instance. A few clicks through the help pages, we discover GitLab integrated Error Tracking is not available on self-hosted instances. You will need to install sentry alongside GitLab - which defies our purpose of reducing the number of services to manage.

GitLab can accept alerts from any source via a webhook receiver. We tried to export events as GitLab alerts through the otlphttp exporter.

/etc/otelcol-contrib/config.yaml
...
exporters:
  otlphttp:
    logs_endpoint: https://example.com:4318
    headers:
      Authorization: Bearer *****
      Content-Type: application/json
    encoding: json
    compression: none

At first we got a 400 HTTP status code. The problem here is the "body of the POST request is a payload either in binary-encoded Protobuf format or in JSON-encoded Protobuf format" (see OTLP/HTTP). Adding encoding: json and compression: none enabled the alert to be created on GitLab.

Unfortunately, the format of the JSON posted to the GitLab API is not at all what we were expecting.

HTTP Request posted to GitLab
{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "body": {
                "kvlistValue": {
                  "values": [
                    {
                      "key": "_HOSTNAME",
                      "value": {
                        "stringValue": "localhost.localdomain"
                      }
                    },
                    ...
                  ]
                }
              },
              "spanId": "",
              "traceId": "",
              "attributes": [
                {
                  "key": "gitlab_environment_name",
                  "value": {
                    "stringValue": "production"
                  }
                }
              ],
              "timeUnixNano": "1726516472972892000",
              "observedTimeUnixNano": "1726516611162060543"
            }
          ]
        }
      ]
    }
  ]
}

It does not seem possible to map fields within GitLab itself anymore (or is it just another edition issue?). Unfortunately we cannot do the transform within OpenTelemetry itself as the wrapper format around log events is fixed by the OTLP architecture.

GitLab can also be configured to receive Prometheus events, through "Alerts are expected to be formatted for a Prometheus webhook receiver." - with no clue what that means at this point. The alertmanager exporter is still in active development, and not available in the OpenTelemetry release we are using. The prometheus exporter doesn't have enough flexibility for us to use GitLab Prometheus Webhooks. The prometheusremotewrite exporter can be configured to talk to a GitLab Prometheus Webhook (through endpoint and headers), but it only supports metrics data types, with a caviat that "Non-cumulative monotonic, histogram, and summary OTLP metrics are dropped by this exporter."

By using a connector, we are able to convert log events into metrics, and by using a deltatocumulative transformer we can then trigger the GitLab Webhook endpoint (see following config).

/etc/otelcol-contrib/config.yaml
...
exporters:
  prometheusremotewrite:
    endpoint: https://gitlab.example.com/prometheus/alerts/notify.json
    headers:
      Authorization: Bearer *****
      Content-Type: application/json
    compression: none

connectors:
  count:

service:

  pipelines:

    metrics:
      receivers: [count]
      processors: [deltatocumulative]
      exporters: [prometheusremotewrite]

    logs:
      receivers: [journald]
      exporters: [count]

Unfortunately, we get a 400 HTTP error code with little help on what is incorrect. Analysis of the HTTP request headers shows that the prometheusremotewrite exporter is setting the encoding to snappy - which might have to do with protobuffer. Setting encoding on prometheusremotewrite is invalid, and compression: none seems to have no effect.

We could maybe build a small server around django-grpc-framework to investigate the HTTP request generated by the prometheusremotewrite exporter, but at this point it looks like there will be no way to connect the otelcol-contrib output directly to GitLab.

Transforming log events

On a side note, we successfully created attributes on log events.

At first the attributes processor looked like it would do the job, but since we are looking to produce a title attribute with a {username} authenticated successfully, we need a more powerful processor capable of concatenating strings. The transform processor will do the job. Finding reference documentation for OTTL statements wasn't has straightforward as expected, but it is here.

I couldn't find an extract from regex function. Since the transform processor "executes the conditions and statements against the incoming telemetry in the order specified in the config", we will make a copy and run a replace_pattern statement.

/etc/otelcol-contrib/config.yaml
processors:
  transform:
    error_mode: ignore
    log_statements:
     - context: log
       statements:
         - set(attributes["title"], body["MESSAGE"])
         - replace_pattern(attributes["title"], ".*authentication success;.*user=(?P.*)$", "$$username authenticated sucessfully")
         - set(attributes["hosts"], body["_HOSTNAME"])
         - set(attributes["gitlab_environment_name"], "production")

service:

  pipelines:

    logs:
      receivers: [journald]
      processors: [transform]
      exporters: [debug]

More to read

You might also like to read Shipping ssh login events through OpenTelemetry, Fast-tracking server errors to a log aggregator on S3, or Logging Docker containers output through journald to syslog.

More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS application hosting platform.

by Sebastien Mirolo on Mon, 9 Sep 2024


Receive news about DjaoDjin in your inbox.

Bring fully-featured SaaS products to production faster.

Follow us on