Xe - Blog - Contact - Gallery - Resume - Talks - Signal Boost - Feeds | GraphViz - When Then Zen

Prometheus and Aegis

A 10 minute read.

Last time in the christine dot website cinematic universe:

Unix sockets started to be used to grace the cluster. Things were at peace. Then, a realization came through:

Mara is hmm
<Mara> What about Prometheus? Doesn't it need a direct line of fire to the service to scrape metrics?

This could not do! Without observability the people of the Discord wouldn't have a livefeed of the infrastructure falling over! This cannot stand! Look, our hero takes action!

Cadey is percussive-maintenance
<Cadey> It will soon!

In order to help keep an eye on all of the services I run, I use Prometheus for collecting metrics. For an example of the kind of metrics I collect, see here (1). In the configuration that I have, Prometheus runs on a server in my apartment and reaches out to my other machines to scrape metrics over the network. This worked great when I had my major services listen over TCP, I could just point Prometheus at the backend port over my tunnel.

When I started using Unix sockets for hosting my services, this stopped working. It became very clear very quickly that I needed some kind of shim. This shim needed to do the following things:

The Go standard library has a tool for doing reverse proxying in the standard library: net/http/httputil#ReverseProxy. Maybe we could build something with this?

Mara is hmm
<Mara> The documentation seems to imply it will use the network by default. Wait, what's this Transport field?


type ReverseProxy struct {
  // ...

  // The transport used to perform proxy requests.
  // If nil, http.DefaultTransport is used.
  Transport http.RoundTripper

  // ...
}

Mara is hmm
<Mara> So a transport is a RoundTripper, which is a function that takes a request and returns a response somehow. It uses http.DefaultTransport by default, which reads from the network. So at a minimum we're gonna need:
  • a ReverseProxy
  • a Transport
  • a dialing function
    • Right?

Yep! Unix sockets can be used like normal sockets, so all you need is something like this:


func proxyToUnixSocket(w http.ResponseWriter, r *http.Request) {
  name := path.Base(r.URL.Path)

  fname := filepath.Join(*sockdir, name+".sock")
  _, err := os.Stat(fname)
  if os.IsNotExist(err) {
    http.NotFound(w, r)
    return
  }

  ts := &http.Transport{
    Dial: func(_, _ string) (net.Conn, error) {
      return net.Dial("unix", fname)
    },
    DisableKeepAlives: true,
  }

  rp := httputil.ReverseProxy{
    Director: func(req *http.Request) {
      req.URL.Scheme = "http"
      req.URL.Host = "aegis"
      req.URL.Path = "/metrics"
      req.URL.RawPath = "/metrics"
    },
    Transport: ts,
  }
  rp.ServeHTTP(w, r)
}

Mara is hmm
<Mara> So in this handler:


name := path.Base(r.URL.Path)

fname := filepath.Join(*sockdir, name+".sock")
_, err := os.Stat(fname)
if os.IsNotExist(err) {
  http.NotFound(w, r)
  return
}

ts := &http.Transport{
  Dial: func(_, _ string) (net.Conn, error) {
    return net.Dial("unix", fname)
  },
  DisableKeepAlives: true,
}

Mara is hmm
<Mara> You have the socket path built from the URL path, and then you return connections to that path ignoring what the HTTP stack thinks it should point to?

Yep. Then the rest is really just boilerplate:


package main

import (
  "flag"
  "log"
  "net"
  "net/http"
  "net/http/httputil"
  "os"
  "path"
  "path/filepath"
)

var (
  hostport = flag.String("hostport", "[::]:31337", "TCP host:port to listen on")
  sockdir  = flag.String("sockdir", "./run", "directory full of unix sockets to monitor")
)

func main() {
  flag.Parse()

  log.SetFlags(0)
  log.Printf("%s -> %s", *hostport, *sockdir)

  http.DefaultServeMux.HandleFunc("/", proxyToUnixSocket)

  log.Fatal(http.ListenAndServe(*hostport, nil))
}

Now all that's needed is to build a NixOS service out of this:


{ config, lib, pkgs, ... }:
let cfg = config.within.services.aegis;
in
with lib; {
  # Mara\ this describes all of the configuration options for Aegis.
  options.within.services.aegis = {
    enable = mkEnableOption "Activates Aegis (unix socket prometheus proxy)";

    # Mara\ This is the IPv6 host:port that the service should listen on.
    # It's IPv6 because this is $CURRENT_YEAR.
    hostport = mkOption {
      type = types.str;
      default = "[::1]:31337";
      description = "The host:port that aegis should listen for traffic on";
    };

    # Mara\ This is the folder full of unix sockets. In the previous post we
    # mentioned that the sockets should go somewhere like /tmp, however this
    # may be a poor life decision: 
    # https://lobste.rs/s/fqqsct/unix_domain_sockets_for_serving_http#c_g4ljpf
    sockdir = mkOption {
      type = types.str;
      default = "/srv/within/run";
      example = "/srv/within/run";
      description =
        "The folder that aegis will read from";
    };
  };

  # Mara\ The configuration that will arise from this module if it's enabled
  config = mkIf cfg.enable {
    # Mara\ Aegis has its own user account to keep things tidy. It doesn't need
    # root to run so we don't give it root.
    users.users.aegis = {
      createHome = true;
      description = "tulpa.dev/cadey/aegis";
      isSystemUser = true;
      group = "within";
      home = "/srv/within/aegis";
    };

    # Mara\ The systemd service that actually runs Aegis.
    systemd.services.aegis = {
      wantedBy = [ "multi-user.target" ];

      # Mara\ These correlate to the [Service] block in the systemd unit.
      serviceConfig = {
        User = "aegis";
        Group = "within";
        Restart = "on-failure";
        WorkingDirectory = "/srv/within/aegis";
        RestartSec = "30s";
      };

      # Mara\ When the service starts up, run this script.
      script = let aegis = pkgs.tulpa.dev.cadey.aegis;
      in ''
        exec ${aegis}/bin/aegis -sockdir="${cfg.sockdir}" -hostport="${cfg.hostport}"
      '';
    };
  };
}

Cadey is enby
<Cadey> Then I just flicked it on for a server of mine:


within.services.aegis = {
  enable = true;
  hostport = "[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705";
  sockdir = "/srv/within/run";
};

Cadey is enby
<Cadey> And then test it with curl:


$ curl http://[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705/printerfacts
# HELP printerfacts_hits Number of hits to various pages
# TYPE printerfacts_hits counter
printerfacts_hits{page="fact"} 15
printerfacts_hits{page="index"} 23
printerfacts_hits{page="not_found"} 17
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.06
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5296128
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1617458164.36
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 911777792

Cadey is aha
<Cadey> And there you go! Now we can make Prometheus point to this and we can save Christmas!

Mara is happy
<Mara> :D


This is another experiment in writing these kinds of posts in more of a Socratic method. I'm trying to strike a balance with a limited pool of stickers while I wait for more stickers/emoji to come in. Feedback is always welcome.

(1): These metrics are not perfect because of the level of caching that Cloudflare does for me.


This article was posted on M04 05 2021. Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.

Tags: prometheus o11y

This post was WebMentioned at the following URLs:

The art for Mara was drawn by Selicre.

The art for Cadey was drawn by ArtZora Studios.