Docker Init

Syazwan <[email protected]>

tl;dr

docker run --init

What is it?

In Unix-based computer operating systems, init is the first process started during booting of the computer system. Init is a daemon process that continues running until the system is shut down.

Init is typically assigned process identifier 1.

—Wikipedia

I don't care

  • Zombies use resources. If kernel process table fills, it will be impossible to create new processes
  • "I only run 1 service per container!" But are you sure the service doesn't spawn processes?
  • How many have written custom init scripts?

When to use?

  • Proper signal handling from host machine (SIGTERM spam anyone?)
  • Long-running service that potentially spawns many children (zombies)
  • More than 1 service (!) (restarts)
  • Exit code forwarding

But, bash can adopt orphans!!!

  • Gotcha: single cmd will be exec'd
  •                   
    
    docker run --rm ubuntu:xenial bash -c 'ps' PID TTY TIME CMD 1 pts/0 00:00:00 ps
    docker run --rm ubuntu:xenial bash -c 'echo; ps' PID TTY TIME CMD 1 ? 00:00:00 bash 8 ? 00:00:00 ps
  • No signal forwarding
  • Unreliable exit code (eg. restart policy)

Roll-up your own (and miss plenty of edge cases)

              
                #!/bin/bash

                # Exit on error
                set -e
                # Let functions inherit ERR traps which otherwise they won't
                # Equivalent to -o errtrace
                set -E

                cleanup() {
                ...
                }
                trap cleanup EXIT
                trap 'exit 1' HUP INT QUIT TERM ERR
              
            

phusion wrote 400+ lines of init script:
baseimage-docker/image/bin/my_init

              
#!/usr/bin/python3 -u
# -*- coding: utf-8 -*-

import argparse
import errno
import json
import os
import os.path
import re
import signal
import stat
import sys
import time

ENV_INIT_DIRECTORY = os.environ.get('ENV_INIT_DIRECTORY', '/etc/my_init.d')

KILL_PROCESS_TIMEOUT = int(os.environ.get('KILL_PROCESS_TIMEOUT', 30))
KILL_ALL_PROCESSES_TIMEOUT = int(os.environ.get('KILL_ALL_PROCESSES_TIMEOUT', 30))

LOG_LEVEL_ERROR = 1
LOG_LEVEL_WARN = 1
LOG_LEVEL_INFO = 2
LOG_LEVEL_DEBUG = 3

SHENV_NAME_WHITELIST_REGEX = re.compile('\W')

log_level = None

terminated_child_processes = {}

_find_unsafe = re.compile(r'[^\w@%+=:,./-]').search


class AlarmException(Exception):
    pass


def error(message):
    if log_level >= LOG_LEVEL_ERROR:
        sys.stderr.write("*** %s\n" % message)


def warn(message):
    if log_level >= LOG_LEVEL_WARN:
        sys.stderr.write("*** %s\n" % message)


def info(message):
    if log_level >= LOG_LEVEL_INFO:
        sys.stderr.write("*** %s\n" % message)


def debug(message):
    if log_level >= LOG_LEVEL_DEBUG:
        sys.stderr.write("*** %s\n" % message)


def ignore_signals_and_raise_keyboard_interrupt(signame):
    signal.signal(signal.SIGTERM, signal.SIG_IGN)
    signal.signal(signal.SIGINT, signal.SIG_IGN)
    raise KeyboardInterrupt(signame)


def raise_alarm_exception():
    raise AlarmException('Alarm')


def listdir(path):
    try:
        result = os.stat(path)
    except OSError:
        return []
    if stat.S_ISDIR(result.st_mode):
        return sorted(os.listdir(path))
    else:
        return []


def is_exe(path):
    try:
        return os.path.isfile(path) and os.access(path, os.X_OK)
    except OSError:
        return False


def import_envvars(clear_existing_environment=True, override_existing_environment=True):
    if not os.path.exists("/etc/container_environment"):
        return
    new_env = {}
    for envfile in listdir("/etc/container_environment"):
        name = os.path.basename(envfile)
        with open("/etc/container_environment/" + envfile, "r") as f:
            # Text files often end with a trailing newline, which we
            # don't want to include in the env variable value. See
            # https://github.com/phusion/baseimage-docker/pull/49
            value = re.sub('\n\Z', '', f.read())
        new_env[name] = value
    if clear_existing_environment:
        os.environ.clear()
    for name, value in new_env.items():
        if override_existing_environment or name not in os.environ:
            os.environ[name] = value


def export_envvars(to_dir=True):
    if not os.path.exists("/etc/container_environment"):
        return
    shell_dump = ""
    for name, value in os.environ.items():
        if name in ['HOME', 'USER', 'GROUP', 'UID', 'GID', 'SHELL']:
            continue
        if to_dir:
            with open("/etc/container_environment/" + name, "w") as f:
                f.write(value)
        shell_dump += "export " + sanitize_shenvname(name) + "=" + shquote(value) + "\n"
    with open("/etc/container_environment.sh", "w") as f:
        f.write(shell_dump)
    with open("/etc/container_environment.json", "w") as f:
        f.write(json.dumps(dict(os.environ)))


def shquote(s):
    ""Return a shell-escaped version of the string *s*.""
    if not s:
        return "''"
    if _find_unsafe(s) is None:
        return s

    # use single quotes, and put single quotes into double quotes
    # the string $'b is then quoted as '$'"'"'b'
    return "'" + s.replace("'", "'\"'\"'") + "'"


def sanitize_shenvname(s):
    ""Return string with [0-9a-zA-Z_] characters""
    return re.sub(SHENV_NAME_WHITELIST_REGEX, "_", s)


# Waits for the child process with the given PID, while at the same time
# reaping any other child processes that have exited (e.g. adopted child
# processes that have terminated).

def waitpid_reap_other_children(pid):
    global terminated_child_processes

    status = terminated_child_processes.get(pid)
    if status:
        # A previous call to waitpid_reap_other_children(),
        # with an argument not equal to the current argument,
        # already waited for this process. Return the status
        # that was obtained back then.
        del terminated_child_processes[pid]
        return status

    done = False
    status = None
    while not done:
        try:
            # https://github.com/phusion/baseimage-docker/issues/151#issuecomment-92660569
            this_pid, status = os.waitpid(pid, os.WNOHANG)
            if this_pid == 0:
                this_pid, status = os.waitpid(-1, 0)
            if this_pid == pid:
                done = True
            else:
                # Save status for later.
                terminated_child_processes[this_pid] = status
        except OSError as e:
            if e.errno == errno.ECHILD or e.errno == errno.ESRCH:
                return None
            else:
                raise
    return status


def stop_child_process(name, pid, signo=signal.SIGTERM, time_limit=KILL_PROCESS_TIMEOUT):
    info("Shutting down %s (PID %d)..." % (name, pid))
    try:
        os.kill(pid, signo)
    except OSError:
        pass
    signal.alarm(time_limit)
    try:
        try:
            waitpid_reap_other_children(pid)
        except OSError:
            pass
    except AlarmException:
        warn("%s (PID %d) did not shut down in time. Forcing it to exit." % (name, pid))
        try:
            os.kill(pid, signal.SIGKILL)
        except OSError:
            pass
        try:
          ATUS(status)))
        sys.exit(1)


def run_command_killable_and_import_envvars(*argv):
    run_command_killable(*argv)
    import_envvars()
    export_envvars(False)


def kill_all_procespath.join(ENV_INIT_DIRECTORY, name)
        if is_exe(filename):
            info("Running %s..." % filename)
            run_command_killable_and_import_envvars(filename)

    # Run /etc/rc.local.
    if is_exe("/etc/rc.local"):
        info("Running llable(filename)


def start_runit():
    info("Booting runit daemon...")
    pid = os.spawnl(os.P_NOWAIT, "/usr/bin/runsvdir", "/usr/bin/runsvdir",
                    "-P", "/etc/service")
    info("Runit started as PID %d" % pid)
    return pid


def wait_for_runit_or_interrupt(pid):
    status = waitpid_reap_other_children(pid)
    return (True, status)


def shutdown_runit_services(quiet=False):
    if not quiet:
        debug("Begin shutting down runit services...")
    os.system("/usr/bin/sv -w %d force-stop /etc/service/* > /dev/null" % KILL_PROCESS_TIMEOUT)


def wait_for_runit_services():
    debug("Waiting for runit services to exit...")
    done = False
    while not done:              info("Runit exited with status %d" % exit_status)
        else:
            info("Running %s..." % " ".join(argstdown_scripts()
            shutdown_runit_services()
            if not runit_exited:
                stop_child_process("runit daemon", runit_pid)
            wait_for_runit_services()
            run_post_shutdown_scripts()

# Parse options.
parser = argparse.ArgumentParser(description='Initialize the system.')
parser.add_argument('main_command', metavar='MAIN_COMMAND', type=str, nargs='*',
                    help='The main command t system aborted.")
    exit(2)
finally:
    if args.kill_all_on_exit:
        kill_all_processes(KILL_ALL_PROCESSES_TIMEOUT)"')))")")")'"'
              
            

Just Stop.

Init options

  • supervisord
  • monit
  • runit
  • tini
  • dumb-init
  • s6

tini

https://github.com/krallin/tini

- Docker engine was using grimes until late 2016

- Docker CE with tini officially released early 2017 as part of v1.13 (but not running by default!)
See PRs #26061 & #28037

- Images without tini will work with tini without any code change. Zero config. So use it.

tini included

docker/engine/blob/master/Dockerfile#L157


              FROM base AS tini
              RUN apt-get update && apt-get install -y cmake vim-common
              COPY hack/dockerfile/install/install.sh ./install.sh
              ENV INSTALL_BINARY_NAME=tini
              COPY hack/dockerfile/install/$INSTALL_BINARY_NAME.installer ./
              RUN PREFIX=/build ./install.sh $INSTALL_BINARY_NAME
              ...
              COPY --from=tini /build/ /usr/local/bin/
            
              
              ~ docker run --rm -it alpine sh
              / # ps
              PID   USER     TIME   COMMAND
                  1 root       0:00 sh
                  8 root       0:00 ps
              
              ~ docker run --rm -it --init alpine sh
              / # ps
              PID   USER     TIME   COMMAND
                  1 root       0:00 /dev/init -- sh
                  8 root       0:00 sh
                  9 root       0:00 ps
              / #
              
              
            

init in compose requires 2.2

              
              version: '2.2'
              services:
                web-1:
                  image: alpine:latest
                  init: true

                web-2:
                  image: alpine:latest
                  init: /usr/libexec/docker-init
              
              
            

Make it default in case users forget to add init config

              
              ENV TINI_VERSION v0.18.0
              RUN set -x \
                  && curl -fSL "https://github.com/krallin/tini/releases/download/$TINI_VERSION/tini" -o /usr/local/bin/tini \
                  && curl -fSL "https://github.com/krallin/tini/releases/download/$TINI_VERSION/tini.asc" -o /usr/local/bin/tini.asc \
                  && export GNUPGHOME="$(mktemp -d)" \
                  && gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys 6380DC428747F6C393FEACA59A84159D7001A4E5 \
                  && gpg --batch --verify /usr/local/bin/tini.asc /usr/local/bin/tini \
                  && rm -r "$GNUPGHOME" /usr/local/bin/tini.asc \
                  && chmod +x /usr/local/bin/tini \
                  && tini -h

              ENTRYPOINT ["/usr/local/bin/tini"]
              
            

dumb-init

https://github.com/Yelp/dumb-init

              
                # Installing
                # via apt
                apt install dumb-init

                # via pip
                pip install dumb-init

                # download binary during build
                RUN wget -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_amd64
                RUN chmod +x /usr/local/bin/dumb-init

                # Using
                ENTRYPOINT ["/usr/bin/dumb-init", "--"]
              
            

tini or dumb-init?

  • compiled against glibc or musl, both < 1MB
  • need signal rewrite? dumb-init
  • need subreaping? tini
  • not sure? Any of them will do!

s6-overlay

https://github.com/just-containers/s6-overlay

Stages

  1. Startup of s6
  2. User's files:
    • Fix ownership & perms: /etc/fix-attrs.d
    • Exec init scripts: /etc/cont-init.d
    • Services: /etc/services.d
  3. Shutdown & cleanup: /etc/cont-finish.d
    (with SIGTERM followed by SIGKILL after grace period)

About 3MB uncompressed

S6

              
                FROM alpine:3.8
                ADD https://github.com/just-containers/s6-overlay/releases/download/v1.21.8.0/s6-overlay-amd64.tar.gz /tmp/
                RUN tar zxvf /tmp/s6-overlay-amd64.tar.gz -C /

                ENTRYPOINT ["/init"]
              
              
                $ docker build -t s6base .
              
            
              
                $ docker run --rm -it s6base sh
              
              
                [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
                [s6-init] ensuring user provided files have correct perms...exited 0.
                [fix-attrs.d] applying ownership & permissions fixes...
                [fix-attrs.d] done.
                [cont-init.d] executing container initialization scripts...
                [cont-init.d] done.
                [services.d] starting services
                [services.d] done.
                / # ^C
                / # [cmd] sh exited 130
                [cont-finish.d] executing container finish scripts...
                [cont-finish.d] done.
                [s6-finish] waiting for services.
                [s6-finish] syncing disks.
                [s6-finish] sending all processes the TERM signal.
                [s6-finish] sending all processes the KILL signal and exiting.
              
            

Example use-case

              
                docker/etc/
                ├── cont-init.d
                │   ├── 001-apache-modules
                │   └── 001-disable-opcache-for-dev
                ├── cont-finish.d
                │   └── cleanup
                ├── fix-attrs.d
                │   └── www
                └── services.d
                    └── php-fpm
                        └── run
              
            

cont-init.d/001-apache-modules

              
                #!/usr/bin/with-contenv sh

                a2enmod rewrite proxy proxy_fcgi proxy_http setenvif > /dev/null
                a2enconf php7.2-fpm > /dev/null
              
            

cont-init.d/001-disable-opcache-for-dev

              
                #!/usr/bin/with-contenv sh

                if [ "$MY_ENV" = "development" ]; then
                  sed -i /etc/php/7.2/fpm/php.ini -e 's/opcache.enable=1/opcache.enable=0/'
                fi
              
            

cont-finish.d/cleanup

              
                #!/usr/bin/with-contenv sh

                echo "Cleaning up..."
              
            

fix-attrs.d/www

              
                /path recurse account file_perm dir_perm
              
              
                /var/www/html/ true www-data,1001:1001 0644 0755
              
            

Recursively set permission to www-data or fallback to 1001:1001. Files are set 0644, folders 0755.

services.d/php-fpm/run

              
                #!/usr/bin/execlineb -P
                php-fpm7.2 -F
              
            

No env is passed. Starts PHP-FPM in foreground.

services.d/php-fpm/finish

              
                #!/usr/bin/execlineb -S1
                if { s6-test ${1} -ne 0 }
                if { s6-test ${1} -ne 256 }

                s6-svscanctl -t /var/run/s6/services
              
            

Terminate all services in /var/run/s6/services/
if php-fpm crashed

Other features

Drop privileges

              
                #!/usr/bin/execlineb -P
                s6-setuidgid www-data
                nginx -g "daemon off;"
              
            

À la gosu, su-exec, et al.
(check out other s6-* bins)

Container env

              
                #!/usr/bin/with-contenv sh
              
            

Use with-contenv helper or set S6_KEEP_ENV=1

Don't change permission

              
                S6_READ_ONLY_ROOT=1
              
            

Without this, S6 will
chown -R root:root /etc/{cont-init.d,cont-finish.d,fix-attrs.d,services.d}/
With this, S6 will copy those folders to /var/run/s6/etc and leave your mounted init scripts untouched

execline

  • An execline script is a single argv, made of a chain of programs designed to perform their action then exec() into the next one
  • execlineb: launcher that parses text file, converts it to argv then executes into that argv
  • simple, which in turn provides better portability and performance than sh
  • :( size limit — argv
  • :( no signal handling

...and lots more

S6_LOGGING
S6_BEHAVIOUR_IF_STAGE2_FAILS
...

Thanks