One of the features of Fedora 17 is the /usr merge, put forward by Harald Hoyer and Kay Sievers[1]. In the time since this feature has been proposed repetitive discussions took place all over the various Free Software communities, and usually the same questions were asked: what the reasons behind this feature were, and whether it makes sense to adopt the same scheme for distribution XYZ, too.
Especially in the Non-Fedora world it appears to be socially unacceptable to actually have a look at the Fedora feature page (where many of the questions are already brought up and answered) which is very unfortunate. To improve the situation I spent some time today to summarize the reasons for the /usr merge independently. I'd hence like to direct you to this new page I put up which tries to summarize the reasons for this, with an emphasis on the compatibility point of view:
Note that even though this page is in the systemd wiki, what it covers is mostly orthogonal to systemd. systemd supports both systems with a merged /usr and with a split /usr, and the /usr merge should be interesting for non-systemd distributions as well.
Primarily I put this together to have a nice place to point all those folks who continue to write me annoyed emails, even though I am actually not even working on all of this...
Enjoy the read!
Footnotes:
[1] And not actually by me, I am just a supportive spectator and am not doing any work on it. Unfortunately some tech press folks created the false impression I was behind this. But credit where credit is due, this is all Harald's and Kay's work.
posted at: 22:29 | path: /projects | permanent link to this entry | 16 comments
Last October we published a wishlist for plumbing related features we'd like to see added to the Linux kernel. Three months later it's time to publish a short update, and explain what has been implemented in the kernel, what people have started working on, and what's still missing.
The full, updated list is available on Google Docs.
In general, I must say that the list turned out to be a great success. It shows how awesome the Open Source community is: Just ask nicely and there's a good chance they'll fulfill your wishes! Thank you very much, Linux community!
We'd like to thank everybody who worked on any of the features on that list: Lucas De Marchi, Andi Kleen, Dan Ballard, Li Zefan, Kirill A. Shutemov, Davidlohr Bueso, Cong Wang, Lennart Poettering, Kay Sievers.
Of the items on the list 5 have been fully implemented and are already part of a released kernel, or already merged for inclusion for the next kernels being released.
For 4 further items patches have been posted, and I am hoping they'll get merged eventually. Davidlohr, Wang, Zefan, Kirill, it would be great if you'd continue working on your patches, as we think they are following the right approach[1] even if there was some opposition to them on LKML. So, please keep pushing to solve the outstanding issues and thanks for your work so far!
Footnotes
[1] Yes, I still believe that tmpfs quota should be implemented via resource limits, as everything else wouldn't work, as we don't want to implement complex and fragile userspace infrastructure to racily upload complex quota data for all current and future UIDs ever used on the system into each tmpfs mount point at mount time.
posted at: 21:26 | path: /projects | permanent link to this entry | 5 comments
Here's the twelfth installment of my ongoing series on systemd for Administrators:
One of the core features of Unix systems is the idea of privilege separation between the different components of the OS. Many system services run under their own user IDs thus limiting what they can do, and hence the impact they may have on the OS in case they get exploited.
This kind of privilege separation only provides very basic protection however, since in general system services run this way can still do at least as much as a normal local users, though not as much as root. For security purposes it is however very interesting to limit even further what services can do, and shut them off a couple of things that normal users are allowed to do.
A great way to limit the impact of services is by employing MAC technologies such as SELinux. If you are interested to secure down your server, running SELinux is a very good idea. systemd enables developers and administrators to apply additional restrictions to local services independently of a MAC. Thus, regardless whether you are able to make use of SELinux you may still enforce certain security limits on your services.
In this iteration of the series we want to focus on a couple of these security features of systemd and how to make use of them in your services. These features take advantage of a couple of Linux-specific technologies that have been available in the kernel for a long time, but never have been exposed in a widely usable fashion. These systemd features have been designed to be as easy to use as possible, in order to make them attractive to administrators and upstream developers:
All options described here are documented in systemd's man pages, notably systemd.exec(5). Please consult these man pages for further details.
All these options are available on all systemd systems, regardless if SELinux or any other MAC is enabled, or not.
All these options are relatively cheap, so if in doubt use them. Even if you might think that your service doesn't write to /tmp and hence enabling PrivateTmp=yes (as described below) might not be necessary, due to today's complex software it's still beneficial to enable this feature, simply because libraries you link to (and plug-ins to those libraries) which you do not control might need temporary files after all. Example: you never know what kind of NSS module your local installation has enabled, and what that NSS module does with /tmp.
These options are hopefully interesting both for administrators to secure their local systems, and for upstream developers to ship their services secure by default. We strongly encourage upstream developers to consider using these options by default in their upstream service units. They are very easy to make use of and have major benefits for security.
A very simple but powerful configuration option you may use in systemd service definitions is PrivateNetwork=:
... [Service] ExecStart=... PrivateNetwork=yes ...
With this simple switch a service and all the processes it consists of are entirely disconnected from any kind of networking. Network interfaces became unavailable to the processes, the only one they'll see is the loopback device "lo", but it is isolated from the real host loopback. This is a very powerful protection from network attacks.
Caveat: Some services require the network to be operational. Of course, nobody would consider using PrivateNetwork=yes on a network-facing service such as Apache. However even for non-network-facing services network support might be necessary and not always obvious. Example: if the local system is configured for an LDAP-based user database doing glibc name lookups with calls such as getpwnam() might end up resulting in network access. That said, even in those cases it is more often than not OK to use PrivateNetwork=yes since user IDs of system service users are required to be resolvable even without any network around. That means as long as the only user IDs your service needs to resolve are below the magic 1000 boundary using PrivateNetwork=yes should be OK.
Internally, this feature makes use of network namespaces of the kernel. If enabled a new network namespace is opened and only the loopback device configured in it.
Another very simple but powerful configuration switch is PrivateTmp=:
... [Service] ExecStart=... PrivateTmp=yes ...
If enabled this option will ensure that the /tmp directory the service will see is private and isolated from the host system's /tmp. /tmp traditionally has been a shared space for all local services and users. Over the years it has been a major source of security problems for a multitude of services. Symlink attacks and DoS vulnerabilities due to guessable /tmp temporary files are common. By isolating the service's /tmp from the rest of the host, such vulnerabilities become moot.
For Fedora 17 a feature has been accepted in order to enable this option across a large number of services.
Caveat: Some services actually misuse /tmp as a location for IPC sockets and other communication primitives, even though this is almost always a vulnerability (simply because if you use it for communication you need guessable names, and guessable names make your code vulnerable to DoS and symlink attacks) and /run is the much safer replacement for this, simply because it is not a location writable to unprivileged processes. For example, X11 places it's communication sockets below /tmp (which is actually secure -- though still not ideal -- in this exception since it does so in a safe subdirectory which is created at early boot.) Services which need to communicate via such communication primitives in /tmp are no candidates for PrivateTmp=. Thankfully these days only very few services misusing /tmp like this remain.
Internally, this feature makes use of file system namespaces of the kernel. If enabled a new file system namespace is opened inheritng most of the host hierarchy with the exception of /tmp.
With the ReadOnlyDirectories= and InaccessibleDirectories= options it is possible to make the specified directories inaccessible for writing resp. both reading and writing to the service:
... [Service] ExecStart=... InaccessibleDirectories=/home ReadOnlyDirectories=/var ...
With these two configuration lines the whole tree below /home becomes inaccessible to the service (i.e. the directory will appear empty and with 000 access mode), and the tree below /var becomes read-only.
Caveat: Note that ReadOnlyDirectories= currently is not recursively applied to submounts of the specified directories (i.e. mounts below /var in the example above stay writable). This is likely to get fixed soon.
Internally, this is also implemented based on file system namspaces.
Another very powerful security option in systemd is CapabilityBoundingSet= which allows to limit in a relatively fine grained fashion which kernel capabilities a service started retains:
... [Service] ExecStart=... CapabilityBoundingSet=CAP_CHOWN CAP_KILL ...
In the example above only the CAP_CHOWN and CAP_KILL capabilities are retained by the service, and the service and any processes it might create have no chance to ever acquire any other capabilities again, not even via setuid binaries. The list of currently defined capabilities is available in capabilities(7). Unfortunately some of the defined capabilities are overly generic (such as CAP_SYS_ADMIN), however they are still a very useful tool, in particular for services that otherwise run with full root privileges.
To identify precisely which capabilities are necessary for a service to run cleanly is not always easy and requires a bit of testing. To simplify this process a bit, it is possible to blacklist certain capabilities that are definitely not needed instead of whitelisting all that might be needed. Example: the CAP_SYS_PTRACE is a particularly powerful and security relevant capability needed for the implementation of debuggers, since it allows introspecting and manipulating any local process on the system. A service like Apache obviously has no business in being a debugger for other processes, hence it is safe to remove the capability from it:
... [Service] ExecStart=... CapabilityBoundingSet=~CAP_SYS_PTRACE ...
The ~ character the value assignment here is prefixed with inverts the meaning of the option: instead of listing all capabalities the service will retain you may list the ones it will not retain.
Caveat: Some services might react confused if certain capabilities are made unavailable to them. Thus when determining the right set of capabilities to keep around you need to do this carefully, and it might be a good idea to talk to the upstream maintainers since they should know best which operations a service might need to run successfully.
Caveat 2: Capabilities are not a magic wand. You probably want to combine them and use them in conjunction with other security options in order to make them truly useful.
To easily check which processes on your system retain which capabilities use the pscap tool from the libcap-ng-utils package.
Making use of systemd's CapabilityBoundingSet= option is often a simple, discoverable and cheap replacement for patching all system daemons individually to control the capability bounding set on their own.
Resource Limits may be used to apply certain security limits on services being run. Primarily, resource limits are useful for resource control (as the name suggests...) not so much access control. However, two of them can be useful to disable certain OS features: RLIMIT_NPROC and RLIMIT_FSIZE may be used to disable forking and disable writing of any files with a size > 0:
... [Service] ExecStart=... LimitNPROC=1 LimitFSIZE=0 ...
Note that this will work only if the service in question drops privileges and runs under a (non-root) user ID of its own or drops the CAP_SYS_RESOURCE capability, for example via CapabilityBoundingSet= as discussed above. Without that a process could simply increase the resource limit again thus voiding any effect.
Caveat: LimitFSIZE= is pretty brutal. If the service attempts to write a file with a size > 0, it will immeidately be killed with the SIGXFSZ which unless caught terminates the process. Also, creating files with size 0 is still allowed, even if this option is used.
For more information on these and other resource limits, see setrlimit(2).
Devices nodes are an important interface to the kernel and its drivers. Since drivers tend to get much less testing and security checking than the core kernel they often are a major entry point for security hacks. systemd allows you to control access to devices individually for each service:
... [Service] ExecStart=... DeviceAllow=/dev/null rw ...
This will limit access to /dev/null and only this device node, disallowing access to any other device nodes.
The feature is implemented on top of the devices cgroup controller.
Besides the easy to use options above there are a number of other security relevant options available. However they usually require a bit of preparation in the service itself and hence are probably primarily useful for upstream developers. These options are RootDirectory= (to set up chroot() environments for a service) as well as User= and Group= to drop privileges to the specified user and group. These options are particularly useful to greatly simplify writing daemons, where all the complexities of securely dropping privileges can be left to systemd, and kept out of the daemons themselves.
If you are wondering why these options are not enabled by default: some of them simply break seamntics of traditional Unix, and to maintain compatibility we cannot enable them by default. e.g. since traditional Unix enforced that /tmp was a shared namespace, and processes could use it for IPC we cannot just go and turn that off globally, just because /tmp's role in IPC is now replaced by /run.
And that's it for now. If you are working on unit files for upstream or in your distribution, please consider using one or more of the options listed above. If you service is secure by default by taking advantage of these options this will help not only your users but also make the Internet a safer place.
posted at: 02:26 | path: /projects | permanent link to this entry | 6 comments
Arun put an awesome article up, detailing how PulseAudio compares to Android's AudioFlinger in terms of power consumption and suchlike. Suffice to say, PulseAudio rocks, but go and read the whole thing, it's worth it.
Apparently, AudioFlinger is a great choice if you want to shorten your battery life.
posted at: 16:31 | path: /projects | permanent link to this entry | 2 comments
In the past weeks we have been working on a major new addition to systemd that will hopefully positively change the Linux ecosystem in a number of ways. But see for yourself, check out the full explanation on what we have implemented on the design document we put up on Google Docs.
posted at: 16:28 | path: /projects | permanent link to this entry | 26 comments
At LinuxCon Europe/ELCE I had the chance to moderate the kernel hackers panel with Linus Torvalds, Alan Cox, Paul McKenney and Thomas Gleixner on stage. I like to believe it went quite well, but check it out for yourself, as a video recording is now available online:
For me personally I think the most notable topic covered was Control Groups, and the clarification that they are something that is needed even though their implementation right now is in many ways less than perfect. But in the end there is no reasonable way around it, and much like SMP, technology that complicates things substantially but is ultimately unavoidable.
Other videos from ELCE are online now, too.
posted at: 16:53 | path: /projects | permanent link to this entry | 3 comments
At the Kernel Summit in Prague last week Kay Sievers and I lead a session on developing shared userspace libraries, for kernel hackers. More and more userspace interfaces of the kernel (for example many which deal with storage, audio, resource management, security, file systems or a number of other subsystems) nowadays rely on a dedicated userspace component. As people who work primarily in the plumbing layer of the Linux OS we noticed over and over again that these libraries written by people who usually are at home on the kernel side of things make the same mistakes repeatedly, thus making life for the users of the libraries unnecessarily difficult. In our session we tried to point out a number of these things, and in particular places where the usual kernel hacking style translates badly into userspace shared library hacking. Our hope is that maybe a few kernel developers have a look at our list of recommendations and consider the points we are raising.
To make things easy we have put together an example skeleton library we dubbed libabc, whose README file includes all our points in terse form. It's available on kernel.org:
The git repository and the README.
This list of recommendations draws inspiration from David Zeuthen's and Ulrich Drepper's well known papers on the topic of writing shared libraries. In the README linked above we try to distill this wealth of information into a terse list of recommendations, with a couple of additions and with a strict focus on a kernel hacker background.
Please have a look, and even if you are not a kernel hacker there might be something useful to know in it, especially if you work on the lower layers of our stack.
If you have any questions or additions, just ping us, or comment below!
posted at: 01:46 | path: /projects | permanent link to this entry | 45 comments
If you make it to Prague the coming week for the LinuxCon/ELCE/GStreamer/Kernel Summit/... superconference, make sure not to miss:
All of that at the Clarion Hotel. See you in Prague!
posted at: 01:31 | path: /projects | permanent link to this entry | 0 comments
Two weeks ago we published a Plumber's Wishlist for Linux. So far, this has already created lively discussions in the community (as reported on LWN among others), and patches for a few of the items listed have already been posted (thanks a lot to those who worked on this, your contributions are much appreciated!).
We have now prepared a second version of the wish list. It includes a number of additions (tmpfs quota! hostname change notifications! and more!) and updates to the previous items, including links to patches, and references to other interesting material.
We hope to update this wishlist from time, so stay tuned!
And now, go and read the new wishlist!
posted at: 20:41 | path: /projects | permanent link to this entry | 3 comments
Nice one, Google suspended my Google+ account because I created it under, well, my name, which is "Lennart Poettering", and Google+ thinks that wasn't my name, even though it says so in my passport, and almost every document I own and I was never aware I had any other name. This is ricidulous. Google, give me my name back! This is a really uncool move.
posted at: 18:50 | path: /projects | permanent link to this entry | 18 comments
I am currently collecting questions for the kernel developer panel at LinuxCon in Prague. If there's something you'd like the panelists to respond to, please post it on the thread, and I'll see what I can do. Thank you!
posted at: 15:38 | path: /projects | permanent link to this entry | comments
Google announced today that they'll be shutting down Google Code Search in January. I am quite sure that this would be a massive loss for the Free Software community. The ability to learn from other people's code is a key idea of Free Software. There's simply no better way to do that than with a source code search engine. The day Google Code Search will be shut down will be a sad day for the Free Software community.
Of course, there are a couple of alternatives around, but they all have one thing in common: they, uh, don't even remotely compare to the completeness, performance and simplicity of the Google Code Search interface, and have serious usability issues. (For example: koders.com is really really slow, and splits up identifiers you search for at underscores, which kinda makes it useless for looking for almost any kind of code.)
I think it must be of genuine interest to the Free Software community to have a capable replacement for Google Code Search, for the day it is turned off. In fact, it probably should be something the various foundations which promote Free Software should be looking into, like the FSF or the Linux Foundation. There are very few better ways to get Free Software into the heads and minds of engineers than by examples -- examples consisting of real life code they can find with a source code search engine. I believe a source code search engine is probably among the best vehicles to promote Free Software towards engineers. In particular if it itself was Free Software (in contrast to Google Code Search).
Ideally, all software available on web sites like SourceForge, Freshmeat, or github should be indexed. But there's also a chance for distributions here: indexing the sources of all packages a distribution like Debian or Fedora include would be a great tool for developers. In fact, a distribution offering this functionality might benefit from such functionality, as it attracts developer interest in the distribution.
It's sad that Google Code Search will be gone soon. But maybe there's something positive in the bad news here, and a chance to create something better, more comprehensive, that is free, and promotes our ideals better than Google ever could. Maybe there's a chance here for the Open Source foundations, for the distributions and for the communities to create a better replacement!
posted at: 23:05 | path: /projects | permanent link to this entry | 15 comments
Hofkirche, Dresden, Saxony, Germany
Bastei, Saxon Switzerland, Saxony, Germany
Fürstenzug, Dresden, Saxony, Germany
Near California State Route 46, California, USA
Near Generals Highway, California, USA
Near Generals Highway, California, USA, a bit further down the road.
Parish Church in Poznan, Poland
posted at: 21:32 | path: /photos | permanent link to this entry | 6 comments
Here's a mail we just sent to LKML, for your consideration. Enjoy:
Subject: A Plumber’s Wish List for Linux We’d like to share our current wish list of plumbing layer features we are hoping to see implemented in the near future in the Linux kernel and associated tools. Some items we can implement on our own, others are not our area of expertise, and we will need help getting them implemented. Acknowledging that this wish list of ours only gets longer and not shorter, even though we have implemented a number of other features on our own in the previous years, we are posting this list here, in the hope to find some help. If you happen to be interested in working on something from this list or able to help out, we’d be delighted. Please ping us in case you need clarifications or more information on specific items. Thanks, Kay, Lennart, Harald, in the name of all the other plumbers An here’s the wish list, in no particular order: * (ioctl based?) interface to query and modify the label of a mounted FAT volume: A FAT labels is implemented as a hidden directory entry in the file system which need to be renamed when changing the file system label, this is impossible to do from userspace without unmounting. Hence we’d like to see a kernel interface that is available on the mounted file system mount point itself. Of course, bonus points if this new interface can be implemented for other file systems as well, and also covers fs UUIDs in addition to labels. * CPU modaliases in /sys/devices/system/cpu/cpuX/modalias: useful to allow module auto-loading of e.g. cpufreq drivers and KVM modules. Andy Kleen has a patch to create the alias file itself. CPU ‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct bus_type cpu’ needs to be introduced to allow proper CPU coldplug event replay at bootup. This is one of the last remaining places where automatic hardware-triggered module auto-loading is not available. And we’d like to see that fix to make numerous ugly userspace work-arounds to achieve the same go away. * expose CAP_LAST_CAP somehow in the running kernel at runtime: Userspace needs to know the highest valid capability of the running kernel, which right now cannot reliably be retrieved from header files only. The fact that this value cannot be detected properly right now creates various problems for libraries compiled on newer header files which are run on older kernels. They assume capabilities are available which actually aren’t. Specifically, libcap-ng claims that all running processes retain the higher capabilities in this case due to the “inverted” semantics of CapBnd in /proc/$PID/status. * export ‘struct device_type fb/fbcon’ of ‘struct class graphics’ Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other without the need to match on the device name. * allow changing argv[] of a process without mucking with environ[]: Something like setproctitle() or a prctl() would be ideal. Of course it is questionable if services like sendmail make use of this, but otoh for services which fork but do not immediately exec() another binary being able to rename this child processes in ps is of importance. * module-init-tools: provide a proper libmodprobe.so from module-init-tools: Early boot tools, installers, driver install disks want to access information about available modules to optimize bootup handling. * fork throttling mechanism as basic cgroup functionality that is available in all hierarchies independent of the controllers used: This is important to implement race-free killing of all members of a cgroup, so that cgroup member processes cannot fork faster then a cgroup supervisor process could kill them. This needs to be recursive, so that not only a cgroup but all its subgroups are covered as well. * proper cgroup-is-empty notification interface: The current call_usermodehelper() interface is an unefficient and an ugly hack. Tools would prefer anything more lightweight like a netlink, poll() or fanotify interface. * allow user xattrs to be set on files in the cgroupfs (and maybe procfs?) * simple, reliable and future-proof way to detect whether a specific pid is running in a CLONE_NEWPID container, i.e. not in the root PID namespace. Currently, there are available a few ugly hacks to detect this (for example a process wanting to know whether it is running in a PID namespace could just look for a PID 2 being around and named kthreadd which is a kernel thread only visible in the root namespace), however all these solutions encode information and expectations that better shouldn’t be encoded in a namespace test like this. This functionality is needed in particular since the removal of the the ns cgroup controller which provided the namespace membership information to user code. * allow making use of the “cpu” cgroup controller by default without breaking RT. Right now creating a cgroup in the “cpu” hierarchy that shall be able to take advantage of RT is impossible for the generic case since it needs an RT budget configured which is from a limited resource pool. What we want is the ability to create cgroups in “cpu” whose processes get an non-RT weight applied, but for RT take advantage of the parent’s RT budget. We want the separation of RT and non-RT budget assignment in the “cpu” hierarchy, because right now, you lose RT functionality in it unless you assign an RT budget. This issue severely limits the usefulness of “cpu” hierarchy on general purpose systems right now. * Add a timerslack cgroup controller, to allow increasing the timer slack of user session cgroups when the machine is idle. * An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or something like that), i.e. a way to attach sender cgroup membership to messages sent via AF_UNIX. This is useful in case services such as syslog shall be shared among various containers (or service cgroups), and the syslog implementation needs to be able to distinguish the sending cgroup in order to separate the logs on disk. Of course stm SCM_CREDENTIALS can be used to look up the PID of the sender followed by a check in /proc/$PID/cgroup, but that is necessarily racy, and actually a very real race in real life. * SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary control message should carry the process name as available in /proc/$PID/comm.
posted at: 01:22 | path: /projects | permanent link to this entry | 11 comments
Earlier today I gave a presentation at the Technical University Berlin about things you need to know, things you should expect and things you shouldn't expect when your are aspiring to become a successful Free Software Hacker.
I have put my slides up on Google Docs in case you are interested, either because you are the target audience (i.e. a university student) or because you need inspiration for a similar talk about the same topic.
The first two slides are in German language, so skip over them. The interesting bits are all in English. I hope it's quite comprehensive (though of course terse). Enjoy:
In case your feed reader/planet messes this up, here's the non-embedded version.
Oh, and thanks to everybody who reviewed and suggested additions to the the slides on +.
posted at: 22:05 | path: /projects | permanent link to this entry | 5 comments
PulseAudio 1.0 is out now. It's awesome. Get it while it is hot!
I'd like to thank Colin Guthrie and Arun Raghavan (and all the others involved) for getting this release out of the door!
posted at: 16:07 | path: /projects | permanent link to this entry | comments
Here's the eleventh installment of my ongoing series on systemd for Administrators:
In a previous episode of this series I covered how to convert a SysV init script to a systemd unit file. In this story I hope to explain how to convert inetd services into systemd units.
Let's start with a bit of background. inetd has a long tradition as one of the classic Unix services. As a superserver it listens on an Internet socket on behalf of another service and then activate that service on an incoming connection, thus implementing an on-demand socket activation system. This allowed Unix machines with limited resources to provide a large variety of services, without the need to run processes and invest resources for all of them all of the time. Over the years a number of independent implementations of inetd have been shipped on Linux distributions. The most prominent being the ones based on BSD inetd and xinetd. While inetd used to be installed on most distributions by default, it nowadays is used only for very few selected services and the common services are all run unconditionally at boot, primarily for (perceived) performance reasons.
One of the core feature of systemd (and Apple's launchd for the matter) is socket activation, a scheme pioneered by inetd, however back then with a different focus. Systemd-style socket activation focusses on local sockets (AF_UNIX), not so much Internet sockets (AF_INET), even though both are supported. And more importantly even, socket activation in systemd is not primarily about the on-demand aspect that was key in inetd, but more on increasing parallelization (socket activation allows starting clients and servers of the socket at the same time), simplicity (since the need to configure explicit dependencies between services is removed) and robustness (since services can be restarted or may crash without loss of connectivity of the socket). However, systemd can also activate services on-demand when connections are incoming, if configured that way.
Socket activation of any kind requires support in the services themselves. systemd provides a very simple interface that services may implement to provide socket activation, built around sd_listen_fds(). As such it is already a very minimal, simple scheme. However, the traditional inetd interface is even simpler. It allows passing only a single socket to the activated service: the socket fd is simply duplicated to STDIN and STDOUT of the process spawned, and that's already it. In order to provide compatibility systemd optionally offers the same interface to processes, thus taking advantage of the many services that already support inetd-style socket activation, but not yet systemd's native activation.
Before we continue with a concrete example, let's have a look at three different schemes to make use of socket activation:
The three schemes provide different performance characteristics. After the service finishes starting up the performance provided by the first two schemes is identical to a stand-alone service (i.e. one that is started without a super-server, without socket activation), since the listening socket is passed to the actual service, and code paths from then on are identical to those of a stand-alone service and all connections are processes exactly the same way as they are in a stand-alone service. On the other hand, performance of the third scheme is usually not as good: since for each connection a new service needs to be started the resource cost is much higher. However, it also has a number of advantages: for example client connections are better isolated and it is easier to develop services activated this way.
For systemd primarily the first scheme is in focus, however the other two schemes are supported as well. (In fact, the blog story I covered the necessary code changes for systemd-style socket activation in was about a service of the second type, i.e. CUPS). inetd primarily focusses on the third scheme, however the second scheme is supported too. (The first one isn't. Presumably due the focus on the third scheme inetd got its -- a bit unfair -- reputation for being "slow".)
So much about the background, let's cut to the beef now and show an inetd service can be integrated into systemd's socket activation. We'll focus on SSH, a very common service that is widely installed and used but on the vast majority of machines probably not started more often than 1/h in average (and usually even much less). SSH has supported inetd-style activation since a long time, following the third scheme mentioned above. Since it is started only every now and then and only with a limited number of connections at the same time it is a very good candidate for this scheme as the extra resource cost is negligble: if made socket-activatable SSH is basically free as long as nobody uses it. And as soon as somebody logs in via SSH it will be started and the moment he or she disconnects all its resources are freed again. Let's find out how to make SSH socket-activatable in systemd taking advantage of the provided inetd compatibility!
Here's the configuration line used to hook up SSH with classic inetd:
ssh stream tcp nowait root /usr/sbin/sshd sshd -i
And the same as xinetd configuration fragment:
service ssh {
socket_type = stream
protocol = tcp
wait = no
user = root
server = /usr/sbin/sshd
server_args = -i
}
Most of this should be fairly easy to understand, as these two fragments express very much the same information. The non-obvious parts: the port number (22) is not configured in inetd configuration, but indirectly via the service database in /etc/services: the service name is used as lookup key in that database and translated to a port number. This indirection via /etc/services has been part of Unix tradition though has been getting more and more out of fashion, and the newer xinetd hence optionally allows configuration with explicit port numbers. The most interesting setting here is the not very intuitively named nowait (resp. wait=no) option. It configures whether a service is of the second (wait) resp. third (nowait) scheme mentioned above. Finally the -i switch is used to enabled inetd mode in SSH.
The systemd translation of these configuration fragments are the following two units. First: sshd.socket is a unit encapsulating information about a socket to listen on:
[Unit] Description=SSH Socket for Per-Connection Servers [Socket] ListenStream=22 Accept=yes [Install] WantedBy=sockets.target
Most of this should be self-explanatory. A few notes: Accept=yes corresponds to nowait. It's hopefully better named, referring to the fact that for nowait the superserver calls accept() on the listening socket, where for wait this is the job of the executed service process. WantedBy=sockets.target is used to ensure that when enabled this unit is activated at boot at the right time.
And here's the matching service file sshd@.service:
[Unit] Description=SSH Per-Connection Server [Service] ExecStart=-/usr/sbin/sshd -i StandardInput=socket
This too should be mostly self-explanatory. Interesting is StandardInput=socket, the option that enables inetd compatibility for this service. StandardInput= may be used to configure what STDIN of the service should be connected for this service (see the man page for details). By setting it to socket we make sure to pass the connection socket here, as expected in the simple inetd interface. Note that we do not need to explicitly configure StandardOutput= here, since by default the setting from StandardInput= is inherited if nothing else is configured. Important is the "-" in front of the binary name. This ensures that the exit status of the per-connection sshd process is forgotten by systemd. Normally, systemd will store the exit status of a all service instances that die abnormally. SSH will sometimes die abnormally with an exit code of 1 or similar, and we want to make sure that this doesn't cause systemd to keep around information for numerous previous connections that died this way (until this information is forgotten with systemctl reset-failed).
sshd@.service is an instantiated service, as described in the preceeding installment of this series. For each incoming connection systemd will instantiate a new instance of sshd@.service, with the instance identifier named after the connection credentials.
You may wonder why in systemd configuration of an inetd service requires two unit files instead of one. The reason for this is that to simplify things we want to make sure that the relation between live units and unit files is obvious, while at the same time we can order the socket unit and the service units independently in the dependency graph and control the units as independently as possible. (Think: this allows you to shutdown the socket independently from the instances, and each instance individually.)
Now, let's see how this works in real life. If we drop these files into /etc/systemd/system we are ready to enable the socket and start it:
# systemctl enable sshd.socket ln -s '/etc/systemd/system/sshd.socket' '/etc/systemd/system/sockets.target.wants/sshd.socket' # systemctl start sshd.socket # systemctl status sshd.socket sshd.socket - SSH Socket for Per-Connection Servers Loaded: loaded (/etc/systemd/system/sshd.socket; enabled) Active: active (listening) since Mon, 26 Sep 2011 20:24:31 +0200; 14s ago Accepted: 0; Connected: 0 CGroup: name=systemd:/system/sshd.socket
This shows that the socket is listening, and so far no connections have been made (Accepted: will show you how many connections have been made in total since the socket was started, Connected: how many connections are currently active.)
Now, let's connect to this from two different hosts, and see which services are now active:
$ systemctl --full | grep ssh sshd@172.31.0.52:22-172.31.0.4:47779.service loaded active running SSH Per-Connection Server sshd@172.31.0.52:22-172.31.0.54:52985.service loaded active running SSH Per-Connection Server sshd.socket loaded active listening SSH Socket for Per-Connection Servers
As expected, there are now two service instances running, for the two connections, and they are named after the source and destination address of the TCP connection as well as the port numbers. (For AF_UNIX sockets the instance identifier will carry the PID and UID of the connecting client.) This allows us to invidiually introspect or kill specific sshd instances, in case you want to terminate the session of a specific client:
# systemctl kill sshd@172.31.0.52:22-172.31.0.4:47779.service
And that's probably already most of what you need to know for hooking up inetd services with systemd and how to use them afterwards.
In the case of SSH it is probably a good suggestion for most distributions in order to save resources to default to this kind of inetd-style socket activation, but provide a stand-alone unit file to sshd as well which can be enabled optionally. I'll soon file a wishlist bug about this against our SSH package in Fedora.
A few final notes on how xinetd and systemd compare feature-wise, and whether xinetd is fully obsoleted by systemd. The short answer here is that systemd does not provide the full xinetd feature set and that is does not fully obsolete xinetd. The longer answer is a bit more complex: if you look at the multitude of options xinetd provides you'll notice that systemd does not compare. For example, systemd does not come with built-in echo, time, daytime or discard servers, and never will include those. TCPMUX is not supported, and neither are RPC services. However, you will also find that most of these are either irrelevant on today's Internet or became other way out-of-fashion. The vast majority of inetd services do not directly take advantage of these additional features. In fact, none of the xinetd services shipped on Fedora make use of these options. That said, there are a couple of useful features that systemd does not support, for example IP ACL management. However, most administrators will probably agree that firewalls are the better solution for these kinds of problems and on top of that, systemd supports ACL management via tcpwrap for those who indulge in retro technologies like this. On the other hand systemd also provides numerous features xinetd does not provide, starting with the individual control of instances shown above, or the more expressive configurability of the execution context for the instances. I believe that what systemd provides is quite comprehensive, comes with little legacy cruft but should provide you with everything you need. And if there's something systemd does not cover, xinetd will always be there to fill the void as you can easily run it in conjunction with systemd. For the majority of uses systemd should cover what is necessary, and allows you cut down on the required components to build your system from. In a way, systemd brings back the functionality of classic Unix inetd and turns it again into a center piece of a Linux system.
And that's all for now. Thanks for reading this long piece. And now, get going and convert your services over! Even better, do this work in the individual packages upstream or in your distribution!
posted at: 20:46 | path: /projects | permanent link to this entry | 23 comments
Here's the tenth installment of my ongoing series on systemd for Administrators:
Most services on Linux/Unix are singleton services: there's usually only one instance of Syslog, Postfix, or Apache running on a specific system at the same time. On the other hand some select services may run in multiple instances on the same host. For example, an Internet service like the Dovecot IMAP service could run in multiple instances on different IP ports or different local IP addresses. A more common example that exists on all installations is getty, the mini service that runs once for each TTY and presents a login prompt on it. On most systems this service is instantiated once for each of the first six virtual consoles tty1 to tty6. On some servers depending on administrator configuration or boot-time parameters an additional getty is instantiated for a serial or virtualizer console. Another common instantiated service in the systemd world is fsck, the file system checker that is instantiated once for each block device that needs to be checked. Finally, in systemd socket activated per-connection services (think classic inetd!) are also implemented via instantiated services: a new instance is created for each incoming connection. In this installment I hope to explain a bit how systemd implements instantiated services and how to take advantage of them as an administrator.
If you followed the previous episodes of this series you are probably aware that services in systemd are named according to the pattern foobar.service, where foobar is an identification string for the service, and .service simply a fixed suffix that is identical for all service units. The definition files for these services are searched for in /etc/systemd/system and /lib/systemd/system (and possibly other directories) under this name. For instantiated services this pattern is extended a bit: the service name becomes foobar@quux.service where foobar is the common service identifier, and quux the instance identifier. Example: serial-getty@ttyS2.service is the serial getty service instantiated for ttyS2.
Service instances can be created dynamically as needed. Without further configuration you may easily start a new getty on a serial port simply by invoking a systemctl start command for the new instance:
# systemctl start serial-getty@ttyUSB0.service
If a command like the above is run systemd will first look for a unit configuration file by the exact name you requested. If this service file is not found (and usually it isn't if you use instantiated services like this) then the instance id is removed from the name and a unit configuration file by the resulting template name searched. In other words, in the above example, if the precise serial-getty@ttyUSB0.service unit file cannot be found, serial-getty@.service is loaded instead. This unit template file will hence be common for all instances of this service. For the serial getty we ship a template unit file in systemd (/lib/systemd/system/serial-getty@.service) that looks something like this:
[Unit] Description=Serial Getty on %I BindTo=dev-%i.device After=dev-%i.device systemd-user-sessions.service [Service] ExecStart=-/sbin/agetty -s %I 115200,38400,9600 Restart=always RestartSec=0
(Note that the unit template file we actually ship along with systemd for the serial gettys is a bit longer. If you are interested, have a look at the actual file which includes additional directives for compatibility with SysV, to clear the screen and remove previous users from the TTY device. To keep things simple I have shortened the unit file to the relevant lines here.)
This file looks mostly like any other unit file, with one distinction: the specifiers %I and %i are used at multiple locations. At unit load time %I and %i are replaced by systemd with the instance identifier of the service. In our example above, if a service is instantiated as serial-getty@ttyUSB0.service the specifiers %I and %i will be replaced by ttyUSB0. If you introspect the instanciated unit with systemctl status serial-getty@ttyUSB0.service you will see these replacements having taken place:
$ systemctl status serial-getty@ttyUSB0.service serial-getty@ttyUSB0.service - Getty on ttyUSB0 Loaded: loaded (/lib/systemd/system/serial-getty@.service; static) Active: active (running) since Mon, 26 Sep 2011 04:20:44 +0200; 2s ago Main PID: 5443 (agetty) CGroup: name=systemd:/system/getty@.service/ttyUSB0 └ 5443 /sbin/agetty -s ttyUSB0 115200,38400,9600
And that is already the core idea of instantiated services in systemd. As you can see systemd provides a very simple templating system, which can be used to dynamically instantiate services as needed. To make effective use of this, a few more notes:
You may instantiate these services on-the-fly in .wants/ symbolic links in the file system. For example, to make sure the serial getty on ttyUSB0 is started automatically at every boot, create a symlink like this:
# ln -s /lib/systemd/system/serial-getty@.service /etc/systemd/system/getty.target.wants/serial-getty@ttyUSB0.service
systemd will instantiate the symlinked unit file with the instance name specified in the symlink name.
You cannot instantiate a unit template without specifying an instance identifier. In other words systemctl start serial-getty@.service will necessarily fail since the instance name was left unspecified.
Sometimes it is useful to opt-out of the generic template for one specific instance. For these cases make use of the fact that systemd always searches first for the full instance file name before falling back to the template file name: make sure to place a unit file under the fully instantiated name in /etc/systemd/system and it will override the generic templated version for this specific instance.
The unit file shown above uses %i at some places and %I at others. You may wonder what the difference between these specifiers are. %i is replaced by the exact characters of the instance identifier. For %I on the other hand the instance identifier is first passed through a simple unescaping algorithm. In the case of a simple instance identifier like ttyUSB0 there is no effective difference. However, if the device name includes one or more slashes ("/") this cannot be part of a unit name (or Unix file name). Before such a device name can be used as instance identifier it needs to be escaped so that "/" becomes "-" and most other special characters (including "-") are replaced by "\xAB" where AB is the ASCII code of the character in hexadecimal notation[1]. Example: to refer to a USB serial port by its bus path we want to use a port name like serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0. The escaped version of this name is serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0. %I will then refer to former, %i to the latter. Effectively this means %i is useful wherever it is necessary to refer to other units, for example to express additional dependencies. On the other hand %I is useful for usage in command lines, or inclusion in pretty description strings. Let's check how this looks with the above unit file:
# systemctl start 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service' # systemctl status 'serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service' serial-getty@serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0.service - Serial Getty on serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 Loaded: loaded (/lib/systemd/system/serial-getty@.service; static) Active: active (running) since Mon, 26 Sep 2011 05:08:52 +0200; 1s ago Main PID: 5788 (agetty) CGroup: name=systemd:/system/serial-getty@.service/serial-by\x2dpath-pci\x2d0000:00:1d.0\x2dusb\x2d0:1.4:1.1\x2dport0 └ 5788 /sbin/agetty -s serial/by-path/pci-0000:00:1d.0-usb-0:1.4:1.1-port0 115200 38400 9600
As we can see the while the instance identifier is the escaped string the command line and the description string actually use the unescaped version, as expected.
(Side note: there are more specifiers available than just %i and %I, and many of them are actually available in all unit files, not just templates for service instances. For more details see the man page which includes a full list and terse explanations.)
And at this point this shall be all for now. Stay tuned for a follow-up article on how instantiated services are used for inetd-style socket activation.
Footnotes
[1] Yupp, this escaping algorithm doesn't really result in particularly pretty escaped strings, but then again, most escaping algorithms don't help readability. The algorithm we used here is inspired by what udev does in a similar case, with one change. In the end, we had to pick something. If you'll plan to comment on the escaping algorithm please also mention where you live so that I can come around and paint your bike shed yellow with blue stripes. Thanks!
posted at: 05:11 | path: /projects | permanent link to this entry | 11 comments
Make sure to read the summary of the Boot & Init Microconf at the Linux Plumbers Conference 2011 In Santa Rosa, CA. It was a fantastic conference (at the social event we took busses from the appetizers to the mains...), and this summary should give you quite a good idea what we discussed there. Highly recommended read.
posted at: 17:56 | path: /projects | permanent link to this entry | 0 comments
Kay Sievers, Harald Hoyer and I will tour the US in the next weeks. If you have any questions on systemd, udev or dracut (or any of the related technologies), then please do get in touch with us on the following occasions:
Linux Plumbers Conference, Santa Rosa, CA, Sep 7-9th
Google, Googleplex, Mountain View, CA, Sep 12th
Red Hat, Westford, MA, Sep 13-14th
As usual LPC is going to rock, so make sure to be there!
posted at: 15:37 | path: /projects | permanent link to this entry | 3 comments
I just finished putting together a text on the systemd wiki explaining what to do to write a syslog service that is nicely integrated with systemd, and does all the right things. It's supposed to be a checklist for all syslog hackers:
rsyslog already implements everything on this list afaics, and that's pretty cool. If other implementations want to catch up, please consider following these recommendations, too.
I put this together since I have changed systemd 35 to set StandardOutput=syslog as default, so that all stdout/stderr of all services automatically ends up in syslog. And since that change requires some (minimal) changes to all syslog implementations I decided to document this all properly (if you are curious: they need to set StandardOutput=null to opt out of this default in order to avoid logging loops).
Anyway, please have a peek and comment if you spot a mistake or something I forgot. Or if you have questions, just ask.
posted at: 23:14 | path: /projects | permanent link to this entry | 8 comments
The Linux cgroup hierarchies of the various kernel controllers are a shared resource. Recently many components of Linux userspace started making use of these hierarchies. In order to avoid that the various programs step on each others toes while manipulating this shared resource we have put together a list of recommendations. Programs following these guidelines should work together nicely without interfering with other users of the hierarchies.
These guidelines are available in the systemd wiki. I'd be very interested in feedback, and would like to ask you to ping me in case we forgot something or left something too vague.
And please, if you are writing software that interfaces with the cgroup tree consider following these recommendations. Thank you.
posted at: 16:25 | path: /projects | permanent link to this entry | 2 comments
Just wanted to draw your attention to the Desktop Summit Wiki. If you are attending the Desktop Summit in Berlin you might find some interesting information in the Wiki.
Go to the main page of the Wiki here. You are welcome to edit and add additional information to the Wiki. To edit the Wiki authenticate with the same credentials you used to sign up for the conference at the Desktop Summit web site.
See you on friday!
posted at: 22:56 | path: /projects | permanent link to this entry | 1 comments
Read the first part of the announcements.
And now there are more exciting announcements:
Only 5 days are now left to beginning of the conference. The first event will already take place on Friday August 5th, at c-base at U/S Jannowitzbrücke, starting at 4pm. The conference programme itself will begin on Saturday August 6th, 10am (though do come earlier, for registration, if you didn't register at the c-base event already!). Note that the primary entrance to the Desktop Summit is in the north-eastern corner of the main building of Humboldt University. That's on Dorotheenstr./Hegelplatz, and not on Unter den Linden.
See you on Friday at c-base!
posted at: 18:08 | path: /projects | permanent link to this entry | 3 comments
In case you missed them, there have been a couple of exciting announcements around the Desktop Summit in Berlin, Germany.
And there will be more exciting announcements coming!
See you in 14 days! Oh, and if you still haven't registered, do so now. It's free, and if you don't register you might not get on the WLAN at the conference right-away.
posted at: 21:15 | path: /projects | permanent link to this entry | 0 comments
Here's the ninth installment of my ongoing series on systemd for Administrators:
So, here's a bit of an opinion piece on the /etc/sysconfig/ and /etc/default directories that exist on the various distributions in one form or another, and why I believe their use should be faded out. Like everything I say on this blog what follows is just my personal opinion, and not the gospel and has nothing to do with the position of the Fedora project or my employer. The topic of /etc/sysconfig has been coming up in discussions over and over again. I hope with this blog story I can explain a bit what we as systemd upstream think about these files.
A few lines about the historical context: I wasn't around when /etc/sysconfig was introduced -- suffice to say it has been around on Red Hat and SUSE distributions since a long long time. Eventually /etc/default was introduced on Debian with very similar semantics. Many other distributions know a directory with similar semantics too, most of them call it either one or the other way. In fact, even other Unix-OSes sported a directory like this. (Such as SCO. If you are interested in the details, I am sure a Unix greybeard of your trust can fill in what I am leaving vague here.) So, even though a directory like this has been known widely on Linuxes and Unixes, it never has been standardized, neither in POSIX nor in LSB/FHS. These directories very much are something where distributions distuingish themselves from each other.
The semantics of /etc/default and /etc/sysconfig are very losely defined only. What almost all files stored in these directories have in common though is that they are sourcable shell scripts which primarily consist of environment variable assignments. Most of the files in these directories are sourced by the SysV init scripts of the same name. The Debian Policy Manual (9.3.2) and the Fedora Packaging Guidelines suggest this use of the directories, however both distributions also have files in them that do not follow this scheme, i.e. that do not have a matching SysV init script -- or not even are shell scripts at all.
Why have these files been introduced? On SysV systems services are started via init scripts in /etc/rc.d/init.d (or a similar directory). /etc/ is (these days) considered the place where system configuration is stored. Originally these init scripts were subject to customization by the administrator. But as they grew and become complex most distributions no longer considered them true configuration files, but more just a special kind of programs. To make customization easy and guarantee a safe upgrade path the customizable bits hence have been moved to separate configuration files, which the init scripts then source.
Let's have a quick look what kind of configuration you can do with these files. Here's a short incomprehensive list of various things that can be configured via environment settings in these source files I found browsing through the directories on a Fedora and a Debian machine:
Now, let's go where the beef is: what's wrong with /etc/sysconfig (resp. /etc/default)? Why might it make sense to fade out use of these files in a systemd world?
What to use instead? Here are a few recommendations of what to do with these files in the long run in a systemd world:
Of course, there's one very good reason for supporting these files for a bit longer: compatibility for upgrades. But that's is really the only one I could come up with. It's reason enough to keep compatibility for a while, but I think it is a good idea to phase out usage of these files at least in new packages.
If compatibility is important, then systemd will still allow you to read these configuration files even if you otherwise use native systemd unit files. If your sysconfig file only knows simple options EnvironmentFile=-/etc/sysconfig/foobar (See systemd.exec(5) for more information about this option.) may be used to import the settings into the environment and use them to put together command lines. If you need a programming language to make sense of these settings, then use a programming language like shell. For example, place an short shell script in /usr/lib/<your package>/ which reads these files for compatibility, and then exec's the actual daemon binary. Then spawn this script instead of the actual daemon binary with ExecStart= in the unit file.
And this is all for now. Thank you very much for your interest.
posted at: 00:34 | path: /projects | permanent link to this entry | 17 comments
If you plan to attend Desktop Summit in Berlin this year, then please REGISTER NOW!
If you do not register, then this means you will have to wait in the signup queue at the conference for substantially longer and might miss a talk or two. You will not get onto the conference WLAN right from the beginning of the conference (access is authenticated and personalized, only people who sign up will get access credentials). Your personal badge will not be ready right-away. If not enough people register we will also have to cut down on the available catering and the parties. We rely on the registration numbers to plan, and if you come but don't sign up before you make it very hard for us to plan. Registration is free, so what are you waiting for?
I am pretty sure you want to avoid all of this right? For your own benefit and for the benefit of everybody else attending the conference, go and register for the conference right-away!
Also, we are still looking for more volunteers for session chairs and runners at the conference. This is your chance to introduce your favourite Open Source hacker on stage! Please consider volunteering and read the Call for Volunteers. Add yourself to the list on the wiki page, today. If you sign up you'll earn yourself the gratitude of the GNOME and KDE communities, and you'll receive the exclusive team T-shirts!
Thank you!
posted at: 00:49 | path: /projects | permanent link to this entry | 3 comments
Here's yes another interview with yours truly. It's on LinuxFR, so I hope you understand some fr_FR.
posted at: 15:28 | path: /projects | permanent link to this entry | 6 comments
It has been way to long since I posted the first episode of my systemd for Developers series. Here's finally the second part. Make sure you read the first episode of the series before you start with this part since I'll assume the reader grokked the wonders of socket activation.
This time we'll focus on adding socket activation support to real-life software, more specifically the CUPS printing server. Most current Linux desktops run CUPS by default these days, since printing is so basic that it's a must have, and must just work when the user needs it. However, most desktop CUPS installations probably don't actually see more than a handful of print jobs each month. Even if you are a busy office worker you'll unlikely to generate more than a couple of print jobs an hour on your PC. Also, printing is not time critical. Whether a job takes 50ms or 100ms until it reaches the printer matters little. As long as it is less than a few seconds the user probably won't notice. Finally, printers are usually peripheral hardware: they aren't built into your laptop, and you don't always carry them around plugged in. That all together makes CUPS a perfect candidate for lazy activation: instead of starting it unconditionally at boot we just start it on-demand, when it is needed. That way we can save resources, at boot and at runtime. However, this kind of activation needs to take place transparently, so that the user doesn't notice that the print server was not actually running yet when he tried to access it. To achieve that we need to make sure that the print server is started as soon at least one of three conditions hold:
Of course, the desktop is not the only place where CUPS is used. CUPS can be run in small and big print servers, too. In that case the amount of print jobs is substantially higher, and CUPS should be started right away at boot. That means that (optionally) we still want to start CUPS unconditionally at boot and not delay its execution until when it is needed.
Putting this all together we need four kind of activation to make CUPS work well in all situations at minimal resource usage: socket based activation (to support condition 1 above), hardware based activation (to support condition 2), path based activation (for condition 3) and finally boot-time activation (for the optional server usecase). Let's focus on these kinds of activation in more detail, and in particular on socket-based activation.
To implement socket-based activation in CUPS we need to make sure that when sockets are passed from systemd these are used to listen on instead of binding them directly in the CUPS source code. Fortunately this is relatively easy to do in the CUPS sources, since it already supports launchd-style socket activation, as it is used on MacOS X (note that CUPS is nowadays an Apple project). That means the code already has all the necessary hooks to add systemd-style socket activation with minimal work.
To begin with our patching session we check out the CUPS sources. Unfortunately CUPS is still stuck in unhappy Subversion country and not using git yet. In order to simplify our patching work our first step is to use git-svn to check it out locally in a way we can access it with the usual git tools:
git svn clone http://svn.easysw.com/public/cups/trunk/ cups
This will take a while. After the command finished we use the wonderful git grep to look for all occurences of the word "launchd", since that's probably where we need to add the systemd support too. This reveals scheduler/main.c as main source file which implements launchd interaction.
Browsing through this file we notice that two functions are primarily responsible for interfacing with launchd, the appropriately named launchd_checkin() and launchd_checkout() functions. The former acquires the sockets from launchd when the daemon starts up, the latter terminates communication with launchd and is called when the daemon shuts down. systemd's socket activation interfaces are much simpler than those of launchd. Due to that we only need an equivalent of the launchd_checkin() call, and do not need a checkout function. Our own function systemd_checkin() can be implemented very similar to launchd_checkin(): we look at the sockets we got passed and try to map them to the ones configured in the CUPS configuration. If we got more sockets passed than configured in CUPS we implicitly add configuration for them. If the CUPS configuration includes definitions for more listening sockets those will be bound natively in CUPS. That way we'll very robustly listen on all ports that are listed in either systemd or CUPS configuration.
Our function systemd_checkin() uses sd_listen_fds() from sd-daemon.c to acquire the file descriptors. Then, we use sd_is_socket() to map the sockets to the right listening configuration of CUPS, in a loop for each passed socket. The loop corresponds very closely to the loop from launchd_checkin() however is a lot simpler. Our patch so far looks like this.
Before we can test our patch, we add sd-daemon.c and sd-daemon.h as drop-in files to the package, so that sd_listen_fds() and sd_is_socket() are available for use. After a few minimal changes to the Makefile we are almost ready to test our socket activated version of CUPS. The last missing step is creating two unit files for CUPS, one for the socket (cups.socket), the other for the service (cups.service). To make things simple we just drop them in /etc/systemd/system and make sure systemd knows about them, with systemctl daemon-reload.
Now we are ready to test our little patch: we start the socket with systemctl start cups.socket. This will bind the socket, but won't start CUPS yet. Next, we simply invoke lpq to test whether CUPS is transparently started, and yupp, this is exactly what happens. We'll get the normal output from lpq as if we had started CUPS at boot already, and if we then check with systemctl status cups.service we see that CUPS was automatically spawned by our invocation of lpq. Our test succeeded, socket activation worked!
The next trigger is hardware activation: we want to make sure that CUPS is automatically started as soon as a local printer is found, regardless whether that happens as hotplug during runtime or as coldplug during boot. Hardware activation in systemd is done via udev rules. Any udev device that is tagged with the systemd tag can pull in units as needed via the SYSTEMD_WANTS= environment variable. In the case of CUPS we don't even have to add our own udev rule to the mix, we can simply hook into what systemd already does out-of-the-box with rules shipped upstream. More specifically, it ships a udev rules file with the following lines:
SUBSYSTEM=="printer", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"
SUBSYSTEM=="usb", KERNEL=="lp*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"
SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ENV{ID_USB_INTERFACES}=="*:0701??:*", TAG+="systemd", ENV{SYSTEMD_WANTS}="printer.target"
This pulls in the target unit printer.target as soon as at least one printer is plugged in (supporting all kinds of printer ports). All we now have to do is make sure that our CUPS service is pulled in by printer.target and we are done. By placing WantedBy=printer.target line in the [Install] section of the service file, a Wants dependency is created from printer.target to cups.service as soon as the latter is enabled with systemctl enable. The indirection via printer.target provides us with a simple way to use systemctl enable and systemctl disable to manage hardware activation of a service.
To ensure that CUPS is also started when there is a print job still queued in the printing spool, we write a simple cups.path that activates CUPS as soon as we find a file in /var/spool/cups.
Well, starting services on boot is obviously the most boring and well-known way to spawn a service. This entire excercise was about making this unnecessary, but we still need to support it for explicit print server machines. Since those are probably the exception and not the common case we do not enable this kind of activation by default, but leave it to the administrator to add it in when he deems it necessary, with a simple command (ln -s /lib/systemd/system/cups.service /etc/systemd/system/multi-user.target.wants/ to be precise).
So, now we have covered all four kinds of activation. To finalize our patch we have a closer look at the [Install] section of cups.service, i.e. the part of the unit file that controls how systemctl enable cups.service and systemctl disable cups.service will hook the service into/unhook the service from the system. Since we don't want to start cups at boot we do not place WantedBy=multi-user.target in it like we would do for those services. Instead we just place an Also= line that makes sure that cups.path and cups.socket are automatically also enabled if the user asks to enable cups.service (they are enabled according to the [Install] sections in those unit files).
As last step we then integrate our work into the build system. In contrast to SysV init scripts systemd unit files are supposed to be distribution independent. Hence it is a good idea to include them in the upstream tarball. Unfortunately CUPS doesn't use Automake, but Autoconf with a set of handwritten Makefiles. This requires a bit more work to get our additions integrated, but is not too difficult either. And this is how our final patch looks like, after we commited our work and ran git format-patch -1 on it to generate a pretty git patch.
The next step of course is to get this patch integrated into the upstream and Fedora packages (or whatever other distribution floats your boat). To make this easy I have prepared a patch for Tim that makes the necessary packaging changes for Fedora 16, and includes the patch intended for upstream linked above. Of course, ideally the patch is merged upstream, however in the meantime we can already include it in the Fedora packages.
Note that CUPS was particularly easy to patch since it already supported launchd-style activation, patching a service that doesn't support that yet is only marginally more difficult. (Oh, and we have no plans to offer the complex launchd API as compatibility kludge on Linux. It simply doesn't translate very well, so don't even ask... ;-))
And that finishes our little blog story. I hope this quick walkthrough how to add socket activation (and the other forms of activation) to a package were interesting to you, and will help you doing the same for your own packages. If you have questions, our IRC channel #systemd on freenode and our mailing list are available, and we are always happy to help!
posted at: 00:46 | path: /projects | permanent link to this entry | 18 comments
Here's another interview with yours truly. It's on IBM developerWorks Brasil, so I hope you understand some pt_BR.
posted at: 16:33 | path: /projects | permanent link to this entry | 0 comments
GNOMErs, the Desktop Summit in Berlin, Germany is approaching quickly!
Submit your BoF for the Desktop Summit BoF days NOW! Deadline is July 3rd, this sunday!
Sign up as a volunteer for the Desktop Summit NOW! Deadline is July 18th!
posted at: 23:20 | path: /projects | permanent link to this entry | 0 comments
It has been a while since I blogged photos of my various travels, although I have visited quite a number of countries in the past 12 months, and travelled overland in a number of them. Here are a few selected shots from three: India (November/December), Thailand (January), Japan (June).
These pictures are from Kyoto, Nara and Takayama in Honshu, Japan.
All this is Bangkok, Thailand. Particular interest deserve the gold-based patterns used widely to adorn Thai architecture:
And finally India (one picture NSFW!):
This is Mumbai, Ellora, Ajanta, Aurangabad (in Maharashtra); Mandu, Sanchi, Gwalior, Khajuraho (Madhya Pradesh); Orchha, Varanasi (Uttar Pradesh); Bangalore, Mysore (Karnataka).
posted at: 19:09 | path: /photos | permanent link to this entry | 4 comments
The Desktop Summit 2011 in Berlin, Germany Needs Your Help!
Are you attending the Desktop Summit? Are you interested in helping the GNOME and KDE communities organize this year's Summit? Do you want to work with other Free Software enthusiasts to make the Desktop Summit rock? Would you like to own one of the exclusive Desktop Summit Team T-Shirts?
If so, please sign up as a volunteer for the Desktop Summit!
Read the full Call For Volunteers!
Read Patricia's original announcement.
Or go directly and sign up as a volunteer.
posted at: 17:32 | path: /projects | permanent link to this entry | 0 comments
The Desktop Summit schedule for the talks and presentations has been published a couple of weeks ago. Now it is time to open the 2nd Call for Participation, this time for Workshops and BoFs.
If you'd like to run a workshop, BoF, hack session or training/teaching session, then please submit it here. If you do it will appear in the printed schedule and get a prominent time slot assigned. BoFs, workshops, hack sessions and training/teaching sessions can also be added after the deadline of July 3rd, and even be registered ad-hoc at the conference, but if you register your slot in advance we can make sure people will find it in the printed schedule, will know about it, can plan to attend it and we can do everything to make sure a lot of people show up.
Note that BoF/workshop proposals are unrestricted, i.e. there is no program committee that will accept or reject submissions: we have a lot of room and we'll accept liberally what is submitted.
For GNOMErs: this part of the conference is supposed to be much like the Boston GNOME summit, but with a printed schedule. So please be welcome to submit your sessions like you'd want to have them take place at the GNOME summit as well.
Also see Jonnor's original announcement.
So, hurry, file your session request right-away and before July 3rd!
posted at: 09:59 | path: /projects | permanent link to this entry | 0 comments
If you are into computers and live in Germany I am sure you know the c't computer magazine. The current edition 13/2011 (p. 172) contains two articles contributed by Thorsten Leemhuis, Kay Sievers and yours truly on the topic of systemd. Awesome read. Now, run to your local kiosk and grab a c't and study the two articles. Go now, quick!
posted at: 11:45 | path: /projects | permanent link to this entry | 0 comments
Golem.de has an interview with yours truly. When I watched I learned so much! If you understand the German language then you might too! (and only then because it is in Goethe's tongue).
posted at: 22:02 | path: /projects | permanent link to this entry | 1 comments
Fedora 15 is out. Get it while it is hot! It is probably the biggest distribution release of a all time with being first in shipping both GNOME 3 and systemd.
Since this is the first distribution release based on systemd, it might be interesting to read up on what it is all about. Here's a little compilation of the available documentation for systemd.
Here's the full list of all man pages.
Some of the systemd for Administrators blog posts are available in Russian language, too.
And, if you still have questions after all of this, please join our mailing list, or our IRC channel #systemd on irc.freenode.org. Alternatively, if you are looking for paid consulting services for systemd contact our friends at ProFUSION.
posted at: 16:54 | path: /projects | permanent link to this entry | 8 comments
The Paper Committee of the Desktop Summit 2011, in Berlin, Germany is happy to announce that the conference programme is now published.
And yes, it is an absolutely rocking programme.
See you in Berlin!
posted at: 17:08 | path: /projects | permanent link to this entry | 0 comments
The Linux Plumbers Conference 2011 in Santa Rosa, CA, USA is coming nearer (Sep. 7-9). Together with Kay Sievers I am running the Boot&Init track, and together with Mark Brown the Audio track.
For both tracks we still need proposals. So if you haven't submitted anything yet, please consider doing so. And that quickly. i.e. if you can arrange for it, last sunday would be best, since that was actually the final deadline. However, the submission form is still open, so if you submit something really, really quickly we'll ignore the absence of time travel and the calendar for a bit. So, go, submit something. Now.
What are we looking for? Well, here's what I just posted on the audio related mailing lists:
So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything audio plumbing related: audio drivers, audio APIs, sound servers, pro audio, consumer audio. If you can propose something audio related -- like talks on media controller routing, on audio for ASOC/Embedded, submit something! If you care for low-latency audio, submit something. If you care about the Linux audio stack in general, submit something. LPC is probably the most relevant technical conference on the general Linux platform, so be sure that if you want your project, your work, your ideas to be heard then this is the right forum for everything related to the Linux stack. And the Audio track covers everything in our Audio Stack, regardless whether it is pro or consumer audio.
And here's what I posted to the init related lists:
So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything from the BIOS (i.e. CoreBoot and friends), over boot loaders (i.e. GRUB and friends), to initramfs (i.e. Dracut and friends) and init systems (i.e. systemd and friends). If you have something smart to say about any of these areas or maybe about related tools (i.e. you wrote a fancy new tool to measure boot performance) or fancy boot schemes in your favourite Linux based OS (i.e. the new Meego zero second boot ;-)) then don't hesitate to submit something on the LPC web site, in the Boot&Init track!
And now, quickly, go to the LPC website and post your session proposal in the Audio resp. Boot&Init track! Thank you!
posted at: 23:30 | path: /projects | permanent link to this entry | 1 comments
As some of you might know Fedora 15 went Gold a couple of days ago. The first big distribution based on systemd will be released 2011-05-24. Mark the date!
In little over a year systemd went from nowhere to became a core piece of Fedora. This wasn't possible without the numerous folks who worked with us on getting systemd right, supplied patches, chased bugs, tested releases and posted comments and generally made sure everything was in shape for the big release.
At this point we'd like to thank everybody who contributed and a few folks in particular:
A. Costa Adrian Spinu Alexey Shabalin Andreas Jaeger Andrew Edmunds Andrey Borzenkov Bill Nottingham Brandon Philips Brendan Jones Brett Witherspoon Chris E Ferron Christian Ruppert Conrad Meyer Daniel J Walsh Dave Reisner Eric Paris Fabian Henze Fabiano Fidêncio Florian Kriener Franz Dietrich Greg Kroah-Hartman Gustavo Sverzut Barbieri Harald Hoyer James Laska Jan Engelhardt Jeff Mahoney Jesse Zhang Jóhann B. Guðmundsson Karel Zak Koen Kooi Lucas De Marchi Ludwig Nussel Luis Felipe Strano Moraes Maarten Lankhorst Malcolm Studd Marc-Antoine Perennou Martin Mikkelsen Matthew Miller Matthias Clasen Matthias Schiffer Michael Biebl Michael Olbrich Michael Tremer Michał Piotrowski Michal Schmidt Mike Kazantsev Mike Kelly Miklos Vajna Milan Broz Ozan Çağlayan Paul Menzel Pavol Rusnak Rahul Sundaram Rainer Gerhards Ran Benita Ray Strode Robert Gerus Sedat Dilek Tero Roponen Thierry Reding Tollef Fog Heen Tomasz Torcz Tom Callaway Tom Gundersen Toshio Kuratomi William Jon McCann Wulf C. Krueger Zbigniew Jędrzejewski-Szmek
And everybody else who I (or git shortlog) forgot.
Thank you!
Lennart and Kay
BTW, the interface stability promise is valid now.
posted at: 14:01 | path: /projects | permanent link to this entry | 4 comments
It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.
Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.