Frequently Asked Questions¶
Container Startup Issues¶
If your container is not starting, or not behaving as you would expect,
the first thing to do is to look at the console logs generated by the
container, using the lxc console --show-log CONTAINERNAME
command.
In this example, we will investigate a RHEL 7 system in which
systemd
can not start.
# lxc console --show-log systemd
Console log:
Failed to insert module 'autofs4'
Failed to insert module 'unix'
Failed to mount sysfs at /sys: Operation not permitted
Failed to mount proc at /proc: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
The errors here say that /sys and /proc can not be mounted - which is correct in an unprivileged container. However, LXD does mount these filesystems automatically if it can.
The container requirements specify that
every container must come with an empty /dev
, /proc
, and
/sys
folder, as well as /sbin/init
existing. If those folders
don’t exist, LXD will be unable to mount to them, and systemd will then
try to. As this is an unprivileged container, systemd does not have the
ability to do this, and it then freezes.
So you can see the environment before anything is changed, you can
explicitly change the init in a container using the raw.lxc
config
param. This is equivalent to setting init=/bin/bash
on the linux
kernel commandline.
lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash'
Here is what it looks like:
root@lxc-01:~# lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash'
root@lxc-01:~# lxc start systemd
root@lxc-01:~# lxc console --show-log systemd
Console log:
[root@systemd /]#
root@lxc-01:~#
Now that the container has started, you can look in it and see that things are not running as well as expected.
root@lxc-01:~# lxc exec systemd bash
[root@systemd ~]# ls
[root@systemd ~]# mount
mount: failed to read mtab: No such file or directory
[root@systemd ~]# cd /
[root@systemd /]# ls /proc/
sys
[root@systemd /]# exit
Because LXD tries to auto-heal, it did create some of the folders when it was starting up. Shutting down and restarting the container will fix the problem, but the original cause is still there - the template does not contain the required files.
Networking Issues¶
In a larger Production Environment, it is common to have multiple vlans and have LXD clients attached directly to those vlans. Be aware that if you are using netplan and systemd-networkd, you will encounter some bugs that could cause catastrophic issues
Do not use systemd-networkd with netplan and bridges based on vlans¶
At time of writing (2019-03-05), netplan can not assign a random MAC
address to a bridge attached to a vlan. It always picks the same MAC
address, which causes layer2 issues when you have more than one machine
on the same network segment. It also has difficultly creating multiple
bridges. Make sure you use network-manager
instead. An example
config is below, with a management address of 10.61.0.25, and VLAN102
being used for client traffic.
network:
version: 2
renderer: NetworkManager
ethernets:
eth0:
dhcp4: no
accept-ra: no
# This is the 'Management Address'
addresses: [ 10.61.0.25/24 ]
gateway4: 10.61.0.1
nameservers:
addresses: [ 1.1.1.1, 8.8.8.8 ]
eth1:
dhcp4: no
accept-ra: no
# A bogus IP address is required to ensure the link state is up
addresses: [ 10.254.254.25/32 ]
vlans:
vlan102:
accept-ra: no
dhcp4: no
id: 102
link: eth1
bridges:
br102:
accept-ra: no
dhcp4: no
interfaces: [ "vlan102" ]
# A bogus IP address is required to ensure the link state is up
addresses: [ 10.254.102.25/32 ]
parameters:
stp: false
Things to note¶
eth0 is the Management interface, with the default gateway.
vlan102 uses eth1.
br102 uses vlan102, and has a bogus /32 IP address assigned to it
The other important thing is to set stp: false
, otherwise the bridge
will sit in learning
state for up to 10 seconds, which is longer
than most DHCP requests last. As there is no possibility of
cross-connecting and causing loops, this is safe to do.
Beware of ‘port security’¶
Many switches do not allow MAC address changes, and will either drop traffic with an incorrect MAC, or, disable the port totally. If you can ping a LXD instance from the host, but are not able to ping it from a different host, this could be the cause. The way to diagnose this is to run a tcpdump on the uplink (in this case, eth1), and you will see either ‘ARP Who has xx.xx.xx.xx tell yy.yy.yy.yy’, with you sending responses but them not getting acknowledged, or, ICMP packets going in and out successfully, but never being received by the other host.
Do not run privileged containers unless necessary¶
A privileged container can do things that effect the entire host - for example, it can use things in /sys to reset the network card, which will reset it for the entire host, causing network blips. Almost everything can be run in an unprivileged container, or - in cases of things that require unusual privileges, like wanting to mount NFS filesystems inside the container, you may need to use bind mounts.