Category Archives: Linux

TopIcons-plus for Gnome (v18)

This is another off-topic post as it is not related to security.

It has been awhile since I released the TopIcons-plus Gnome-Shell extension.

I had not advertised it here because it was not really ready or stable, but now I believe it is taking shape.

How is Topicons-plus useful ?

The Gnome developers want to kill system tray icons, which are displayed in what they call the legacy tray.

Such icons are familiar to everybody: messaging programs like RocketChat or Telegram, e-mail clients like Thunderbird, Dropbox, KeepassX, etc.

Gnome designers think such a design belongs to past, is flawed in many ways (status or menu?) and should be useless with modern environments with a dock and a powerful notification system.

I would not comment on that and I actually believe they are right.

However, the legacy tray they propose is horrible. It is hidden most of the time and you have to click to open it before accessing to your icons. It is very painful, and it is done on purpose, to clearly send a message that it should not be used anymore by application developers.

Well, but what about the existing applications ?

They are not going away all the sudden. As a user, I still need them.

And it is open-source, mostly developed on free time: developers are not going to re-implement everything just for the Gnome ecosystem…

That is where I think an extension like TopIcons-plus is useful. It removes the hassle of this legacy tray by bringing back the icons to the top bar, so they are always visible.

Latest release

It comes with extra features, like styling (opacity, desaturation, size) and positioning.

The latest release should be in pretty good shape. If you don’t want to use the Github code, be patient: it should get validated on the Gnome website within the next days.

Enjoy!

TopIcons-Plus v18, tray icons centered

One more rant against the Linux Intel graphic driver

Some quick notes that may help random Linux users looking for similar issues.

I am, like many, the unfortunate user of a laptop with Intel graphics (Thinkpad T460 to be precise). Why unfortunate? Because the graphic driver provided by Intel sucks.

i915, as it is being called, really has been sucking for years, and it is known for that (just google it, if you don’t believe me).

For the sake of completeness, here is the exact model with which I experienced some issues:

%  lspci
00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
...

For performance, remove the X11 Intel driver

First, its X11 module is generally under-performing under X11, so I just removed it to have X11 using modsettings. These are the instructions for Fedora (24), but you can virtually do something similar for any distribution:

% dnf remove xorg-x11-drv-intel

Do not worry, it just remove the X11 part of the driver, not the kernel driver itself.

Login, logout, job done: you should have less lags with desktop environments like gnome-shell.

For stability, disable RC6

I experienced frequent, daily freezes of my work session. The display would totally hang or display a blank screen, forcing me to cold reboot the computer.

Here is an extract of the dmesg kernel traces leading the the crash (it is a bit lengthy, but that may help people to find this post):

oct. 06 11:00:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:00:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:43 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:02:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:02:22 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:04:08 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=6364 end=6365) time 287 us, min 954, max 959, scanline start 950, end 967
oct. 06 11:13:58 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=41777 end=41778) time 340 us, min 954, max 959, scanline start 946, end 967
oct. 06 11:20:18 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=64583 end=64584) time 284 us, min 954, max 959, scanline start 946, end 964
oct. 06 11:20:33 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=65517 end=65518) time 284 us, min 1073, max 1079, scanline start 1071, end 1091
oct. 06 11:28:27 localhost.localdomain kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
oct. 06 11:31:53 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=106339 end=106340) time 285 us, min 1073, max 1079, scanline start 1066, end 1086
oct. 06 11:33:58 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=113803 end=113804) time 287 us, min 954, max 959, scanline start 948, end 966
oct. 06 11:35:13 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=118345 end=118346) time 285 us, min 1073, max 1079, scanline start 1062, end 1081
oct. 06 11:52:59 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=182278 end=182279) time 282 us, min 1073, max 1079, scanline start 1064, end 1084
oct. 06 12:01:29 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=212893 end=212894) time 284 us, min 1073, max 1079, scanline start 1068, end 1088
oct. 06 12:02:44 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=217395 end=217396) time 282 us, min 1073, max 1079, scanline start 1068, end 1088
oct. 06 12:02:49 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=217642 end=217643) time 247 us, min 954, max 959, scanline start 949, end 964
oct. 06 12:03:54 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=221597 end=221598) time 281 us, min 1073, max 1079, scanline start 1067, end 1086
oct. 06 12:05:49 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=228446 end=228447) time 290 us, min 954, max 959, scanline start 948, end 966
oct. 06 12:17:32 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:23 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:52 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
oct. 06 12:20:52 localhost.localdomain kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
oct. 06 12:20:52 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:20:54 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:05 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:05 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [2532], reason: Engine(s) hung, action: reset
oct. 06 12:21:05 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:07 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:15 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:15 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [2532], reason: Engine(s) hung, action: reset
oct. 06 12:21:15 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:17 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:27 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:27 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:21:27 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:29 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:37 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:37 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:21:37 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:39 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:43 localhost.localdomain kernel: ------------[ cut here ]------------
oct. 06 12:21:43 localhost.localdomain kernel: WARNING: CPU: 0 PID: 1109 at drivers/gpu/drm/i915/intel_display.c:13533 intel_atomic_commit+0x13b8/0x1470 [i915]
oct. 06 12:21:43 localhost.localdomain kernel: pipe A vblank wait timed out
oct. 06 12:21:43 localhost.localdomain kernel: Modules linked in: tun nfnetlink_queue nfnetlink_log uas usb_storage xt_nat veth rfcomm ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack dm_thin_pool dm_persistent_data dm_bio_prison loop ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security ebtable_filter ebtables ip6table_filter ip6_tables vmnet(O) ppdev parport_pc parport vboxpci(O) vboxnetadp(O) vboxnetflt(O) fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) vboxdrv(O) cmac bnep cpufreq_stats vfat fat arc4 iTCO_wdt snd_soc_skl iTCO_vendor_support
oct. 06 12:21:43 localhost.localdomain kernel:  snd_soc_skl_ipc snd_hda_codec_hdmi snd_soc_sst_ipc intel_rapl snd_soc_sst_dsp x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_ext_core intel_powerclamp snd_hda_codec_generic coretemp snd_soc_sst_match kvm_intel snd_soc_core kvm snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_hda_codec iwlmvm snd_hda_core mac80211 irqbypass intel_cstate intel_rapl_perf snd_hwdep snd_seq snd_seq_device btusb snd_pcm btrtl uvcvideo btbcm btintel videobuf2_vmalloc videobuf2_memops joydev bluetooth videobuf2_v4l2 iwlwifi i2c_i801 snd_timer videobuf2_core cfg80211 rtsx_pci_ms videodev memstick media mei_me mei shpchp thinkpad_acpi intel_pch_thermal snd soundcore rfkill wmi tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c dm_crypt hid_logitech_hidpp hid_logitech_dj 8021q garp
oct. 06 12:21:43 localhost.localdomain kernel:  stp llc mrp i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel e1000e i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm serio_raw ptp pps_core rtsx_pci video fjes
oct. 06 12:21:43 localhost.localdomain kernel: CPU: 0 PID: 1109 Comm: systemd-logind Tainted: G     U  W  O    4.7.5-200.fc24.x86_64 #1
oct. 06 12:21:43 localhost.localdomain kernel: Hardware name: LENOVO 20FNCTO1WW/20FNCTO1WW, BIOS R06ET42W (1.16 ) 09/20/2016
oct. 06 12:21:43 localhost.localdomain kernel:  0000000000000286 0000000018e0c148 ffff8800d283b850 ffffffffb63daaaf
oct. 06 12:21:43 localhost.localdomain kernel:  ffff8800d283b8a0 0000000000000000 ffff8800d283b890 ffffffffb60a0b0b
oct. 06 12:21:43 localhost.localdomain kernel:  000034dd00000000 ffff88040f607000 0000000000000000 0000000000000000
oct. 06 12:21:43 localhost.localdomain kernel: Call Trace:
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb63daaaf>] dump_stack+0x63/0x84
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60a0b0b>] __warn+0xcb/0xf0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60a0b8f>] warn_slowpath_fmt+0x5f/0x80
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e4483>] ? finish_wait+0x53/0x70
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc05046a8>] intel_atomic_commit+0x13b8/0x1470 [i915]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e46e0>] ? prepare_to_wait_event+0xf0/0xf0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc0380ba7>] drm_atomic_commit+0x37/0x60 [drm]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e21e8>] restore_fbdev_mode+0x238/0x260 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e45d4>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e464d>] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc051ea4a>] intel_fbdev_set_par+0x1a/0x60 [i915]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb645a6b6>] fb_set_var+0x236/0x460
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e4004>] ? __wake_up+0x44/0x50
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb67ea562>] ? down_write+0x12/0x40
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64caabb>] ? tty_unthrottle+0x3b/0x60
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb645074f>] fbcon_blank+0x30f/0x350
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64db0b2>] do_unblank_screen+0xd2/0x1a0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64d0ef6>] vt_ioctl+0x4f6/0x1270
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64c537a>] tty_ioctl+0x35a/0xc50
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625f909>] ? dput+0xd9/0x260
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625b4b2>] do_vfs_ioctl+0xa2/0x5d0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60be9b8>] ? task_work_run+0x88/0xb0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625ba59>] SyS_ioctl+0x79/0x90
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb67ec572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
oct. 06 12:21:43 localhost.localdomain kernel: ---[ end trace 9f62268cfd97b6cb ]---

As seen above, it would always happen after a while and when the graphic chip goes to the RC6 power saving mode.

After searching on different forum and wikis, I applied the proposed solution of completly disabling the RC6 mode. Add the part in red to the kernel options in your grub configuration file:

%  cat /etc/default/grub 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora/root rd.luks.uuid=luks-a9d14a0e-6c22-4976-919a-d216bd69d563 rd.lvm.lv=fedora/swap resume=/dev/dm-2 quiet splash i915.enable_rc6=0"
GRUB_DISABLE_RECOVERY="true"

Then, just rebuild grub and reboot. If you are on a UEFI system (as root):

grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Or, for legacy BIOS:

grub2-mkconfig -o /boot/grub2/grub.cfg

Finally reboot and you are done.

There is a caveat however, as it will probably cause some battery drain. With Powertop, I measured a consumption increase of around 6 W (8 to 13W), which caused my battery life to drop from approximately 10h to 5h30.

Still enough and a acceptable price to pay to work reliably without risking a complete system hang.

But, if I had to buy a computer personally, I would make sure that it has an nvidia card. Yeah, I know that there proprietary blob has its caveats too, but from what I heard it is probably more stable.

Graphic drivers have always been a problem for “Linux on the desktop”.

References

  • https://wiki.archlinux.org/index.php/intel_graphics
  • https://wiki.gentoo.org/wiki/Intel

Lessons learned with Docker, Nodejs apps and volumes

Context

I have kept playing with Docker recently, just for fun and to learn.

It is very powerful, but still young. It quickly shows some limit when it comes to security or persistence. There are some workarounds, yet more or less complex, more or less hacky.

Indeed, I had some issues with Etherpad, which is a Nodejs application, and its integration into Docker.

Initially, I made something quite simple, so my Dockerfile ended like that:

USER etherpad
CMD ["node","/opt/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js"]

Thus, I simply start the app with a low privileges user.

It worked, but I had two issues:

  1. Docker was not able to stop it nicely. Instead, it timed out after 10 sec and finally killed the app and the container altogether.
  2. No persistence of any kind, of course.

I decided to tackle these two issues to understand what was going on behind.

The PID 1 issue

I could not understand immediately the first issue: why was Docker unable to terminate the container properly?

After wandering a few hours on wrong paths (trying to get through with Nodejs nodemon or supervisor), I finally found some good articles, explaining that Docker misses an init system to catch signals, wich causes some issues with applications started with a PID = 1, which cannot be killed, or with Bash (the shell doesn’t handle transmitted signals.

I am not going to repeat poorly what has already been explained very well, so I encourage you to read this two excellent posts:

You will also find a lot of bug reports in the Docker github about this issue, and a lot of hacky or overkilling solutions.

In my opinion, the most elegant solution among them is to use a launcher program, very simple and dedicated to catch and handle signal.

I chose to use Dumb-init, as it is well packaged (there are plenty of options) and seems to be well maintained.

So, after installing Dump-init in the Dockerfile, the CMD line should now look like this:

USER etherpad
CMD ["dumb-init","node","/opt/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js"]

And indeed, as expected, docker stop now works flawlessly.

Volume permissions

This is where I had the toughest issue, although it is supposed to be straightforward with volumes.

Volumes enable to share files or folders between host and containers, or between containers solely. There are plenty of possibilities, nicely illustrated on this blog:

And it works very well…. as long as you application runs as root.

In my case, for instance, Etherpad runs with a low privileged user, which is highly recommended. At startup, it creates a sqlite database, etherpad.db,  in its ./var folder.

Mounting a volume, of any kind, over the ./var folder, would result in a folder with root only permissions. Subsequently, of course, the launch of Etherpad from the CMD command would fail miserably.

Simple solutions like chown in the Dockerfile don’t work, because they apply before the mount. The mount occurs at runtime and works like a standard Linux mount: it is created by the docker daemon, with root permissions, over possibly existing data.

My solution was to completely change the way Etherpad is started. I now use an external script which is started at runtime:

  1. First, it applies the appropriate permissions to the mounted volume with chown,
  2. Then, it starts Etherpad with a low privileged user thanks to a su hack.

So now the Dockerfile ends with:

VOLUME /opt/etherpad-lite/var
ADD run-docker.sh ./bin/
CMD ["./bin/run-docker.sh"]

And here is the script:

#!/bin/bash

chown -R etherpad:etherpad /opt/etherpad-lite/var
su etherpad -s /bin/bash -c  "dumb-init node /opt/etherpad-lite/node_modules/ep_etherpad-lite/no
de/server.js"

I use a data volume for persistency, so the run command looks like this:

docker run -d --name etherpad -p 80:9001 -v etherpad:/opt/etherpad-lite/var -t debian-etherpad

Far from being ideal, but it works. I really hope some features are coming to bring more options in this area, especially in the Dockerfile.

Some final thoughts

Globally, we can still hope a lot of improvements in security, because when I look at many Dockerfiles around, I see two behaviors:

  • A lot of people don’t care and everything is happily running as root, from unauthenticated third-party images or binaries…
  • Some people do care but end up with dirty hacks, because there is no other way to do so.

It is scary and so far from the Linux philosophy. Let’s wait for the enhancements to come.

You can find the complete updated Dockerfile on this github page.

While we are on this topic, have a look to this nice post with some nice tips and tricks for Docker.

A journey with Btrfs

Why BTRFS ?

I have recently tested Btrfs as the file system for my /home partition (which was previously on ext4).

I have been impressed by what this file system enables to do, but also came to the conclusion that it is not for me.

As a quick reminder, the goal of this file system is to bring to Linux a fully featured file system similar to zfs. Some of these features promise a lot of awesomeness: snapshots, native RAID, automatic defragmentation and repairs, etc.

Wouldn’t it be cool to have such a file system for your data? Among them, snapshotting really is a killer feature. See it as a global git for all your data. You can track any file history, make a diff comparison on them and revert back to a chosen version, anytime and on-line.

Btrfs has been under development for a while and it is still undergoing. However, the first stable version has finally been released last year.

Many people warn that it is not production ready yet. It seems obvious for critical production systems, under heavy load or using the most advanced features (e.g. RAID). But what about a simple /home, mainly using snapshots (which have been around for a while)?

You will see that there are still some issues with virtualization.

Disclaimer 1: this is in no way a review or a benchmark of Btrfs. Consider it simply as some feedback for my specific use case.

Getting ready

This chapter is a summary of procedures found in various resources, along with my feedback.

Disclaimer 2: First of all, make several backup of your entire /home. And make sure that it is operational and complete. Anyway, beware that there is obviously some inherent risk for your data in manipulating your home partition. So, do not come back to insult me if you lose any data.

First, note that there is a conversion utility btrfs-convert, to convert an existing ext4 partition to btrfs. While this sounds cool, it did not work well with my partition, leading to many corrupted inodes.

So my advice is to just make a good backup of your home:

% rsync -av /home /your/backup/

Then, log out and format the partition as root:

# mount | grep home
/dev/mapper/system-home on /home type ext4 (rw,noatime,data=ordered)
# umount /home
# mkfs.btrfs /dev/mapper/system-home

Change the file system and its options in /etc/fstab. For example:

/dev/system/home     /home     ext4     defaults,noatime     1 1

should become (also note the change on the last digit):

/dev/system/home   /home    btrfs  defaults,noatime,ssd,space_cache,compress=lzo    1 0

Re-mount /home and you are done!

Snapper

The main purpose for me to test Btrfs was the snapshot feature, in the hope to keep a version history of each file and avoid accidental deletions and changes.

Of course, one could use the Btrfs commands and implement snapshots manually. But why reinventing the wheel?

The guys behind snapper  already made a service especially for that. It is basically a wrapper over Btrfs that will make automatic snapshots in the background, based on your frequency settings, and ease their handling.

Once installed, it can be enabled with the following command:

# snapper -c home create-config /home

It has the effect of creating a configuration file, where you can adjust the number of snapshots you want to keep per day, week, month, etc. Of course, don’t keep too much data as it will waste free space, especially if you happen to move large amounts of data. Hourly and daily snapshots are OK, as they would be cleaned up quickly. But monthly or yearly snapshots would consume a lot of space and would be pretty useless for a /home.

Here is what I used, without consuming much more than 10 GB:

# subvolume to snapshot
SUBVOLUME="/home"

# filesystem type
FSTYPE="btrfs"

# users and groups allowed to work with config
ALLOW_USERS=""
ALLOW_GROUPS="

# sync users and groups from ALLOW_USERS and ALLOW_GROUPS to .snapshots
# directory
SYNC_ACL="no"

# start comparing pre- and post-snapshot in background after creating
# post-snapshot
BACKGROUND_COMPARISON="yes"

# run daily number cleanup
NUMBER_CLEANUP="yes"

# limit for number cleanup
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="10"
NUMBER_LIMIT_IMPORTANT="5"

# create hourly snapshots
TIMELINE_CREATE="yes"

# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"

# limits for timeline cleanup
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="10"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="2"
TIMELINE_LIMIT_MONTHLY="0"
TIMELINE_LIMIT_YEARLY="0"

# cleanup empty pre-post-pairs
EMPTY_PRE_POST_CLEANUP="yes"

# limits for empty pre-post-pair cleanup
EMPTY_PRE_POST_MIN_AGE="1800"

Now, let’s play a little. In the following sequence, we create a file containing “Hello World!”, we then create a manual snapshot, change the file and display the differences:

# vim test.txt
# snapper -c home create --description "before test"
# vim test.txt
# sudo snapper -c home list
Type   | # | Pre # | Date                     | User | Cleanup  | Description  | Userdata
-------+---+-------+--------------------------+------+----------+--------------+---------
single | 0 |       |                          | root |          | current      | 
single | 1 |       | Sun Mar 13 19:44:21 2016 | root |          | before test  | 
single | 2 |       | Sun Mar 13 19:45:12 2016 | root |          | created test | 
single | 3 |       | Sun Mar 13 19:52:39 2016 | root |          | update test  | 
single | 4 |       | Sun Mar 13 20:00:01 2016 | root | timeline | timeline     | 
single | 5 |       | Sun Mar 13 21:00:01 2016 | root | timeline | timeline     | 
single | 6 |       | Sun Mar 13 22:00:01 2016 | root | timeline | timeline     | 
# snapper -c home status 1..0
--- "/home/.snapshots/2/snapshot/phocean/test.txt" 2016-03-13 19:44:53.370641373 +0100
+++ "/home/phocean/test.txt" 2016-03-13 19:45:27.226586459 +0100
@@ -1 +1,2 @@
Hell World!
+Good bye.
@@ -0,0 +1,2 @@
+Hell World!
+Good bye

Neat, isn’t it? Now, what if we decide to restore the file to this snapshot:

snapper -c home undochange 1..0 /home/phocean/test.txt

That’s it!

Note that all these operations can be done against the entire partition (no argument needed), a folder or a file.

Pros

Regarding regular files, I had no issue at all. After a week of intensive use, I already the occasion to enjoy the benefits of having snapshots and being able to restore a file.

On the performance side, even though I haven’t done any benchmark, it is a least as fast as ext4. It is said that under some conditions, compression can be a big read rate boost.

On the compression side, on my partition of 400 GB, it allowed me to reclaim around 20 GB of space. Of course, the gain you can expect is totally related to the sorts of files you have (you won’t gain much on files that are already compressed or encrypted).

Cons

As warned on the official wiki itself, you should not use Btrfs as-is with database or virtualization solutions.

Dixit the official wiki:

Files with a lot of random writes can become heavily fragmented (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM.

Indeed, I quickly experienced some issues with Virtualbox. Under heavy I/O operations, and having several machines running at a time, I had the guest file systems corrupted more than once. And so badly that the guest machine was unrecoverable (even with snapshots). Sometimes I got plenty of ext4 errors, or sometimes it just froze, while copying a bunch of file or doing an apt-get upgrade...

The workarounds did not make it for me:

  1. I even did not test disabling CoW for the whole partition. It kills one of the main advantages of using Btrfs.
  2. I tried disabling CoW for all the VM folder. While the corruption frequency decreased, it still occurred after a while.

So, I would simply adivse of not putting any virtual machine on the Btrfs partitions, until this thing definitely get sorted. I use virtual machines intensively at work and need them to be reliable.

Conclusion

Btrfs is awesome and pretty stable at this time, unless you need to host virtual machines. You could still have a dedicate ext4 partition for you VMs, and enjoy Btrfs for the rest of your home.

To be honest, I did not bother (not wanting to manage several partitions), and switched back to ext4 for all, in the expectation of better days. I am not sure if this should be addressed on the Btrfs, or the Virtualbox side (or both).

References

Misc rants on Linux desktop, Mac OS and Antivirus

Linux desktop is in bad shape…

The culprits? Unity and Gnome 3. I am not talking about KDE, as I never felt good with it. I had tried KDE 4 and it did not change my opinion, not to mention that I suffered from several bugs.

Unity? Like many people, I just don’t get it. It is pretty clumsy and feels unachieved. I also suffered from a lot of performance issues like this that are never fixed and make it a pain to use daily.
Gnome 3? Actually, I liked it. It looks nice, is pretty fast and smooth. What I like the most is the workflow. It really makes use of workspaces logical and optimum. But… it did not work for me! Instability, again and again.
You will tell me, that I should have stayed with Gnome 2 or go to XFCE / Openbox / etc. I have used all of them. They have qualities, sure, but we are in 2012 and I want something with more features.

Conclusion: it is sad that after so many years, Linux is not yet ready for the desktop, because some guys decided to break everything again instead of doing incremental enhancements. Why breaking so suddenly things that work? I don’t get it. I felt really fustrated with the feeling that I was at the same point as 5 years ago, dealing with the same kind of bugs. I have long been a Linux advocate and I believed I was right a few years back when I told people it was promising and superior to the competition (Windows XP at the time). Now years have passed, and I started to feel I was lying, or hiding the truth that is Linux Desktop failed and went nowhere.
Yes, I just got tired to fight with the computer to get basic things done. And considering the Linus post and several reactions into the comments, I am not alone in this case.

… so I gave a try to Apple…

I recently got a Mac Book Pro. The main reason is I wanted a very stable workstation to focus on my work. It was hard to admit after so many years using it, but I came to the conclusion that a Linux desktop could not meet this requirement anymore.

So I am going to be with Mac OS Lion for a while (though I am certainly not closing the door to the Linux desktop forever). I have to say that it is a nice OS and it is damned stable. It is good to have something that works out of the box, without any frustration or need to customize things to have something suitable.

And what about the stability of Mac OS? It is very eye candy, but is it stable?

At first, I actually had some serious troubles. It was freezing almost every day, forcing me to a cold reboot. I started to be seriously doubtful concerning the stability of Mac OS, when I found by chance that the freeze occured every time that Sophos Antivirus started an update…

Antivirus and Mac OS…

Wait, what? Antivirus? On Mac OS? I know it will be the reaction of many Mac users. I do also think that it is useless, but for a different reason than most of them.
Of course, I don’t get the “Mac OS is secure” marketing. Actually, it has the less secure kernel around, even though it benefits from a robust Unix architecture.
No, my point is that antivirus all fail anyway. In forensic analysis, we can even not trust an antivirus scan to decide if a machine is sane or not. Instead, we have to use specific tools and memory acquisition to make sure.
It is simply because signature-based detection can always be worked around by malwares. There are hundreds of ways to achieve it successfully: changing binary headers, code obfuscation, encryption, hooking (see rootkits and bootkits).
Ok, antivirus vendors claim that they also offer behavioral detection, sandboxes, etc. Yes, that’s a good move, but they can’t check all of the system activity and again there are many ways to bypass it. So why bother?

I mean, I still think it matters to have an antivirus on Windows. Especially for people who are not too techy. At least, it will detect the most basics threats and throw out alarms. There are thousands of such threats on Windows, and on this point antivirus offer a simple way to defeat them (though awareness and education are certainly more important).

But on Mac Os, and on Linux as well, there are very few threats. Once again, it is not that they are so much secure, but at the time I am writing, it is a fact.

So to summarize:

  • very few threats on Mac OS and Linux
  • antivirus still massively rely on signature-based detection

You see: if there is nothing much to detect, an antivirus is overhead. It will only eat some resources and fail anyway against coming threats.
Just keeping the system up-to-date is certainly the best thing to do so far.

Well, so why did I set an antivirus? I was actually using it for my forensic analysis on Windows machines. It was a convenient way for me to have a local scanner that I could started on dumped suspicious processes, without having to connect on Viruscan. It used to be convenient when I was traveling without connection, but I can live without it.

About Sophos for Mac OS

So moreover this piece of software was crashing my laptop. The update part seems to be executed with root privileges, and for some reason it locks the system (not only mine, look at the forums). Not to mention that having such a component may offer more room to malicious code to exploit the kernel…

A shame, a pure piece of crap. Now that I removed it, I am enjoying an uptime of about 30 days!

Conclusion

Sophos Antivirus for Mac OS is pure crap, run to remove it if it happens to be on your computer.

Anyway, you don’t need an antivirus on Mac OS. Moreover, it seems that several vendor offer solution that lack of maturity and testing on this platform. So you would actually degrade your system stability and security if you would installed on of these.

And Mac OS is a nice Unix-based desktop alternative to have the work done, even though sadly it is not open-source.

openSUSE and Windows samba share

By default, access from openSUSE or any Linux box,to a Windows Vista CIFS share is broken.
The cause is that Samba speaks NTLM while Vista speaks only NTLMv2.

Googling provided the solution, but it does not seems well known yet.

A workaround is to make Vista more flexible on client acceptation.

If you are running the Ultimate or Buisness version of Vista :

  • Run secpol.msc
  • Go to Local Policies / Security Options
  • Find Network Security : LAN Manager authentication level
  • Change the setting from Send NTLMv2 response only to Send LM & NTLM – use NTLMv2 session security if negotiated

If you are running the Home version, you will have to edit the registry manually :

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\LMCompatibilityLevel

If it doesn’t already exist, create a DWORD value named LmCompatibilityLevel and set its value to 1.

Or in smb.conf :
client ntlmv2 auth = yes

More ressources there.