One more rant against the Linux Intel graphic driver

Some quick notes that may help random Linux users looking for similar issues.

I am, like many, the unfortunate user of a laptop with Intel graphics (Thinkpad T460 to be precise). Why unfortunate? Because the graphic driver provided by Intel sucks.

i915, as it is being called, really has been sucking for years, and it is known for that (just google it, if you don’t believe me).

For the sake of completeness, here is the exact model with which I experienced some issues:

%  lspci
00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
...

For performance, remove the X11 Intel driver

First, its X11 module is generally under-performing under X11, so I just removed it to have X11 using modsettings. These are the instructions for Fedora (24), but you can virtually do something similar for any distribution:

% dnf remove xorg-x11-drv-intel

Do not worry, it just remove the X11 part of the driver, not the kernel driver itself.

Login, logout, job done: you should have less lags with desktop environments like gnome-shell.

For stability, disable RC6

I experienced frequent, daily freezes of my work session. The display would totally hang or display a blank screen, forcing me to cold reboot the computer.

Here is an extract of the dmesg kernel traces leading the the crash (it is a bit lengthy, but that may help people to find this post):

oct. 06 11:00:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:00:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:01:43 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:02:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:02:22 localhost.localdomain kernel: [drm] RC6 on
oct. 06 11:04:08 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=6364 end=6365) time 287 us, min 954, max 959, scanline start 950, end 967
oct. 06 11:13:58 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=41777 end=41778) time 340 us, min 954, max 959, scanline start 946, end 967
oct. 06 11:20:18 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=64583 end=64584) time 284 us, min 954, max 959, scanline start 946, end 964
oct. 06 11:20:33 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=65517 end=65518) time 284 us, min 1073, max 1079, scanline start 1071, end 1091
oct. 06 11:28:27 localhost.localdomain kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
oct. 06 11:31:53 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=106339 end=106340) time 285 us, min 1073, max 1079, scanline start 1066, end 1086
oct. 06 11:33:58 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=113803 end=113804) time 287 us, min 954, max 959, scanline start 948, end 966
oct. 06 11:35:13 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=118345 end=118346) time 285 us, min 1073, max 1079, scanline start 1062, end 1081
oct. 06 11:52:59 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=182278 end=182279) time 282 us, min 1073, max 1079, scanline start 1064, end 1084
oct. 06 12:01:29 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=212893 end=212894) time 284 us, min 1073, max 1079, scanline start 1068, end 1088
oct. 06 12:02:44 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=217395 end=217396) time 282 us, min 1073, max 1079, scanline start 1068, end 1088
oct. 06 12:02:49 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=217642 end=217643) time 247 us, min 954, max 959, scanline start 949, end 964
oct. 06 12:03:54 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=221597 end=221598) time 281 us, min 1073, max 1079, scanline start 1067, end 1086
oct. 06 12:05:49 localhost.localdomain kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=228446 end=228447) time 290 us, min 954, max 959, scanline start 948, end 966
oct. 06 12:17:32 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:23 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:18:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:19:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:01 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:25 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:44 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:20:52 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
oct. 06 12:20:52 localhost.localdomain kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
oct. 06 12:20:52 localhost.localdomain kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
oct. 06 12:20:52 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:20:54 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:05 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:05 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [2532], reason: Engine(s) hung, action: reset
oct. 06 12:21:05 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:07 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:15 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:15 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [2532], reason: Engine(s) hung, action: reset
oct. 06 12:21:15 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:17 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:27 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:27 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:21:27 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:29 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:37 localhost.localdomain kernel: [drm] stuck on render ring
oct. 06 12:21:37 localhost.localdomain kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [2322], reason: Engine(s) hung, action: reset
oct. 06 12:21:37 localhost.localdomain kernel: drm/i915: Resetting chip after gpu hang
oct. 06 12:21:39 localhost.localdomain kernel: [drm] RC6 on
oct. 06 12:21:43 localhost.localdomain kernel: ------------[ cut here ]------------
oct. 06 12:21:43 localhost.localdomain kernel: WARNING: CPU: 0 PID: 1109 at drivers/gpu/drm/i915/intel_display.c:13533 intel_atomic_commit+0x13b8/0x1470 [i915]
oct. 06 12:21:43 localhost.localdomain kernel: pipe A vblank wait timed out
oct. 06 12:21:43 localhost.localdomain kernel: Modules linked in: tun nfnetlink_queue nfnetlink_log uas usb_storage xt_nat veth rfcomm ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack dm_thin_pool dm_persistent_data dm_bio_prison loop ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security ebtable_filter ebtables ip6table_filter ip6_tables vmnet(O) ppdev parport_pc parport vboxpci(O) vboxnetadp(O) vboxnetflt(O) fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) vboxdrv(O) cmac bnep cpufreq_stats vfat fat arc4 iTCO_wdt snd_soc_skl iTCO_vendor_support
oct. 06 12:21:43 localhost.localdomain kernel:  snd_soc_skl_ipc snd_hda_codec_hdmi snd_soc_sst_ipc intel_rapl snd_soc_sst_dsp x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_ext_core intel_powerclamp snd_hda_codec_generic coretemp snd_soc_sst_match kvm_intel snd_soc_core kvm snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_hda_codec iwlmvm snd_hda_core mac80211 irqbypass intel_cstate intel_rapl_perf snd_hwdep snd_seq snd_seq_device btusb snd_pcm btrtl uvcvideo btbcm btintel videobuf2_vmalloc videobuf2_memops joydev bluetooth videobuf2_v4l2 iwlwifi i2c_i801 snd_timer videobuf2_core cfg80211 rtsx_pci_ms videodev memstick media mei_me mei shpchp thinkpad_acpi intel_pch_thermal snd soundcore rfkill wmi tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c dm_crypt hid_logitech_hidpp hid_logitech_dj 8021q garp
oct. 06 12:21:43 localhost.localdomain kernel:  stp llc mrp i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel e1000e i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm serio_raw ptp pps_core rtsx_pci video fjes
oct. 06 12:21:43 localhost.localdomain kernel: CPU: 0 PID: 1109 Comm: systemd-logind Tainted: G     U  W  O    4.7.5-200.fc24.x86_64 #1
oct. 06 12:21:43 localhost.localdomain kernel: Hardware name: LENOVO 20FNCTO1WW/20FNCTO1WW, BIOS R06ET42W (1.16 ) 09/20/2016
oct. 06 12:21:43 localhost.localdomain kernel:  0000000000000286 0000000018e0c148 ffff8800d283b850 ffffffffb63daaaf
oct. 06 12:21:43 localhost.localdomain kernel:  ffff8800d283b8a0 0000000000000000 ffff8800d283b890 ffffffffb60a0b0b
oct. 06 12:21:43 localhost.localdomain kernel:  000034dd00000000 ffff88040f607000 0000000000000000 0000000000000000
oct. 06 12:21:43 localhost.localdomain kernel: Call Trace:
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb63daaaf>] dump_stack+0x63/0x84
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60a0b0b>] __warn+0xcb/0xf0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60a0b8f>] warn_slowpath_fmt+0x5f/0x80
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e4483>] ? finish_wait+0x53/0x70
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc05046a8>] intel_atomic_commit+0x13b8/0x1470 [i915]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e46e0>] ? prepare_to_wait_event+0xf0/0xf0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc0380ba7>] drm_atomic_commit+0x37/0x60 [drm]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e21e8>] restore_fbdev_mode+0x238/0x260 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e45d4>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc03e464d>] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffc051ea4a>] intel_fbdev_set_par+0x1a/0x60 [i915]
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb645a6b6>] fb_set_var+0x236/0x460
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60e4004>] ? __wake_up+0x44/0x50
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb67ea562>] ? down_write+0x12/0x40
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64caabb>] ? tty_unthrottle+0x3b/0x60
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb645074f>] fbcon_blank+0x30f/0x350
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64db0b2>] do_unblank_screen+0xd2/0x1a0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64d0ef6>] vt_ioctl+0x4f6/0x1270
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb64c537a>] tty_ioctl+0x35a/0xc50
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625f909>] ? dput+0xd9/0x260
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625b4b2>] do_vfs_ioctl+0xa2/0x5d0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb60be9b8>] ? task_work_run+0x88/0xb0
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb625ba59>] SyS_ioctl+0x79/0x90
oct. 06 12:21:43 localhost.localdomain kernel:  [<ffffffffb67ec572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
oct. 06 12:21:43 localhost.localdomain kernel: ---[ end trace 9f62268cfd97b6cb ]---

As seen above, it would always happen after a while and when the graphic chip goes to the RC6 power saving mode.

After searching on different forum and wikis, I applied the proposed solution of completly disabling the RC6 mode. Add the part in red to the kernel options in your grub configuration file:

%  cat /etc/default/grub 
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora/root rd.luks.uuid=luks-a9d14a0e-6c22-4976-919a-d216bd69d563 rd.lvm.lv=fedora/swap resume=/dev/dm-2 quiet splash i915.enable_rc6=0"
GRUB_DISABLE_RECOVERY="true"

Then, just rebuild grub and reboot. If you are on a UEFI system (as root):

grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Or, for legacy BIOS:

grub2-mkconfig -o /boot/grub2/grub.cfg

Finally reboot and you are done.

There is a caveat however, as it will probably cause some battery drain. With Powertop, I measured a consumption increase of around 6 W (8 to 13W), which caused my battery life to drop from approximately 10h to 5h30.

Still enough and a acceptable price to pay to work reliably without risking a complete system hang.

But, if I had to buy a computer personally, I would make sure that it has an nvidia card. Yeah, I know that there proprietary blob has its caveats too, but from what I heard it is probably more stable.

Graphic drivers have always been a problem for “Linux on the desktop”.

References

  • https://wiki.archlinux.org/index.php/intel_graphics
  • https://wiki.gentoo.org/wiki/Intel