| .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) |
| .. [see the bottom of this file for redistribution information] |
| |
| Reporting regressions |
| +++++++++++++++++++++ |
| |
| "*We don't cause regressions*" is the first rule of Linux kernel development; |
| Linux founder and lead developer Linus Torvalds established it himself and |
| ensures it's obeyed. |
| |
| This document describes what the rule means for users and how the Linux kernel's |
| development model ensures to address all reported regressions; aspects relevant |
| for kernel developers are left to Documentation/process/handling-regressions.rst. |
| |
| |
| The important bits (aka "TL;DR") |
| ================================ |
| |
| #. It's a regression if something running fine with one Linux kernel works worse |
| or not at all with a newer version. Note, the newer kernel has to be compiled |
| using a similar configuration; the detailed explanations below describes this |
| and other fine print in more detail. |
| |
| #. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst, |
| it already covers all aspects important for regressions and repeated |
| below for convenience. Two of them are important: start your report's subject |
| with "[REGRESSION]" and CC or forward it to `the regression mailing list |
| <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev). |
| |
| #. Optional, but recommended: when sending or forwarding your report, make the |
| Linux kernel regression tracking bot "regzbot" track the issue by specifying |
| when the regression started like this:: |
| |
| #regzbot introduced: v5.13..v5.14-rc1 |
| |
| |
| All the details on Linux kernel regressions relevant for users |
| ============================================================== |
| |
| |
| The important basics |
| -------------------- |
| |
| |
| What is a "regression" and what is the "no regressions" rule? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| It's a regression if some application or practical use case running fine with |
| one Linux kernel works worse or not at all with a newer version compiled using a |
| similar configuration. The "no regressions" rule forbids this to take place; if |
| it happens by accident, developers that caused it are expected to quickly fix |
| the issue. |
| |
| It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with |
| 5.14 doesn't work at all, works significantly slower, or misbehaves somehow. |
| It's also a regression if a perfectly working application suddenly shows erratic |
| behavior with a newer kernel version; such issues can be caused by changes in |
| procfs, sysfs, or one of the many other interfaces Linux provides to userland |
| software. But keep in mind, as mentioned earlier: 5.14 in this example needs to |
| be built from a configuration similar to the one from 5.13. This can be achieved |
| using ``make olddefconfig``, as explained in more detail below. |
| |
| Note the "practical use case" in the first sentence of this section: developers |
| despite the "no regressions" rule are free to change any aspect of the kernel |
| and even APIs or ABIs to userland, as long as no existing application or use |
| case breaks. |
| |
| Also be aware the "no regressions" rule covers only interfaces the kernel |
| provides to the userland. It thus does not apply to kernel-internal interfaces |
| like the module API, which some externally developed drivers use to hook into |
| the kernel. |
| |
| How do I report a regression? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Just report the issue as outlined in |
| Documentation/admin-guide/reporting-issues.rst, it already describes the |
| important points. The following aspects outlined there are especially relevant |
| for regressions: |
| |
| * When checking for existing reports to join, also search the `archives of the |
| Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and |
| `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. |
| |
| * Start your report's subject with "[REGRESSION]". |
| |
| * In your report, clearly mention the last kernel version that worked fine and |
| the first broken one. Ideally try to find the exact change causing the |
| regression using a bisection, as explained below in more detail. |
| |
| * Remember to let the Linux regressions mailing list |
| (regressions@lists.linux.dev) know about your report: |
| |
| * If you report the regression by mail, CC the regressions list. |
| |
| * If you report your regression to some bug tracker, forward the submitted |
| report by mail to the regressions list while CCing the maintainer and the |
| mailing list for the subsystem in question. |
| |
| If it's a regression within a stable or longterm series (e.g. |
| v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list |
| <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org). |
| |
| In case you performed a successful bisection, add everyone to the CC the |
| culprit's commit message mentions in lines starting with "Signed-off-by:". |
| |
| When CCing for forwarding your report to the list, consider directly telling the |
| aforementioned Linux kernel regression tracking bot about your report. To do |
| that, include a paragraph like this in your mail:: |
| |
| #regzbot introduced: v5.13..v5.14-rc1 |
| |
| Regzbot will then consider your mail a report for a regression introduced in the |
| specified version range. In above case Linux v5.13 still worked fine and Linux |
| v5.14-rc1 was the first version where you encountered the issue. If you |
| performed a bisection to find the commit that caused the regression, specify the |
| culprit's commit-id instead:: |
| |
| #regzbot introduced: 1f2e3d4c5d |
| |
| Placing such a "regzbot command" is in your interest, as it will ensure the |
| report won't fall through the cracks unnoticed. If you omit this, the Linux |
| kernel's regressions tracker will take care of telling regzbot about your |
| regression, as long as you send a copy to the regressions mailing lists. But the |
| regression tracker is just one human which sometimes has to rest or occasionally |
| might even enjoy some time away from computers (as crazy as that might sound). |
| Relying on this person thus will result in an unnecessary delay before the |
| regressions becomes mentioned `on the list of tracked and unresolved Linux |
| kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the |
| weekly regression reports sent by regzbot. Such delays can result in Linus |
| Torvalds being unaware of important regressions when deciding between "continue |
| development or call this finished and release the final?". |
| |
| Are really all regressions fixed? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Nearly all of them are, as long as the change causing the regression (the |
| "culprit commit") is reliably identified. Some regressions can be fixed without |
| this, but often it's required. |
| |
| Who needs to find the root cause of a regression? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Developers of the affected code area should try to locate the culprit on their |
| own. But for them that's often impossible to do with reasonable effort, as quite |
| a lot of issues only occur in a particular environment outside the developer's |
| reach -- for example, a specific hardware platform, firmware, Linux distro, |
| system's configuration, or application. That's why in the end it's often up to |
| the reporter to locate the culprit commit; sometimes users might even need to |
| run additional tests afterwards to pinpoint the exact root cause. Developers |
| should offer advice and reasonably help where they can, to make this process |
| relatively easy and achievable for typical users. |
| |
| How can I find the culprit? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Perform a bisection, as roughly outlined in |
| Documentation/admin-guide/reporting-issues.rst and described in more detail by |
| Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but |
| in many cases finds the culprit relatively quickly. If it's hard or |
| time-consuming to reliably reproduce the issue, consider teaming up with other |
| affected users to narrow down the search range together. |
| |
| Who can I ask for advice when it comes to regressions? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Send a mail to the regressions mailing list (regressions@lists.linux.dev) while |
| CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the |
| issue might better be dealt with in private, feel free to omit the list. |
| |
| |
| Additional details about regressions |
| ------------------------------------ |
| |
| |
| What is the goal of the "no regressions" rule? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Users should feel safe when updating kernel versions and not have to worry |
| something might break. This is in the interest of the kernel developers to make |
| updating attractive: they don't want users to stay on stable or longterm Linux |
| series that are either abandoned or more than one and a half years old. That's |
| in everybody's interest, as `those series might have known bugs, security |
| issues, or other problematic aspects already fixed in later versions |
| <http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_. |
| Additionally, the kernel developers want to make it simple and appealing for |
| users to test the latest pre-release or regular release. That's also in |
| everybody's interest, as it's a lot easier to track down and fix problems, if |
| they are reported shortly after being introduced. |
| |
| Is the "no regressions" rule really adhered in practice? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| It's taken really seriously, as can be seen by many mailing list posts from |
| Linux creator and lead developer Linus Torvalds, some of which are quoted in |
| Documentation/process/handling-regressions.rst. |
| |
| Exceptions to this rule are extremely rare; in the past developers almost always |
| turned out to be wrong when they assumed a particular situation was warranting |
| an exception. |
| |
| Who ensures the "no regressions" rule is actually followed? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The subsystem maintainers should take care of that, which are watched and |
| supported by the tree maintainers -- e.g. Linus Torvalds for mainline and |
| Greg Kroah-Hartman et al. for various stable/longterm series. |
| |
| All of them are helped by people trying to ensure no regression report falls |
| through the cracks. One of them is Thorsten Leemhuis, who's currently acting as |
| the Linux kernel's "regressions tracker"; to facilitate this work he relies on |
| regzbot, the Linux kernel regression tracking bot. That's why you want to bring |
| your report on the radar of these people by CCing or forwarding each report to |
| the regressions mailing list, ideally with a "regzbot command" in your mail to |
| get it tracked immediately. |
| |
| How quickly are regressions normally fixed? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Developers should fix any reported regression as quickly as possible, to provide |
| affected users with a solution in a timely manner and prevent more users from |
| running into the issue; nevertheless developers need to take enough time and |
| care to ensure regression fixes do not cause additional damage. |
| |
| The answer thus depends on various factors like the impact of a regression, its |
| age, or the Linux series in which it occurs. In the end though, most regressions |
| should be fixed within two weeks. |
| |
| Is it a regression, if the issue can be avoided by updating some software? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Almost always: yes. If a developer tells you otherwise, ask the regression |
| tracker for advice as outlined above. |
| |
| Is it a regression, if a newer kernel works slower or consumes more energy? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Yes, but the difference has to be significant. A five percent slow-down in a |
| micro-benchmark thus is unlikely to qualify as regression, unless it also |
| influences the results of a broad benchmark by more than one percent. If in |
| doubt, ask for advice. |
| |
| Is it a regression, if an external kernel module breaks when updating Linux? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| No, as the "no regression" rule is about interfaces and services the Linux |
| kernel provides to the userland. It thus does not cover building or running |
| externally developed kernel modules, as they run in kernel-space and hook into |
| the kernel using internal interfaces occasionally changed. |
| |
| How are regressions handled that are caused by security fixes? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| In extremely rare situations security issues can't be fixed without causing |
| regressions; those fixes are given way, as they are the lesser evil in the end. |
| Luckily this middling almost always can be avoided, as key developers for the |
| affected area and often Linus Torvalds himself try very hard to fix security |
| issues without causing regressions. |
| |
| If you nevertheless face such a case, check the mailing list archives if people |
| tried their best to avoid the regression. If not, report it; if in doubt, ask |
| for advice as outlined above. |
| |
| What happens if fixing a regression is impossible without causing another? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Sadly these things happen, but luckily not very often; if they occur, expert |
| developers of the affected code area should look into the issue to find a fix |
| that avoids regressions or at least their impact. If you run into such a |
| situation, do what was outlined already for regressions caused by security |
| fixes: check earlier discussions if people already tried their best and ask for |
| advice if in doubt. |
| |
| A quick note while at it: these situations could be avoided, if people would |
| regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each |
| development cycle a test run. This is best explained by imagining a change |
| integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at |
| the same time is a hard requirement for some other improvement applied for |
| 5.15-rc1. All these changes often can simply be reverted and the regression thus |
| solved, if someone finds and reports it before 5.15 is released. A few days or |
| weeks later this solution can become impossible, as some software might have |
| started to rely on aspects introduced by one of the follow-up changes: reverting |
| all changes would then cause a regression for users of said software and thus is |
| out of the question. |
| |
| Is it a regression, if some feature I relied on was removed months ago? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| It is, but often it's hard to fix such regressions due to the aspects outlined |
| in the previous section. It hence needs to be dealt with on a case-by-case |
| basis. This is another reason why it's in everybody's interest to regularly test |
| mainline pre-releases. |
| |
| Does the "no regression" rule apply if I seem to be the only affected person? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| It does, but only for practical usage: the Linux developers want to be free to |
| remove support for hardware only to be found in attics and museums anymore. |
| |
| Note, sometimes regressions can't be avoided to make progress -- and the latter |
| is needed to prevent Linux from stagnation. Hence, if only very few users seem |
| to be affected by a regression, it for the greater good might be in their and |
| everyone else's interest to lettings things pass. Especially if there is an |
| easy way to circumvent the regression somehow, for example by updating some |
| software or using a kernel parameter created just for this purpose. |
| |
| Does the regression rule apply for code in the staging tree as well? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Not according to the `help text for the configuration option covering all |
| staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_, |
| which since its early days states:: |
| |
| Please note that these drivers are under heavy development, may or |
| may not work, and may contain userspace interfaces that most likely |
| will be changed in the near future. |
| |
| The staging developers nevertheless often adhere to the "no regressions" rule, |
| but sometimes bend it to make progress. That's for example why some users had to |
| deal with (often negligible) regressions when a WiFi driver from the staging |
| tree was replaced by a totally different one written from scratch. |
| |
| Why do later versions have to be "compiled with a similar configuration"? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Because the Linux kernel developers sometimes integrate changes known to cause |
| regressions, but make them optional and disable them in the kernel's default |
| configuration. This trick allows progress, as the "no regressions" rule |
| otherwise would lead to stagnation. |
| |
| Consider for example a new security feature blocking access to some kernel |
| interfaces often abused by malware, which at the same time are required to run a |
| few rarely used applications. The outlined approach makes both camps happy: |
| people using these applications can leave the new security feature off, while |
| everyone else can enable it without running into trouble. |
| |
| How to create a configuration similar to the one of an older kernel? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Start your machine with a known-good kernel and configure the newer Linux |
| version with ``make olddefconfig``. This makes the kernel's build scripts pick |
| up the configuration file (the ".config" file) from the running kernel as base |
| for the new one you are about to compile; afterwards they set all new |
| configuration options to their default value, which should disable new features |
| that might cause regressions. |
| |
| Can I report a regression I found with pre-compiled vanilla kernels? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| You need to ensure the newer kernel was compiled with a similar configuration |
| file as the older one (see above), as those that built them might have enabled |
| some known-to-be incompatible feature for the newer kernel. If in doubt, report |
| the matter to the kernel's provider and ask for advice. |
| |
| |
| More about regression tracking with "regzbot" |
| --------------------------------------------- |
| |
| What is regression tracking and why should I care about it? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Rules like "no regressions" need someone to ensure they are followed, otherwise |
| they are broken either accidentally or on purpose. History has shown this to be |
| true for Linux kernel development as well. That's why Thorsten Leemhuis, the |
| Linux Kernel's regression tracker, and some people try to ensure all regression |
| are fixed by keeping an eye on them until they are resolved. Neither of them are |
| paid for this, that's why the work is done on a best effort basis. |
| |
| Why and how are Linux kernel regressions tracked using a bot? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Tracking regressions completely manually has proven to be quite hard due to the |
| distributed and loosely structured nature of Linux kernel development process. |
| That's why the Linux kernel's regression tracker developed regzbot to facilitate |
| the work, with the long term goal to automate regression tracking as much as |
| possible for everyone involved. |
| |
| Regzbot works by watching for replies to reports of tracked regressions. |
| Additionally, it's looking out for posted or committed patches referencing such |
| reports with "Link:" tags; replies to such patch postings are tracked as well. |
| Combined this data provides good insights into the current state of the fixing |
| process. |
| |
| How to see which regressions regzbot tracks currently? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. |
| |
| What kind of issues are supposed to be tracked by regzbot? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The bot is meant to track regressions, hence please don't involve regzbot for |
| regular issues. But it's okay for the Linux kernel's regression tracker if you |
| involve regzbot to track severe issues, like reports about hangs, corrupted |
| data, or internal errors (Panic, Oops, BUG(), warning, ...). |
| |
| How to change aspects of a tracked regression? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| By using a 'regzbot command' in a direct or indirect reply to the mail with the |
| report. The easiest way to do that: find the report in your "Sent" folder or the |
| mailing list archive and reply to it using your mailer's "Reply-all" function. |
| In that mail, use one of the following commands in a stand-alone paragraph (IOW: |
| use blank lines to separate one or multiple of these commands from the rest of |
| the mail's text). |
| |
| * Update when the regression started to happen, for example after performing a |
| bisection:: |
| |
| #regzbot introduced: 1f2e3d4c5d |
| |
| * Set or update the title:: |
| |
| #regzbot title: foo |
| |
| * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of |
| the issue or a fix are discussed::: |
| |
| #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ |
| #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 |
| |
| * Point to a place with further details of interest, like a mailing list post |
| or a ticket in a bug tracker that are slightly related, but about a different |
| topic:: |
| |
| #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 |
| |
| * Mark a regression as invalid:: |
| |
| #regzbot invalid: wasn't a regression, problem has always existed |
| |
| Regzbot supports a few other commands primarily used by developers or people |
| tracking regressions. They and more details about the aforementioned regzbot |
| commands can be found in the `getting started guide |
| <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and |
| the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ |
| for regzbot. |
| |
| .. |
| end-of-content |
| .. |
| This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top |
| of the file. If you want to distribute this text under CC-BY-4.0 only, |
| please use "The Linux kernel developers" for author attribution and link |
| this as source: |
| https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst |
| .. |
| Note: Only the content of this RST file as found in the Linux kernel sources |
| is available under CC-BY-4.0, as versions of this text that were processed |
| (for example by the kernel's build system) might contain content taken from |
| files which use a more restrictive license. |