# Should You Use Upper Bound Version Constraints?

Bound version constraints (upper caps) are starting to show up in the Python ecosystem. This is causing real world problems with libraries following this recommendation, and is likely to continue to get worse; this practice does not scale to large numbers of libraries or large numbers of users. In this discussion I would like to explain why always providing an upper limit causes far more harm than good even for true SemVer libraries, why libraries that pin upper limits require more frequent updates rather than less, and why it is not scalable. After reading this, hopefully you will always consider every cap you add, you will know the (few) places where pinning an upper limit is reasonable, and will possibly even avoid using libraries that pin upper limits needlessly until the author updates them to remove these pins.

If this 10,000 word behemoth is a bit long for you, then skip around using the table of contents, or see the TL;DR section at the end, or read version numbers by Bernát Gábor, which is shorter but is a fantastic read with good examples and cute dog pictures. Or Hynek’s Semantic Versioning Will Not Save You Be sure to check at least the JavaScript project analysis before you leave!

Also be warned, I pick on Poetry quite a bit. The rising popularity of Poetry is likely due to the simplicity of having one tool vs. many for packaging, but it happens to also have a special dependency solver, a new upper bound syntax, and a strong recommendation to always limit upper versions - in direct opposition to members of the Python core developer team and PyPA developers. Not all libraries with excessive version capping are Poetry projects (like TensorFlow), but many, many of them are. To be clear, Poetry doesn’t force version pinning on you, but it does push you really, really hard to always version cap, and it’s targeting new Python users that don’t know any better yet than to accept bad recommendations. And these affect the whole ecosystem, including users who do not use poetry, but want to depend on libraries that do! I do really like other aspects of Poetry, and would like to eventually help it build binary packages with Scikit-build (CMake) via a plugin, and I use it on some of my projects happily. If I don’t pick on Poetry enough for you, don’t worry, I have a follow-up post that picks on it in much more detail. Also, check out pdm, which gives many of the benefits of Poetry while following PEP standards.

This turned out be quite long (and even longer after reviews by PyPA and Python core developers), so I’ve included a table of contents. Feel free to jump to the thing that you care about. This was also split into three posts, the first is application vs. library, and the final one is Poetry versions.

# Intro

What is version capping? It’s when you have a dependency in project.dependencies / tool.poetry.dependencies / install_requires, and instead of this:

click>=7


You write this:

click>=7,<8
# Equivalent, the tilde lets the final number be larger
click~=7.0


Or, only in Poetry, this:

[tool.poetry.dependencies]
click = "^7"


This allows any newer version up to but not including the next “major” version. The syntax here is governed by PEP 440, except for Poetry’s addition, which comes from other languages like JavaScript’s npm.

## SemVer

Let’s briefly define SemVer - both true SemVer and realistic SemVer. One difference between this post and previous attempts by others is that I will even be addressing dependencies that use true SemVer (which there are, admittedly, none, true minor and patch releases are an impossible concept to achieve).

SemVer states there are three version digits in the form Major.Minor.Patch. The “rule” is that only fixes are allowed if the patch number is increased, only additions are allowed if the minor version is bumped, and if you do anything could break downstream users, then the major version must be bumped. I recommend the excellent article here by the tox/virtualenv maintainer and fellow PyPA member, Bernát Gábor.

Whenever any downstream code breaks and it was not a major release, then you’ll have a smattering of people that immediately start complaining that the library “didn’t follow SemVer”, and that it was not a problem with SemVer, but your problem for not following it. A long discussion can be found here, but I’ll give a tiny taste of it. Which version do you bump when you update? It turns out one person’s bugfix is another’s breaking change. Basically, a “perfect” SemVer library would pretty much always bump the major version, since you almost always could possibly break a user with any change (and if you have enough users, by Hyrum’s law, you will do this). This makes “true” SemVer pointless. Minor releases are impossible, and patch releases are nearly impossible. If you fix a bug, someone could be depending on the buggy behaviour (distutils, looking at you). Of course, even a SemVer purist will admit users should not break if you add or fix something, but that does mean there is no such thing as “pure” SemVer.

Example: pyparsing 3.0.5 (click to expand)

PyParsing recently removed a non-public attribute in 3.0.5, which broke all versions of packaging, since it happened to be using this non-private attribute. Was this packaging’s fault? Yes, but now pyparsing 3.0.5 is a breaking release for a lot of the Python ecosystem. packaging 21.2 was released with a cap (the wrong cap, <3 instead of !=3.0.5, so it breaks libraries expecting >=3 that previously worked). pyparsing 3.0.6 restored this private attribute, and packaging stopped using it, as well.

Does dropping Python 2 require a major release? Many (most) packages did this, but the general answer is ironically no, it is not an addition or a breaking change; the version solver will ensure the correct version is used (unless the Requires-Python metadata slot is empty or not updated, never forget this, never set lower than what you test!).

Now, don’t get me wrong, I love “realistic” or “almost” SemVer - I personally use it on all my libraries (I don’t maintain a single CalVer library like pip or only-live-at-head library, like googletest). Practical SemVer mostly follows the rule above, but acknowledges the fact that it’s not perfect. It also often adds a new rule to the mix: if you deprecate a feature (almost always in a minor release), you can remove that feature in a future minor release. You have to check the library to see what the deprecation period is - NumPy and Python use three minor releases. This is also used in CalVer (versioning based on dates) - you can set a deprecation period in time. Really large libraries hate making major releases - Python 2->3 was a disaster. SemVer purists argue that this makes minor releases into major releases, but it’s not that simple - the deprecation period ensures the “next” version works, which is really useful, and usually gives you time to adjust before the removal happens. It’s a great balance for projects that are well kept up using libraries that move forward at a reasonable pace. If you make sure you can see deprecations, you will almost always work with the next several versions.

The best description of realistic SemVer I’ve seen is that it’s an “abbreviated changelog”. I love this, because I love changelogs - I think it is the most important part of documentation you have, a well written changelog lets you see what was missing before so you know not to look for it in older versions, it lets you know what changed so you can update your code (both to support a version as well as again when you drop older versions), and is a great indicator of the health and stability of a project. With SemVer, you can look at the version, and that gives you a quick idea of how large and what sort of changes have occurred before checking the changelog.

## Solver

We need to briefly mention the solver, as there happen to be several, and one reason this is more relevant today than a few years ago is due to changes in the solver.

Pip’s solver changed in version 20.3 to become significantly smarter. The old solver would ignore incompatible transitive requirements much more often than the new solver does. This means that an upper cap in a library might have been ignored before, but is much more likely to break things or change the solve now.

It tries to find a working set of dependencies that all agree with each other. By looking back in time, it’s happy to solve very old versions of packages if newer ones are supposed to be incompatible. This can be helpful, but is slow, and also means you can easily get a very ancient set of packages when you thought you were getting the latest versions.

Poetry has a unique and very strict (and slower) solver that goes even farther hunting for solutions. It forces you to cap Python if a dependency does1. One key difference is that Poetry has the original environment specification to work with every time, while pip does not know what the original environment constraints were. This enables Poetry to roll back a dependency on a subsequent solve, while pip does not know what the original requirements were and so does not know if an older package is valid when it encounters a new cap.

Conda’s solver is like Poetry, and due to the number of builds in conda channels, this should scare you - initial solves for an environment with a large number of dependencies (including a single package like ROOT) can take minutes, and updates to existing environments can take more than a day. I’ve never had Poetry take more than 70 seconds, but I’ve also not used it on anything large; it also always has the original specification, while conda only has it if you use a file based update (which is faster). Conda has gotten better by taking more shortcuts2 and guessing things (I haven’t had a 25+ hour solve in a while), and Mamba’s C implementation and better algorithms really help, but doing a “smart” solve is hard.

We’ll see it again, but just to point it out here: solver errors are pure evil, and can’t be fixed downstream. If a library requires pyparsing>=3 and another library requests pyparsing<3, that’s the end, you are out of business. “Smart” solvers may look for older versions of those libraries to see if one exists that does not have that cap - if the one with the high lower bound had a release with a lower upper bound, that’s what it will choose; regardless of what bugs have been fixed, etc. since that release. We’ll discuss the problems this causes later. However, an under-constrained build is completely trivial to fix for any user. It’s just a minor inconvenience to a large number of users.

# The problem: Relying on SemVer for capping versions

Now comes the problem: If you have a dependency, should you add an upper cap? Let’s look at the different aspects of this. Be sure you understand the difference between libraries and applications, as defined in a previous post.

We’ll cover the valid use cases for capping after this section. But, just to be clear, if you know you do not support a new release of a library, then absolutely, go ahead and cap it as soon as you know this to be true. If something does not work, you should cap (or maybe restrict a single version if the upstream library has a temporary bug rather than a design direction that’s causing the failure). You should also do as much as you can to quickly remove the cap, as all the downsides of capping in the next sections still apply.

The following will assume you are capping before knowing that something does not work, but just out of general principle, like Poetry recommends and defaults to with poetry add and the default template. In most cases, the answer will be “don’t”. For simplicity, I will also assume you are being tempted to cap to major releases (^1.0.0 in Poetry or ~=1.0 in all other tooling that follows Python standards via PEP 440). If you cap to minor versions (~=1.0.0), this is much worse, and the arguments below apply even more strongly.

## Version limits break code too

Library: ✅ (applies)
Application: ✳️ (partially applicable)

No one likes having an update break users. For example, IPython depends on a library called Jedi. Jedi 0.18 removed something that IPython used, so until IPython 7.20 was released, a “normal” solve (like pip install ipython) resulted in a broken install. It’s tempting to look back at that and say “well, if IPython capped it’s dependencies, that would have been avoided”. In fact, every time this happens, you will find well-meaning but misguieded suggestions claiming this is proof you should have chosen reasonable upper bounds for all requirements.

However, capping dependencies also breaks things, and you can’t fix it downstream. If I write a library or application that depends on a library that has a broken dependency, I can limit it, and then my users are happy. In the case above, the interim solution was to just manually pin Jedi, such as pip install ipython jedi<0.18 for a user or to cap it in dependencies for a library or application. Any user can easily do that - irritating, and a leaky abstraction, but fixable. But you can’t fix an over-constraint - and this is not just a pip issue; you’d have to throw out the entire dependency solver system to get things to install. Most other Jedi releases have been fine, capping on other versions would have been problematic for users who don’t care or know about IPython using Jedi.

If you put upper limits, this also then can’t easily be fixed by your dependencies - it usually forces the fix on the library that does the pinning. This means every single major release of every dependency you cap immediately requires you to make a new release or everyone using your library can no longer use the latest version of those libraries. If “make a new release quickly” from above bothered you; well, now you have to make it on every version bump of every pinned dependency.3

It also means you must support a wide version range; ironically, this is the very opposite of the syntax Poetry adds for capping. For example, let’s say you support click^7. Click 8 comes out, and someone writes a library requiring click^8. Now your library can’t be installed at the same time as that other library, your requirements do not overlap. If you update to requiring click ^8, your update can’t be installed with another library still on click^7. So you have to support click>=7,<9 for a while until most libraries have similarly updated (and this makes the ^ syntax rather useless, IMO). This particular example is especially bad, because a) it’s common, b) click is used for the “application” part of the code, which is likely not even used by your usage if you are using it as a library, and c) the main breaking change in Click 8 was the removal of Python <3.6 support, which is already included in the solve.

## Fixes are not always backported

Library: ✅ (applies)
Application: ✅ (applies)

I’ll defer to Bernát’s post to explain that many Python projects simply do not have the time or resources to continue to provide fixes and security patches for old major versions. I’ve had old patch update requests refused from projects like pip and pandas. Actually, I’ve even refused them for CLI11, my CI was broken on the old 1.x version so I couldn’t run tests.

Let’s look at a concrete example. Let’s say some library version 6.1.0 worked. You pin to <7. Then 6.2.0 comes out, and breaks your code. The problem is discovered and fixed, but the development has gone on too far to easily backport, or it’s too involved, so 7.0.1 works again. Your cap is now broken, and your code does not work forever (next section). I have seen fixes like this multiple times, and have been responsible for them, as well. Often the CI system breaks for old, unmaintained releases, and it’s not feasible to go back and fix the old version.

Newer version of code are intended to be better than older versions of code; they fix bugs, the add new hardware/software compatibility, and they fix security releases. You should never be limiting your user’s ability to update things you happen to depend on (dependencies should not be a leaky abstraction; a user shouldn’t have to know or care about what you depend on). Upper limits dramatically limit the ability to update without the end user’s knowledge.

You want users to use the latest versions of your code. You want to be able to release fixes and have users pick up those fixes. And, in general, you4 don’t like having to release new patch versions for old major (or even minor) versions of your library, at least very far back. So why force your dependencies to support old versions with patch releases because you’ve capped them?

## Code does not work forever

Library: ✅ (applies)
Application: ✅ (applies)

One claim I’ve seen Poetry developers make is that capping your dependencies means your code will work in the future. Hopefully I don’t have to tell you this is wrong; it’s completely wrong for a library for the reason outlined above, and partially wrong for an application. There are lots of reasons code breaks without changing it, let’s look at a few.

One of the most recent ones was the macOS Apple Silicon transition. In order to get working code, you have to have the latest versions of packages. Python 3.9 (and later backported to 3.8) is required on Apple Silicon, so if you depend on a library capped to not support those versions, you are out of luck, that code does not work. This is also true with all the major libraries, which did not backport Apple Silicon support very far. You need a recent NumPy, pip, packaging, Poetry, etc. And just in case you think you have plenty of time, note that Apple no longer even sells an Intel based notebook, exactly one year after the transition started.

Similar support rollouts have happened (or are happening) for Linux and Windows architectures (like ARM and PowerPC), operating system updates (macOS 11’s new numbering system broke pip, Poetry, and lots of other things, might also happen to a lesser extent on Windows 11), new manylinux versions, PyPy support, Musllinux wheels, and even just adding wheel support in general, actually. Code simply will never work forever, and allowing the possibility of using newer libraries increases the chance it can be used. If you limit to NumPy to 1.17, you will never support Apple Silicon. However, if you don’t limit it, and the final version you actually support happens to be 1.21, then your code will work with Apple Silicon, and future users may have to manually limit versions to <1.22 eventually, but it will work.

Artificially limiting versions will always reduce the chances of it working in the future. It just avoids users in the future from having to add extra limits, but this is a problem that has a simple user workaround, and is not that likely to happen for many dependencies. If someone is using a multiple-year old version of your code, either you disappeared (and you therefore can’t fix broken upper limits), or they are being forced to use an old version, probably because someone someone else (artificially) pinned your library.

Sphinx and Docutils example

This happened recently with Sphinx & docutils. The problem was docutils 0.18 broke Sphinx. Below in valid reasons for capping, you’ll see that docutils in Sphinx actually probably crosses the threshold for pinning. The sweet irony here, though, was that it turned out Sphinx did start capping docutils in Sphinx 4, but the affected users had capped Sphinx to version 3!

## SemVer never promises to break your code

Library: ✅ (applies)
Application: ✅ (applies)

A really easy but incorrect generalization of the SemVer rules is “a major version will break my code”. It’s the basis for Poetry’s recommendation to always cap versions, but it’s a logical fallacy. Even if the library follows true SemVer perfectly, a major version bump does not promise to break downstream code. It promises that some downstream code may break. If you use pytest to test your code, for example, the next major version will be very unlikely to break. If you write a pytest extension, however, then the chances of something breaking are much higher (but not 100%, maybe not even 50%). Quite ironically, the better a package follows SemVer, the smaller the change will trigger a major version, and therefore the less likely a major version will break a particular downstream code.

As a general rule, if you have a reasonably stable dependency, and you only use the documented API, especially if your usage is pretty light/general, then a major update is extremely unlikely to break your code. It’s quite rare for light usage of a library to break on a major update. It can happen, of course, but is unlikely. If you are using something very heavily, if you are working on a framework extension, or if you use internals that are not publicly documented, then your chances of breaking on a major release are much higher. As mentioned before, Python has a culture of producing FutureWarnings, DeprecationWarnings, or PendingDeprecationWarnings (make sure they are on in your testing, and turn into errors), good libraries will use them.

It may sound ridiculous, but I should probably point out that CalVer libraries do not follow SemVer (usually). poetry add packaging will still do ^21 for the version it adds. You shouldn’t be capping versions, but you really shouldn’t be capping CalVer. Poetry itself depends on packaging = ^20.4 at the time of writing (though it actually vendors it). In fact, if I could find where Poetry (not poetry-core, which vendors it) uses packaging, I know how to trigger a bug due to the cap on packaging. But I can’t find where it’s imported other than in tests.

## It doesn’t scale

Library: ✅ (applies)
Application: ❌ (not applicable)

If you have a single library that doesn’t play well, then you probably will get a working solve easily - this is one reason that this practice doesn’t seem so bad at first. If more packages start following this tight capping, however, you end up with a situation where things simply cannot solve - a moderately sized application can have a hundred or more dependencies when expanded. The entire point of packaging is to allow you to get lots of packages that each do some job for you - we should be trying to make it easy to be able to add dependencies, not harder.

The implication of this is you should be very careful when you see tight requirements in packages and you have any upper bound caps anywhere in the dependency chain. If something caps dependencies, there’s a very good chance adding two such packages will break your solve, so you should pick just one - or just avoid them altogether, so you can add one in the future. This is a good rule, actually: Never add a library to your dependencies that has excessive upper bound capping. When I have failed to follow this rule for a larger package, I have usually come to regret it.

If you are doing the capping and are providing a library, you now have a commitment to quickly release an update, ideally right before any capped dependency comes out with a new version. Though if you cap, how to you install development versions or even know when a major version is released? This makes it harder for downstream packages to update, because they have to wait for all the caps to be moved for all upstream.

## It conflicts with tight lower bounds

Library: ✅ (applies)
Application: ❌ (not applicable)

A tight lower bound is only bad if packages cap upper bounds. If you can avoid upper-cap packages, you can accept tight lower bound packages, which are much better; better features, better security, better compatibility with new hardware and OS’s. A good packaging system should allow you to require modern packages; why develop for really old versions of things if the packaging system can upgrade them? But a upper bound cap breaks this. Hopefully anyone who is writing software and pushing versions will agree that tight lower limits are much better than tight upper limits, so if one has to go, it’s the upper limits.

It is also rather rare that packages solve for lower bounds in CI (I would love to see such a solver become an option, by the way!), so setting a tight lower bound is one way to avoid rare errors when old packages are cached that you don’t actually support. CI almost never has a cache of old packages, but users do.

Please test with a constraints.txt file that forces your lower bounds, by the way, at least if you have a reasonable number of users.

## Capping dependencies hides incompatibles

Library: ✅ (applies)
Application: ✅ (applies)

Another serious side effect of capping dependencies is that you are not notified properly of incoming incompatibilities, and you have to be extra proactive in monitoring your dependencies for updates. If you don’t cap your dependencies, you are immediately notified when a dependency releases a new version, probably by your CI, the first time you build with that new version. If you are running your CI with the --dev flag on your pip install (uncommon, but probably a good idea), then you might even catch and fix the issue before a release is even made. If you don’t do this, however, then you don’t know about the incompatibility until (much) later.

If you are not following all of your dependencies, you might not notice that you are out of date until it’s both a serious problem for users and it’s really hard for you to tell what change broke your usage because several versions have been released. While I’m not a huge fan of Google’s live-at-head philosophy (primarily because it has heavy requirements not applicable for most open-source projects), I appreciate and love catching a dependency incompatibility as soon as you possibly can; the smaller the change set, the easier it is to identify and fix the issue.

## Capping all dependencies hides real incompatibilities

Library: ✅ (applies)
Application: ✅ (applies)

If I see X>=1.1, that tells me that you are using features from 1.1 and do not support 1.0. If I see X<1.2, this should tell me that there’s a problem with 1.2 and the current software (specifically something you know the dependency will not fix/revert). Not that you just capped all your dependencies and have no idea if that will or won’t work at all. A cap should be like a TODO; it’s a known issue that needs to be worked on soon. As in yesterday.

## Libraries that ask you to cap

Library: ✅ (applies)
Application: ❌ (not applicable)

There are some libraries who ask users to cap. If a huge change is coming, and most user code is going to be broken, it’s not great, but necessary. If that does happen, explain it in the readme, and follow it as a user if the explanation applies to you (or look for another library that doesn’t intentionally plan to break you). Unfortunately, some libraries are asking users to cap, “just in case” an API breaking change needs to be made. This is wrong for several reasons, especially for writing libraries.

First, SemVer is a battle between frequent and rare major bumps; “True” SemVer forces basically any change to be a breaking change; it’s based on how many users more than how large the change is. This direction makes capping to major versions meaningless, because you have too many of them. The other direction (which is what any library that thinks they have no major versions coming) is not true SemVer, it’s “practical” SemVer, and as I’ve shown, you should not cap to that. You don’t know a major version will break you, and you don’t know a minor / patch version will not introduce subtle (or major) bugs. If that matters, you still have to learn how to use an application lock file. There’s no free pass here. For the upstream library, they are now basically removing their ability to make smaller breaking changes, like removing a deprecated item, because that should be a major change, but a major change will cause a massive number of user to become unsupported until they bump their pin (and remember, libraries can depend on libraries which depend on libraries which depend… You get the point.)

The old version must be maintained. If users are asked to pin to major version, or if you just decide to pin to major version, that version must have at least critical security and version dependency updates (like new OSs, architectures, or Python versions) applied to it until all the (recursive) pins are updated. If it’s not a “LTS” release, you should not pin to it, unless you have to. There are a few of these LTS in Python, but it’s not the social norm - Python is OSS and most of us do not have the resources or intention to maintain multiple branches of packages. We expect you to use the latest releases. Would you depend on a package you know is going to be abandoned? I hope not. If you cap, then that’s what you are doing - the thing you are depending on is going to be abandoned when the next release is made. This forces you to make a new release, and has a trickle-down effect; if you cap obviously you need to expect users may cap you!

Unfortunately, what some library authors are using this for is a free pass to break people’s code without deprecation periods. Not everyone will read the README, etc, and even if they do, they might dislike capping (for the next reason, for example, or any of the other reasons listed here) and not cap your library anyway. So a breaking release will break some number of users proportional to the total number of users anyway, regardless of what you put in your README. Especially if you don’t give a reason, and just have it there “just in case”; but regardless, many (most) users will not read anything anyway.

## Backsolving is usually wrong

Library: ✅ (applies)
Application: ✅ (applies)

Let’s assume you depend on a library, we’ll call it tree. That library depends on dirt>=7.0. You also depend on a library bush. That also depends on dirt>=2. When tree 2.1.3 is released, they noticed they were broken by bush 7.0, so they cap dirt<7.0. Question: what do you think the solver will do? Most people who are on the pro-capping side will answer “produce an error, because the two libraries have incompatible dependencies and there’s no solution”. In fact, I’ve spent quite a bit of time talking about solver errors.

But this is not one of those, not for a “smart” solver like we are seeing today. Instead, the solver will backsolve to tree 2.1.2! This is surely not what the tree developers wanted; that’s probably why they released tree 2.1.3 in the first place! (This is not just hypothetical - IPython ran into this with Jedi 0.18).

Note that unlike the errors you are likely to see from getting the latest, incompatible versions (usually attribute errors and such), these caps can introduce old versions with subtle bugs, security holes, or exactly the same attribute errors and such, and this is completely hidden from the user! They don’t know that they are getting old versions, or why. If they are using a library that does the capping, they don’t see it, and they thought they were getting the latest versions of everything.

If you have even a single release with less strict caps than a newer release, this can happen. If Python had editable metadata, and every author could be trusted to edit every past release with the proper caps once one is known/discovered, then this system would be okay. In fact, conda-forge does exactly this (albeit, conda-forge admins can do the metadata overrides for past releases, which is an important feature that PyPI would never do). However, changing this for PyPI would be a massive undertaking; you’d need to have a extra patch file (wheels are hashed for security), everything, from Pip, wheel, twine, warehouse, all installers, locking packages managers, everything would need to handle these extra files. Then there are security implications - what do you do if some makes a good release then adds a malicious dependency though metadata editing?

The recommend solution is to keep capping to an absolute minimum. These issues only occur when you mix a low upper cap with a high lower cap, so fewer upper caps reduce the chances of this happening.

Now let’s make this even more fun as a segway into our next issue. Let’s say, before numba 0.55 was released, you ask for numba and you are on Python 3.10. If you use Numba, you probably have seen this - you get not 0.54, but 0.50, since that was the last uncapped version of Numba - and the error you see is not the nice setup.py error telling you that Python is too new, but instead a compilation failure for a Numba dependency, llvmlite. No package installer that I’m aware of handles caps on Python versions correctly - and there may not be a “correct” way, but this is the same problem as above, just for Python version. But it gets worse.

Now let’s say you are on Python 3.9, and you are using a locking package manager (Poetry, PDM, Pipenv, …). You leave the default value for Python caps, either no cap or something like ^3.6. What Numba version are you going to get? You guessed it, 0.50. Okay, you probably didn’t guess it. Why? Because the lock file it is generating is supposed to be valid for all versions of Python you requested - so it picked 0.50, since that’s valid on Python 3.99. Never, never put a cap into the Python-Requires metadata slot; and this leads us into our next point…

## Pinning the Python version is special

Library: ✅ (applies)
Application: ✅ (applies)

Anther practice pushed by Poetry is adding an upper cap to the Python version. This is misusing a feature designed to help with dropping old Python versions to instead stop new Python versions from being used. “Scrolling back” through older releases to find the newest version that does not restrict the version of Python being used is exactly the wrong behavior for an upper cap, and that is what the purpose of this field is. All current solvers (Pip, Poetry, PDM) do not work correctly if this field is capped, and implement the scroll back behavior.

To be clear, this is very different from a library: specifically, you can’t downgrade your Python version5 if this is capped to something below your current version. You can only fail. So this does not “fix” something by getting an older, working version, it only causes hard failures if it works the way you might hope it does. This means instead of seeing the real failure and possibly helping to fix it, users just see a “Python doesn’t match” error. And, most of the time, it’s not even a real error; if you support Python 3.x without warnings, you should support Python 3.x+1 (and 3.x+2, too).

Capping to <4 (something like ^3.6 in Poetry) is also directly in conflict with the Python developer’s own statements; they promise the 3->4 transition will be more like the 1->2 transition than the 2->3 transition. It’s not likely to happen soon, and if it does, it likely will be primarily affecting Stable ABI / Limited API builds and/or GIL usage; it likely will not affect normal Python packages more than normal updates will. When Python 4 does come out, it will be really hard to even run your CI on 4 until all your dependencies uncap. And you won’t actually see the real failures, you’ll just see incompatibility errors, so you won’t even know what to report to those libraries. And this practice makes it hard to test development versions of Python.

And, if you use Poetry, as soon as someone caps the Python version, every Poetry project that uses it must also cap, even if you believe it is a detestable practice and confusing to users. It is also wrong unless you fully pin the dependency that forced the cap - if the dependency drops it in a patch release or something else you support, you no longer would need the cap. Even worse, if someone adds a cap or tightens a cap, unless they yank every single older release, a locking solver like Poetry or PDM will backsolve to the last versions without the cap so that the lock file it creates will be “valid” on all the Python versions you are requesting! This is because these solvers are using the cap for the lock file - the lock file - lock files cannot lock the Python version (another fundamental difference), so they are computing the range of Python versions the lock file is valid for. This is different than the Python-Requires metadata slot, but Poetry and PDM both do not have separate settings. If metadata was mutable (it is not) and you actually trusted library authors to go back and check every old release for the correct Python cap (not going to happen), upper capping here is worse than useless.

If you are developing a package like Numba, where internal Python details (bytecode) are relied on so there really is a 0% chance of it working, manually adding an error in your setup.py is fine, but still do not limit here! This metadata field was not designed to support upper caps, and an upper cap should always translate an error; it does not change your solve. Never provide an upper cap to your Python version. I generally will not use a library that has an upper cap to the Python version; when I have missed this, I’ve been bitten by it, hard (cibuildwheel, pybind11, and several other package’s CI went down). To be clear, in that case, Python 3.10 was perfectly fine, and you could install a venv with 3.9 and then upgrade to 3.10 and it would still work. It just broke installing with 3.10, and pre-commit.ci and brew were updating to 3.10, breaking CI. This took hours of my time to roll back across half a dozen repos, it caused people trusting my style recommendations to also be affected, all for an untested version cap - Python 3.10 didn’t break the application at all.

## Applications are slightly different

Now if you have a true application (that is, if you are not intending your package to be used as a library), upper version constraints are much less problematic. You notice not all the reasons above apply for applications. This due to two reasons.

First, if you are writing a library, your “users” are specifying your package in their dependencies; if an update breaks them, they can always add the necessary exclusion or cap for you to help end users - it’s a leaky abstraction, they shouldn’t have to care about what your dependencies are, but when capping interferes with what they can use, that’s also a leaky and unfixable abstraction. For an application, the “users” are more likely to be installing your package directly, where the users are generally other developers adding to requirements for libraries.

Second, for an app that is installed from PyPI, you are less likely to have to worry about what else is installed (the other issues are still true). Many (most?) users will not be using pipx or a fresh virtual environment each time, so in practice, you’ll still run into problems with tight constraints, but there is a workaround (use pipx, for example). You still are still affected by most of the arguments above, though, so personally I’d still not recommend adding untested caps.

You should never depend only on SemVer for a deployed application, like a website. I won’t repeat the SemVer article verbatim here, but in general, you are roughly as likely to get a breakage (usually unintentional) from a minor or patch release of a library than from a major version. Depending on the stability and quality of the library, often more likely. So applications only have one choice: They should supply a lock file that has every dependency explicitly listed. All systems have the ability to do this - you can use pip-tools for pip, Poetry, pdm, and pipenv make lock files automatically, PEP 665 even proposes a standard lock file format, etc. This gives users a way to install using exactly the known working dependencies. In production (say for a website), you must do this. Otherwise, you will randomly break. This is why patch releases exist, it’s because a major, minor, or even other patch release broke something!

If you are not using Poetry or pip-tools, you can still make a simple lock file with:

pip freeze > requirements-lock.txt


Then you can install it with:

pip install --no-deps -r requirements-lock.txt


While this does not include hashes like Poetry, pipenv, or pip-tools will, it covers many low-risk use cases, like setting up a simple web application. By the way, since I’ve been harsh on Poetry, I should point out it really shines here for this use.

What about your general requirements that control the locking process? With a lockfile, you’ll know when you try to update it that something breaks, and then you can add a (temporary) pin for that dependency. Adding arbitrary pins will reduce your ability to update your lock file with the latest dependencies, and obscure what actually is not supported with what is arbitrarily pinned.

# Python is not JavaScript

Poetry gets a lot of inspiration from npm for JavaScript, including the ^ operator syntax (meaning newer minor/patch releases are okay but not major ones). If you are coming from a language like JavaScript, you might be tempted to use upper pins since you are used to seeing them there. But there are two big differences between Python packaging and JavaScript.

## Technical difference

The technical difference is that npm (and JavaScript) has the idea of local dependencies. If you have several packages that request the same dependency, they each get a copy of that dependency. They are free to have conflicting version requirements; each gets a copy so the copies could be different versions. That invalidates several of the arguments above. It also is much harder to add a pin as a user, because you have to add a nested pin (you can use Yarn or you can manually edit your lock file). Poetry does not implement this model, by the way - it still is a traditional Python system with fully shared dependencies.

This does not solve all problems, by the way. It just keeps them from randomly conflicting by keeping them localized - JavaScript libraries act much more like applications under my definitions. It also has the idea of peer dependencies (for plugins), which have all the conflict issues listed so far. In fact, here’s a quote from nodejs.org about peer dependencies:

One piece of advice: peer dependency requirements, unlike those for regular dependencies, should be lenient.

In Python, all dependencies are “peer dependencies”!

## Social difference

The social difference (which stems from the technical difference) is that Python libraries (and Python itself) do not like to do hard, backward incompatible changes without warnings. This is more accepted in languages with local packages, but a stable Python package that breaks backward compatibility without any sort of warning is likely to be avoided if it happens too often. Generally there is a set deprecation period. This was enforced by the Python 3 transition; Python itself and major libraries have promised to never do that hard of a break again.

The other factor possibly responsible for the social difference is that Python libraries often have a small number of maintainers, often with split priorities. This means they cannot devote resources to keeping up multiple major versions of software - usually only the latest version is supported. This means you are expected to use the latest major and minor version to be supported with security and compatibility fixes; but you can’t do this if any of your dependencies force a cap. (Given the number of vulnerabilities reported above, I don’t think JavaScript library maintainers are releasing many new patch releases for old major versions either.)

## Watch for warnings

So Python and its ecosystem does not have an assumption of strict SemVer, and has a tradition of providing deprecation warnings. If you have good CI, you should be able to catch warnings even before your users see them. Try the following pytest configuration:

[tool.pytest.ini_options]
filterwarnings = ["error"]


This will turn warnings into errors and allow your CI to break before users break. You can ignore specific warnings as well; see the pytest docs or scikit-hep’s developer pytest pages.

## Analysis of a JavaScript project

I have been maintaining several open source gitbook projects and gitbook went closed-source years ago. I decided to do an analysis on the lock file for Modern CMake.6

For the project, there are either 2 or 7 user level packages (gitbook and svgexport, as well as five gitbook plugins). This installs 576 packages, which would be flattened to 315 unique packages if you ignored version pinning, or 426 packages if you include the version with the package - yes, that’s over 100 times that a package gets installed multiple times with different versions. Though not all of those are conflicts (npm doesn’t seem to try very hard to get consistent versions), but I still counted at least 30 unsolvable version conflicts if this was in a flat system. This is exactly what we will run into if we try to replicate version capping in a flat dependency system like Python and more people start following version capping - it does not scale in a flat system.

Also, building this today reports 153 vulnerabilities (11 low, 47 moderate, 90 high, 5 critical). And many packages are stuck on versions up to 10 major versions old - and this is even counting just svgexport, since gitbook is a dead (in terms of open source) project.

Quick n' dirty analysis code (click to expand)

This is the (ugly) analysis code I used to process the package-lock.json file. The final conflicts were counted by hand, since the system is not close enough to Python to use packaging.versions. This is not polished, pretty code, but rather my first iteration, with the original short names and such that I’d normally not let anyone see. The final count for conflicts was done by hand based on the simplified constraints printed out.

import json
import itertools

def versions(name, d):
for it in d.items():
match it:
case "version", str(x):
yield name, x
case "dependencies", dict(x):
for n, y in x.items():
yield from versions(n, y)

def requirements(name, d):
for it in d.items():
match it:
case "requires", dict(x):
yield name, x
case "dependencies", dict(x):
for n, y in x.items():
yield from requirements(n, y)

def flatten(it):
for _, req in it:
yield from req.items()

def filter_const(cst):
cst_list = cst.replace("= ", "=").replace(" || ","||").split(" ")
for cs in cst_list:
match cs[0], cs[1:]:
case "*", "":
pass
case ",", c:
yield c.split(".")[0]
case "~", c:
yield ".".join(c.split(".")[0:2])
case _:
yield cs

with open("package-lock.json") as f:

vers = sorted(versions("base", pl))
print(f"Total number of packages: {len(vers)}")

print(f"Unique packages: {len({a for a, _ in vers})}")

print(f"Unique versions: {len(set(vers))}")

multi_all = {
n: {x[1] for x in vs}
for n, vs in itertools.groupby(sorted(set(vers)), key=lambda x: x[0])
}
multi = {n: v for n, v in multi_all.items() if len(v) > 1}

sorted_listing = sorted(flatten(requirements("base", pl)))
results = {
name: {x for g in group for x in filter_const(g[1])}
for name, group in itertools.groupby(sorted_listing, key=lambda x: x[0])
}

for name in multi:
nm = " ".join(multi[name])
print(f"{name:22}", f"{nm:32}", *results[name])


# Upper limits are valid sometimes

## When is it okay to set an upper limit?

Valid reasons to add an upper limit are:

1. If a dependency is known to be broken, block out the broken version. Try very hard to fix this problem quickly, then remove the block if it’s fixable on your end. If the fix happens upstream, excluding just the broken version is fine (or they can “yank” the bad release to help everyone).
2. If you know upstream is about to make a major change that is very likely to break your usage, you can cap. But try to fix this as quickly as possible so you can remove the cap by the time they release. Possibly add development branch/release testing until this is resolved. TensorFlow 1-2, for example, was a really major change that moved things around. But fixing it was really as simple as importing from tensorflow.v1.
3. If upstream asks users to cap, then I still don’t like it, but it is okay if you want to follow the upstream recommendation. You should ask yourself: do you want to use a library that may intentionally break you and require changes on your part without help via deprecation periods? A one-time major rewrite might be an acceptable reason. Also, if you are upstream, it is very un-Pythonic to break users without deprecation warnings first. Don’t do it if possible. A good upstream (like NumPy) may ask for a future cap (NumPy asks for +3 versions for large dependent packages).
4. If you are writing an extension for an ecosystem/framework (pytest extension, Sphinx extension, Jupyter extension, etc), then capping on the major version of that library is acceptable. Note this happens once - you have a single library that can be capped. You must release as soon as you possibly can after a new major release, and you should be closely following upstream - probably using development releases for testing, etc. But doing this for one library is probably manageable.
5. You are releasing two or more libraries in sync with each other. You control the release cadence for both libraries. This is likely the “best” reason to cap. Some of the above issues don’t apply in this case - since you control the release cadence and can keep them in sync.
6. You depend on private internal details of a library. You should also rethink your choices - this can be broken in a minor or patch release, and often is (pyparsing 3.0.5, for example).

If you cap in these situations, I wouldn’t complain, but I wouldn’t really recommend it either:

1. If you have a heavy dependency on a library, maybe cap. A really large API surface is more likely to be hit by the possible breakage.
2. If a library is very new, say on version 1 or a ZeroVer library, and has very few users, maybe cap if it seems rather unstable. See if the library authors recommend capping (reason 3 above) - they might plan to make a large change if it’s early in development. This is not blanket permission to cap ZeroVer libraries!
3. If a library looks really unstable, such as having a history of making big changes, then cap. Or use a different library. Even better, contact the authors, and make sure that your usage is safe for the near future.

All these are special cases, and are uncommon; no more than 1-2 of your dependencies should fall into the categories above. In every other case, do not cap your dependences, expecially if you are writing a library! You could probably summarize it like this: if there’s a high chance (say 75%+) that a dependency will break for you when it updates, you can add a cap. But if there’s no reason to believe it will break, do not add the cap; you will cause more severe (unfixable) pain than the breakage would.

If you have an app instead of a library, you can be cautiously slightly stricter, but not much. Apps do not have to live in shared environments, though they might.

Notice many of the above instances are due to very close/special interaction with a small number of libraries (either a plugin for a framework, synchronized releases, or very heavy usage). Most libraries you use do not fall into this category. Remember, library authors don’t want to break users who follow their public API and documentation. If they do, it’s for a special and good reason (or it is a bad library to depend on). They will probably have a deprecation period, produce warnings, etc.

If you do version cap anything, you are promising to closely follow that dependency, update the cap as soon as possible, follow beta or RC releases or the development branch, etc. When a new version of a library comes out, end users should be able to start trying it out. If they can’t, your library’s dependencies are a leaky abstraction (users shouldn’t have to care about what dependencies libraries use).

## Rapid updates can hide the problem

If a library author is very quick at updating their library when new releases come out (like rich), upper capping doesn’t cause immediate issues (though it does still interfere with testing development versions). However, as soon as those rapid updates stop, the library starts to decay much faster than a library without upper caps. The dependencies cannot “fix” this by releasing a new version with a backport of whatever they removed/changed because they didn’t realise someone was using it, either, because they are capped. This leads into the next reason to use caps.

## Planned obsolescence

This might happen eventually if you don’t limit, but it will happen much faster and with more assurance with hard limits. Remember, caps can’t be fixed by users.

## Examples of acceptable caps

Numba pins LLVMLight exactly, and puts a hard cap on Python (and recently, temporarily NumPy too). They control both Numba and LLVMLight, so the pinning there is okay (reason 5 above).7 They use Python bytecode to decompile Python functions; this is an internal detail to Python, so every minor release is allowed to (does) change bytecode, so Numba must support each version manually (reason 6 above). They know the most recent version of NumPy is incompatible (reason 1 above). In both cases, they also put a check in setup.py, but remember, that only affects building Numba, so that works for Python, but may not work for normal dependencies like NumPy since normally users install wheels, not SDists, so setup.py does not run. Numba should (and will) release this pin as quickly as possible, because there are quite a few reasons to use NumPy 1.21, including it being the first NumPy to support Python 3.10.

I personally limit hist to the minor release of boost-histogram. I control both packages, and release them in sync; a hist release always follows a new minor release of boost-histogram. They are tightly coupled, but part of a family (reason 1 above). At this point, boost-histogram is likely stable enough even in internal details to ease up a bit, but this way I can also advertise the new boost-histogram features as new hist features. ;)

Many packages follow Flit’s recommendation and use requires = ["flit_core >=3.2,<4"] in the pyproject.toml build specification. This is reason 3 above; Flit asks you to do this. It’s also in the pyproject.toml, which by definition will never be “shared” with anything, it’s a new, disposable virtual environment that is created when building a wheel, and then thrown away, making it much more like an application requirement. However, if you only use the PEP 621 configuration for Flit, I see no reason to cap it; this is a published standard and isn’t going to change, so Flit 4 will not “break” usage unless a bug is introduced. And Flit actually now reflects this in the version limit recommendation!

### TensorFlow

Now let’s look at a bad upper limit and the mess it caused. TensorFlow used to put an upper cap on everything (note some of the comments there are wrong, for example, order does not matter to the solve). This was a complete mess. Several of the dependencies here are small little libraries that are not going to break anyone on updates, like wrapt and six. Probably the worst of all though is typing_extensions. This is a backport module for the standard library typing module, and it’s pinned to 3.7.x. First, new versions of typing_extensions are not going to remove anything at least for five years, and maybe not ever - this is a compatibility backport (the stdlib typing might be cleaned up after 5 years). Second, since this is a backport, setting a high lower bound on this is very, very common - if you want to use Python 3.10 features, you have to set a higher lower bound. Black, for example, sets 3.10.0 as the minimum. This is completely valid, IMO - if you have a backport package, and you want the backports from Python 3.10, you should be able to get them. Okay, so let’s say you run this:

python3 -m venv .venv
./.venv/bin/pip install black tensorflow


(or pretend that’s in a requirements.txt file for a project, etc - however you’d like to think of that). First, the resolver will download black 21.8b0. Then it will start downloading TensorFlow wheels, working it’s way back several versions - if you are on a limited bandwidth connection, be warned each one is several hundred MB, this is multiple GB to do. Eventually it will give up, and start trying older black versions. It will finally find a set that’s compatible, since older black versions don’t have the high pin, and will install that. Now try this:

python3 -m venv .venv
./.venv/bin/pip install black
./.venv/bin/pip tensorflow


This will force typing-extensions to be rolled back, and then will be broken with:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
black 21.8b0 requires typing-extensions>=3.10.0.0, but you have typing-extensions 3.7.4.3 which is incompatible.


In other words, simply having black pre-installed will keep you from installing TensorFlow, even though they are completely unrelated. The reason? TensorFlow thinks it has to have typing_extensions 3.7 instead of 3.10, which is wrong. This is literally a standard library backport package (for typing)! With such strong pinning, TensorFlow was effectively an application, it could not play nicely with pretty much any other library.

Due to the problems this caused, Tensorflow has removed the upper caps on most dependencies, and you can now install it again with other libraries.

### Packaging and PyParsing

Packaging is a foundational library for most of Python packaging. Everybody either depends on it (tox, cibuildwheel, pdm, etc) or vendors it (pip, Poetry, pipenv). Packaging has very few dependencies, but it does require pyparsing. In version 3, pyparsing changed the name of some tokens - but provided backward compatible names. Packaging worked just fine with version 3, but it affected the text of one error message that was being compared in the tests, so packaging capped pyparsing to <3, and then released packaging 21.2 with no other change (compared to 21.1) except this cap. This immediately started breaking things (like Google App Engine deployment, and other complaints stating “ton of dependency conflicts”). To be clear, it didn’t solve anything except one test inside packaging itself. Then pyparsing 3.0.5 was released with a change to in internal method name (starting with an underscore). This was used by packaging (bad), so the real limit was !=3.0.5 (pyparsing was nice and restored this for 3.0.6, though they could have said it was packaging’s fault - which it was). The correct fix is to not use a private implementation detail, which packaging fixed, but old versions still exist.

# TL;DR

Capping dependencies has long term negative effects, especially for libraries, and should never be taken lightly. A library is not installed in isolation; it has to live with other libraries in a shared environment. Only add a cap if a dependency is known to be incompatible or there is a high (>75%) chance of it being incompatible in its next release. Do not cap by default - capping dependencies makes your software incompatible with other libraries that also have strict lower limits on dependencies, and limits future fixes. Anyone can fix a missing cap, but users cannot fix an over restrictive cap causing solver errors. It also encourages hiding issues until they become harder to fix, it does not scale to larger systems, it limits your ability to access security and bugfix updates, and some tools (Poetry) force these bad decisions on your downstream users if you make them. Never cap Python, it is fundamentally broken at the moment. Also, even packing capping has negative consequences that can produce unexpected solves.

Even perfect SemVer does not promise your usage will be broken, and no library can actually perfectly follow SemVer anyway; minor versions and even patch versions are often more likely to break you than major versions for a well designed, stable library. You must learn to use a locking package system if you need application reliability - SemVer capping is not a substitute. Python has a culture of using deprecation warnings and slow transitions, unlike an ecosystem with a nested dependency system like npm. We saw a realistic NPM project has 30+ version conflicts if it was to be flattened like Python - version capping does not scale when dependencies are shared. Provide an optional working set of fully pinned constraints if that’s important to you for applications - this is the only way to ensure a long term working set of dependencies (including for npm).

If you absolutely must set upper limits, you should release a new version as soon as possible with a higher cap when a dependency updates (ideally before the dependency releases the update). If you are committing to this, why not just quickly release a patch release with caps only after an actual conflict happens? It will be less common, and will help you quickly sort out and fix incompatibilities, rather than hiding your true compatibilities and delaying updates. You want users to use the latest versions of your libraries if there’s a problem, so why can’t you offer the same consideration to the libraries you depend on and use?

If you need a TL;DR for the TL;DR, I’ll just quote Python Steering Council Member and packaging expert Brett Cannon:

Libraries/packages should be setting a floor, and if necessary excluding known buggy versions, but otherwise don’t cap the maximum version as you can’t predict future compatibility

Also, this is not generalizable to systems that are able to provide unique versions to each package - like Node.js. These systems can avoid resolver conflicts by providing different versions locally for each package; and this creates different social expectations about acceptable changes and about LTS support for major versions. This is very, very different from a system that always solves for a shared single version. Those systems are also where the ^ syntax is much more useful. Some tools (like Poetry) seem to be trying to apply part of those systems (caret syntax, for example) without applying the local version feature, which is key to how they work. Having local (per dependency) copies of all dependencies solves many of the issues above and practically turns libraries into applications, though some of the arguments above still apply, like hiding incompatibilities until the changeset is very large.

# Acknowledgements

Thanks to Python steering council member Brett Cannon, Python core developer Paul Ganssle, fellow PyPA members Bernát Gábor, Pradyun Gedam and @layday, fellow RSE Troy Comi, and fellow IRIS-HEP member Alex Held for their comments on early drafts. Also I’d like to acknowledge the excellent article Why you shouldn’t invoke setup.py directly from Paul Ganssle for convincing me that a Proustian monstrosity of a post can be useful. All typos and mistakes are my own.

1. Poetry is prioritizing the truthfulness of the lock file here. If you make a lockfile (and Poetry always does) and a dependency pins python<3.10, then that lockfile will not load on Python 3.10. This is understandable, but there’s no way to set the Requires-Python metadata slot other than with this setting! If you are developing a library, you should not be forced to do this because of a lock file which is not even in the distribution. I’d rather a warning + a correct Python range only in the lockfile, or a way to set them separately, like with PEP 621 metadata support combined with the old specification. ↩︎

2. Fun fact: one shortcut includes checking to see if the latest version of everything is valid. This immediately is broken if there’s an upper cap that affects the solve. ↩︎

3. One common pushback here is that a smart dependency solver will get old versions, so updating is not pressing. But new libraries shouldn’t have to support really old versions of things just because they can’t live with libraries with old caps. Libraries shouldn’t have to keep pushing updates to old major/minor releases to support new hardware and Python versions, etc. So yes, you are “promising” to update rapidly if capped dependencies update. Otherwise, your library cannot be depended on. ↩︎

4. I’m obviously making assumptions about you, my reader, here. But I rather expect I am right. If not, I’d like you to join all my projects and start releasing old backports for all major versions for me. ;) ↩︎

5. In pip or Poetry. Conda can do this, because Python is more like a library there. But we aren’t discussing conda, and at least conda-forge has it’s own system, and it’s not tied to your normal packaging config at all, the package names may not even be the same, etc. ↩︎

6. This was done to provide this argument, not just to play with Python 3.10 pattern matching because I always work on libraries and don’t get to play with all the new toys… ↩︎

7. Though, as a maintainer for the conda-forge Numba package, I have to say it is does make the update slower given there’s no wiggle room at all, and for some reason Numba seems to release before the matching llvmlite is available on PyPI. ↩︎