Along with a pip (and now packaging) maintainer, Damian Shaw, I have
been working on making packaging, the library behind almost all packaging
related tools, faster at reading versions and specifiers, something tools like
pip have to do thousands of times during resolution. Using Python 3.15’s new
statistical profiler and metadata from every package ever uploaded to PyPI, I
measured and improved core Packaging constructs while keeping the code readable
and simple. Reading in Versions can be up to 2x faster and SpecifierSets can
be up to 3x faster in packaging 26.0rc1, now released! Other
operations have been optimized, as well, up to 5x in some cases.
Introduction
packaging is the core library used by most tools for Python to deal with many
of the standardized packaging constructs, like versions, specifiers, markers,
and the like. It is the 11th most downloaded library, but if you also take into
account that it is vendored into pip, meaning you get a (hidden) copy with every
pip install, it’s actually the 2nd most downloaded library. Given that pip is
vendored into Python, everyone who has Python has packaging, unless their
distro strips it out into a separate package; so it is possible it is the most
common third party Python library in the world.
In packaging, a Version is something that follows PEP 440’s version
standard. And a SpecifierSet is conditions on that version; think >=2,<3 or
~=1.0, those are SpecifierSets. They are used on dependencies, on
requires-python, etc. They are also part of Markers, that is, something like
tomli; python_version < '3.11' (a Requirement) contains a Marker.
I’d like to start by showing you the progress we’ve made as a series of plots; if you’d like to see how we made some of these, I’ll follow with in-depth examples.
Performance plots with asv
After most of the performance PRs were made, I finally invested a little time into making a proper set of micro-benchmarks with asv; I’ll be showing plots from that. Code for this is currently in a branch in my fork; it might eventually be either contributed or moved to a separate repo. The benchmarks are an optimized (trimmed down) version of the original code.
Plots were made using code in the source directory of my blog repository;
values are scaled by the 25.0 performance numbers, with a green line showing the
current performance after the changes we’ve been working on. I ran them with
Python 3.14 from uv (which is a bit faster than the one from homebrew) on an
entry-level M1 Mac Mini. The plot xscale is expanded after 25.0 to show the
current work.
This is the Version constructor. You can see the series of PRs described below
lowering the time to 0.5. Now, one of those steps was making the comparison
tuple generated on first usage, instead of in the constructor, so the sorting
benchmark has taken on that cost:
Sorting isn’t slower than before, we’ve just moved some of the construction time over to the first time you compare a version; inside pip, only around 30% of the versions constructed actually get compared, so this is a savings.
I did play around with the idea of computing __lt__ and friends directly,
instead of making a tuple, caching it, then comparing that. But it seems Python
optimizes tuple comparison, and these get compared a lot when sorting, so even
though the custom method could exit early and save a little calculation, it
still was something like 5x slower.
Here you can see optimizations for __str__; we’ve mostly avoided calling the
Version -> str -> Version like we used to, but this still helps third party
packages that do this.
Here we can see SpecifierSet’s construction time. In the past, there were two
major regressions; The first bump around 2020 was a bugfix; the added logic is
needed for correctness. The second was the introduction of the nested
NamedTuple and some other slowdowns we have now fixed.
One of the most important operations on SpecifierSet is asking if a version is
contained in it. Here you can see that we’ve managed to get this over 2x faster.
Another core operation is .filter, which we’ve made about 5x faster. Most of
this was from caching the Version, avoiding repeated Version constructors.
Another constructor is Marker. The big jump in version 22 was moving to a
handwritten parser instead of pyparsing (which also isolated us from breakages
due to pyparsing changing their API, and removed our only dependency, too!), but
we’ve further improved this since 25.0 by dropping the regular expression
construction inside the constructor.
Evaluating Markers (to see if the Requirement passes a particular
environment) has also gotten faster. Most of that final drop is from avoiding
trying to parse everything as a Version, and instead just apply Version to
things that might be versions.
For reconstructing Requirement, this is similar to Marker (since it contains
them).
Here’s a microbenchmark of the canonicalize_name function, which we made 2x
faster by removing a regular expression substitution, using str.translate
instead.
For our final benchmark, this is a quick attempt at making a toy resolver. The one bump up is from a fix for proper PEP 440 handling of prereleases.
How it started
Now that you’ve seen what we’ve done, let’s look at how we got there.
This optimization work started when Damian Shaw made a PR to reduce
the number of Versions being created during specifier comparison operations,
with a note about how pip needed to create thousands of these. That got me
interested; I was looking into why Version’s were slow to create in the first
place. During the work, Kevin Turcios also got involved, looking for potential
slow operations using an AI tool he works on. Also huge thanks to Brett Cannon
for reviewing many of these PRs.
Measuring Version and atomic/possessive regex (3.11+ only)
The core of the Version object is a regular expression; the rules specified in
PEP 440 can be expressed as a regular expression. While most versions look like
1.2.3, there are a lot of optional parts; 2!1.2.3.dev1.post1+extra is also a
valid version (don’t try to upload it to PyPI, but it is valid as a Version!).
A regular expression is a natural way to express something like that, and
probably will be faster than lots of string manipulations; but regular
expressions are known to be slow. Since I teach students in my APC 524
class at Princeton to always profile before they start to optimize,
I started by profiling, of course. Okay… I first worked on the regex,
because I knew it had to be slow. I used Python 3.11’s new atomic grouping and
possessive qualifiers to reduce backtracking; once you’ve matched a part you
don’t need to go back and try other matches on the same part of the version.
This did make it faster - by something like 5%.
To measure this, I started by just asking ChatGPT for some versions valid in Python, it gave me 10 or so, then I multiplied that by a large number and that gave me something I could run. A little later, I downloaded the metadata for PyPI (about 10GB sqlite file), and read in every version published, filtering out invalid versions (PyPI used to not validate versions; it predates PEP 440 anyway!), and started using that (final benchmarking code is at the end). This also gave me a way to ensure that the same versions were being read; if the number of versions changed, then the regex was doing something differently.
Here’s the quick script:
import timeit
from packaging.version import Version
TEST_VERSIONS = [
"1.0.0",
"2.7",
"1.2.3rc1",
"0.9.0.dev4",
"10.5.1.post2",
"1!2.3.4",
"1.0+abc.1",
"2025.11.24",
"3.4.5-preview.8",
"v1.0.0",
] * 10_000
def bench():
for v in TEST_VERSIONS:
Version(v)
if __name__ == "__main__":
t = timeit.timeit("bench()", globals=globals(), number=5)
print(f"Time: {t:.4f} seconds")
Profiling Version
This result didn’t make sense; the regex was faster, so it should have had a
bigger impact on Versions as a whole. I decided to do what I should have done
first: profile. This was a perfect opportunity to try CPython 3.15.0’s new
statistical profiler that I’d been hearing about on the Core.Py podcast.
Since uv python install can install the 3.15 alpha’s, it was easy to get it, I
didn’t even have to build anything. Since packaging doesn’t have any compiled
dependencies, everything worked smoothly with the alpha version.
To use it, something like this works on macOS:
sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
It might trigger an install with sudo active, which means you’ll have to clear
uv’s cache and python installs also with sudo, but it got me going.
The textual output was nice, and the html output was great; for a zero-setup profile (well, once Python 3.15 is out), this is fantastic.
Here’s what it looked like:
Textual output (click to expand)
$ sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
Time: 1.3528 seconds
Per version: 2.705616084 µs
Captured 13646 samples in 1.36 seconds
Sample rate: 10000.01 samples/sec
Error rate: 20.57%
Profile Stats:
nsamples sample% tottime (ms) cumul% cumtime (s) filename:lineno(function)
1/10703 0.0 0.100 99.4 1.070 _sync_coordinator.py:193(_execute_script)
0/10703 0.0 0.000 99.4 1.070 _sync_coordinator.py:234(main)
0/10703 0.0 0.000 99.4 1.070 _sync_coordinator.py:251(<module>)
0/10703 0.0 0.000 99.4 1.070 <frozen runpy>:88(_run_code)
0/10703 0.0 0.000 99.4 1.070 <frozen runpy>:198(_run_module_as_main)
0/10661 0.0 0.000 99.0 1.066 <timeit-src>:6(inner)
0/10661 0.0 0.000 99.0 1.066 timeit.py:183(Timer.timeit)
0/10661 0.0 0.000 99.0 1.066 timeit.py:240(timeit)
0/10661 0.0 0.000 99.0 1.066 benchmark_version.py:25(<module>)
670/10660 6.2 67.000 99.0 1.066 benchmark_version.py:21(bench)
82/9990 0.8 8.200 92.7 0.999 __init__:0(__init__)
2613/2623 24.3 261.300 24.4 0.262 version.py:201(Version.__init__)
951/2106 8.8 95.100 19.6 0.211 version.py:218(Version.__init__)
1660/1813 15.4 166.000 16.8 0.181 version.py:208(Version.__init__)
1068/1151 9.9 106.800 10.7 0.115 version.py:206(Version.__init__)
Legend:
nsamples: Direct/Cumulative samples (direct executing / on call stack)
sample%: Percentage of total samples this function was directly executing
tottime: Estimated total time spent directly in this function
cumul%: Percentage of total samples when this function was on the call stack
cumtime: Estimated cumulative time (including time in called functions)
filename:lineno(function): Function location and name
Summary of Interesting Functions:
Functions with Highest Direct/Cumulative Ratio (Hot Spots):
0.818 direct/cumulative ratio, 58.4% direct samples: version.py:(Version.__init__)
0.063 direct/cumulative ratio, 6.2% direct samples: benchmark_version.py:(bench)
0.008 direct/cumulative ratio, 0.8% direct samples: __init__:(__init__)
Functions with Highest Call Frequency (Indirect Calls):
10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(main)
10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(<module>)
10703 indirect calls, 99.4% total stack presence: <frozen runpy>:(_run_code)
Functions with Highest Call Magnification (Cumulative/Direct):
10703.0x call magnification, 10702 indirect calls from 1 direct: _sync_coordinator.py:(_execute_script)
121.8x call magnification, 9908 indirect calls from 82 direct: __init__:(__init__)
15.9x call magnification, 9990 indirect calls from 670 direct: benchmark_version.py:(bench)
(The HTML version has line numbers and more info.) That’s not what I expected at all. While you can see the regex (first blue section on the left), it’s not dominating; there’s a bunch of other stuff nearly as large as the regex.
Speedups
Stripping 0’s: 10% speedup
The first speedup I saw was this line:
_release = tuple(
reversed(list(itertools.dropwhile(lambda x: x == 0, reversed(release))))
)
That’s terrible, it’s generating tons of small lists and dropping them. I went with a version that is very fast, making this line 20x faster and dropping it off the profile. You can do something in between that is more readable, but this was a few percent faster and readable enough.
def _strip_trailing_zeros(release: tuple[int, ...]) -> tuple[int, ...]:
for i in range(len(release) - 1, -1, -1):
if release[i] != 0:
return release[: i + 1]
return ()
This sped reading versions up by about 10% in my benchmark, and by about 40% in pip’s resolver.
Faster Regex (10-17% faster, 3.11+ only)
I did go ahead and make the regex PR. I dropped atomic groups; just using possessive qualifiers got the speed up I wanted and it was easier to strip them out to support older versions of Python with the same single regex string. The 10-17% speedup might not seem like a lot, but I still planned to remove a lot of the other things that were keeping the regex from dominating.
To do this, * becomes *+, and ? becomes ?+. You just need to be careful
to only apply it where backtracking is not needed, like between each group.
Inside a group, there are cases where you might need to backtrack. To support
older Python versions, PATTERN.replace("*+", "*").replace("?+", "?") can be
used to strip this back out (atomic groups are harder to strip out).
I also cleaned up the regex code a bit, using fullmatch instead of search
with anchors, which also seemed a little (1%) faster, could be within
measurement uncertainty though.
Note that if you are trying to speed anything up except packaging itself, you
can add the regex PyPI library and that supports these features on
older Python versions too. The packaging library can’t have dependencies,
especially compiled ones.
SpecifierSet: Removing singledispatch (7% faster)
I noticed another slow part in the flamegraph was canonicalize_version, which
used a functools.singledispatch instead of an if statement; while I love
singledispatch for a very specific style of programming, this isn’t a good use
of it, and it’s slow. The function is now simpler, and faster.
This is basically what it was doing:
# Bad pattern
@functools.singledispatch
def f(x: Version | str) -> str:
str(_TrimmedRelease(str(...)))
@canonicalize_version.register
def f(x: str) -> object:
return f(Version(x))
Notice how the dispatched functions call the generic function, and the types overlap. Those are signs that this shouldn’t even be used. A better version would be:
def f(x: Version | str) -> str:
if isinstance(Version, str):
version = Version(x)
return str(_TrimmedRelease(str(x)))
I don’t want to give singledispatch a bad reputation; see uproot-browser for
a good use, where I use it to register different data types that have a known
plotting mechanism. It’s just the wrong tool here, and also not great when
performance is critical.
However, that wasn’t the only problem with this function; it was running the
Version creation (also inside _TrimmedRelease, too!) too many (more than
one) times. Remember making Versions runs a regex!
SpecifierSet: remove duplicate Version creation (37% faster)
Inside canonicalize_version, there was another issue; the same version was
created twice, once with a subclass (_TrimmedRelease) that had a different
behavior when it turned into a string (removing zeros). I instead
reworked the classes so you could create the subclass directly, without
going through a string. _TrimmedRelease(version) now avoids the string
intermediate if version is a Version.
Now the function looks something like this:
def f(x: Version | str) -> str:
if isinstance(Version, str):
version = Version(x)
return str(_TrimmedRelease(x))
Removing NamedTuple (20% faster)
Version had an interesting design; it contained a _Version NamedTuple with
all of its fields. Now that we added caching, the outer Version had one more
field, but otherwise, it was redundant. Creating and using NamedTuple is
expensive, accessing via the names has a cost. This might have been done to
ensure the object was not writeable, but that can be done without the
NamedTuple access using properties. Removing this gave a (20%)
speedup, as well as accessing values and even turning the version into a string
also gets faster.
I was not able to find anyone using the hidden ._version attribute using
GitHub’s code search; if that does break someone, we can always generate the
NamedTuple on demand, but we’ll only do that if we have to.
Map instead of generator (8% faster)
Another slow line are the ones that look like this:
release = tuple(int(i) for i in match.group("release").split("."))
That generator is rather expensive. You can save a little time by using a list
comprehension instead (tuple([...]) instead of tuple(...)) for small tuples,
but I found that this:
release = map(int, match.group("release").split("."))
was similar to the list comprehension in speed, and it’s both nicer than adding
the extra brackets, and was used elsewhere in the code, so moving to using them
saved about 8%. Note that tuple([ ... for ... in ... ]) is likely only faster
when the thing you are iterating over is small.
Using replacement to get new versions
A couple PRs Damian started and we both worked on was adding __replace__
support, then using it to to replace some
Version -> str -> Version sequences inside SpecifierSet. It would have been
nice if the API of Version returned Version instead of str for some
methods like .public, but that’s a breaking change. If you are using something
like Version(version.public) in a performance critical path, you can use
__replace__ (copy.replace on Python 3.14) instead, which will be much faster
than reparsing the Version. This mostly speeds up comparison, which I’m not
usually benchmarking, but is critical for users like pip.
Using slots (2% faster)
This isn’t much of an improvement for Version or
SpecifierSet (maybe more on older Python versions), but using
__slots__ is a good idea, can reduce memory, and makes the class stricter as
well, since it disallows setting a unknown property. Key sharing dictionaries in
newer versions reduce the savings, but it’s still nicer.
Speedups inspired by Codeflash
Kevin Turcios used his tool, codeflash.ai, to look for possible speedups. I
reviewed the ones it found, and implemented a version of three of them: I moved
set construction out of a function, I used .partition
instead of split (probably not faster, but nicer), and I used a dict
to handle alternate spellings instead of a series of if’s. The tool reported the
speedup in the test function, but that’s representative of real work; check the
PRs if you’d like to see the values. I came up with different solutions, so the
values are relevant enough to show here.
Here’s an example:
# Before
parts = [p.strip() for p in pair.split(",", 1)]
parts.extend([""] * (max(0, 2 - len(parts)))) # Ensure 2 items
label, url = parts
# After
label, _, url = (s.strip() for s in pair.partition(","))
Another one pulled set construction outside a function (making a set is expensive unless you use it inline; if it’s static, just make it once).
Other speedups
Damian also implemented a series of speedups related to reducing unnecessary object creation, such as making some computation lazy, caching related versions, avoiding redundant Version creation, and using the cache in more places. These aren’t less important than mine, it’s just that I’m writing the blog post and I have more to say about mine. :) Also, since his work focused on making pip’s resolver faster, some of the speedups are related to comparisons and containment checks, which won’t show up on my simple profiling.
For his resolver benchmark, pip was originally creating Versions over 4.8
million times, and combined with changes he is also making to pip, it’s now
under 400 thousand.


There also was a speedup found by Shantanu Jain, which speeds up
Requirement parsing by 3x by moving regex construction out of the constructor.
After implementing the asv based benchmarks, I also worked on speedups for
Marker and Requirement. One of the more impactful changes was
replacing the regular expression substitution for a string translate, doubling
the performance of canonicalize_name. I also inlined the __str__ code for
Version, using f-strings instead of joining lists, which gave a 10% speedup.
The flamegraph now looks much better; the regex (in blue above) dominates, and
parts like splitting strings into . separated integers probably can’t get
faster outside of compilation. There might be a bit more to gain, but we’ve done
pretty well.
Final performance numbers
Comparing packaging 25.0 and the main branch on Python 3.14, reading every
version on PyPI went from 19.6 seconds to 9.9 seconds, a nearly 2x speedup.
Reading every requires-python and checking if the current version of Python
passes went from 105 seconds to 33.9 seconds, a 3x speedup. I actually do this
all the time when I’m running analysis on build backends to monitor adoption;
those run about two times faster on packaging main.
We have made an RC release, and hope to make a full release in about a week;
some other work on improving our handling of standards around markers could
cause a delay, but it should happen soon. A lot of other things are in the
release as well: support for pattern matching, support for pylock files,
support for import name metadata, support for writing metadata to a file, and
lots of expanded linting and type verification in our codebase.
The last change required a small fix to the standard; packaging has never
followed the marker specification correctly, but the standard was a bit broken,
requiring every value to attempt conversion to a Version, even things that
were not version-like at all. This change gave us a speed up that uv is
already doing. Waiting on that to get approval is all that’s left for a final
release of the new packaging (as well as waiting a bit for any bugs in the RC to
be reported by you)! Please test the RC and make sure it works for you.
I don’t know about you, but I’m very excited for the fastest release of
packaging yet! Please try 26.0rc1 and tell us if there are any
regressions!
Thanks to Kevin Turcios, Brett Cannon, and Damian Shaw for reviewing this post before publication.
Benchmark scripts (click to expand)
# benchmark_versions.py
import sqlite3
import timeit
from packaging.version import Version, InvalidVersion
# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
def valid_version(v: str) -> bool:
try:
Version(v)
except InvalidVersion:
return False
return True
with sqlite3.connect("pypi-data.sqlite") as conn:
TEST_ALL_VERSIONS = [
row[0]
for row in conn.execute("SELECT version FROM projects")
if valid_version(row[0])
]
def bench():
for v in TEST_ALL_VERSIONS:
Version(v)
if __name__ == "__main__":
print(f"Loaded {len(TEST_ALL_VERSIONS):,} versions")
t = timeit.timeit("bench()", globals=globals(), number=1)
print(f"Time: {t:.4f} seconds")
# benchmark_specifiers.py
import sqlite3
import timeit
from packaging.specifiers import SpecifierSet, InvalidSpecifier
from packaging.version import Version
# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
def valid_spec(v: str) -> bool:
try:
SpecifierSet(v)
except InvalidSpecifier:
return False
return True
with sqlite3.connect("pypi-data.sqlite") as conn:
TEST_ALL_SPECS = [
row[0]
for row in conn.execute("SELECT requires_python FROM projects")
if row[0] and valid_spec(row[0])
]
def bench():
ver = Version("3.14.2")
for v in TEST_ALL_SPECS:
SpecifierSet(v).contains(ver)
if __name__ == "__main__":
print(f"Loaded {len(TEST_ALL_SPECS):,} specs")
t = timeit.timeit("bench()", globals=globals(), number=1)
print(f"Time: {t:.4f} seconds")