Working to make Python lazy

Python 3.15a7, which is now just a uv python install 3.15 away on all major platforms, has lazy imports! This exciting feature, proposed in PEP 810, promises to make CLI applications faster (especially when using flags like --help), and could make a lot of large code with lots of imports that don’t always get used faster too. Unlike the earlier, failed attempt, this requires libraries to put in some work. I’ve developed a helper tool to make it easy; I’d like to cover what lazy imports are and how to use my tool. Since this is the first library that I used AI heavily when writing it, the second half of the post will cover how my experience with AI for a task like this went.

What is a lazy import?

Imagine you have a file like this, with a standard Python argparse CLI:

import argparse
import numpy


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--foo", action="store_true")
    args = parser.parse_args()
    if args.foo:
        print(numpy.array([1, 2, 3]))

What happens if you run this with --help? The numpy library will be imported, even though it is never used. If you are using modern uv tooling, this can be even worse, since uv doesn’t pre-compile bytecode unless you ask it to; that makes the install faster, but imports are slower the first time.

The above is just one example; this can also happen when you have this common pattern:

# __init__.py
from . import a
from . import b

__all__ = ["a", "b"]

The idea behind this is that a user can just use lib.a.stuff with just import lib, rather than import lib.a, but you pay the cost of import even if they never use all the imports. Some libraries, like rich, are careful to avoid this and ask users to import explicitly, but many older libraries did this.

And there are also libraries that can do multiple things (like CLI libraries with subcommands), but you don’t need the dependencies for every subcommand.

How to use Python 3.15’s lazy imports

Take the first example. In Python 3.15, you can now write:

lazy import argparse
lazy import numpy

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--foo", action="store_true")
    args = parser.parse_args()
    if args.foo:
        print(numpy.array([1, 2, 3]))

Now, both imports are “lazy”, meaning nothing happens at all when you import them. They might not even be installed. The first time you try to use the object, though, it becomes a real, imported object. So if you do --help, numpy is never accessed and never imported.

There is also a backward-compatible syntax:

__lazy_modules__ = ["argparse", "numpy"]

import argparse
import numpy

This works on older Pythons (it’s just not lazy), and you can also dynamically generate or manipulate that list if you want. Linters like Ruff have already updated to allow this to be placed above your imports without triggering a lint violation.

I should mention there’s a flag and a variable to make Python treat all imports as lazy, -X lazy_imports=all and PYTHON_LAZY_IMPORTS=all (also normal and none). That’s mostly for testing.

Why not lazy?

Shouldn’t you just mark everything as lazy? You don’t have to. There are some modules that have side effects when you import them; if those side effects need to happen at the import site, then those can’t be lazy. This pattern, for example, can’t be lazy:

try:
    import numpy
except ModuleNotFoundError:
    ...

The error here will move to the first usage of something from numpy. There is a semi-lazy alternative:

import importlib.util

if importlib.util.find_spec("numpy") is None:
  ... # whatever you wanted to do if numpy is missing

lazy import numpy

This is slightly more expensive than doing nothing at all (which is why lazy importing doesn’t do it), will import packages to get to subpackages (a.b imports a), and some types of import errors won’t trigger when just finding the spec (for the above example, numpy._core could be missing/broken if someone didn’t compile numpy correctly - this is rare, though). Regardless, this is a pretty good way to check to see if a package is installed.

The other case you don’t need lazy is if you use something at top level. For example:

lazy import re

REGEX = re.compile(...) # not lazy here

Here, the lazy import is not needed, since you can’t process the file without importing this anyway. You can work around this by caching:

import functools
lazy import re

@functools.cache
def regex() -> re.Regex:
  return re.compile(...)
from __future__ import annotations

__lazy_modules__ = ["re"]

import functools
import re


@functools.cache
def regex() -> re.Regex:
    return re.compile(...)

Notice for the old version that you need from __future__ import annotations; this is so the annotation doesn’t cause the re module to be loaded. Started in Python 3.14, this should no longer be the case, as the annotations became lazy by default in that version.

You can make these sorts of imports lazy, but you are just moving the import errors for no good reason, so it’s a bit better not to.

If you want to make everything in a file lazy, you can do it like this:

class AllLazy:
    @staticmethod
    def __contains__(_: str) -> bool:
        return True


__lazy_modules__ = AllLazy()

This simply is used by testing with in on full module names, and you can put your own object in here. (The static tool below doesn’t look for this yet.)

A tool to help

So libraries ideally should start adding these __lazy_modules__, but it’s a little more complex than just putting all modules into it. So I wrote a tool, flake8-lazy, to help with figuring out exactly what to add, and with keeping it tidy. This is the first library I’ve used AI tools heavily in developing (I’ve started using them to help maintain plumbum, but that’s not from scratch), so I’ll end with a section about how that went (very well). I’ve developed flake8-errmsg in the past, so it’s not my first plugin. Like that project, there’s also a built-in standalone runner; early in the 3.15 lifecycle, I rather expect that to be the main way to use it.

To use it:

uvx flake8-lazy <filename>
pipx run flake8-lazy <filename>

This will report the errors (noqa doesn’t work in the simple runner).

Here are the errors implemented in the first (0.1.0) version:

Code 1xx: Missing lazy declarations
LZY101 stdlib module should be listed in __lazy_modules__
LZY102 third-party or local module should be listed in __lazy_modules__

These try to find things that are not used at top level, and suggest they be added to your __lazy_modules__ (the lazy syntax works too). Currently, they assume annotations do not trigger an import (since flake8, unlike Ruff, doesn’t know the minimum Python version you are targeting, it can’t tell if it’s 3.14+ or not).

Code 2xx: __lazy_modules__ validation
LZY201 __lazy_modules__ is not sorted
LZY202 module listed in __lazy_modules__ is never imported
LZY203 module listed in __lazy_modules__ is duplicated
LZY204 __lazy_modules__ is assigned after importing modules it names
LZY205 module listed in __lazy_modules__ must be an absolute name

These look for general problems specifically with __lazy_modules__.

Code 3xx: Native lazy keyword (Python 3.15+)
LZY301 lazy import inside suppress(ImportError) is misleading
LZY302 module declared lazy by both lazy keyword and __lazy_modules__
LZY303 module imported both eagerly and lazily

These look for issues specific to Python 3.15+’s new syntax. These only work on 3.15+ as the host Python, as well. You can tell uv to use it already with --python=3.15.

Code 4xx: Lazy import safety and semantics
LZY401 module is declared lazy but accessed at the top level
LZY402 module is an enclosing package for this file and should not be lazy

LZY402 is the opposite of the LZY101/LZY102 checks, basically; if you access something at top level, you might as well not make it lazy. This might get moved to a 9xx check, as it’s not problematic to do it, and the check system could be wrong.

Tips

Don’t apply this to test suites.

Look for opportunities to make things lazy if they are not listed here. The re example above is an example of this. But also check the actual imported libraries, too - one library may import another anyway (quite a few libraries import re, including typing, making that one really hard to avoid! re is pretty slow, too, sadly). You can do this with -X importtime. Anything that is lazy and never gets imported will not show up here anymore. You can force lazy imports off to see the difference. You can also force lazy imports on to see how much time you might save before starting.

Type checkers always treat TYPE_CHECKING as True, so you can avoid importing typing with this trick:

TYPE_CHECKING = False
if TYPE_CHECKING:
    ...

The __lazy_modules__ system is completely dynamic (just needs a __contains__ method for absolute module names); the checks don’t handle anything dynamic here. The most common use case, relative imports, can be left static:

__lazy_modules__ = [f"{__spec__.parent}.thing"]
import .thing

Note that __package__ is the older form of __spec__.parent.

Results

I tried running this tool on its own source code, and managed to get the --help flag 2x faster on Python 3.15. On cibuildwheel, this managed a 3-4x speedup for things like --help and --print-build-identifiers.

There’s still a ways to go - there are lots of edge cases in trying to detect if something is being resolved. For example, dataclasses resolve type hints to see if typing.ClassVar is used, which breaks laziness. It’s better to put too much into lazy than too little.

There’s also a big problem with this syntax:

from a import b

Is a.b a module or not? Only a type checker knows (if it’s typed). This is the same thing again:

from . import b

I had to assume the right hand side is not a module, but if it is, it will be missed. You can use as to avoid this ambiguous syntax.


Developing the tool with AI

This was a really interesting project to try AI on, partially because this has never been done before. Lazy imports were added quite recently, were just released about a week ago for the first time in an alpha build of CPython, and have only been easily available in uv for three or so days. The AI can’t be just grabbing some existing code because it doesn’t exist (I know that’s not how model training and validation works). It has to take my input, run tests, and read the PEP, and “reason” from that. And it does. I used it on over 40 tasks, and it never failed understand what I asked it to do. It didn’t “outsmart” me and do something smarter than I would have done, but it followed directions perfectly. Not only did I not hand write more than about 5% of the code (mostly tweaks and configuration), but I haven’t followed though all the implementation details. It took less than a day for the initial version (I was doing other things too while the AI worked), and getting it into a usable form (by using it on libraries) happened over the next couple of days (again, off and on). This is probably 5-7x faster than I could have done it by hand.

I started with the Scientific Python Development Guide’s template, which has strong linting, formatting, and testing setup already, perfect for AI usage. I tried a few options I haven’t used before, like uv_build for the backend, and the new Zensical documentation engine. I also ended up finding a few things that could be improved, and put them back into the template. I also increased the linting checks to ALL then used uvx --from sp-repo-review[cli] sp-ruff-checks . to get a list of checks that are always good to ignore. I think a lot of the success of the AI came down to just how good this setup is.

For AI, I’m using GitHub Copilot, with auto model selection, primarily in VSCode (though I also later used the GitHub agent feature too to develop features in parallel). The model seemed to mostly be GPT-5.3-codex, though Claude Sonnet 4.6 was auto-selected sometimes too.

I didn’t add configuration at first, but once I filled up my first context window and wanted to start a new chat, I added a handwritten AGENTS.md and a copilot CI configuration (.github/workflows/copilot-setup-steps.yml). The focus of these was to make uv, prek, and nox available and instruct the tools to use them. This reduced my need to manually run these or tell the agent about them.

I ended up doing very little manual coding - most of my manual edits were setup or configuration. If I didn’t like the model output, I just would ask for it to make changes. I was quite explicit though in instructions; I’ve written a plugin for flake8 with a manual runner before, so I knew what I wanted. And I iterated a lot. For example, when adding better error messages for broken files, the model thought about adding Python 3.11 exception notes, but then did it a different way, due to Python 3.10 being the minimum. I asked it to instead do the notes, but gate it for 3.11.

The docs were initially written by the model too, though there I did quite a bit of editing as well. I don’t love the repetition between the docs and README, but the agent is pretty good at keeping them in check, even if I edit one, it can fix the other to match. I don’t see a way to include one in the other with Zensical yet.

I even did things like ask to rebase and solve the merge conflicts. It’s didn’t fail at anything, really. The worst it did was not always run the style checks, meaning I had do to one more thing once the CI caught the failing checks. But even that was pretty rare. I asked it to refactor the really long __init__.py file eventually, and it did that perfectly too; the only thing it didn’t do was re-apply __lazy_modules__.

I really couldn’t be much happier with the results. The agent was great at writing tests for everything it added, even without prompting. It would work through errors and warnings - I didn’t have to do any of that, which was fantastic. I ran on the CPython source code, and found an issue (the encoding wasn’t handled), so I just told the agent how to run it, and it found the issue and applied the correct fix (use the tokenizer rather than manually opening the file, which I would have taken much longer to find). I asked it to handle relative imports correctly (.), and it generalized for .., etc.

I also asked it to read PEP 810, and look for possible checks based on the text, which it did a great job with, and some of the checks are actually from those suggestions. I also asked it to come up with a better numbering scheme, which it also did.

Refactors were amazing. I could just make big changes, like reorganizing the numbering scheme, and then it would just tinker for a while and then it was done. I tried making this multithreaded on free-threaded Python (the CLI runner, that is), but got pretty poor results; something is creating a single-threaded bottleneck, and wasn’t able to find it quickly. But being able to try big things like that very easily was great. And this still wasn’t a “mistake”, it did exactly what I wanted.

Iteration on stuff that takes times was a strong point. If you have a tool that outputs something, rather than fixing it itself, the agent is really good and applying a fix (including complex things like typing) and rerunning. This continues to make linting tools even more valuable.

The code quality is not terrible, but hand written would be better, I think. I’ve generally seen that - in the past, I’ve used AI for a quick first draft to see if something is performant, etc, but will do a hand written implementation for the actual PR. AI also can do cleanup if you ask it to, and telling it the minimum Python version, that you value modern readable code, etc all helps. But that begs the question; if the tests and linting are strong, and if you use AI to edit it in the future, does human readability matter as much now? Also, does that lock you into using AI tools? (I made sure the quality wasn’t that bad, but interesting philosophical questions nonetheless.)

If you’d like to see what the work looked like, you can see the commit history and the GitHub Agent PRs. Overall, it’s quite incredible, comparing my attempts at AI early last year (just slop), last advent of code in TypeScript (great for learning a language, and actually pretty good at refactoring and helping), and now just 3 months later, where it’s really, really good. I’ve tried to get it to do pattern matching before; it was terrible, and now it nails it (still has to be asked, though). It’s still a tool that does what it’s told, but it’s gotten good at doing what it’s told. Combined with proper linting and testing setups, it’s a very good helper.

The skill set required to work with it, I believe, is the same. I am still doing the sort of high level things I’d do when designing a library. When I added generic typing to boost-histogram, I did one by hand, then told AI to do the rest following my example. I’m making the decisions, it’s just now a lot faster (as in, less of my time, I am doing other things while it’s working) to see the result of those decisions.

By the way, the 0x token models (I tried GPT 5 mini) work fine at taking the output of flake8-lazy --format=lazy-modules and applying them to a non-trivial codebase automatically. It’s a bit slow, but it works, I used that on cibuildwheel initially. So I added a --apply feature to the CLI to inject the lines in 0.4.0.