Scripting in Bash is a pain. Bash can do almost anything, and is unbeatable for small scripts, but it struggles when scaling up to doing anything close to a real world scripting problem. Python is a natural choice, especially for the scientist who already is using it for analysis. But, it’s much harder to do basic tasks in Python. So you are left with scripts starting out as Bash scripts, and then becoming a mess, then being (usually poorly) ported to Python, or even worse, being run by a Python script. I’ve seen countless Python scripts that run Bash scripts that run real programs. I’ve even written one or two. It’s not pretty.
I recently came (back) across a really powerful library for doing efficient command line scripts in Python. It contains a set of tools that makes the four (five with color) main tasks of command line scripts simple and powerful. I will also go over the one main drawback of the library (and the possible enhancement!).
Note: The colors module is new to Plumbum in 1.6.0.
Local commands
The first and foremost part of the library is a replacement for popen, subprocess, etc. of Python. I’ll compare the “correct, current” Python standard library method and Plumbum’s method.
Basic commands
Our first task will simply be to get our feet wet with a simple command. Let’s
run ls
to see the contents of the current directory. This is easy with
subprocess.call
:
import subprocess
subprocess.call(["echo", "I am a string"])
0
What just happened? The result, zero, was the return code of the call. The
output of the call went to stdout, so if we were in a terminal, we would have
seen it output (and in IPython notebook, it will show up in the terminal that
started the notebook). This might be what we want, but probably we wanted the
value of the output. That would be subprocess.check_output
:
subprocess.check_output(["echo", "I am a string"])
b'I am a string\n'
As you can already see, this not only requires different calls for different situations, but it even gave a bytes string (which is technically correct, but almost never what you want for a shell script). The reason for the different calls is because they are shortcuts to the actual subprocess Popen object. So we really need:
p = subprocess.Popen(
["echo", "I am a string"],
shell=False,
bufsize=512,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
outs, errs = p.communicate()
outs
b'I am a string\n'
As you can guess, this is only a small smattering of the options you can pass (not all were needed for this call), but it gives you an idea of what is needed to work with subprocess.
Let’s look at Plumbum. First, let’s see the fastest method to get a command:
from plumbum import local, FG, BG, TF, RETCODE
echo = local["echo"]
echo("I am a string")
'I am a string\n'
Here, we have a local object, which represents the computer. It acts like a dictionary; if you put in a key, you get the command that would be called if you run that command in a terminal. Let’s look at the object we get:
echo
LocalCommand()
Now this is a working python object and can be called like any Python function! In fact, it can access most all of the details and power of the Popen object we saw earlier. If you don’t like to repeat yourself, there is a magic shortcut for getting commands:
from plumbum.cmd import echo
There is no echo
command in a cmd.py
file somewhere; this dynamically does
exactly what we did, calling ['echo']
on the local object. This is quicker and
simpler, but it is good to know what happens behind the scenes!
Plumbum also allows you to add arguments to a command without running the
command; as you will soon see, this allows you to build complex commands just
like bash. If you use square brackets instead of parenthesis, the command
doesn’t run yet (Haskal users: this is currying; Pythonistas will know it as
partial
)
echo["I am a string"]
BoundCommand(LocalCommand(), ['I am a string'])
When you are ready, you can call it:
echo["I am a string"]()
'I am a string\n'
Or, you can run it in the foreground, so that the output is sent to the current
terminal as it runs (this is the subprocess.call
equivalent from the
beginning, although non-zero return values are not handled in the same way):
from plumbum import FG
echo["I am a string"] & FG
Complex commands (piping)
Stdin
Now, how about input a python text string to a command? As an example, let’s use
the unix dc
command. It is a desktop calculator, with reverse polish notation
syntax.
from plumbum.cmd import dc
We can call it using the -e
flag followed by the calculation we want to
perform, like 1 + 2
. We already know how to do that,
dc("-e", "1 2 + p")
'3\n'
But, it also can be run without this flag. If we do that, we can then type (or pipe) text in from the bash shell.
subprocess
, we don’t
have a shortcut, so we have to use Popen
, manually setting the stdin
and
stdout
to a subprocess
PIPE
, and then communicate in bytes.
proc = subprocess.Popen(["dc"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
outs, errs = proc.communicate("1 2 + p".encode("ascii"))
outs
b'3\n'
(dc << "1 2 + p")()
'3\n'
Piping
Of course, in bash we can pipe text from one command to another. Let’s compare that (not going to even try the subprocess call here).
!echo "1 2 + p" | dc
3
(echo["1 2 + p"] | dc)()
'3\n'
print(echo["1 2 + p"] | dc)
/bin/echo '1 2 + p' | /usr/bin/dc
Background execution
One of the great things about Bash is the ease of “simple” multithreading; you
can start a command in the background using the &
character. To test this, we
need a long running command that returns a value. In bash, we can make this
using the following function:
$ fun () { sleep 3; echo 'finished'; }
$ fun
finished
$ fun &
[1] 6210
$finished
[1]+ Done fun
Here, when we ran it in the foreground, it held up our terminal until it
finished. The second time we ran it, it gave us back our terminal, but we were
interrupt ed three seconds later with text from the process. If we wanted to
interact with the process, or wait for it to finish, etc, we could do $!
to
get the pid of the last spawned process, and then use wait
to wait on the pid.
(see git-all.bash for an example).
This simplicity is not usually something that is easy to emulate in a programming language. Let’s see it in plumbum. Here, I’m piping sleep (which doesn’t print anything) to echo, just so I can get a slow running command, and I’m using IPython’s time magic to measure the time taken:
%%time
sleep = local['sleep']
sleep_and_print = sleep['3'] | echo['hi']
print(sleep_and_print())
hi CPU times: user 5.75 ms, sys: 9.25 ms, total: 15 ms Wall time: 3.01 s
%%time
bg = sleep_and_print & BG
CPU times: user 3.94 ms, sys: 7.45 ms, total: 11.4 ms Wall time: 20.4 ms
Now, bg is a Future
object that is attached to the background process. We can
call .poll()
on it to see if it’s done or .wait()
to wait until it returns.
Then, we can access the stdout and stderr of the command. (stdout
, etc. will
automatically wait()
for you, so you can use them directly.)
%%time
print(bg.stdout)
hi CPU times: user 2.14 ms, sys: 1.76 ms, total: 3.9 ms Wall time: 2.73 s
Remote commands
Besides local commands, Plumbum provides a remote class for working with remote machines via SSH in a platform independent manner. It works much like the local object, and will use the best system, including Paramiko, to do the processes. I haven’t moved my scripts from pure Paramiko to Plumbum yet, but only having to learn one procedure for both local and remote machines is a huge plus (and Paramiko is fairly ugly to program in, like subprocess).
Command Line Applications
Command line applications on Python already have one of the best toolkits
available, argparse (C++’s Boost Program Options library is a close second).
However, after seeing the highly pythonic Plumbum cli
module, it feels
repetitive and antiquated.
Let’s look at a command line application that takes a couple of options. In argparse, we would need to do the following:
%%writefile color_argparse.py
import argparse
def main():
parser = argparse.ArgumentParser(description='Echo a command in color.')
parser.add_argument('-c','--color', type=str,
help='Color to print')
parser.add_argument('echo',
help='The item to print in color')
args = parser.parse_args()
print('I should print', args.echo, 'in', args.color, "- but I'm lazy.")
if __name__ == '__main__':
main()
Overwriting color_argparse.py
%run color_argparse.py -c red item
I should print item in red - but I'm lazy.
As you can tell from the documentation, the programs quickly grow as you try to do more advanced commands, grouping, or subcommands. Now compare to Plumbum:
%%writefile color_plumbum.py
from plumbum import cli
class ColorApp(cli.Application):
color = cli.SwitchAttr(['-c','--color'], help='Color to print')
def main(self, echo):
print('I should print', echo, 'in', self.color, "- but I'm lazy.")
if __name__ == '__main__':
ColorApp.run()
Overwriting color_plumbum.py
%run color_plumbum.py -c red item
I should print item in red - but I'm lazy.
Here, we see a more natural mapping of class -> program, and also we have a lot more control over the items this way, as well. For example, if we want to add a validator, say to check existing files or to ensure a number in a range or a word in a set, we can do that on each switch. Switches can also be full fledged functions that run when the switch is set. And, we can easily extend this process to subcommands (see git-all.py) and it remains readable and avoids duplication.
Path manipulations
Path manipulations using os.path
functions are messy and can become involved
quickly. Things that should be simple require several functions chained to get
anywhere. The situation was bad enough to warrant adding an additional module to
Python 3.4+, the provisional
pathlib module. Now this is
not a bad module, but you have to install a separate library on Python 2.7 or
3.3 to get it, and it has a couple of missing features. Plumbum provides a
similar construct, and it is automatically available if you are already using
Plumbum, and it corrects two of the three missing features. The features I’m
mentioning are:
- No support for manipulation of multiple extensions, like
.tar.gz
- Plumbum supports an additional argument to
.with_suffix()
, default matches pathlib
- Plumbum supports an additional argument to
- No support for home directories
- Plumbum provides the
local.env.home
path
- Plumbum provides the
- No support for using
open(path)
without wrapping in astr()
call- Can’t be fixed unless path subclasses str (not likely for either library, see unipath), or pathlib support added to the system open function (any Python devs reading? Please?)
I would love to see the pathlib
module adapt the .with_suffix()
addition
that Plumbum has, and add some sort of home directory expansion or path, as
well.
Plumbum also has the unique little trick that //
automatically calls glob,
making path composition even simpler. I doubt we’ll get this added to pathlib,
but I can always hope (at least, until someone removes the provisional status).
Color support (NEW)
I’ve been working on a new color library for Plumbum. git-all.py
has been
converted to use it.
Colors are used through the Styles generated by the colors object. You can get colors and attributes like this:
from plumbum import colors
red = colors.fg.red # Red foreground color
other_color = colors.bg(2) # The second background color
bold = colors.cold
reset = colors.reset
You can directly access colors
as if it was the fg
object. Standard term
colors can be accessed with ()
, and the 256 extended colors can be accessed
with []
by number, name (camel case or underscore), or html code. All objects
support with statements, which restores normal font (for a single Style
, it
will reset only the necessary component if possible, like bold
or fg
color).
You can manually take the inverse (~
) to get the undo-ing action. Calling a
Style
without any parameters will send it to stdout
. Using |
will wrap a
string with a style (as will []
notation). Styles otherwise act just like
normal text, so they can be added to text, etc (they are str
subclasses, after
all).
For the following demo, I’ll be using the HTMLCOLOR, and a with statement to
capture output in IPython and display it as HTML. (See my upcoming post for a
more elegant IPython display technique.) Also note redirect_stdout
is new in
Python 3.4, but is easy to implement in other versions if needed.
from plumbum.colorlib import htmlcolors as colors
from IPython.display import display_html
from contextlib import contextmanager, redirect_stdout
from io import StringIO # Python3 name
@contextmanager
def show_html():
out = StringIO()
with redirect_stdout(out):
yield
display_html(out.getvalue(), raw=True)
Now, inside the capture context manager, we can use COLOR just like on a
terminal (save for needing to use </br>
to break lines if we don’t take
advantage of the build in htmlstyle print command, and having to be careful not
to use un-reset Styles).
with show_html():
colors.green.print("This is in red!")
(colors.bold & colors.blue).print("This is in bold blue!")
colors.bg["LightYellow"].print("This is on the background!")
colors["LightBlue"].print("This is also from the extended color set")
print(
"This is {colors.em}emphasized{colors.em.reset}! (reset was needed)".format(
colors=colors
),
end="<br/>",
)
print("This is normal")
This is in red!
This
is in bold blue!
This is
on the background!
This is also from the
extended color set
This is emphasized! (reset was
needed)
This is normal
Putting it together in an example: git-all
Now, let’s look at a real world example previously mentioned:
git-all.bash. This is a script I
wrote some time ago for checking a large number of repositories in a common
folder. Due to the clever way git subcommands work, simply naming this git-all
and putting it in your path gives your a git all
command. It is written in
very reasonable bash, IMO, and works well.
Directory manipulation
Let’s look at this piece by piece and see what would be required to convert it to Python. First, this script is in one of the repo’s, so we need the current directory, up one.
unset CDPATH
SOURCE="${BASH_SOURCE[0]}"
while [ -h "$SOURCE" ]; do
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
SOURCE="$(readlink "$SOURCE")"
[[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
done
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
REPOLOC=$DIR/..
(Sorry for the awful highlighting by IPython, it hates the $ in strings for Bash.)
REPOLOC = local.path(__file__) / ".."
for file in $(ls); do
if [[ -d $REPOLOC/$file/.git ]]; then
...
done
And code goes here.
valid_repos = [d / "../.." for d in local.cwd // "*/.git/config"]
def git_on_all(bold=False):
for n, repo in enumerate(valid_repos):
with local.cwd(repo):
with color_change(n):
yield repo.basename
To use it, simply loop over git_on_all()
:
for repo_name in git_on_all():
print("The current working directory is in the", repo_name, "repo!")
Command line arguments
We don’t have a nice cli
tool in Bash, so we have to build long if statements.
We can separate each command in Python, and let the help file be built for us:
@GitAll.subcommand("pull")
class Pull(cli.Application):
"Pulls all repos in the folder, not threaded."
def main(self):
for repo in git_on_all():
git["pull"] & FG
This is git all pull
, clean and separated from the ugly loops in Bash.
Multithreading
if [[ $1 == qfetch ]]
|| [[ $1 == fetch ]]
|| [[ $1 == status ]]; then
for file in $(ls); do
if [[ -d $REPOLOC/$file/.git ]]; then
cd $REPOLOC/$file
git fetch -q &
spawned+=($!)
fi
done
echo -n "Waiting for all repos to report: "
for pid in ${spawned[@]}; do
wait $pid
done
echo "done"
fi
def fetch():
bg = [(git["fetch", "-q"] & BG) for repo in git_on_all()]
printf("Waiting for the repos to report: ")
for fut in bg:
fut.wait()
print("done")
This is just as readable, if not more so, and doesn’t need the if loop to check
the input, since that’s now part of the cli
interface. The actual version in
the script also can report errors in the fetch, which the Bash version can not.
Colors (classic tput method)
We would like to toggle colors, so each repo is in a different cyclic color. My final Bash solution was elegant:
echo -n
on these):
txtreset=$(tput sgr0)
txtbold=$(tput bold)
& FG
):
txtreset = tput["sgr0"]
txtbold = tput["bold"]
Though with the plumbum.color
library, we don’t have to.
@contextmanager
def color_change(color):
txtreset & FG
txtbold & FG
tput["setaf", color % 6 + 1] & FG
try:
yield
finally:
txtrst & FG
The try/finally block allows this to restore our color, even if it throws an exception! This is tremendously better than the Bash version, which leaves the color on the terminal if you make a mistake. A nice example of context managers can be found on Jeff Preshing’s blog.
You can use it to wrap parts of the code that print in a color:
with colorchange(tput("setaf", 2), bold=True):
print("This will be in color number 2")
Colors (new method)
Plumbum has a new colors tool, and this is how you would use in in this script.
from plumbum import colors
Colors can be generated cyclically by number, and combinations of color and attributes can be put in a with statement, too:
with(colors.fg[1:7][n%6] & colors.bold):
And, we can simply unbold:
colors.bold.reset.print(
git("diff-files", "--name-status", "-r", "--ignore-submodules", "--")
)
And that’s it! All the benefits we had from before are here.
Final Comparison
I’ll be using functions in the Python version to make it clear what each git call does, and making the Python version cleaner in a few ways that I could also apply to the Bash script. So this is not meant to be a 1:1 comparison. In my defense, Bash users tend to avoid functions or other clean programming practices.
Most of the extra lines are from the Python functions. Also, I’ve improved a
couple of commands for git for current best practices. I’ve also avoided using
FG for the print commands, so that I can control the color and the long-output
paging (If you change print()
for & FG
, the output would match the Bash
script). Here is the script:
git-all.py.
Note: You might want to look at the history of that script, as I’ll probably update it occasionally as I start using it.
Notice that it is very clear what each part of the cli
part of the script, and
it’s easy to add a feature or extend it. The long for loops are nicely
abstracted into iterators.
Also, there may be bugs for a few days while I start using this instead of my
bash script. Also, it must be renamed to git-all
with no extension for
git all status
etc. to work.
Bonus: Possible improvement: argcomplete support
One last thing: The one drawback to Plumbum over argparse is due to one enhancement package for argparse. I think a great addition to the Plumbum library would be argcomlete support. If you’ve never seen argcomplete, it’s a bash completion extension for argparse. Argcomplete allows you to add validators, and will suggest completions when pressing tab on the command line.
Adding support to Plumbum would not be that hard, and probably wouldn’t even require argcomplete to be installed. The Argcomplete API requires three things:
#ARGCOMPLETE_OK
near the top of a script- Special output piped to several channels (8 and 9, I believe) of the terminal
when the
_ARGCOMPLETE
special environment variable is set, and then exits before calling.main()
. - The ability to predict the next completion
The first one is easy, and wouldn’t require anything from Plumbum. The second
would be a simple addition to a new method cli.Application.argcomplete(self)
that could be overridden to remove or customize argcomplete support. The final
one is the hard one, the prediction of the possible completions. If that can be
done, support could be added.
Because support would be added into Plumbum itself, you wouldn’t have to use the monkey patching that argcomplete has to use to inject itself into argparse. You would still use the same bash hooks that argcomplete uses, so it would work along side it, being called in the same way.