Writing Python command-line tools with cliff

As part of the Keepsafe Europe project at EDINA, we needed to build a tool to help us understand and manipulate files sent to us by publishers. We didn’t know much about the data formats we’d get (other than, as always, they would be completely inconsistent), but we knew that we would have to manipulate them in various ways, and bring them into a roughly standard format to interact with an API we’d built. We also knew we would have to be able to script this so that we could run it in the future with different data, and enable others outside the university to do this sort of manipulation for themselves.

As a first-step, we wanted to build a command-line application which would let us manipulate CSV-like data and submit requests to our API. Luckily, using Python for this turned out to be pretty simple, thanks to a framework called cliff. I couldn’t find much about cliff online, so I thought I’d write up a quick intro/advert for it here.

Python’s built-in argparse module already makes it straight-forward to build command-line scripts that take in arguments and handle them sensibly. What cliff does is build on top of that, allowing you to make an entire application with: subcommands (think git add, git commit, git ...); an interactive shell; base classes for common types of commands such as listing data; and output  formatted in a variety of fashions.

I’m going to work through a very basic, stripped-down version of our app, to help you get a feel for what cliff can do.

First, setup an environment and install cliff in the standard fashion:

$ virtualenv -p /usr/bin/python3 env
$ source env/bin/activate
$ pip install cliff

Then, create a module, which I’ll just call app.py:

import os
import sys

from cliff.app import App
from cliff.command import Command
from cliff.commandmanager import CommandManager
from cliff.show import ShowOne

data = [
    [ 'Journal', 'Publisher', 'Print ISSN', 'Online ISSN' ],
    [ 'Journal of Software', 'Computer Publishings', '0000-0000', '0000-0001' ],
    [ 'Journal of Hardware', 'Computer Publishings', '1111-0000', '1111-0001' ],
    [ 'Software Development Monthly', 'Megacorp', '2222-0000', '2222-0001' ],
    [ 'Hardware Letters', 'XIT University Press', '3333-0000', '3333-0001' ],
]

class MyApp(App):

    def __init__(self):
        super().__init__(
            description='Does some awesome stuff',
            version='0.1',
            command_manager=CommandManager('myapp'),
            deferred_help=True,
        )

    def initialize_app(self, argv):
        commands = [ Select, ]
        for command in commands:
            self.command_manager.add_command(command.__name__.lower(), command)

class Select(ShowOne):
    'display details of journal with given title'

    def get_parser(self, prog_name):
        parser = super().get_parser(prog_name)
        parser.add_argument('title', help='Title of journal to show')
        return parser

    def take_action(self, parsed_args):
        headers = data[0]
        for d in data[1:]:
            if d[0] == parsed_args.title:
                return (headers, d)
        return (None, None)

def main(argv=sys.argv[1:]):
    myapp = MyApp()
    return myapp.run(argv)

if __name__ == '__main__':
    sys.exit(main(sys.argv[1:]))

You’ll also need to add a module called setup.py:

#!/usr/bin/env python

from setuptools import setup, find_packages

setup(
    name='myapp',
    version='0.1',
    description='Does awesome stuff',
    author='Steven Carlysle-Davies',
    author_email='steven.carlysle-davies@ed.ac.uk',
    entry_points={
        'console_scripts': [
            'myapp = app:main'
        ],
    },
)

Then, you just need to install your application into your environment:

$ pip install -e .

That’s it, you now have a full (if somewhat minimal) application ready to interact with:

$ myapp select "Journal of Software"
+-------------+----------------------+
| Field       | Value                |
+-------------+----------------------+
| Journal     | Journal of Software  |
| Publisher   | Computer Publishings |
| Print ISSN  | 0000-0000            |
| Online ISSN | 0000-0001            |
+-------------+----------------------+

You can also get the data in a different format:

$ myapp select -f json "Hardware Letters"
{
 "Online ISSN": "3333-0001",
 "Publisher": "XIT University Press",
 "Journal": "Hardware Letters",
 "Print ISSN": "3333-0000"
}

And use the interactive application:

$ myapp
(myapp) help select
usage: select [-h] [-f {json,shell,table,value,yaml}] [-c COLUMN]
 [--prefix PREFIX] [--max-width <integer>] [--print-empty]
 [--noindent]
 title

display details of journal with given title

positional arguments:
 title Title of journal to show

optional arguments:
 -h, --help show this help message and exit

...
(myapp) select -f value "Journal of Hardware"
Journal of Hardware
Computer Publishings
1111-0000
1111-0001
(myapp) quit
$

As you can see, a fairly powerful range of functionality for not much code. Let’s take a closer look at what’s going on.

First, we override the default constructor to App with some parameters to let cliff know what our application is and what it does, which is used in the generated help.

    def __init__(self):
        super().__init__(
            description='Does some awesome stuff',
            version='0.1',
            command_manager=CommandManager('myapp'),
            deferred_help=True,
        )

Next, we need to tell cliff what commands are available.

    def initialize_app(self, argv):
        commands = [ Select, ]
        for command in commands:
            self.command_manager.add_command(command.__name__.lower(), command)

The recommended way of doing this is actually through specifying entry-points in your setup.py. That would complicate this example, and in practice we’ve actually found that it’s more readable to specify them directly in your application, instead of having to edit a list in two places.

Next we define the command class. We’re inheriting from a built-in cliff command called ShowOne, which is useful for any commands where you want to show the details of one ‘thing’, e.g. a row, a file etc. The docstring is again used in the generated help text.

class Select(ShowOne):
    'display details of journal with given title'

Then, we specify what arguments our command takes. This uses the built-in argparse module, but cliff has already added some common options to it in the parent class. All we’re adding is that we take in one required argument, called ‘title’.

    def get_parser(self, prog_name):
        parser = super().get_parser(prog_name)
        parser.add_argument('title', help='Title of journal to show')
        return parser

Finally, we need a subroutine that actually does all the work once the command is called. This gets given the arguments that have been parsed according to the parser defined above, and ShowOne expects the subroutine to return a 2-tuple of (headers, values). In our case, we’re just looping through the data to find the matching row (and not handling errors very gracefully!).

    def take_action(self, parsed_args):
        headers = data[0]
        for d in data[1:]:
            if d[0] == parsed_args.title:
                return (headers, d)
        return (None, None)

In our actual application, this subroutine is often usually just a line or two long, taking in the parsed_args variable and extracting the values to pass to another subroutine elsewhere in our application. That allows us to test the logic of the code more easily.

Last, we need a bit of boilerplate that’ll be in almost all your cliff applications, but this is what actually instantiates your application and passes through the parameters from the command-line.

def main(argv=sys.argv[1:]):
    myapp = MyApp()
    return myapp.run(argv)

if __name__ == '__main__':
    sys.exit(main(sys.argv[1:]))

I’m going to gloss over setup.py, but it’s part of setuptools, and there’s more information about how cliff interacts with it in the cliff documentation.

As you can see, quite a lot of the above code is just needed to initialise the application , and only a small part is actually needed to define the select command. Let’s add another command, one that displays multiple rows of the data:

from cliff.lister import Lister
...
commands = [ Filter, Select, ]
...
class Filter(Lister):
    'display selected columns of the data'

    def get_parser(self, prog_name):
        parser = super().get_parser(prog_name)
        parser.add_argument('--index', action='store_true', help='Use column index numbers instead of names')
        parser.add_argument('column', nargs='+', help='Selected columns to display')
        return parser

    def take_action(self, parsed_args):
        if parsed_args.index:
            columns = [int(c) for c in parsed_args.column]
        else:
            columns = [data[0].index(c) for c in parsed_args.column]

        selected = [ [d[c] for c in columns] for d in data ]
        return (selected[0], selected[1:])

This command uses the other major built-in cliff command, Lister. This is similar to ShowOne, but used where you want to display a list of things to the user. This time, we’re letting the user give us a list of columns to display. If they specify --index, they can give us column indices instead of headers.

$ myapp filter "Journal" "Publisher"
+------------------------------+----------------------+
| Journal                      | Publisher            |
+------------------------------+----------------------+
| Journal of Software          | Computer Publishings |
| Journal of Hardware          | Computer Publishings |
| Software Development Monthly | Megacorp             |
| Hardware Letters             | XIT University Press |
+------------------------------+----------------------+

$ myapp filter --format json --index 1 2
[
 {
 "Publisher": "Computer Publishings",
 "Print ISSN": "0000-0000"
 },
 {
 "Publisher": "Computer Publishings",
 "Print ISSN": "1111-0000"
 },
 {
 "Publisher": "Megacorp",
 "Print ISSN": "2222-0000"
 },
 {
 "Publisher": "XIT University Press",
 "Print ISSN": "3333-0000"
 }
]

$ myapp help filter
usage: myapp filter [-h] [-f {csv,json,table,value,yaml}] [-c COLUMN]
 [--noindent] [--quote {all,minimal,none,nonnumeric}]
 [--max-width <integer>] [--print-empty] [--index]
 column [column ...]

display selected columns of the data

positional arguments:
 column Selected columns to display

optional arguments:
 -h, --help show this help message and exit
 --index Use column index numbers instead of names
...

There’s a lot more to cliff, but hopefully this has given you a taste for what it can do. We’ve found it trivially easy to implement commands, and let cliff handle all the complications of parsing arguments and formatting the output appropriately. It lets us easily build scripts that manipulate data (and let others who don’t know Python build the same scripts). For anyone looking to build quick and powerful command-line applications, I’d highly recommend it.

 

 

Python: How you import impacts how you mock

Recently, I had a problem with monkeypatching external service based function. In a nutshell, monkeypatch is a pytest fixture that allows you to replace an object or function with your mock version. Try as I might I could not get it to work. This blog post is something I wish I had read when getting my mocks to work first time.

This is the module structure with get_user function that connects to LDAP. We want to mock get_user in our tests to avoid connecting to LDAP and to not have to use real users for tests.

base.py

def get_user(uun):
    return ca_user

SCENARIO 1: importing get_user function in the module where it’s required

forms.py

from central_auth.base import get_user

def clean_user(uun):
   ca_user = get_user(uun)

Trying to monkeypatch it as below will not have the desired effect. The real get_user still gets called.

tests.py

from central_auth import base

def test_form_POST_OK(monkeypatch):
    monkeypatch(base, 'get_user', get_mock_user_function)

    get_view_with_form() # where get_user is called

# DOES NOT WORK

This is because when we do named importation i.e. importing object or function as opposed to module, the object/function gets a new namespaced name. This way what exists as central_auth.base.get_user is referred to forms.get_user within forms.py.

To make the monkeypatch work, base module should be imported like so:

SCENARIO 2: module import

forms.py

from central_auth import base

def clean_user(uun):
   ca_user = base.get_user(uun)

Then we end up having central_auth.base.get_user and forms.base.get_user, both referring to the same base module.

Alternatively, we can use unittest.mock.patch to the same effect which also allows a greater level of granularity:

SCENARIO 1:

forms.py

from central_auth.base import get_user

def clean_user(uun):
   ca_user = get_user(uun)
tests.py

@mock.patch('forms.get_user', side_effect=get_mock_user_function)
def test_form_OK(form_get_user)

SCENARIO 2:

forms.py

from central_auth import base

def clean_user(uun):
   ca_user = base.get_user(uun)
tests.py

@mock.patch('forms.base.get_user', side_effect=get_mock_user_function)
def test_form_OK(forms_get_user)

The only disadvantage of using mock.patch is that if get_user is called in different modules in your tested function then you need to mock all of them specifically like so:

forms.py

from central_auth import base

def clean_user(uun):
   ca_user = base.get_user(uun)
views.py

from central_auth import base

def view_user(request, uun):
    ca_user = base.get_user(uun)
tests.py

@mock.patch('forms.base.get_user', side_effect=get_mock_user_function)
@mock.patch('views.base.get_user', side_effect=get_mock_user_function)
def test_form_OK(views_get_user, forms_get_user):
    get_view_with_form()

 

Since I am a newbie to python and pytest, give me a shout if I got something terribly wrong. For now, my tests work 🙂

Cheers

Django Training

After the Python course, the same group of developers met again with Toby for a Django framework training.  The Django course was a day shorter than the Python one and felt more focused and intensive.  We were given a great opportunity to build upon our Python knowledge and learn how one does web development using the Django framework.

Continue reading “Django Training”

Python Training

We’re in that exciting period where we’re introducing a new language and new framework within IS Apps: we’re adopting Python with Django as our development platform of choice for web apps. Making such a significant change isn’t just about the technology or the underlying infrastructure but the people too, and means making a significant investment in developing our knowledge and skills.

To that end a group of intrepid developers and I have just undertaken the first of two training courses covering Python and Django.

Continue reading “Python Training”