Creating CLI tools leveraging ZIO and Decline using scala-cli

2022-06-02

Here’s a quick overview and demonstration of kicking the tires on ZIO 1.x + decline (commandline argument parser) using scala-cli (version 0.1.6) to build commandline tools.

High Level Points

scala-cli is a new tool in active development that aims to replace the current scala tool.
scala-cli enables running scripts, loading REPL, compiling, testing, packaging (amongst other features) for “simple” (flat) projects.
scala-cli enables a very Python-ish workflow and integrates with your text editor well. A REPL driven approach can be used using scala-cli repl File.scala.
scala-cli package enables creating an executable from your main class. This is very useful. No need to deal with Python’s conda, pipenv, poetry, etc… for setting up an env. Rsync/scp your packaged tool to another server and your good to go (provided the java versions are compatible).
Building a commandline tool using ZIO was a useful exercise to kick the tires on ZIO and understand the ZIO 1.x effect system and learn how to compose computation in ZIO.
decline is a CLI parser library using cats. It has a very elegant mechanism of composing options/commands.
The default decline interface had some unexpected behaviors with how --help was handled and how errors were handled/mapped to exit codes.

Specifics

Using scala-cli setup-ide . will generate the necessary files for the LSP server for your text editor.

https://scala-cli.virtuslab.org/docs/commands/setup-ide

Using directives //> in the top of your scala file, you can define the scala version, library versions, etc…

https://scala-cli.virtuslab.org/docs/reference/directives

For example, Declined.scala

//> using platform "jvm"
//> using scala "2.13.8"
//> using lib "dev.zio::zio:1.0.14"
//> using lib "com.monovore::decline:2.2.0"
//> using mainClass "DeclinedApp"

Using zio.App we can define our main by overriding run.

1
2
3

object DeclinedApp extends zio.App {
   override def run(args: List[String]): ZIO[ZEnv, Nothing, ExitCode] = ???
}

Using decline, CLI options and arguments are defined using Opts.

For example:

import com.monovore.decline._
import cats.implicits._

val nameOpt = Opts.option[String]("user", help = "User name", "u")

val alphaOpt = Opts
    .option[Double]("alpha", help = "Alpha Filtering", "a")
    .withDefault(1.23)

These Opts compose in interesting ways, specifically with ZIO effects.

val versionOpt: Opts[RIO[Console, Unit]] = Opts
    .flag(
      "version",
      "Show version and Exit",
      "v",
      visibility = Visibility.Partial
    )
    .orFalse
    .map(_ => putStrLn(VERSION))

You can compose options together using mapN to define an “action” or “command” that would need multiple commandline options/args.

For example to define a mainOpt that uses name, alpha, force:

val nameOpt = Opts.option[String]("user", help = "User name", "u")

val alphaOpt = Opts
    .option[Double]("alpha", help = "Alpha Filtering", "a")
    .withDefault(1.23)

// Add for testing failure modes
val forceOpt = Opts.flag("fail", help = "Manually trigger a failure").orFalse

// Now define our core
val mainOpt: Opts[RIO[Console, Unit]] =
    (nameOpt, alphaOpt, forceOpt).mapN[RIO[Console, Unit]] {
      (name, alpha, force) =>
        if (force)
          ZIO.fail(
            new Exception(s"Manually FAIL triggered by $name! alpha=$alpha")
          )
        else putStrLn(s"Hello $name. Running with alpha=$alpha")
    }

In addition to composing using mapN, there’s orElse which enables composing “actions”. For example, enabling --version to run (if provided) or run the “main” application.

1	val runOpt = versionOpt orElse mainOpt

These composed Opts can be used in a Command that will handled --help and be central point where Command.parse can be called.

val command: Command[RIO[Console, Unit]] = Command[RIO[Console, Unit]](
    name = "declined",
    header = "Testing decline+zio"
  )(versionOpt orElse mainOpt)

Bridging the ouptut of Command.parse with ZIO requires a little glue and some special attention to deal with the error cases. Command.parse will return an Either[Help, T]. The left of Either being used as “help+errors” container is a bit of friction point because --help triggers the left of the Either. Help.errors will return a non-empty list of errors (if there are parse errors).

def effect(arg: List[String]): ZIO[Console, Throwable, Unit] = {
    command.parse(arg) match {
      case Left(help) => // a bit odd that --help returns here
        if (help.errors.isEmpty) putStrLn(help.show)
        else IO.fail(new Exception(s"${help.errors}"))
      case Right(value) => value.map(_ => ())
    }
}

And finally, wire a call to run and make sure errors are written to stderr and a non-zero exit code is returned during failure cases.

override def run(args: List[String]): ZIO[ZEnv, Nothing, ExitCode] =
    effect(args).map(_ => ExitCode.success).catchAll { ex =>
      for {
        _ <- putStrLnErr(s"Error ${ex}").orDie
      } yield ExitCode.failure
}

Running can be done using scala-cli run Declined.scala

1 2	$> scala-cli run Declined.scala -- --version 0.1.0

Or by packaging the app and running the generated exe.

1
2
3

scala-cli package --jvm 14 Declined.scala
Wrote /Users/mkocher/path/to/DeclinedApp, run it with
  ./DeclinedApp

Running

$> ./DeclinedApp --help
Usage: declined [--user <string> [--alpha <floating-point>] [--fail]]

Testing decline+zio

Options and flags:
    --help
        Display this help text.
    --version, -v
        Show version and Exit
    --user <string>, -u <string>
        User name
    --alpha <floating-point>, -a <floating-point>
        Alpha Filtering
    --fail
        Manually trigger a failure

And a few smoke tests.

$> ./DeclinedApp --user Dave --alpha 3.14
Hello Dave. Running with alpha=3.14
$> echo $?
0
$ ./DeclinedApp --user Dave --alpha dragon
Error java.lang.Exception: Invalid floating-point: dragon
$> ./DeclinedApp --user Dave --alpha 3.14 --dragon
Error java.lang.Exception: Unexpected option: --dragon
$>echo $?
1

Summary and Final Comments

scala-cli is a very promising addition for the Scala community.
scala-cli changed my workflow. This new workflow was closer to how I would work in Python.
I really like ZIO’s core composablity ethos, however, it does have a learning curve.
Decline‘s Command[T] design allows for intergrate with ZIO pretty seemlessly.
Misc “scrappy” CLI tools that I would typically write in Python, I could easily write in Scala.

Exploring TypedDict in Python 3.8

2019-09-19

This post will explore the new TypedDict feature in Python and explore leveraging TypedDict combined with the static analysis tool mypy to improve the robustness of your Python code.

PEP-589

TypedDict was proposed in PEP-589 and accepted in Python 3.8.

A few key quotes from PEP-589 can provide context and motivation for the problem that TypedDict is attempting to address.

This PEP proposes a type constructor typing.TypedDict to support the use case where a dictionary object has a specific set of string keys, each with a value of a specific type.

More generally, representing pure data objects using only Python primitive types such as dictionaries, strings and lists has had certain appeal. They are are easy to serialize and deserialize even when not using JSON. They trivially support various useful operations with no extra effort, including pretty-printing (through str() and the pprint module), iteration, and equality comparisons.

This particular section of the PEP is interesting and suggests that TypedDict can be particularly useful for retrofitting legacy code (with type annotations).

Dataclasses are a more recent alternative to solve this use case, but there is still a lot of existing code that was written before dataclasses became available, especially in large existing codebases where type hinting and checking has proven to be helpful. Unlike dictionary objects, dataclasses don’t directly support JSON serialization, though there is a third-party package that implements it

The reference implementation was defined in mypy_extensions and can be installed in Python 3.7 (e.g., pip install mypy_extensions), or using typing.TypedDict in Python 3.8.

These following examples are run with mypy 0.711 and examples shown below can be obtained from this gist.

Motivation: Dictionary-Mania

Here’s a common example where a type-checking tool (e.g., mypy) won’t be able to help you catch type errors in your code.

def example_0() -> int:
    """Simple Example of Using raw dict and how mypy won't catch
    these errors with the keys
    """

    m = dict(name='Star Wars', year=1234)

    # mypy will NOT catch this error
    t = m['name'] + 100

However, with TypedDict, you can define this a structural-typing-ish interface to dict for a specific data model.

Using Python < 3.8 will require from mypy_extensions import TypedDict whereas, Python >= 3.8 will require from typing import TypedDict.

Let’s create a simple Movie data model example and explore how mypy can be used to help catch type errors.

Example 1: Basic Usage of TypedDict

class Movie(TypedDict):
    name: str
    year: int


def example_01():
    m = Movie(name='Star Wars', year=1977)
    # or
    m2:Movie = dict(name='Star Wars', year=1977)

    # the type checker will catch this
    n = m['name'] + 100

To enable runnable code that purposely has errors that can be caught by mypy, let’s define a helper function to require a specific Exception type to be raised.

import logging
log = logging.getLogger(__name__)


def log_expected_error(ex, fx):
    raised_error = False
    try:
        return fx()
    except ex as e:
        raised_error = True
        log.info(f"Got Expected error `{e}` of type {ex} from {fx.__name__}")
    finally:
        if not raised_error:
            log.error(f"Expected {fx} to raise {ex}")

Example 2: Exploring Mutating and Assignment of TypedDicts

Let’s mutate the Movie TypedDict instance and explore how mypy can catch type errors during assignment.

def example_02() -> int:
    m = Movie(name='Star Wars', year=1977)
    log.info(m)

    # mypy will catch this
    m['name'] = 11111

    def f() -> int:
        m['year'] = m['year'] + 'asfdsasdf'
        return 0

    log_expected_error(TypeError, f)

    # Use dict methods to mutate
    # Note, current verison of mypy is confused
    # by this and generates `"Movie" has no attribute "clear"`
    m.clear()

    def f2() -> int:
        # mypy won't catch this KeyError from .clear()
        return m['year'] + 100

    log_expected_error(KeyError, f2)

    # Can we mix Movie and a raw dict?
    d2 = dict(extras=True, alpha=1234, name=12345, year='1978')

    # mypy will raise TypeError here
    m.update(d2)

    log.info(m)

    # Update a Movie with a Movie
    m2 = Movie(name='Star Wars', year=1977)
    new_m = Movie(name='Movie Title', year=1234)

    # Both of these are proper Movie TypedDict
    # hence, no mypy type error
    m.update(new_m)
    log.info(m2)

There’s a few interesting items to note.

mypy will catch assignment errors
The current version of mypy will get a bit confused with dict methods, such as .clear(). Moreover, .clear() will also yield KeyErrors (related, see total=False keyword of the TypedDict)
mypy will only allow merging dicts that are the same type. You can’t mix TypedDict and a raw dict without mypy raising an issue

Example #3: TypedDicts total Keyword Argument

There’s a total keyword to the TypedDict that communicates that the dict does not need to be completely well formed. This is particularly interesting in how the mypy interpets the types.

For example, X with alpha, beta and gamma as ints, will be

class X(TypedDict, total=False):
    alpha:int
    beta:int
    gamma:int

x:X = dict()
x['alpha'] = 1
x['beta'] = 2
x['gamma'] = 3

Lets dive deeper using a variation of the previously defined Movie example using total=False to explore how mypy interprets the ‘incomplete’ data model.

class Movie2(TypedDict, total=False):
    name:str
    year:int
    release_year: Optional[int]


def example_03() -> int:
    """
    Explore with defining an 'incomplete' Movie data model and how
    None/Null checking works with mypy
    """

    m = Movie2(name='Star Wars')
    log.info(m)

    def f() -> int:
        # mypy will catch this
        m['name'] = 1234
        return 0

    # Use dict methods to mutate
    # mypy is confused by this. The error is:
    # `"Movie" has no attribute "clear"`
    m.clear()

    def f2() -> int:
        # mypy doesn't catch this NPE
        # I don't think it treats the type
        # as Optional[int]
        m['year'] = m['year'] + 100
        return 0

    log_expected_error(KeyError, f2)

    # Explicit test with release_year which
    # is fundamentally Optional[int]

    def f3() -> int:
        # mypy WILL catch this NPE
        m['release_year'] = m['release_year'] + 100
        return 0

    log_expected_error(KeyError, f3)


    # This works as expected and
    m2 = Movie2(name='Star Wars', release_year=2049)

    # This works as expected and mypy won't raise an error
    if m2['release_year'] is not None:
        t = m2['release_year'] + 10

Finally, let’s explore how isinstance works with TypedDict

Example 4: TypedDict and isinstance

def example_04() -> int:
    """Testing isinstance"""

    m = Movie(name='Movie', year=1234)


    def f() -> int:
        is_true = isinstance(m, dict)
        return 0


    # This is a bit unexpected that this
    # will raise an exception at runtime
    # ` Cannot use isinstance() with a TypedDict type`
    def f2() -> int:
        is_true = isinstance(m, Movie)
        return 0

    log_expected_error(TypeError, f2)

The important item to note here is that you can NOT use isinstance with TypedDict. Python will raise a runtime error of TypeError. Specifically the error you’ll see is show below.

1	TypeError: TypedDict does not support instance and class checks

Summary

TypedDict + mypy can be valuable to help catch type errors in Python and can help with heavy dictionary-centric interfaces in your application or library
TypedDict can be used in Python 3.7 using mypy_extensions package
TypedDict can be used in Python 2.7 using mypy_extensions and the 2.7 ‘namedtuple-esque’ syntax style (e.g., Movie = TypedDict('Movie', {'title':str, year:int}))
Using the total=False keyword to TypedDict can introduce large wholes the static typechecking process yielding KeyErrors. The keyword total=False should be used judiciously (if at all)
isinstance should not be used with TypedDict as it will raise a runtime TypeError exception
Be mindful when using TypeDict methods such as clear()
TypeDict introduces a new (somewhat) competing data modeling alternative to dataclasses, typing.NamedTuple, “classic” classes and third-party libraries, such as pydantic and attrs. It’s not completely clear to me how all these different competing data model abstractions models are going to age gracefully

I believe TypedDict can be a value tool to help improve clarity of interfaces, specifically in legacy code that is a bit dictionary-mania heavy. However, for new code, I would suggest avoid using TypedDict in favor of the thin data models, such as pydantic and attrs.

Best to you and your Python-ing.

P.S. A runnable form of the code used in the post can be found in this gist.

Python Dashboards with Panel: Kicking the Tires

2019-06-21

PyViz recently release the first official release (0.60) of Panel. Overall, I’m digging the iterative development model of developing dashboard components within Juypter lab/notebook.

Here’s an example notebook that will demonstrate creating a few dashboard components in Panel. The raw notebook can be launched using mybinder.org)

Python 3.8.0b1 Positional-Only Arguments and the Walrus Operator

2019-06-06

On June 4th 2019, Python 3.8.0b1 was released. The official changelog is here.

There are two interesting syntactic changes/features that were added which I believe are useful to explore in some depth. Specifically, the new “walrus”‘ := operator and the new Positional-Only function parameter features.

Walrus

First, the “walrus” expression operator (:=) defined in PEP-572

…naming sub-parts of a large expression can assist an interactive debugger, providing useful display hooks and partial results. Without a way to capture sub-expressions inline, this would require refactoring of the original code; with assignment expressions, this merely requires the insertion of a few name := markers. Removing the need to refactor reduces the likelihood that the code be inadvertently changed as part of debugging (a common cause of Heisenbugs), and is easier to dictate to another programmer.

A (contrived) example using Python 3.8.0b1 built from source “3.8.0b1+ (heads/3.8:23f41a64ea)”

>>> xs = list(range(0, 10))
>>> xs
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [(x, y) for x in xs if (y := x * 2) < 5]
[(0, 0), (1, 2), (2, 4)]

The idea is to use an expression based approach to remove unnecessary chatter and potential bugs of storing local state.

Another simple example:

>>> # Python 3.8
>>> import re
>>> rx = re.compile(r'([A-z]*)\.([A-z]*)')
>>> def g(first, last): return f"First: {first} Last: {last}"
>>> names = ['ralph', 'steve.smith']
>>> for name in names:
...     if (match := rx.match(name)): print(g(*match.groups()))
First: steve Last: smith

As a side note, many of these “None-ish” based examples in the PEP (somewhat mechanically) look like a map, flatMap, 'foreach' on Option[T] cases in Scala.

Python doesn’t really do this well due to its inside-out nature of composing maps/filter/generators (versus a left to right model). Nevertheless, here’s the example to demonstrate the general idea using a functional centric approach.

>>> def processor(sx): return rx.match(sx)
>>> def not_none(x): return x is not None
>>> def printer(x): print(g(*x.groups()))
>>> _ = list(map(printer, filter(not_none, map(processor, names))))
First: steve Last: smith

The “Exceptional cases” described in the PEP are worth investigating in more detail. There’s several cases where “Valid, though probably confusing” is used.

For example:

1 2	y := f(x) # INVALID (y := f(x)) # Valid, though not recommended

Note that the “walrus” operator can also be used in function definitions.

1 2	def foo(answer = p := 42): return "" # INVALID def foo(answer=(p := 42)): return "" # Valid, though not great style

Positional Only Args

The other interesting feature added to Python 3.8 is Positional-Only arguments in function definitions.

For as long as I can recall, Python has had this fundamental feature (or bug) on how functions or methods are called.

For example,

def f(x, y=1234): return x + y
>>> f(1)
1235
>>> f(1, 2)
3
>>> f(x=1, y=2)
3
>>> f(y=1, x=2)
3

Often this fundamental ambiguity of function call “style” isn’t really a big deal. However, it can leak local variable names as part of the public interface.As a result, minor variable renaming can potentially break interfaces. It’s also not clear what should be a keyword only argument or a positional only argument with a default. For example, simply changing f(x, y=1234) to f(n, y=1234) can potentially break the interface depending on the call “style”.

I’ve worked with a few developers over the years that viewed this as a feature and thought that this style made the API calls more explicit. For example:

def compute(alpha, beta, gamma):
    return 0

compute(alpha=90, gamma=80, beta=120)

I never really liked this looseness of positional vs keyword and would (if possible) try to discourage its use. Regardless, it can be argued this is a feature of the language (at least in Python <= 3.7). I would guess that many Python developers are also leveraging the unpacking dict style as well to call functions.

1 2	d = dict(alpha=90, gamma=70, beta=120) compute(**d)

In Python 3.0, function definitions using Keyword-Only arguments was added (see PEP-3102 from 2006) using the * delimiter. All arguments to the right of the * are Keyword-Only arguments.

1 2	def f(a, b, *, c=1234): return a + b + c

Unfortunately, this still leaves a fundamental issue with clearly defining function arguments. There are three cases: Positional-Only, Positional or Keyword, and Keyword-Only. PEP-3102 only solves the Keyword-Only case and doesn’t address the other two cases.

Hence in Python < 3.8, there’s still a fundamental ambiguity when defining a function and how it can be called.

For example:


def f(a, b=1, *, c=1234):
    return a + b + c

f(1, 2, c=3)
f(a=1, b=2, c=3)

Starting with Python 3.8.0, a Positional-Only parameters mechanism was added. The details are captured in PEP-570

Similar to the * delimiter in Python 3.0.0 for Keyword-Only args), a / delimiter was added to clearly delineate Positional-Only (or conversely Keyword-Only args) in function or method definitions. This makes the three cases of function arguments unambigious in how they should be called.

Here’s a few examples:

def f0(a, b, c=1234, /):
    # a, b, c are only positional args. 
    return a + b + c

def f1(a, b, /, c=1234):
    # a, b must be a positional,
    # b can be positional or keyword
    return a + b + c

def f2(a, b, *, c=1234):
    # a, b can be keyword or positional
    # but c MUST be keyword
    return a + b + c

def f3(a, b, /, *, c=1234):
    # a, b only positional
    # c only keyword
    # e.g., # f3(1, 2, c=3)
    return a + b + c

Combining the / and * with the type annotations yields:

def f4(a:int, b:int, /, *, c:int=1234):
    # this can only be called as
    # f4(1, 2, c=3)
    return a + b + c

We can dive a bit deeper and inspect the function signature via inspect.

import inspect
def extractor(f):
    sx = inspect.signature(f)
    return [(v.name, v.kind, v.default) for v in sx.parameters.values()]

Let’s inspect each example:

1
2
3

def pf(f):
    print(f"Function {f.__name__} with type annotations {f.__annotations__}")
    print(extractor(f))

Running in the REPL yeilds:

>>> funcs = (f0, f1, f2, f2, f2, f4)
>>> _ = list(map(pf, funcs))
Function f0 with type annotations {}
[('a', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('c', <_ParameterKind.POSITIONAL_ONLY: 0>, 1234)]
Function f1 with type annotations {}
[('a', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('c', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, 1234)]
Function f2 with type annotations {}
[('a', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('c', <_ParameterKind.KEYWORD_ONLY: 3>, 1234)]
Function f2 with type annotations {}
[('a', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('c', <_ParameterKind.KEYWORD_ONLY: 3>, 1234)]
Function f2 with type annotations {}
[('a', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_OR_KEYWORD: 1>, <class 'inspect._empty'>), ('c', <_ParameterKind.KEYWORD_ONLY: 3>, 1234)]
Function f4 with type annotations {'a': <class 'int'>, 'b': <class 'int'>, 'c': <class 'int'>}
[('a', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('b', <_ParameterKind.POSITIONAL_ONLY: 0>, <class 'inspect._empty'>), ('c', <_ParameterKind.KEYWORD_ONLY: 3>, 1234)]

Note, you can use bind to call the func (this must also adhere to the correct function definition of arg and keywords in the function of interest).

Also, it’s worth noting that both scipy and numpy have been using this / style in the docs for some time.

When Should I Start Adopting these Features?

If you’re a library developer that has packages on PyPi, it might not be clear when it’s “safe” to start leveraging these features. I was only able to find one source of Python 3 adoption and as a result, I’m only able outline a very crude model.

On December 23, 2016, Python 3.6 was officially released. In the Fall of 2018, JetBrains release the Python Developer Survey which contains the Python 2/3 breakdown, as well as the breakdown of different versions within Python 3. As of the Fall 2018, 54% of Python 3 developers were using Python 3.6.x. Therefore, using this very crude model, if you assume that the rate of adoption of 3.6 and 3.8 are the same and if the minimum threshold of adoption of 3.8 is 54%, then you’ll need to wait approximately 2 years before starting to leverage these 3.8 features.

When you do plan to leverage these 3.8 specific features and pushing a package to the Python Package Index, then I would suggest clearly defining the Python version in the setup.py. For more details, see the official packaging docs for more details.

1 2	# setup.py setup(python_requires='>=3.8')

Summary and Conclusion

Python 3.8 added the “walrus” operator := that enables results of expressions to be used
It’s recommended reading the Exceptional Cases for better understanding of where to (and to not) use the := operator.
Python 3.8 added Positional-Only function definitions using the / delimiter
Defining functions with Positional-Only arguments will require a trailing / in the definition. E.g., def adder(n, m, /): return 0
There are changes in the standard lib to communicate. It’s not clear how locked down or backward compatible the interfaces were changes. Here’s a random example of the function signature of functools.partial being updated to use /.
Positional-Only arguments should improve consistency of API calls across Python runtimes (e.g., cpython and pypi)
The Positional-Only PEP-570 outlines improvements in performance, however, I wasn’t able to find any performance studies on this topic.
Migrating to 3.8 might involve potentially breaking API changes based on the usage of / in the Python 3.8 standard lib
For core library authors of libs on pypi, I would recommend using the crude approximation (described above) of approximately 2 years away from being able to adopt the new 3.8 features
For mypy users, you might want to make sure you investigate the supported versions of Python 3.8 (More Details on the compatiblity matrix)

I understand the general motivation to solve core friction points or ambiguities at the language level, however, the new syntatic changes are a little too noisy for my tastes. Specifically the Positional-Only / combined with the * and type annotations. Regardless, the (Python 3.8) ship has sailed long ago. I would encourage the Python community to periodically track and provide feedback on the current PEPs to help guide the evolution of the Python programming language. And finally, Python 3.8.0 (beta and future 3.8 RCs) bugs should be filed to https://bugs.python.org .

Best to you and your Python-ing!

Third Party Libraries

Let’s take a quick look into two of the third-party libraries, attrs and pydantic that inspired dataclasses. I believe both of these libraries are supported by mypy.

Attrs

Similar to the dataclasses approach, the types in attrs' are only used as annotations (e.g., __annotations__) and are not enforced at runtime. However, the general validation mechanism trivially enables runtime type validation. For the purposes of this example, let’s also add custom validation on the name of the Person as well as adding type checking.

from typing import Optional

import attr
from attr.validators import instance_of as vof


def is_positive_nonzero(instance, attribute, value):
    if value < 1:
        raise ValueError(f"Value {value} must be >0")


def check_non_empty_str(inst, attribute, value):
    if not value:
        raise ValueError("name must be a non-empty string")


@attr.s(auto_attribs=True, frozen=True)
class Person:
    id:int = attr.ib(validator=[vof(int), is_positive_nonzero])
    name:str = attr.ib(validator=[vof(str), check_non_empty_str])
    favorite_color: Optional[str] = attr.ib(default=None, validator=[vof((str, type(None)))])

    def __hash__(self):
        return hash(self.id)

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.id == other.id
        return False

The abstract of Raymond Hettinger’s talk from Pycon has an interesting summary of dataclasses.

Dataclasses are shown to be the next step in a progression of data aggregation tools: tuple, dict, simple class, bunch recipe, named tuples, records, attrs, and then dataclasses. Each builds upon the one that came before, adding expressiveness at the expense of complexity.

I’m not sure I completely agree. The dataclasses implementation looks closer to attrs-lite, than the “next step of progression”.

Pydantic

Another alternative is pydantic. This is a bit more opinionated design. It also has a nice Schema abstraction to communicate core metadata on the fields as well as first class support for serialization hooks. The pydantic library also has a dataclasses wrapper layer that can be accessed via pydantic.dataclasses.

Here’s an example of defining our Person data model.

from typing import Optional
import pydantic
from pydantic import BaseModel, validator, PositiveInt


class Person(BaseModel):
    class Config:
        allow_mutation = False
        validate_all = True 

    id:PositiveInt
    name:str
    favorite_color: Optional[str] = None

    @validator('name')
    def validate_name(cls, v):
        if not v:
            raise ValueError("Name can't be an empty")
        return v

    def __hash__(self):
        return hash(self.id)

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.id == other.id
        return False

Overall, I like the general style and approach, however, it does have a few quarks. Specifically the keyword only usage as well as unexpected casting behavior of ints to strs.

The pydantic API also supports rich metadata that could be useful for generating commandline interfaces for a given schema data model and emitted JSONSchema.

from pydantic import BaseModel, validator, ValidationError, Schema

class Person(BaseModel):
    class Config:
        allow_mutation = False

    # '...' means the value is required.
    id: PositiveInt = Schema(..., title="User Id")
    name:str = Schema(..., title="User name")
    favorite_color:Optional[str] = Schema(None,
                                          title="User Favorite Color",
                                          description="Favorite Color. Provided as case sensitive english spelling")

     @validator('name')
    def validate_name(cls, v):
        if not v:
            raise ValueError("Name can't be an empty")
        return v

    def __hash__(self):
        return hash(self.id)

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.id == other.id
        return False

#Person.schema() will emit the JSONSchema as a dict.

Summary And Conclusion

dataclasses offers a terse syntax for defining a class or data container that has type annotations using a code generation approach
dataclasses field metadata can be used to define defaults, communicate which fields should be used in eq or hash, lte, etc…
dataclasses has a __post_init hook that can be used for validation
dataclasses By design does not do type validation. It only adds __annotation__ to the data container for static analyzers to consume, such as mypy
Since dataclasses is now in the standard lib, this means feature enhancement, bug fixes, and backwards compatibility are now coupled the official Python release process
Raymond’s Pycon talk mentions the end-to-end develop time on dataclasses was 200+ hrs

Initially, I was a intrigued by the addition of dataclasses to the standard library. However, after a deeper dive into the dataclasses, it’s not clear to me that these are particularly useful for Python developers. I believe third-party solutions such as attrs or pydantic might be a better fit due to their validation hooks and richer feature sets. It will be interesting to see the adoption of dataclasses by both the Python core as well as third-party developers.

For a deeper look and comparison of the 3 (or 4) models to define a class or data container in Python, please consult this Notebook in this Gist

Best on all your Python-ing!

Series: Functional Programming Techniques In Python

2019-03-01

This is a 4 Part Series that explores functional centric design style and patterns in Python.

Part 1 (notebook) starts with different mechanisms of how to define functions in Python and quickly moves to using closures and functools.partial. We then move on to adding functools.reduce and composition with compose to our toolbox. Finally, we conclude with adding lazy map and filter to our toolbox and create a data pipeline that takes a stream of records to compute common statics using a max heap as the reducer.

In Part 2 (notebook), we explore building a REST client using functional-centric design style. Using an iterative approach, we build up a REST client using small functions leveraging clsoures and passing functions as first class citizens to methods. To conclude, we wrap the API and expose the REST client via a simple Python class.

Part 3 (notebook) follows similarly to spirit to Part 2. We build a commandline interface leveraging argparse from the Python standard library. Sometimes libraries such as argparse can have rough edges or friction points in the API that introduce duplication or design issues. Part 3 focuses on iterative building up an expressive commandline interface to a subparser commandline tool using closures and functions to smooth out the rough edges of argparse.. There’s also examples of using a Strategy-ish pattern with type annotated functions to enable configuring logging as well as custom error handling.

Part 4 (notebook) concludes with some gotchas with regards to scope in closures, a brief example of decorators and a few suggestions for leverging function-centric designs in your APIs or applications.

If you’re a OO wizard, a Data Scientist/Analysist, or a backend dev, this series can be useful to add another design approach in your toolbelt to designing APIs or programs.

Best to you and your Python’ing!

12 Next

Creating CLI tools leveraging ZIO and Decline using scala-cli

High Level Points

Specifics

Summary and Final Comments

Exploring TypedDict in Python 3.8

PEP-589

Motivation: Dictionary-Mania

Example 1: Basic Usage of TypedDict

Example 2: Exploring Mutating and Assignment of TypedDicts

Example #3: TypedDicts total Keyword Argument

Example 4: TypedDict and isinstance

Summary

Python Dashboards with Panel: Kicking the Tires

Python 3.8.0b1 Positional-Only Arguments and the Walrus Operator

Walrus

Positional Only Args

When Should I Start Adopting these Features?

Summary and Conclusion

Further Reading

Dataclasses in Python 3.7

Third Party Libraries

Attrs

Pydantic

Summary And Conclusion

Series: Functional Programming Techniques In Python

Functional Programming Techniques in Python Part 4

Functional Programming Techniques in Python Part 3

Functional Programming Techniques in Python Part 2

Functional Programming Techniques in Python Part 1