# Subplot input language

Subplot reads upwards of four input files, each in a different format:

* The "subplot document" file in YAML,
* The document file(s) in [GitHub Flavored Markdown](https://github.github.com/gfm/).
* The bindings file(s), in YAML.
* The functions file(s), in Python, Rust, or some other language.

Subplot interprets marked parts of the input markdown documents
specially. These are fenced code blocks tagged with the `scenario`,
`file`, or `example` classes.


## Scenario language

The scenarios are core to Subplot. They express what the detailed
acceptance criteria are and how they're verified. The scenarios are
meant to be understood by both all human stakeholders and the Subplot
software. As such, they are expressed in a somewhat stilted language
that resembles English, but is just formal enough that it can also be
understood by a computer.

A scenario is a sequence of steps. A step can be setup to prepare for
an action, an action, or an examination of the effect an action had.
For example, a scenario to verify that a backup system works might
look like the following:

~~~~~~{.markdown .numberLines}
~~~scenario
given a backup server
when I make a backup
and I restore the backup
then the restored data is identical to the original data
~~~
~~~~~~

This is not magic. The three kinds of steps are each identified by the
first word in the step.

* `given` means it's a step to set up the environment for the scenario
* `when` means it's a step with the action that the scenario verifies
* `then` means it's a step to examine the results of the action

The `and` keyword is special in that it means the step is the same
kind as the previous step. In the example, on line 4, it means the
step is a `when` step.

Each step is implemented by a bit of code, provided by the author of
the subplot document. The step is _bound_ to the code via a binding
file, via the text of the step: if the text is like this, then call
that function. Bindings files are described in detail shortly below.

The three kinds of steps exist to make scenarios easier to understand
by humans. Subplot itself does not actually care if a step is setup,
action, or examination, but it's easier for humans reading the
scenario, or writing the corresponding code, if each step only does
the kind of work that is implied by the kind of step it's bound to.

### Using Subplot's language effectively

Your subplot scenarios will be best understood when they use the subplot
language in a consistent fashion, within and even across *different* projects.
As with programming languages, it's possible to place your own style on your
subplots.  Indeed, there is no inherent internal implementation difference between
how `given`, `when` and `then` steps are processed (other than that `given`
steps often also have cleanup functions associated with them).

Nonetheless we have some recommendations about using the Subplot language,
which reflect how we use it in Subplot and related projects.

When you are formulating your scenarios, it is common to try and use phraseology
along the lines of _if this happens then that is the case_ but this is not
language which works well with subplot.  Scenarios describe what will happen in
the success case.  As such we don't construct scenarios which say _if foo happens
then the case fails_, instead we say _when I do the thing then foo does not happen_.
This is a subtle but critical shift in the construction of your test cases which
will mean that they map more effectively to scenarios.

Scenarios work best when they describe how some entity (human or otherwise)
actually goes about successfully achieving their goal.  They start out by setting
the scene for the goal (`given`) they go on to describe the actions/activity
undertaken in order for the goal to be achieved (`when`) and they describe how
the entity knows that the goal has been achieved (`then`).  By writing in this
active goal-oriented fashion, your scenarios will flow better and be easier for
all stakeholders to understand.

In general you should use `given` statements where you do not wish to go into
the detail of what it means for the statement to have been run, you simply wish
to inform the reader that some precondition is met.  These statements are often
best along the lines of `given a setup which works` or `given a development enviroment`
or somesuch.

The `when` statements are best used to denote **active** steps. These are
the steps which your putative actors or personae use to achieve their goals.
These often work best in the form `when I do the thing` or
`when the user does the thing`.

The `then` statements are the crux of the scenario, they are the **validation**
steps.  These are the steps which tell the reader of the scenario how the actor
knows that their action (the `when` steps) has had the desired outcome.  This
could be of the form `then some output is present` or `then it exits successfully`.

With all that in mind, a good scenario looks like

```
given the necessary starting conditions
when I do the required actions
then the desired outcome is achieved
```

Given all that, however, it's worth considering some pitfalls to avoid when
writing your scenarios.

It's best to avoid overly precise or overly technical details in your scenario
language (unless that's necessary to properly describe your goal etc.)  So
it's best to say things like `then the output file is valid JSON` rather than
`then the output file contains {"foo": "bar", "baz": 7}`.  Obviously if the
actual values are important then again, statements such as `then the output file
has a key "foo" which contains the value "bar"` or similar.

Try not to change "person" or voice in your scenarios unless there are multiple
entities involved in telling your stories. For example, if you have a scenario
statement of `when I run fooprogram` do not also have statements in the passive
such as `when fooprogram is run`. It's reasonable to switch between `when` and
`then` statements (`then the output is good`) but try not to have multiple
`then` statements which switch it up, such as `then I have an output file`,
`and the output file is ok`.

If you're likely to copy-paste your scenario statements around, do not use `and`
as a scenario keyword, even though it's valid to do so.  Instead start all your
scenario statements with the correct `given`, `when`, or `then`.  The typesetter
will deal with formatting that nicely for you.

## Document markup

Subplot parses Markdown input files using GitHub-flavored Markdown.

[fenced code blocks]: https://github.github.com/gfm/#fenced-code-blocks

Subplot supports most of the major features of _gfm_ including tables,
task lists, ~sub-~ and ^super-^ script, ~~strike-through~~, and heading attributes.

Subplot extends Markdown by treating certain certain tags for [fenced
code blocks][] specially. A scenario, for example, would look like
this:

~~~~~~{.markdown .numberLines}
```scenario
given a standard setup
when peace happens
then everything is OK
```
~~~~~~

The `scenario` tag on the code block is recognized by Subplot, which
will typeset the scenario (in output documents) or generate code (for
the test program) accordingly. Scenario blocks do not need to be
complete scenario. Subplot will collect all the snippets into one
block for the test program. Snippets under the same heading belong
together; the next heading of the same or a higher level ends the
scenario.

For `scenario` blocks you may not use any attributes. All attributes
are reserved for Subplot. Subplot doesn't define any attributes yet,
but by reserving all of them, it can add them later without it being
a breaking change.

For embedding test data files in the Markdown document, Subplot
understands the `file` tag:

~~~~~~~~markdown
~~~{#filename .file}
This data is accessible to the test program as 'filename'.
~~~
~~~~~~~~

The `.file` attribute is necessary, as is the identifier, here
`#filename`. The generated test program can access the data using the
identifier (without the #).

Subplot also understands the `dot` and `roadmap` tags, and can use the
Graphviz dot program, or the [roadmap][] Rust crate, to produce
diagrams. These can useful for describing things visually.

When typesetting files, Subplot will automatically number the lines in
the file so that documentation prose can refer to sections of embedded
files without needing convoluted expressions of positions.  However if
you do not want that, you can annotate the file with `.noNumberLines`.

For example…

~~~~~~~~markdown
~~~{#numbered-lines.txt .file}
This file has numbered lines.

This is line number three.
~~~

~~~{#not-numbered-lines.txt .file .noNumberLines}
This file does not have numbered lines.

This is still line number three, but would it be obvious?
~~~
~~~~~~~~

…renders as:

~~~{#numbered-lines.txt .file}
This file has numbered lines.

This is line number three.
~~~

~~~{#not-numbered-lines.txt .file .noNumberLines}
This file does not have numbered lines.

This is still line number three, but would it be obvious?
~~~

[roadmap]: https://crates.io/search?q=roadmap


### Use embedded file

This scenario makes sure the sample files are used in a scenario so
that they don't cause warnings.

~~~{.scenario label=embedded-file}
given file numbered-lines.txt
given file not-numbered-lines.txt
~~~

## Document metadata

Document metadata is read from the Subplot Document (YAML file). This can used to set the
document title, authors, date (version), and more. Crucially for
Subplot, the bindings and functions files are named in the metadata
block, rather than Subplot deriving them from the input file name.

~~~{.file .yaml .numberLines}
title: "Subplot"
authors:
- The Subplot project
date: work in progress
markdowns:
- subplot.md
bindings:
- subplot.yaml
impls:
  python:
    - subplot.py
~~~

There can be more than one bindings or functions file: use a YAML
list.


## Bindings file

The bindings file binds scenario steps to code functions that
implement the steps. The YAML file is a list of objects (also known as
dicts or hashmaps or key/value pairs), specifying a step kind (given,
when, then), a pattern matching the text of the step and
optionally capturing interesting parts of the text. Each binding may contain
a type map which tells subplot the types of the captures in the patterns so
that they can be validated to some extent, and a binding will list some number
of implementations, each of which is specified by the name of the language
(template) it is for, and then the name of a function that implements the step,
optionally with the name of a function to call to clean up a scenario which
includes that step.

There are some flexibilities in bindings, futher details can be found below:

1. Patterns can be simple or full-blown Perl-compatible regular
   expresssions ([PCRE][]).
2. Bindings _may_ have type maps.  Without a type map, all captures are
   considered to be short strings (words).
3. Bindings _may_ have as many or as few implementations as needed.  A zero
   `impl` binding will work for `docgen` but will fail to `codegen`.  This can
   permit document authors to prepare bindings without knowing how an engineer
   might implement it.

~~~{.yaml .numberLines}
- given: "a standard setup"
  impl:
    python:
      function: create_standard_setup
- when: "{thing} happens"
  impl:
    python:
      function: make_thing_happen
  types:
    thing: word
- when: "I say (?P<sentence>.+) with a smile"
  regex: true
  impl:
    python:
      function: speak
- then: "everything is OK"
  impl:
    python:
      function: check_everything_is_ok
~~~

In the example above, there are four bindings and they all provide Python
implementation functions:

* A binding for a "given a standard setup" step. The binding captures
  no part of the text, and causes the `create_standard_setup` function
  to be called.
* A binding for a "when" step consisting of one word followed by
  "happens". For example, "peace", as in "then peace happens". The
  word is captured as "thing", and given to the `make_thing_happen`
  function as an argument when it is called.
* A binding for a "when" followed by "I say", an arbitrary sentence,
  and then "with a smile", as in "when I say good morning to you with
  a smile". The function `speak` is then called with capture named
  "sentence" as "good morning to you".
* A binding for a "then everything is OK" step, which captures nothing,
  and calls the `check_everything_is_ok` function.

## Step functions and cleanup

A step function must be atomic: either it completes successfully, or
it cleans up any changes it made before returning an indication of
failure.

A cleanup function is only called for successfully executed step
functions.

For example, consider a step that creates and starts a virtual
machine. The step function creates the VM, then starts it, and if both
actions succeeds, the step succeeds. A cleanup function for that step
will stop and delete the VM. The cleanup is only called if the step
succeeded. If the step function manages to create the VM, but not
start it, it's the step function's responsibility to delete the VM,
before it signals failure. The cleanup function won't be called in
that case.

### Simple patterns

The simple patterns are of the form `{name}` and match a single word
consisting of printable characters. This can be varied by adding a
suffix, such as `{name:text}` which matches any text. The following
kinds of simple patterns are supported:

* `{name}` or `{name:word}` &ndash; a single word. As a special case, the
  form `{name:file}` is also supported.  It is also a single word, but has the
  added constraint that it must match an embedded file's name.
* `{name:text}` &ndash; any text
* `{name:int}` &ndash; any whole number, including negative
* `{name:uint}` &ndash; any unsigned whole number
* `{name:number}` &ndash; any number
* `{name:escapedword}` &ndash; a single word, in which backslash escapes are
  processed during the generation of the test suite.
* `{name:escapedtext}` &ndash; any text, in which backslash escapes are
  processed during the generation of the test suite.


A pattern uses simple patterns by default, or if the `regex` field is
set to false. To use regular expressions, `regex` must be set to true.
Subplot complains if typical regular expression characters are used,
when simple patterns are expected, unless `regex` is explicitly set to
false.

### Regular expression patterns

Regular expression patterns are used only if the binding `regex` field
is set to `true`.

The regular expressions use [PCRE][] syntax as implemented by the Rust
[regex][] crate. The `(?P<name>pattern)` syntax is used to capture
parts of the step. The captured parts are given to the bound function
as arguments, when it's called.

[PCRE]: https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
[regex]: https://crates.io/crates/regex

### The type map

Bindings may also contain a type map.  This is a dictionary called `types`
and contains a key-value mapping from capture name to the type of the capture.
Valid types are listed above in the simple patterns section.  In addition to
simple patterns, the type map can be used for regular expression bindings as
well.

When using simple patterns, if the capture is given a type in the type map, and
also in the pattern, then the types must match, otherwise subplot will refuse to
load the binding.

Typically the type map is used by the code generators to, for example, distinguish
between `"12"` and `12` (i.e. between a string and what should be a number). This
permits the generated test suites to use native language types directly.  The
`file` type, if used, must refer to an embedded file in the document; subplot docgen
will emit a warning if the file is not found, and subplot codegen will emit an error.

### The implementation map

Bindings can contain an `impl` map which connects the binding with zero or more
language templates.  If a binding has no `impl` entries then it can still be
used to `docgen` a HTML document from a subplot document.  This permits a
workflow where requirements owners / architects design the validations for a
project and then engineers implement the step functions to permit the
validations to work.

Shipped with subplot are a number of libraries such as `files` or `runcmd` and
these libraries are polyglot in that they provide bindings for all supported
templates provided by subplot.

Here is an example of a binding from one of those libraries:

```yaml
- given: file {embedded_file}
  impl:
    rust:
      function: subplotlib::steplibrary::files::create_from_embedded
    python:
      function: files_create_from_embedded
  types:
    embedded_file: file
```

### Embedded file name didn't match

```{.scenario label=codegen label=embedded-file}
given file badfilename.subplot
given file badfilename.md
and file b.yaml
and file f.py
and an installed subplot
when I try to run subplot codegen --run badfilename.md -o test.py
then command fails
```

~~~{#badfilename.subplot .file .yaml .numberLines}
title: Bad filenames in matched steps do not permit codegen
markdowns: [badfilename.md]
bindings: [b.yaml]
impls:
  python: [f.py]
~~~

~~~{#badfilename.md .file .markdown .numberLines}
# Bad filename

```scenario
given file missing.md
```

~~~

### Bindings file strictness - given when then

The bindings file is semi-strict.  For example you must have only one
of `given`, `when`, or `then` in your binding.


```{.scenario label=docgen label=failures label=bindings}
given file badbindingsgwt.subplot
and file badbindingsgwt.md
and file badbindingsgwt.yaml
and an installed subplot
when I try to run subplot docgen --output ignored.html badbindingsgwt.subplot
then command fails
and stderr contains "binding has more than one keyword"
```

~~~{#badbindingsgwt.subplot .file .yaml .numberLines}
title: Bad bindings cause everything to fail
markdowns: [badbindingsgwt.md]
bindings: [badbindingsgwt.yaml]
~~~

~~~{#badbindingsgwt.md .file .markdown .numberLines}
# Bad bindings
```scenario
given we won't reach here
```
~~~

~~~{#badbindingsgwt.yaml .file .yaml .numberLines}
- given: we won't reach here
  then: we won't reach here
~~~

### Bindings file strictness - unknown field

The bindings file is semi-strict.  For example, you must not have keys
in the bindings file which are not known to Subplot.


```{.scenario label=docgen label=failures label=bindings}
given file badbindingsuf.subplot
and file badbindingsuf.md
and file badbindingsuf.yaml
and an installed subplot
when I try to run subplot docgen --output ignored.html badbindingsuf.subplot
then command fails
and stderr contains "Unknown field `function`"
```

~~~{#badbindingsuf.subplot .file .yaml .numberLines}
title: Bad bindings cause everything to fail
markdowns: [badbindingsuf.md]
bindings: [badbindingsuf.yaml]
~~~

~~~{#badbindingsuf.md .file .markdown .numberLines}
# Bad bindings
```scenario
given we won't reach here
```
~~~

~~~{#badbindingsuf.yaml .file .yaml .numberLines}
- given: we won't reach here
  function: old_school_function
~~~

## Functions file

Functions implementing steps are supported in Python. The
language is chosen by setting the `template` field in the document
YAML metadata to `python`.

The functions files are not parsed by Subplot at all. Subplot merely
copies them to the output. All parsing and validation of the file is
done by the programming language being used.

The conventions for calling step functions vary by language. All
languages support a "dict" abstraction of some sort. This is most
importantly used to implement a "context" to store state in a
controlled manner between calls to step functions. A step function can
set a key to a value in the context, or retrieve the value for a key.

Typically, a "when" step does something, and records the results into
the context, and a "then" step checks the results by inspecting the
context. This decouples functions from each other, and avoids having
them use global variables for state.


### Python

The context is implemented by a dict-like class.

The step functions are called with a `ctx` argument that has the
current state of the context, and each capture from a step as a
keyword argument. The keyword argument name is the same as the capture
name in the pattern in the bindings file.

The contents of embedded files are accessed using a function:

- `get_file(filename)`

Example:

~~~python
import json

def exit_code_is(ctx, wanted=None):
    assert_eq(ctx.get("exit"), wanted)

def json_output_matches_file(ctx, filename=None):
    actual = json.loads(ctx["stdout"])
    expected = json.load(open(filename))
    assert_dict_eq(actual, expected)

def file_ends_in_zero_newlines(ctx, filename=None):
    content = open(filename, "r").read()
    assert_ne(content[-1], "\n")
~~~

## Comparing the scenario runners

Currently Subplot ships with three scenario runner templates.  The
Python and Rust templates.  Given that, this comparison
is only considered correct against the version of Rust at the time of
publishing a Subplot release.  Newer versions of Rust may introduce
additional functionality which we do not list here.  Finally, we do
not list features here which are considered fundamental, such as
"runs all the scenarios" or "supports embedded files" since no template
would be considered for release if it did not do these things.  These
are the differentiation points.

| Feature                       | Python                                         | Rust                                                         |
| ----------------------------- | ---------------------------------------------- | ------------------------------------------------------------ |
| Isolation model               | Subprocess                                     | Threads                                                      |
| Parallelism                   | None                                           | Threading                                                    |
| Passing environment variables | CLI                                            | Prefixed env vars                                            |
| Execution order               | Randomised                                     | Fixed order plus threading peturbation                       |
| Run specific scenarios        | Simple substring check                         | Either exact _or_ simple substring check                     |
| Diagnostic logging            | Supports comprehensive log file                | Writes captured output to stdout/stderr on failure           |
| Stop-on-failure               | Stops on first failure unless told not to      | Runs all tests unless told not to                            |
| Data dir integration          | Cleans up each scenario unless told to save it | Cleans up each scenario with no option to save failure state |


