Source on GitHub

User manual

How to read this manual

The different sections of this manual are presented in the theoretical order of operations: you start by preparing your collection of programs, these programs are labeled, these labels are translated into taxa, the results are stored in a database; finally, you filter this database through a command pipeline to extract recommendations.

In practice, your input is only required for the last stage. That's why we suggest you observe the following order:

  1. Start by reading the pipeline tutorial: it explains, in a user-friendly style, what you can expect from Paroxython and how you may integrate it into your teaching.
  2. Once you are familiar with this aspect, see if you can improve or enrich the results through some preparation of the programs in your collection.
  3. If needed, adapt the default taxonomy to your own vision of how to teach programming.
  4. Harness the full power of the pipeline with its most advanced commands.

Of course, you can consult at any time the glossary of terms specific to Paroxython.

Preparing your program collection

tl;dr. There is nothing to prepare before launching Paroxython. Nevertheless, for best results:

  • don't name things like a first year student (follow PEP 8, good practices and common sense);
  • aim at the simplest style. Don't try to outsmart Paroxython: you're sure to win… a lot of false negatives;
  • the system is not capable of inference: be direct;
  • for the same reason, prefer from foo import bar; bar() to import foo; foo.bar();
  • do use:
    • early exits when alternative versions are less clear;
    • infinite loops when you deal with an unpredictable stream of inputs;
    • else after the return statement of then branches which are not guards;
  • give manual hints for:
    • correcting a false positive or negative;
    • adding metadata (it's more versatile than a nested directory structure);
    • resolving duck typing on built-in types, if needed.

Naming things

Capitalization

Paroxython takes advantage of the official PEP 8 naming conventions. In a nutshell:

lowercase_with_underscores
Variables, functions (in the broadest sense, including methods), modules, packages.
CapitalizedWords
Classes, exceptions, type variables.
ALL_CAPITAL_LETTERS_WITH_UNDERSCORES
Constants, defined on a module level.

For example, with a careless style like this:

def ComputeSquare(x): # bad: CamelCase for a function name
    return x * x

FOO = 42
FOO = ComputeSquare(FOO) # bad: redefinition of a constant

… Paroxython will wrongly detect two definitions of constants (var/assignment/explicit/constant) and the instanciation of a class (call/class/constructor). To make these false positives disappear, just write:

def compute_square(x):
    return x * x

foo = 42
foo = compute_square(foo)

Leading and trailing underscores

_single_leading_underscore
Weak β€œinternal use” indicator.
__double_leading_underscore
Mangled non-public class attributes.
single_trailing_underscore_
Conflict-averse objects (avoiding a name collision with a keyword).
__double_leading_and_trailing_underscore__
"Magic" methods or attributes (never invent such names).

Other conventions

When the name of a function returning a value starts with "is_", "are_", "has_", "have_", "can_", "must_" or "needs_", it is tagged as predicate ("def/subroutine/function/predicate"):

def is_odd(n):
    return n % 2

When the name of a function returning nothing starts or ends with "test" or "tests", it is tagged as test ("def/subroutine/procedure/test"):

def test_is_odd():
    assert not is_odd(42)
    assert is_odd(5)

Beware of renaming built-ins

It's rarely a good idea to reuse a built-in function name:

max = 42  # mad: max() is a built-in function

In the same way, if you write classes, consider the names of the methods of the built-in types as taboo. Paroxython relies on them to try guessing the type of objects to which they apply. For instance, in:

def change_separator(s, old, new):
    return new.join(s.split(old))

… nothing indicates that s is a type/sequence/string except from the call of its method split().

Style

Be boring

Paroxython is optimized for the most unremarkable style. In the following snippet:

a += 1
a = a + 1
a = 1 + a

… only the first two lines will be labelled as increment:a. Paroxython doesn't bother to decipher the so-called Yoda style. Other examples:

a = -a # labelled as `negate`
a = -1 * a # false negative
a[-i] # labelled as `negative_index`
a[len(a) - i] # false negative

Be direct

Some static analysis tools have inference capabilities. For instance, Astroid can parse:

a = 1
b = 2
c = a + b

… and infer the value of c.

For now, Paroxython makes no inferences. So, using an intermediate assignment will sometimes be enough to make it miss a feature:

s = "hello, %s" % world # labelled as `string_formatting_operator`
template = "hello, %s"
s = template % world # false negative

A string literal on the left hand side of % is needed to recognize the latter as a string formatting operator. Paroxython does not try to guess the type of template.

There are more interesting cases. For instance, the following code is correctly tagged tail_recursive_function:gcd: a function is tail recursive if and only if all its recursive calls return their results immediately, without any further calculation.

def gcd(a, b):
    if b == 0:
        return a
    return gcd(b, a % b)

Now, suppose you want only one exit point, and come up with the following functionally equivalent code:

def gcd(a, b):
    if b != 0:
        a = gcd(b, a % b)
    return a

The result of the recursive call is now assigned to variable a. There is no further calculation as such, but Paroxython expects you to return directly the result. Its no-inference heuristic is fragile: delaying a key treatment is enough to defeat recognition1.

Importations

So far, the preferred form is:

from a.b.c import d

For instance, writing:

from math import cos, sin
cos(1) - sin(1)

… makes Paroxython able to produce the following (correct) taxa:

  • import/standard/math/cos
  • import/standard/math/sin
  • call/subroutine

But writing:

import math
math.cos(1) - math.sin(1)

… yields subpar results:

  • import/standard/math (less specific)
  • call/subroutine/method (wrong)

Early exits

Paroxython will rightly identify a universal quantification (pattern/elements/satisfy/all) in:

def is_prime(n):
    for candidate in range(2, n):
        if n % candidate == 0:
            return False
    return n > 1

Not so if you try to force the predicate into the single-point-of-exit dogma of Pascal2:

# Once upon a time... in academia
def is_prime(n):
    candidate = 2
    divisor_found = False
    while candidate < n and not divisor_found:
        if n % candidate == 0:
            divisor_found = True
        candidate += 1
    return n > 1 and not divisor_found

Apart from being twice longer, more error-prone and harder to understand, this style defeats Paroxython, and will always do, sorry.

Tip

It's not the 80's anymore3. Don't think you're being rude if you keep things simple by exiting early. To put things in perspective, with a modern language like Python, you'd better think of control flags as code smells.

In any case, take a look at our specifications of iterative patterns. Should there be anything you don't agree with, no problem: just delete the corresponding translations from your copy of taxonomy.tsv.

Infinite loops

Likewise, consider solving the loop-and-a-half problem3 this way:

def input_number_between(prompt, lower_bound, upper_bound):
    while True:
        number = literal_eval(input(prompt))
        if lower_bound <= number <= upper_bound:
            return number
        print(f"Your number should be between {lower_bound} and {upper_bound}!")

… to make it tagged pattern/inputs/find_first_good.

Tip

As a general rule, opt for while True if (and only if) at least one of the three following conditions holds:

  1. it is possible to never reach a terminal state;
  2. it is impossible to infer the next state from the states already observed;
  3. it is possible to observe the same state more than once.

Guards

Numerous guidelines and linters (e.g., LLVM Coding Standards, Pylint, ESLint, TSLint) warn that an else branch following a return statement β€œis not necessary and increases indentation level”. Although this is technically true, our approach is more nuanced. Specifically, we recommend our students to use the β€œno else” style when the condition can be seen as a guard, i.e., a pre-condition to avoid failing, complicating or slowing down the main treatment of the subroutine:

def is_prime(n):
    if n < 2: # avoid calculating the square root of a negative number
        return False
    if n == 2: # avoid returning a wrong result for 2
        return True
    if n % 2 == 0: # avoid wasting time in the loop
        return False
    # this is where the real work begins
    for candidate in range(3, int(sqrt(n) + 1), 2):
        if n % candidate == 0:
            return False
    return True

However, as noted here or here, there are situations where applying the blanket rule would break an inherent symmetry in the treatment:

def max_(a, b):
    if a >= b:
        return a
    else: # removing this branch would hurt readability
        return b

So, both styles have their uses. Consider this as an opportunity to better communicate your intention4. Remember that when you write an if statement ended by a return and not followed by an else branch, Paroxython will automatically try5 to tag it as a guard (flow/conditional/guard).

Manual hints

On a given source code, the tagging algorithm may sometimes result in false positives or false negatives. Moreover, the semantics of some features may be subjective (e.g., meta/topic/fun) or beyond the capabilities of Paroxython (e.g., deciding the relevance of the short_circuit property of a boolean operator). In any case, the user has the possibility to manually tag certain lines of their source code to hint either the presence or absence of a given feature.

Addition

The addition of one or several tags is hinted by a comment starting with # paroxython:.

Hinting with labels

A tag can be a label (of the kind defined, but not necessarily included in spec.md):

if i < len(s) and s[i] == x: # paroxython: short_circuit:And

This is powerful: a label can be derived into several other labels, and then converted into any number of taxa (according to the mapping of taxonomy.tsv). But it asks you to check the taxonomy to see how it will be converted.

Hinting directly with the final taxa

A tag can be a taxon (starting with a non-empty sequence of word characters and a slash):

if i < len(s) and s[i] == x: # paroxython: operator/boolean/and/short_circuit

Hinting with a taxon is easier: you put it directly into the taxonomy. But sometimes you want more.

Deletion

To delete a label, prefix it with a minus symbol. For instance, the following hint requalifies an addition into a string concatenation:

print(a + b) # paroxython: -addition_operator +concatenation_operator

Warning

You cannot directly hint the deletion of a given taxon.

Multiple lines

If a label should span over several lines, it is hinted on the first line with a .​.​. suffix (meaning β€œto be continued”), and on the last line with a ..​.​ prefix (meaning β€œcontinuing”). In the following example, a new label is manually substituted to the calculated label loop:for:

for am in ifera: # paroxython:  -loop:for... +amoeboid_protist...
    catch()
    eat() # paroxython: ...loop:for ...amoeboid_protist

Formatting details

Some tolerances exist for the syntax:

  • + can be omitted.
  • ..​.​ can be written … (HORIZONTAL ELLIPSIS, U+2026).
  • # paroxython: is neither space- nor case-sensitive.

Warning

If you use a code formatter like Black, beware that it may reject a deletion hint after the line which features the label to be deleted. In this case, protect the line with # fmt:off and # fmt:on.

Why would I do that?

On simple programs such as those found in programming education, the results of Paroxython are generally quite satisfactory. But the better the feature labelling, the better the recommendations. You will sometimes, hopefully not too often, feel the need to remove (-) some false positives or correct (+) some false negatives. There are, however, more interesting possibilities.

Adding metadata

Consider the following naive suboptimal solution to the general problem of deduplicating a given sequence:

1   def unique_elements(sequence):
2       result = []
3       for x in sequence:
4           if sequence.count(x) == 1:
5               result.append(x)
6       return result

What if we take the trouble to give Paroxython some hints about what we meant by β€œnaive”, β€œsuboptimal” and β€œgeneral”?

1   def unique_elements(sequence):
2       result = []
3       for x in sequence: # paroxython: complexity:O(n) complexity:O(n^2)...
4           if sequence.count(x) == 1: # paroxython: complexity:O(n)
5               result.append(x) # paroxython: ...complexity:O(n^2)
6       return result
7   # paroxython: meta/level/naive complexity:suboptimal
8   # paroxython: topic:general
Taxon Lines
meta/complexity/O(n) 3, 4
meta/complexity/O(n^2) 3-5
meta/complexity/suboptimal 1-8
meta/level/naive 1-8
meta/topic/general 1-8

Wow, this is handy. Let's break these comments down:

  • We have pinpoint the lines featuring a linear operation: for and count(). The latter is not obvious for most students, and yes, that's the kind of thing you're going to have to repeat over and over again, unless you give up with high level languages like Python, and go back to C, assembler, Turing machines, Brainfuck, whatever.
  • Nesting these two operations results in a quadratic complexity, which we have hinted by opening and closing a tag on the lines involved (3-5).
  • As stated above, this approach is generally known as naive. No translation for a label level:(.+) is provided in the default taxonomy, so you hint directly at the desired taxon.
  • Tagging the program as suboptimal, or as a counter-example, can be useful: when you are setting up an exam, you don't need to be recommended bad programs, do you? With this kind of metadata, you can easily exclude all of them.
  • Tagging your programs by topic is an alternative to distributing them in different folders, and is arguably better: when a program belongs to several topics, you don't have to bother with symlinks; and you are not stuck when you change your mind about the categorization (when, suddenly, you decide to classify your programs by technique: like greedy, dynamic programming, divide and conquer, etc.). By the way, remember you can always redirect the recommendations to stdout, and get a simple list of files. See paroxython.cli_recommend for an example of how to copy them in another directory.

Hunting ducks

Python famously thinks that:

If it looks like a duck, swims like a duck, and quacks like a duck, then it is a duck.

This is called duck typing, and the reason why unique_elements() (above) would be happy with either a list, a tuple or a string: all of them sport the same methods, specifically count() (explicit) and __iter__() (implicit).

A duck typing.
A duck typing on a typewriter.
Unknown artist, courtesy of StackOverflow.

For us, this cute duck typing comes with a challenge. Although this may change in the future, for now Paroxython ignores your type hints. As already explained, it just relies on the names of the methods and attributes to guess the type of the built-in objects. For instance, the default taxonomy has these two lines:

type/sequence/list            member_call_method:(append|extend|insert|reverse|sort)
call/subroutine/method/sequence/list/\1  member_call_method:(append|extend|insert|reverse|sort)

This is why the expression foo.append(bar) will be tagged member_call_method:append (label), and then call/subroutine/method/sequence/list/append and type/sequence/list (taxa). Here, this technique increases our chances of finding programs that feature lists, but it does not achieve the same accuracy for foo.count(bar): indeed, count() is shared by all sequential types. In the taxonomy, this comes down to:

type/sequence                   member_call_method:(count|index)
call/subroutine/method/sequence_duck/\1 member_call_method:(count|index)

Did you spot the duck in the second row? It's there to tell you: β€œQuack! that's the best I can say, but in case this sequence can't actually be anything other than a list, just let me know.”

Well, ok. How to do this?

In fact, the taxonomy has two lines for that:

type/sequence/list              member_call_method:list:(count|index|pop|...)
call/subroutine/method/sequence/list/\1 member_call_method:list:(count|index|pop|...)

This means that you can hint at every ambiguous list method (count(), etc.) by deleting the original label and adding a new one with an inserted ":list":

a.count(b)  # paroxython: -member_call_method:count +member_call_method:list:count

Now, the generated taxa will be: type/sequence/list and call/subroutine/method/sequence/list/count6. As for the duck, it is no more. It has ceased to be. It is expired and gone to meet its maker. It has run down the curtain and joined the choir invisible. This is an ex-duck.

Admittedly, all of this may be overkill, since a program featuring a list would most of the time feature it in several places, most of them without ambiguity. After all, generally speaking, you just want Paroxython to help you find programs in your repository, but don't need it to find features in a given program (you know better).

But if you feel like it, just fire up this 1 weird pipeline:

{
    [
        "operation": "include",
        "data": [".+\bduck\b"],
    ]
}

… and happy duck hunting!

Taxonomy

Structure

In Paroxython, the default taxonomy is a forest: a handful of separate trees with flow, operator, meta, type, etc. as their roots. We call taxon7 a path from a root node to a leaf node. Thus:

type/number/integer/literal

… describes, you know… this kind of literal which is a kind of integer which is a kind of number which is a kind of type (as an example, think of the answer to the ultimate question of life, the universe, and everything).

In the visualization below, we have introduced the pseudo-root β€œπŸβ€ to gather all the trees into one:

This comes8 straight from the tagging of the public repository The Algorithms - Python. The size of a node is relative to the number of times the corresponding taxon's prefix appears in these programs.

Hover over the nodes to see these numbers, or click them to navigate the tree. By the way, you may click on type to better follow the subsequent explanations.

Mapping the labels onto taxa

Paroxython does not directly grow these forest of taxa. Remember, a program is first tagged with labels (as specified in spec.md). Those are then translated into taxa by a purely morphological operation (search and replace), without any more reference to the original source code.

1-1 mappings

Suppose a program has been tagged with the labels:

  • literal:True (the constant True).
  • literal:False (the constant False).
  • external_free_call:bool (a call to the built-in function bool()).

The following conversion table (extracted from taxonomy.tsv) will then give the taxa seen above in the subtree type/boolean:

Taxa (replacement patterns) Labels (search patterns)
type/boolean/literal/True literal:True
type/boolean/literal/False literal:False
type/boolean external_free_call:bool

Note that, somewhat counterintuitively, the conversion goes from right to left. Each line consists of a taxon tab-separated from a label. It is interpreted as β€œwhen a program features this label, create that taxon with the same spanning lines”.

All of these rows define 1-1 mappings: each one translates one label into one taxon.

N-1 mappings

The previous table extract represents the simplest case, where both taxon and label patterns are literal. Below, the second label pattern is a non-literal regular expression:

Taxa (replacement patterns) Labels (search patterns)
type/sequence/list external_free_call:list
type/sequence/list member_call_method:(append|extend|insert|reverse|sort)

It defines a 5-1 mapping, which converts five possible labels (member_call_method:append, member_call_method:extend, and so on) into a single taxon (the metacharacter "|" meaning β€œor”). These two rows thus constitute a 6-1 mapping.

1-N mapping

When a source code is, say, 42 lines long, it is tagged with the label "whole_span:42". This is an example of a single label which produces two taxa: "meta/program" and "meta/count/program/sloc/42". Both span the whole program and, although it is not obvious, both have their uses:

  • The first one is common to all programs, and provides an invariable access key to an all-encompassing span. In a command pipeline, it can be used to express the absence of a taxon.
  • The second one has a variable part, and can be used to filter programs by size (for example, in paroxython.recommend_programs, the pattern "meta/count/program/sloc/[1-4]?[0-9]" will be used to filter out the programs that have 50 lines or more).

This conversions are triggered by the following rows in the default taxonomy:

Taxa (replacement patterns) Labels (search patterns)
meta/program whole_span:.+
meta/count/program/sloc/\1 whole_span:(.+)

On the first row, the sequence of metacharacters ".+" means: β€œeat all you can up to the end of the line.” On the second row, the added parentheses also captures these characters: they are restituted in the replacement pattern by "\1", which denotes a backreference to the first captured group.

Another example could be:

Taxa (replacement patterns) Labels (search patterns)
type/sequence/list member_call_method:(append|extend|insert|reverse|sort)
call/subroutine/method/sequence/list/\1 member_call_method:(append|extend|insert|reverse|sort)

… which means: when you encounter a label member_call_method:sort (for instance), that means the program features a list and a call to the sort() method.

1-0 mapping

At first glance, some of the labels generated during the first tagging step seem redundant, e.g. "binary_operator:Add" and "addition_operator". In fact, the former was used to calculate the latter. In spec.md, the definition of "addition_operator" is introduced by:

An addition operator is a binary operator Add which has not be classified as a concatenation operator.

This is a good example of a label for internal use only. In the conversion step, it will be simply ignored (i.e., it has no entry in the taxonomy).

Back to the roots

Functional inspirations: var, def, call and type

What are the basic concepts on which computing can be built? As you know, Alonso Church answered this very question as early as 1936 (years before the computers were even invented!). The three building blocks of his lambda calculus, namely the concepts of variable, abstraction and application, provide us our first three taxonomic roots, which we will denote respectively by var, def and call. To these we can naturally add that of type, already shown above.

All of these could be enough to describe pure functional programs, at least theoretically. However:

  1. Python is no Haskell. It is a multi paradigm language, with a strong emphasis on imperative programming. And its zen famously holds that β€œpracticality beats purity”.
  2. Here, we are not interested in teaching generalized computability. Pedagogical considerations take precedence over a crippling respect for mathematical abstractions. For instance, we don't need to explain thoroughly the concepts of literal, string, type, built-in function and call for making our young students feel they understand:
print("hello, world")

For these reasons, choosing to root our taxonomy in the four basic notions of the typed lambda calculus should be seen more as a tribute than a formal commitment. More specifically:

  • var will essentially bring together everything relating to the concept of assignment and scoping;
  • def will accommodate not only lambda functions, but named ones, methods, generators and even classes. Note that this does not include variable definitions.
  • call will cover any call to anything with a __call__() method, which Python calls a callable (sorry);
  • finally, type will welcome all types, no purity required.

Imperative needs: flow

The imperative nature of Python demands that we introduce the concept of control flow, under which we put the loops, the conditionals, and some other animals10.

For your convenience: operator, condition, subscript

Again, those are mainly practical choices. After all, an operator is nothing more than an unassuming function call with a funny name and a funny syntax. And a condition11, a combination of function calls which happens to evaluate to a particular type. As for the creation of the root subscript, it comes from the observation that an lot of exciting programs can be written with sequences before venturing out to direct access (that is, as long as we treat Python lists as lists and not arrays). Not that in our default taxonomy, subscript includes slicing and dictionary access too.

Importations

Previously classified under def, since version 0.6.0, the import statement has now its own root in the taxonomy.

Zooming out: pattern

Now this is arguably the most interesting feature to tag in a beginner-level program. Under pattern, you will find numerous variants of the invaluable accumulation pattern (counting, summing, filtering, finding the β€œbest” element, etc.), but also some early-exit patterns (testing for an universal or existential property, finding the first β€œgood” element, etc.), whether by traversing a sequence or evolving a state. A (currently) small number of common expression patterns are also available. However, the common assignment idioms (such as incrementation, swap, slide, conditional assignment) are classified under var/assignment/explicit.

Although the relative size of pattern is the smallest of the taxonomy, note that it is almost that of meta/program (which has one occurrence per program): in other words, almost all programs feature such a pattern (which spans several lines).

The loop patterns constitute an aspect of programming which is not always taught in a conscious and systematic way, and to which Paroxython intends to draw your attention.

Warning

For priority reasons, Paroxython searches for these loop patterns in β€œstatement” loops only, not in β€œcomprehension” loops. This may change in a future version.

Tagging with style

Paradigms

Python allows you to mix freely multiple paradigms. However, for Paroxython, every program will fall into exactly one of the following four categories, always associated to the whole span:

  1. style/object_oriented if it features at least one class definition;
  2. style/functional if it is not object oriented, and features at least one function returning something, but no loop and no assignment;
  3. style/procedural if it is neither object oriented or functional, and features at least one subroutine definition;
  4. style/imperative if it features no subroutine or class definition.

Additionally, style/imperative/flat is the no-indent version of the latter (no loop, conditional, etc.).

Functional traits are also tagged. The corresponding taxa start with: style/functional_trait/, and are suffixed by lambda, higher_order, pure_function, map, filter, reduce, etc. Unlike style/functional, which spans the entire program, these taxa are located on certain lines. They can appear in a program of any (all-encompassing) style.

Independently of the 4-partition, style/one_liner denotes any program whose exactly one SLOC is neither a definition header, an importation or an assertion. For instance, the following program is considered a one-liner:

from sum_proper_divisors_1 import sum_proper_divisors

def abundant_numbers_below(bound):
    # An abundant number is a number for which the sum of its proper divisors
    # is greater than the number itself (Wikipedia).
    return [n for n in range(1, bound) if sum_proper_divisors(n) > n]

Flawed styles

Although Paroxython is no linter, it may tag some unpythonic (i.e., non-idiomatic) or naive patterns, frequent in the beginners' code. For instance:

aux = a
a = b
b = aux

… is tagged style/unpythonic/swap, and:

if condition:
    return False
else:
    return True

is tagged style/naive/return_condition. Search _unpythonic and _naive in spec.md for the current list (which could of course be extended ad libitum).

Going meta

Paroxython will store inside the meta tree some program metadata, such as the number of lines. Some children, such as topic, technique, complexity, are already provided, which you can fill in by adding a manual hint in the source code of the programs.

Modifying the taxonomy

The definition of a taxonomy is, at least partly, a matter of pedagogical choices. You are encouraged to duplicate the default taxonomy, and replace or delete those taxa that do not fit your course purpose or logic.

Copy the default taxonomy.tsv at the same level as the src folder that contains your Python programs:

programming_101/
β”œβ”€β”€ taxonomy.tsv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ hello_world.py
β”‚   β”œβ”€β”€ gob's_program.py
|   β”œβ”€β”€ ...

This way, omitting the option --taxonomy like in the command below will use your copy instead of the original.

paroxython collect programming_101/src

You can now freely experiment on this copy.

Pipeline tutorial

How to get recommendations? This tutorial walks you through a series of little β€œteacher stories” to introduce the pipeline of commands used by Paroxython's recommendation system.

You β€œexecute” this pipeline on command line with:

paroxython recommend -p 'path/to/your/pipe.py' 'path/to/your/programs/db.json'

Covering your base(s)

The simplest pipeline has no command: it just lists the programs of your database, without any filter. This could produce a rather long document, since each source code is printed, along with a list of each and every taxon it implements. Here is the content of your pipe.py:

[]

Tip

The point is that the resulting list is sorted by increasing learning cost. This should give you a rough idea of the pedagogical angle of attack to adopt in your first classes.

Imparting knowledge

Tracking the progress of your course

Suppose that, in your first session, you have introduced your students to the hello_world.py, wheat_and_chessboard.py and euler_005_smallest_multiple.py programs. Not only Paroxython should no longer recommend these programs, but the cost of learning the associated concepts should be considered as zero when they are encountered again in the future.

[
  {
    "operation": "impart",
    "data": [
      "hello_world.py",
      "wheat_and_chessboard.py",
      "euler_005_smallest_multiple.py",
    ],
  },
]

Coming after another course

Suppose instead that you intervene after an introductory course given by a colleague. She gives you a folder (named "CS_101"), which contains the programs she has studied with her class. Since you are secretly in love, you assume, somewhat foolishly, that the concepts they implement are mastered by your new students.

[
  {
    "operation": "impart",
    "data": "find CS_101 -path '*.py'",
  },
]

Tip

As you can see, rather than maintaining a list of programs or concepts in the "data" field, you may provide a string. Paroxython will interpret it as a shell command, and expect it to print on stdout the required list of items (programs or taxa), one per line.

Blacklisting programs

You want to filter out all programs reserved for an exam, or too complex, or not pedagogically interesting enough, or that could get you kicked out of your college, etc.

[
  {
    "operation": "exclude",
    "data": [
      "fizzbuzz.py",
      "alpha_go.py",
      "hello_world_of_pain.py",
      "gob_s_program.py",
    ],
  },
]

Concepts to be introduced later (or never)

For the time being, you don't want to be recommended any program requiring the concepts of recursivity (def/subroutine/recursive), dictionary (type/non_sequence/dictionary) or set (type/non_sequence/set).

[
  {
    "operation": "exclude",
    "data": [
      "def/subroutine/recursive",
      "type/non_sequence",
    ],
  },
]

Note

Paroxython relies on the last three characters of a "data" item to decide whether it is a Python program (ending with ".py") or a taxon (like here).

Tip

Since the two latter taxa share a common prefix and Python doesn't provide other non-sequence type, it is enough to exclude all taxa starting with type/non_sequence.

Concepts to be introduced now

Your next class will be devoted to conditional constructs. Thus, you need to go through programs featuring taxa prefixed by flow/conditional or even, why not, some conditional expressions (operator/ternary). The suggested programs do not necessarily have to implement both concepts at the same time, but at least one. Any other program will be rejected.

[
  {
    "operation": "include",
    "data": [
      "flow/conditional",
      "operator/ternary",
    ],
  },
]

Preparing an assignment

Basically, you provide Paroxython with a list of programs already studied in class, and it suggests new programs that implement only old concepts. This requires the exclusion of all programs featuring at least one concept that has not been seen in any of the programs already seen.

It may sound a little complicated, because it is. But don't panic. Remember that Paroxython always orders its recommendations by increasing cost. Thus, all you have to do is ask it to list the programs that have not been introduced yet. The programs you want to choose among will appear at the top of the results, under the zero-learning cost section. Moreover, inside each section, they will be sorted by increasing size (SLOC), which can be a reasonable proxy for their difficulty.

But in the end, you, the teacher, are the judge. Paroxython is not going to set up a test for you, let alone grade the answers. It is just here to remind you of some possibilities you might not have thought of at the right time, and to give you some confidence that the exercises require no concept you have not pre-introduced in class, which later on might save you some awkward conversations with your dear students.

Since this is the last filter in our tutorial, let's summarize what we've seen by chaining several commands together:

  1. Impart the previous knowledge by extracting all the programs listed in the "timeline.txt" file that you update after each session. You can adapt the script parse_syllabus.py to your own needs. If you wish, you can use the base_path variable, which represents the parent of the directory containing the programs of your personal database.
  2. Exclude some irrelevant programs.
  3. Include the concepts that you want to test during your exam.
[
  {
    "operation": "impart",
    "data": "python helpers/parse_syllabus.py {base_path}/timeline.txt",
  },
  {
    "operation": "exclude",
    "data": [
      "foo.py",
      "bar.py",
      # "buzz.py",
    ],
  },
  {
    "operation": "include",
    "data": [
      "pattern/elements/accumulate",
      "topic/game",
      "type/sequence/list",
    ],
  },
]

Tip

This command pipeline is not a JSON file, but a Python program, or more restrictively a Python expression. As such, it offers several amenities you surely know and love: comments, trailing commas, r-strings (no need to double-escape backslashes in regexes!), etc. For security purposes, this file is not imported, but read as a text and evaluated by ast.literal_eval().

This is the end of our pipeline tutorial. If you dare, read on for more advanced features (regular expressions, span algebra, semantic triples, negations).

Pipeline documentation

This part goes deeper into the pipeline by describing in details how the recommendation system works. It assumes you've already read the pipeline tutorial. First of all, let us recall the example of pipeline we gave at the end.

>>> [
...   {
...     "operation": "impart",
...     "data": "python helpers/parse_syllabus.py {base_path}/timeline.txt",
...   },
...   {
...     "operation": "exclude",
...     "data": [
...       "foo.py",
...       "bar.py",
...       # "buzz.py",
...     ],
...   },
...   {
...     "operation": "include",
...     "data": [
...       "pattern/elements/accumulate",
...       "topic/game",
...       "type/sequence/list",
...     ],
...   },
... ]

Regular expression patterns

The first thing to clarify is that the strings which constitute the data lists are not simple names, but β€œdescriptions” of names. These descriptions, or patterns, are written in the language of regular expressions, and more precisely in the dialect implemented by the third-party module regex12.

Tip

If you don't know how to write regular expressions, the good news is that you don't have to worry. Unless they contain certain special characters, they are self-descriptive. That means, for instance, that the string "type/sequence/list" happens to be a regular expression which precisely describes (or ”matches”) the string "type/sequence/list". This is the case for all strings that are only made up of so-called literal characters.

As a matter of fact, all the characters that can appear in a taxon are literal. We advise you to avoid non-literal characters in your program paths as well. What are they? Essentially: .^$*+?{}\[]. Depending on your operating system, some are already illegal in filenames anyway. To tell the truth, the first one (the dot) is an integral part of most filenames, as an extension separator. However, it turns out that it basically means β€œany character”, which of course includes the dot itself. So, the regular expression "hello_world.py" matches the filename "hello_world.py", and the fact that it matches "hello_world/py" and "hello_worldapy" shouldn't be a real problem. In any case, you can always prefix (or escape) a non-literal character with a backslash "\" to make it literal (here, "hello_world\.py"): that's all you'll need to know about regular expressions.

The second thing to clarify is that a given pattern will match any string starting with it. Thus, "type/sequence/list" will match not only "type/sequence/list", but "type/sequence/list/empty" and "type/sequence/list/supercalifragilisticexpialidocious" too. The same applies to program paths. For example, in our personal database, all programs from Project Euler start with "euler_". The pattern "euler_" will then match all of them.

In terms of regular expressions, it means that, under the hood, Paroxython invokes match() instead of search(). In the rare cases where you want to restrict the whole matching to the pattern itself, simply suffix it with a dollar sign, i.e., "type/sequence/list$".

Execution of a pipeline

A pipeline is a set (or list, but the order is not important) of commands which apply sequentially to a database of program tags. It returns:

  • a list of recommended programs;
  • an estimation of the learning cost of each concept (i.e., taxon) they implement.

The set of commands constitutes a pipeline that evolves both:

  • a selection of programs (initially all those in the tag database);
  • a body of imparted knowledge (initially empty).

Note

During this process, the program selection can never increase, and the imparted knowledge can never decrease.

A command consists of:

  • an operation among "include", "exclude", "impart" and "hide", optionally suffixed by the quantifier (" all");
  • the data to which it applies. This data is (or can be interpreted to produce) a list of criteria, each criterion being:
    • either a program pattern,
    • or a taxon pattern,
    • or a relationship between two taxon patterns (more on this in the Semantic triples section).

Combining the operation with a criterion is done in four stages:

  1. An auxiliary set of programs is calculated.
  2. The set of recommended programs (the selection) is updated.
  3. A new set of taxa may be β€œlearned”, i.e., added to the body of imparted knowledge.
  4. The set of taxa and programs to hide in the final report may be updated.

Note that each stage depends on both the operation and the nature of the criterion to which it applies. Let us detail every possible case:

"include" command

Its purpose is to keep only those selected programs that satisfy at least one given criterion.

  1. The auxiliary set contains any program such that:
    • either its name matches the given pattern,
    • or its source code:
      • either features at least one taxon matching the given pattern,
      • or satisfies the given relationship.
  2. The selection is then restricted to its intersection with the auxiliary set.
  3. The imparted knowledge is unchanged.

"include all" variant

Its purpose is to keep only those selected programs that satisfy all given criteria.

{
  "operation": "include all",
  "data": [
    criterion_1,
    criterion_2,
    ...,
    criterion_n
  ]
}

… is equivalent to:

  {"operation": "include", "data": [criterion_1]},
  {"operation": "include", "data": [criterion_2]},
  ...,
  {"operation": "include", "data": [criterion_n]}

"exclude" command

Its purpose is to filter out of the selection the programs which satisfy at least one given criterion.

  1. The same auxiliary set of programs as above ("include") is calculated, but further extended to the programs which import them (either directly or indirectly). Indeed, when the teacher wants to exclude a certain program, any program that requires it should be excluded as well.
  2. The selection is then restricted to its difference with the auxiliary set.
  3. The imparted knowledge is unchanged.

"exclude all" variant

Unlike "include all", this variant actually expands the semantics of the system.

{
  "operation": "exclude all",
  "data": [
    criterion_1,
    criterion_2,
    ...,
    criterion_n
  ]
}

… excludes the programs satisfying all the criteria. This cannot be transformed into a sequence of "exclude" commands, which would exclude the programs satisfying any of the criteria.

"impart" command

Its main purpose is to track the acquisition of new knowledge. However, if this knowledge is provided by a given set of programs, there is no more need to keep them in the selection of recommended programs.

  1. The auxiliary set contains all programs whose name matches at least one of the given patterns (relationships are not supported), plus those which import (either directly or indirectly) at least one of them. In other words, it is the same as above ("exclude"), but only in the case where the criterion is a program pattern. In all other cases, this auxiliary set will be empty.
  2. The selection is then restricted to its difference with the auxiliary set (same as above for "exclude").
  3. There are three cases:
    1. If the criterion is a program pattern, all the taxa featured in a program whose name matches this pattern are added to the body of imparted knowledge.
    2. If the criterion is a taxon pattern, all the taxa matching this pattern are added to the body of imparted knowledge.
    3. Otherwise, the body of imparted knowledge is unchanged.

The "impart all" variant is not supported, and treated as "impart".

"hide" command

This is the simplest command of all. It merely accumulates the programs or the taxa matching the given patterns (relationships are not supported), without other effect on the filter. In the final stage, the accumulated programs or taxa will be filtered out of report.

Notes.

  • Excluding a program is equivalent to hiding it plus every program importing it.
  • The learning costs are not affected whatsoever by the hiding of a taxon: its individual cost continues to contribute towards the total cost of a program.

The "hide all" variant is not supported, and treated as "hide".

Semantic triples

Span algebra

Being given a program p of n lines, a couple (i_1, i_2) of line numbers such that 1 \leq i_1 \leq i_2 \leq n represents a span of p. When Paroxython finds a taxon t in p, it doesn't just say whether p features t or not: it returns the list of spans where t occurs.

It is sometimes interesting to know how the spans of two taxa of the same program are located relatively to each other.

  • For instance, a print() statement whose span is inside the span of a function indicates that this function has a side-effect.
  • A less trivial application stems from the fact that our taxonomy is multi-dimensional: each algorithmic feature can be associated with several taxa at the same time, characterizing it it along several dimensions. For instance, the taxa flow/loop/for and flow/loop/while describe the category of a loop; whereas the taxa flow/loop/exit/early and flow/loop/exit/late describe its exit behavior. These dimensions are independent and, although it is entirely possible to do so, our default taxonomy does not list the results of the cross-product (i.e., flow/loop/for/exit/early, flow/loop/for/exit/late, flow/loop/while/exit/early and flow/loop/while/exit/late). Querying an hypothetical taxon flow/loop/for/exit/early (or flow/loop/exit/early/for, by the way) comes down to querying the taxa flow/loop/for spanning the same lines as flow/loop/exit/early.

To express a relation \mathfrak{R} between the spans of two taxa, we start from the algebra devised in 1983 by James F. Allen in his seminal paper (PDF) about temporal intervals13. The so-called Allen's interval algebra defines 13 basic operators which capture all possible relations between two intervals X=(x_1, x_2) and Y=(y_1, y_2). The table below uses his terminology:

Examples \mathfrak{R} on endpoints X \mathbin{\mathfrak{R}} Y Y \mathbin{\mathfrak{R}} X Keys
....XXXXX........  
....YYYYY........   x_1 = y_1 \leq x_2 = y_2   X equals Y   Y equals X   x=y≀x=y
....YYYYYYYYY....   x_1 = y_1 \leq x_2 \leq y_2   X starts Y   Y started by X   x=y≀x≀y
..YYYYYYYYY......   y_1 \leq x_1 \leq x_2 \leq y_2   X during Y   Y contains X   y≀x≀x≀y
.YYYYYYYY........   y_1 \leq x_1 \leq x_2 = y_2   X finishes Y   Y finished by X   y≀x≀x=y
....|...|..YYYYY.   x_1 \leq x_2 \leq y_1 \leq y_2   X before Y   Y after X   x≀x≀y≀y
....|...|YYYYY...   x_1 \leq x_2 = y_1 \leq y_2   X meets Y   Y met by X   x≀x=y≀y
....|..YYYYY.....   x_1 \leq y_1 \leq x_2 \leq y_2   X overlaps Y   Y overlapped by X   x≀y≀x≀y

These well-known operators apply quite nicely to our spans, with the following adjustments:

Synonyms. The 13 operator names, whose some don't make much sense in a non-temporal context, are extended with a number of synonyms, such as "inside" for "during", and "equal" or "is" for "equals". The current list can be found here.

Non-strict inequalities. In the above table, contrary to the convention on time intervals, we consider the inequalities to be large. In a program, most features span exactly one (logical) line (i.e., x_1=x_2): this is simply not possible with the original relations on endpoints. Besides, when we look for a function which contains a loop, we are generally not interested in whether the loop finishes on the last line. Blind adherence to strictness convention would force us to express our query as a disjunction of these two relations for the following program to be selected:

def first_missing_non_negative(l):
    elements = set(l)
    for natural in range(len(elements) + 1):
        if natural not in elements:
            return natural

Strict inequalities. In addition to the existing relations, all combinations of strict and non-strict conditions are generated, for instance x_1 < x_2 < y_1 \leq y_2. That adds up to 162 in total14.

Keys. As we use many more operators than Allen, giving an English name to every single one is not an option. Paroxython actually identifies them by a unique key (see the last column of the table above) calculated like this:

  1. Start from the chain of equalities and inequalities that defines the relations between the two spans' endpoints: y_1 < x_1 = y_2 \leq x_2.
  2. Drop the indices (since, for a given x or y, indice 1 always comes before indice 2, no information is lost): y < x = y \leq x.
  3. Rewrite the formula using only the five characters "=", "<", "≀", "x" and "y": y<x=y≀x.

Key normalization. Paroxython tries (cf. normalize_predicate()) to be tolerant with the user input. The verb "is" is ignored, except when it constitutes the whole string (thus, "is after" comes down to "after"). When an operator is not an Allen's name or a predefined synonym, the following normalization process is applied:

  1. Replace any <= by ≀.
  2. Replace any == by =.
  3. Strip all characters distinct from "=", "<", "≀", "x" and "y".
  4. If the string is "x=y" or "y=x", replace it by "x=y≀x=y".
  5. If the string contains only one x (resp. y), expand it into x≀x (resp. y≀y).

This allows the user to be understood when entering formulas like x == y, x <= y or y1 < x1 == x2 <= y2.

Positive semantic triples

A pipeline command can apply not only to patterns of programs and taxa, but also to relationships between the spans of two taxon patterns. Being given an operator \mathfrak{R} and two taxon patterns t_1 and t_2, the statement t_1 \mathbin{\mathfrak{R}} t_2 will be codified in the form of the subject–predicate–object expression (t_1, \mathfrak{R}, t_2), which can be entered as a mere Python tuple (t1, R, t2).

The simplest semantic triple is:

("meta/program", "contains", taxon_pattern)

It matches all programs where the span of the taxon "meta/program" includes the span of a given taxon_pattern. Since, by definition, every program features one taxon "meta/program", whose span encompasses the whole program, this is strictly equivalent to:

taxon_pattern

And now for something completely different: the following pipeline command will only keep programs featuring a conditional inside a loop or (inclusive) ended by a return statement (the function first_missing_non_negative() given above satisfies these two criteria).

  {
    "operation": "include",
    "data": [
      ("flow/loop", "contains", "flow/conditional"),
      ("flow/conditional", "finished by", "def/subroutine/return"),
    ]
  },

It is currently not possible to chain several operators with shared operands, for example to keep only the programs that feature a conditional inside a loop and ended by a return statement: the quintuple ("flow/loop", "contains", "flow/conditional", "finished by", "def/subroutine/return") would raise an error. As it stands, the best we can do is to chain the commands themselves:

  {
    "operation": "include",
    "data": [
      ("flow/loop", "contains", "flow/conditional"),
    ]
  },
  {
    "operation": "include",
    "data": [
      ("flow/conditional", "finished by", "def/subroutine/return"),
    ]
  },
  {
    "operation": "include",
    "data": [
      ("def/subroutine/return", "contains", "flow/loop"),
    ]
  },

This keeps only the programs that feature a conditional inside a loop and a conditional ended by a return statement and a return statement inside a loop. The result is a subset of the previous one, but may still include programs where the two loops, conditionals or return statements are distinct.

Negative semantic triples

It is possible to negate the predicate of a semantic triple, either by prefixing it by an exclamation mark (e.g., "!x≀y≀y≀x", or "!contains"), or by adding the word not (e.g, "not x≀y≀y≀x", "not contains", or "contains not"), whether it's grammatically correct or not.

Expressing the absence of a taxon

Here again, the simplest negated semantic triple is:

("meta/program", "not contains", taxon_pattern)

On a set of independent programs (i.e., not importing each other), there is a strict equivalence between:

  {
    "operation": "include",
    "data": [
      ("meta/program", "not contains", taxon_pattern)
    ]
  },

and:

  {
    "operation": "exclude",
    "data": [
      taxon_pattern
    ]
  },

And, conversely, between:

  {
    "operation": "include",
    "data": [
      taxon_pattern
    ]
  },

and:

  {
    "operation": "exclude",
    "data": [
      ("meta/program", "not contains", taxon_pattern)
    ]
  },

This property may be used to select the programs that include any subset of a given set of taxa. This can be done in a systematic way by converting15 the corresponding logical formula in conjunctive normal form, and transforming each clause into a separate "include" command.

For instance, consider the two taxa "flow/conditional" and "flow/loop". To keep only the programs that feature either a conditional (denoted by the literal a) or a loop (denoted by the literal b), but not both of them, express first the formula in CNF: (a \lor b) \land (\neg a \lor \neg b), then translate it into:

  {
    "operation": "include",
    "data": [
      "flow/conditional",
      "flow/loop",
    ]
  }, # keep only the programs that feature either a conditional or a loop
  {
    "operation": "include",
    "data": [
      ("meta/program", "not contains", "flow/conditional"),
      ("meta/program", "not contains", "flow/loop"),
    ]
  }, # among them, keep only those that feature no conditional or no loop

For two taxa, there are a total of 16 combinations, listed and tested here.

However, on a set on interdependent programs, "exclude" is not the opposite of "include" anymore: as already explained, excluding a program excludes also the programs which import it. Consequently, including the programs that do not feature a given taxon will filter out those which feature it directly; but excluding the programs that do feature this taxon will filter out those which feature it either directly or indirectly (by importation).

General semantics of the negation

The semantic triple:

(taxon_pattern_1, not predicate, taxon_pattern_2)

… describes the set of programs which feature at least one taxon matching taxon_pattern_1, and such that, for any span s_1 of such a taxon, there exists no span s_2 of a taxon matching taxon_pattern_2 for which predicate(s_1, s_2) is verified.

Warning

Note that the result includes also all programs featuring at least one taxon matching taxon_pattern_1, but no taxon matching taxon_pattern_2. For instance, suppose the original (negative) triple is:

("def", "not contains", "var/assignment/explicit")

The function will return the (disjoint) union of these two sets:

  1. The programs featuring at least one subroutine, but no assignment at all.
  2. The programs featuring at least one subroutine, at least one assignment, but no assignment inside a subroutine.

For example, if the original (negative) predicate is "taxon_pattern_1 not inside taxon_pattern_2", any program consisting in:

  • "taxon_2{taxon_1}"16 is rejected;
  • "taxon_1" is accepted (although there is no taxon_2);
  • "taxon_2" is rejected (no taxon_1);
  • "taxon_1{taxon_2}" is accepted;
  • "taxon_1 taxon_2{taxon_1}" is accepted (there exists a couple (s_1, s_2) such that taxon_1 is not inside taxon_2);
  • "taxon_1 taxon_2{taxon_1} taxon_2{}" is accepted.

If the predicate is "taxon_pattern_2 not contains taxon_pattern_1" (sic), any program consisting in:

  • "taxon_2{taxon_1}" is rejected;
  • "taxon_1" is rejected (no taxon_2);
  • "taxon_2" is accepted (although there is no taxon_1);
  • "taxon_1{taxon_2}" is accepted;
  • "taxon_1 taxon_2{taxon_1}" is accepted (there exists a couple (s_1, s_2) such that taxon_2 does not contain taxon_1);
  • "taxon_1 taxon_2{taxon_1} taxon_2{}" is accepted.

Note that, due to the rule explained in the warning above, "taxon_pattern_1 not inside taxon_pattern_2" is not equivalent to "taxon_pattern_2 not contains taxon_pattern_1" (differences in bold). However, if taxon_pattern_2 is "meta/program", which is featured by all programs, these two forms become equivalent again.

A practical example

Suppose you want your course on the assignment statements to include the concept of parallel assignment, for example to introduce swapping without an auxiliary variable:

(a, b) = (b, a)

In the taxonomy, this concept is listed as a special case of both tuple ("type/sequence/tuple") and variable assignment ("var/assignment/explicit/parallel").

However, you want to introduce officially the abstract data type β€œtuple” only much later in your course. You therefore wish to exclude from the recommendations any program implementing a tuple, unless it is part of a parallel assignment. The following command will make the trick:

  {
    "operation": "exclude",
    "data": [
      ("type/sequence/tuple", "is not", "var/assignment/explicit/parallel")
    ]
  },

The triple describes the set of programs which feature at least one tuple, and such that there is no tuple's span coinciding with a parallel assignment's span. In other words, the set of programs which feature at least one β€œnon parallel” tuple. When we exclude these programs, those which remain either feature no tuple, or only the ones involved in a parallel assignment.

The other combinations are listed and tested here.

Glossary

Inside a definition, the terms which have their own entry in the glossary are italicized.

AST (flat β€”)
The dump of an Abstract Syntax Tree as a sequence of text lines, each one consisting of the whole path from the tree root to the current node, with some tweaks.
Command
A pipeline operation, along with its data.
Criterion (plur. criteria)
Either a taxon pattern, a program pattern or a semantic triple.
Data
In a restricted sense, the second member of a command.
  • If it is a string, it is interpreted as a shell command printing a list of program or taxon patterns.
  • If it is a list, each element is interpreted as a criterion.
Hint (manual β€”)
A comment that you may add in a source, either to fix a mislabelling problem, or to specify some metadata. It starts with "# paroxython: and can schedule either a label or a taxon for addition (by default) or deletion. Example:
print(a + b) # paroxython: -addition_operator +concatenation_operator
Label
Name of a feature, as defined in spec.md. Examples:
  • simple: swap, if_else_branch, verbose_conditional_assignment;
  • with one variable suffix: loop_with_return:for, import_module:foo, member_call_method:append;
  • with several variable suffixes: member_call:bar:append, for_range:start:stop:-1.

Comparison with taxa: the taxa aim to represent concepts; labels, not so much. For instance, labelling a = b by assignment:42 captures, in addition to the concept (assignment), an arbitrary piece of information (a name) which may or may not be internally used later to label more complex features. Moreover, taxa are structured into a taxonomy; labels are free form.

Operation
A string among "include", "exclude", "impart" and "hide", with an optional suffix "any" (implicit) or "all".
Path
A continuous sequence of distinct edges from the root to a given node of a tree. Can be used for the taxonomy or for the flat AST.
Pattern
A regular expression describing the name of a program, a label or a taxon.
Pipeline
A list of filtering commands, expressed as a Python value and evaluated by ast.literal_eval().
Predicate
Used in a restricted sense to denote the middle term of a semantic triple. It is a function taking two poor spans x and y and telling whether they satisfy a certain relationship (for instance, β€œx after y”, or β€œy not inside x”).
Source
The text of a Python program of your collection.
Span
The location of a given feature, as the couple of its first and last lines. By extension, a third member, the path, identifies the beginning of the feature in the AST.
Span (poor β€”)
A span deprived from its last member.
Tag
Generic term, covering both labels and taxa.
Taxon (plur. taxa)
Name of a feature, resulting from the conversion of a given label. Examples:
  • the label swap is converted to the taxon var/assignment/explicit/parallel/swap;
  • the label member_call_method:append is converted to the taxa call/subroutine/method/sequence/list/append and type/sequence/list.

A taxon is a path from a root to a node in the taxonomy. It represents a learning concept that should be of interest to the teacher.

Taxonomy
A forest structuring the learning concepts of the domain. The idea is that the concept represented by a node must be introduced after its parent concept (if any).
Triple (semantic β€”)
Consists of a taxon pattern (the subject), a predicate and an another taxon pattern (the object). Example: ("flow/conditional", "finished by", "def/subroutine/return").

  1. Note that, in this case, it is actually possible to write a single-point-of-exit version which would be correclty tagged as tail recursive:

    def gcd(a, b):
        return (gcd(b, a % b) if b else a)
    

  2. There are numerous Pascal dialects, and some of them support a C-like return equivalent (procedure exit) since at least 1992. This extension is not part of the Pascal standard, though. 

  3. Roberts, Eric. (1995). Loop exits and structured programming: reopening the debate. Proceedings of the 26th SIGCSE Technical Symposium on Computer Science Education, 1995, Nashville, Tennessee, USA, March 2-4, 1995: 268-272. doi:10.1145/199688.199815. 

  4. A modern language like Swift definitely clarifies this intention by making guard a keyword (since 2.0). 

  5. See here for the complete heuristic. 

  6. For the similar cases where no translation is provided in the default taxonomy, you can either add it yourself, or directly hint at the desired taxa, i.e., after # paroxython:, replace +member_call_method:list:count by type/sequence/list and call/subroutine/method/sequence/list/countπŸ¦†. 

  7. This contrasts with the convention in biological classification, where the term taxon refers to a taxonomic unit (a node of the tree). For instance, African Elephant is classified under the taxon Loxodonta. However, a taxon (implicitely) β€œencompasses all included taxa of lower rank”, here:

    • Kingdom: Animalia
    • Phylum: Chordata
    • Class: Mammalia
    • Order: Proboscidea
    • Family: Elephantidae
    • Subfamily: Elephantinae
    • Genus: Loxodonta

    Since our nodes are not uniquely named (e.g., there are several nodes literal), we make all the lower rank nodes explicit by concatenating them (i.e., type/number/integer/literal instead of simply literal, which would be ambiguous)🐘. 

  8. More precisely, it is the result of executing SELECT taxon, count(*) FROM taxon WHERE taxon not LIKE 'meta/count/%' GROUP BY taxon on the SQLite tag database. 

  9. On a side note: in the AST, an expression like a < b < c counts for one (chained) comparison. 

  10. The sequence control flow is an exception. As it characterizes the imperative paradigm more than this or that program, it will deliberately be excluded from the features searched by Paroxython. 

  11. A condition is a boolean expression, not to be confused with a conditional (control structure). 

  12. This module appears to be here to stay, since it is proeminently featured in the documentation of the more mundane standard library re. Thanks and kudos to the author, Matthew Barnett: without his awesome work, many parts of our code would have been much tougher to write, and much slower to execute. 

  13. Allen, James F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM. 26 (11): 832–843. doi:10.1145/182.358434 

  14. 162 = 6 \times 27, where 6 is the number of distinct permutations of the quadruple (x, x, y, y), and 27=3^3 the number of elements of the triple auto-product of 3 symbols (=, ≀ and <). 

  15. Automate the conversion with the function to_cnf() of SymPy

  16. In these examples, the braces denote the fact that the span of a certain taxon is included in that of another taxon.