User manual
How to read this manual
The different sections of this manual are presented in the theoretical order of operations: you start by preparing your collection of programs, these programs are labeled, these labels are translated into taxa, the results are stored in a database; finally, you filter this database through a command pipeline to extract recommendations.
In practice, your input is only required for the last stage. That's why we suggest you observe the following order:
- Start by reading the pipeline tutorial: it explains, in a user-friendly style, what you can expect from Paroxython and how you may integrate it into your teaching.
- Once you are familiar with this aspect, see if you can improve or enrich the results through some preparation of the programs in your collection.
- If needed, adapt the default taxonomy to your own vision of how to teach programming.
- Harness the full power of the pipeline with its most advanced commands.
Of course, you can consult at any time the glossary of terms specific to Paroxython.
Preparing your program collection
tl;dr. There is nothing to prepare before launching Paroxython. Nevertheless, for best results:
- don't name things like a first year student (follow PEP 8, good practices and common sense);
- aim at the simplest style. Don't try to outsmart Paroxython: you're sure to win… a lot of false negatives;
- the system is not capable of inference: be direct;
- for the same reason, prefer
from foo import bar; bar()
toimport foo; foo.bar()
; - do use:
- early exits when alternative versions are less clear;
- infinite loops when you deal with an unpredictable stream of inputs;
else
after thereturn
statement ofthen
branches which are not guards;
- give manual hints for:
- correcting a false positive or negative;
- adding metadata (it's more versatile than a nested directory structure);
- resolving duck typing on built-in types, if needed.
Naming things
Capitalization
Paroxython takes advantage of the official PEP 8 naming conventions. In a nutshell:
lowercase_with_underscores
- Variables, functions (in the broadest sense, including methods), modules, packages.
CapitalizedWords
- Classes, exceptions, type variables.
ALL_CAPITAL_LETTERS_WITH_UNDERSCORES
- Constants, defined on a module level.
For example, with a careless style like this:
def ComputeSquare(x): # bad: CamelCase for a function name
return x * x
FOO = 42
FOO = ComputeSquare(FOO) # bad: redefinition of a constant
… Paroxython will wrongly detect two definitions of constants (var/assignment/explicit/constant
) and the instanciation of a class (call/class/constructor
). To make these false positives disappear, just write:
def compute_square(x):
return x * x
foo = 42
foo = compute_square(foo)
Leading and trailing underscores
_single_leading_underscore
- Weak βinternal useβ indicator.
__double_leading_underscore
- Mangled non-public class attributes.
single_trailing_underscore_
- Conflict-averse objects (avoiding a name collision with a keyword).
__double_leading_and_trailing_underscore__
- "Magic" methods or attributes (never invent such names).
Other conventions
When the name of a function returning a value starts with "is_"
, "are_"
, "has_"
, "have_"
, "can_"
, "must_"
or "needs_"
, it is tagged as predicate ("def/subroutine/function/predicate"
):
def is_odd(n):
return n % 2
When the name of a function returning nothing starts or ends with "test"
or "tests"
, it is tagged as test ("def/subroutine/procedure/test"
):
def test_is_odd():
assert not is_odd(42)
assert is_odd(5)
Beware of renaming built-ins
It's rarely a good idea to reuse a built-in function name:
max = 42 # mad: max() is a built-in function
In the same way, if you write classes, consider the names of the methods of the built-in types as taboo. Paroxython relies on them to try guessing the type of objects to which they apply. For instance, in:
def change_separator(s, old, new):
return new.join(s.split(old))
… nothing indicates that s
is a type/sequence/string
except from the call of its method split()
.
Style
Be boring
Paroxython is optimized for the most unremarkable style. In the following snippet:
a += 1
a = a + 1
a = 1 + a
… only the first two lines will be labelled as increment:a
. Paroxython doesn't bother to decipher the so-called Yoda style. Other examples:
a = -a # labelled as `negate`
a = -1 * a # false negative
a[-i] # labelled as `negative_index`
a[len(a) - i] # false negative
Be direct
Some static analysis tools have inference capabilities. For instance, Astroid can parse:
a = 1
b = 2
c = a + b
… and infer the value of c
.
For now, Paroxython makes no inferences. So, using an intermediate assignment will sometimes be enough to make it miss a feature:
s = "hello, %s" % world # labelled as `string_formatting_operator`
template = "hello, %s"
s = template % world # false negative
A string literal on the left hand side of %
is needed to recognize the latter as a string formatting operator. Paroxython does not try to guess the type of template
.
There are more interesting cases. For instance, the following code is correctly tagged tail_recursive_function:gcd
: a function is tail recursive if and only if all its recursive calls return their results immediately, without any further calculation.
def gcd(a, b):
if b == 0:
return a
return gcd(b, a % b)
Now, suppose you want only one exit point, and come up with the following functionally equivalent code:
def gcd(a, b):
if b != 0:
a = gcd(b, a % b)
return a
The result of the recursive call is now assigned to variable a
. There is no further calculation as such, but Paroxython expects you to return directly the result. Its no-inference heuristic is fragile: delaying a key treatment is enough to defeat recognition1.
Importations
So far, the preferred form is:
from a.b.c import d
For instance, writing:
from math import cos, sin
cos(1) - sin(1)
… makes Paroxython able to produce the following (correct) taxa:
import/standard/math/cos
import/standard/math/sin
call/subroutine
But writing:
import math
math.cos(1) - math.sin(1)
… yields subpar results:
import/standard/math
(less specific)call/subroutine/method
(wrong)
Early exits
Paroxython will rightly identify a universal quantification (pattern/elements/satisfy/all
) in:
def is_prime(n):
for candidate in range(2, n):
if n % candidate == 0:
return False
return n > 1
Not so if you try to force the predicate into the single-point-of-exit dogma of Pascal2:
# Once upon a time... in academia
def is_prime(n):
candidate = 2
divisor_found = False
while candidate < n and not divisor_found:
if n % candidate == 0:
divisor_found = True
candidate += 1
return n > 1 and not divisor_found
Apart from being twice longer, more error-prone and harder to understand, this style defeats Paroxython, and will always do, sorry.
Tip
It's not the 80's anymore3. Don't think you're being rude if you keep things simple by exiting early. To put things in perspective, with a modern language like Python, you'd better think of control flags as code smells.
In any case, take a look at our specifications of iterative patterns. Should there be anything you don't agree with, no problem: just delete the corresponding translations from your copy of taxonomy.tsv.
Infinite loops
Likewise, consider solving the loop-and-a-half problem3 this way:
def input_number_between(prompt, lower_bound, upper_bound):
while True:
number = literal_eval(input(prompt))
if lower_bound <= number <= upper_bound:
return number
print(f"Your number should be between {lower_bound} and {upper_bound}!")
… to make it tagged pattern/inputs/find_first_good
.
Tip
As a general rule, opt for while True
if (and only if) at least one of the three following conditions holds:
- it is possible to never reach a terminal state;
- it is impossible to infer the next state from the states already observed;
- it is possible to observe the same state more than once.
Guards
Numerous guidelines and linters (e.g.,
LLVM Coding Standards,
Pylint,
ESLint,
TSLint)
warn that an else
branch following a return
statement βis not necessary and increases indentation levelβ.
Although this is technically true, our approach is more nuanced. Specifically, we recommend our students to use the βno elseβ style when the condition can be seen as a guard, i.e., a pre-condition to avoid failing, complicating or slowing down the main treatment of the subroutine:
def is_prime(n):
if n < 2: # avoid calculating the square root of a negative number
return False
if n == 2: # avoid returning a wrong result for 2
return True
if n % 2 == 0: # avoid wasting time in the loop
return False
# this is where the real work begins
for candidate in range(3, int(sqrt(n) + 1), 2):
if n % candidate == 0:
return False
return True
However, as noted here or here, there are situations where applying the blanket rule would break an inherent symmetry in the treatment:
def max_(a, b):
if a >= b:
return a
else: # removing this branch would hurt readability
return b
So, both styles have their uses. Consider this as an opportunity to better communicate your intention4. Remember that when you write an if
statement ended by a return
and not followed by an else
branch, Paroxython will automatically try5 to tag it as a guard (flow/conditional/guard
).
Manual hints
On a given source code, the tagging algorithm may sometimes result in false positives or false negatives. Moreover, the semantics of some features may be subjective (e.g., meta/topic/fun
) or beyond the capabilities of Paroxython (e.g., deciding the relevance of the short_circuit
property of a boolean operator). In any case, the user has the possibility to manually tag certain lines of their source code to hint either the presence or absence of a given feature.
Addition
The addition of one or several tags is hinted by a comment starting with # paroxython:
.
Hinting with labels
A tag can be a label (of the kind defined, but not necessarily included in spec.md):
if i < len(s) and s[i] == x: # paroxython: short_circuit:And
This is powerful: a label can be derived into several other labels, and then converted into any number of taxa (according to the mapping of taxonomy.tsv). But it asks you to check the taxonomy to see how it will be converted.
Hinting directly with the final taxa
A tag can be a taxon (starting with a non-empty sequence of word characters and a slash):
if i < len(s) and s[i] == x: # paroxython: operator/boolean/and/short_circuit
Hinting with a taxon is easier: you put it directly into the taxonomy. But sometimes you want more.
Deletion
To delete a label, prefix it with a minus symbol. For instance, the following hint requalifies an addition into a string concatenation:
print(a + b) # paroxython: -addition_operator +concatenation_operator
Warning
You cannot directly hint the deletion of a given taxon.
Multiple lines
If a label should span over several lines, it is hinted on the first line with a .β.β.
suffix (meaning βto be continuedβ), and on the last line with a ..β.β
prefix (meaning βcontinuingβ). In the following example, a new label is manually substituted to the calculated label loop:for
:
for am in ifera: # paroxython: -loop:for... +amoeboid_protist...
catch()
eat() # paroxython: ...loop:for ...amoeboid_protist
Formatting details
Some tolerances exist for the syntax:
+
can be omitted...β.β
can be writtenβ¦
(HORIZONTAL ELLIPSIS, U+2026).# paroxython:
is neither space- nor case-sensitive.
Warning
If you use a code formatter like Black, beware that it may reject a deletion hint after the line which features the label to be deleted. In this case, protect the line with # fmt:off
and # fmt:on
.
Why would I do that?
On simple programs such as those found in programming education, the results of Paroxython are generally quite satisfactory. But the better the feature labelling, the better the recommendations. You will sometimes, hopefully not too often, feel the need to remove (-
) some false positives or correct (+
) some false negatives. There are, however, more interesting possibilities.
Adding metadata
Consider the following naive suboptimal solution to the general problem of deduplicating a given sequence:
1 def unique_elements(sequence):
2 result = []
3 for x in sequence:
4 if sequence.count(x) == 1:
5 result.append(x)
6 return result
What if we take the trouble to give Paroxython some hints about what we meant by βnaiveβ, βsuboptimalβ and βgeneralβ?
1 def unique_elements(sequence):
2 result = []
3 for x in sequence: # paroxython: complexity:O(n) complexity:O(n^2)...
4 if sequence.count(x) == 1: # paroxython: complexity:O(n)
5 result.append(x) # paroxython: ...complexity:O(n^2)
6 return result
7 # paroxython: meta/level/naive complexity:suboptimal
8 # paroxython: topic:general
Taxon | Lines |
---|---|
meta/complexity/O(n) |
3, 4 |
meta/complexity/O(n^2) |
3-5 |
meta/complexity/suboptimal |
1-8 |
meta/level/naive |
1-8 |
meta/topic/general |
1-8 |
Wow, this is handy. Let's break these comments down:
- We have pinpoint the lines featuring a linear operation:
for
andcount()
. The latter is not obvious for most students, and yes, that's the kind of thing you're going to have to repeat over and over again, unless you give up with high level languages like Python, and go back to C, assembler, Turing machines, Brainfuck, whatever. - Nesting these two operations results in a quadratic complexity, which we have hinted by opening and closing a tag on the lines involved (3-5).
- As stated above, this approach is generally known as naive. No translation for a label
level:(.+)
is provided in the default taxonomy, so you hint directly at the desired taxon. - Tagging the program as suboptimal, or as a counter-example, can be useful: when you are setting up an exam, you don't need to be recommended bad programs, do you? With this kind of metadata, you can easily exclude all of them.
- Tagging your programs by topic is an alternative to distributing them in different folders, and is arguably better: when a program belongs to several topics, you don't have to bother with symlinks; and you are not stuck when you change your mind about the categorization (when, suddenly, you decide to classify your programs by technique: like greedy, dynamic programming, divide and conquer, etc.). By the way, remember you can always redirect the recommendations to
stdout
, and get a simple list of files. Seeparoxython.cli_recommend
for an example of how to copy them in another directory.
Hunting ducks
Python famously thinks that:
If it looks like a duck, swims like a duck, and quacks like a duck, then it is a duck.
This is called duck typing, and the reason why unique_elements()
(above) would be happy with either a list, a tuple or a string: all of them sport the same methods, specifically count()
(explicit) and __iter__()
(implicit).
For us, this cute duck typing comes with a challenge. Although this may change in the future, for now Paroxython ignores your type hints. As already explained, it just relies on the names of the methods and attributes to guess the type of the built-in objects. For instance, the default taxonomy has these two lines:
type/sequence/list member_call_method:(append|extend|insert|reverse|sort)
call/subroutine/method/sequence/list/\1 member_call_method:(append|extend|insert|reverse|sort)
This is why the expression foo.append(bar)
will be tagged member_call_method:append
(label), and then call/subroutine/method/sequence/list/append
and type/sequence/list
(taxa). Here, this technique increases our chances of finding programs that feature lists, but it does not achieve the same accuracy for foo.count(bar)
: indeed, count()
is shared by all sequential types. In the taxonomy, this comes down to:
type/sequence member_call_method:(count|index)
call/subroutine/method/sequence_duck/\1 member_call_method:(count|index)
Did you spot the duck in the second row? It's there to tell you: βQuack! that's the best I can say, but in case this sequence can't actually be anything other than a list, just let me know.β
Well, ok. How to do this?
In fact, the taxonomy has two lines for that:
type/sequence/list member_call_method:list:(count|index|pop|...)
call/subroutine/method/sequence/list/\1 member_call_method:list:(count|index|pop|...)
This means that you can hint at every ambiguous list method (count()
, etc.) by deleting the original label and adding a new one with an inserted ":list"
:
a.count(b) # paroxython: -member_call_method:count +member_call_method:list:count
Now, the generated taxa will be: type/sequence/list
and call/subroutine/method/sequence/list/count
6. As for the duck, it is no more. It has ceased to be. It is expired and gone to meet its maker. It has run down the curtain and joined the choir invisible. This is an ex-duck.
Admittedly, all of this may be overkill, since a program featuring a list would most of the time feature it in several places, most of them without ambiguity. After all, generally speaking, you just want Paroxython to help you find programs in your repository, but don't need it to find features in a given program (you know better).
But if you feel like it, just fire up this 1 weird pipeline:
{
[
"operation": "include",
"data": [".+\bduck\b"],
]
}
… and happy duck hunting!
Taxonomy
Structure
In Paroxython, the default taxonomy is a forest: a handful of separate trees with flow
, operator
, meta
, type
, etc. as their roots. We call taxon7 a path from a root node to a leaf node. Thus:
type/number/integer/literal
… describes, you know… this kind of literal which is a kind of integer which is a kind of number which is a kind of type (as an example, think of the answer to the ultimate question of life, the universe, and everything).
In the visualization below, we have introduced the pseudo-root βπβ to gather all the trees into one:
This comes8 straight from the tagging of the public repository The Algorithms - Python. The size of a node is relative to the number of times the corresponding taxon's prefix appears in these programs.
Hover over the nodes to see these numbers, or click them to navigate the tree. By the way, you may click on type
to better follow the subsequent explanations.
Mapping the labels onto taxa
Paroxython does not directly grow these forest of taxa. Remember, a program is first tagged with labels (as specified in spec.md
). Those are then translated into taxa by a purely morphological operation (search and replace), without any more reference to the original source code.
1-1 mappings
Suppose a program has been tagged with the labels:
literal:True
(the constantTrue
).literal:False
(the constantFalse
).external_free_call:bool
(a call to the built-in functionbool()
).
The following conversion table (extracted from taxonomy.tsv
) will then give the taxa seen above in the subtree type/boolean
:
Taxa (replacement patterns) | Labels (search patterns) |
---|---|
type/boolean/literal/True |
literal:True |
type/boolean/literal/False |
literal:False |
type/boolean |
external_free_call:bool |
Note that, somewhat counterintuitively, the conversion goes from right to left. Each line consists of a taxon tab-separated from a label. It is interpreted as βwhen a program features this label, create that taxon with the same spanning linesβ.
All of these rows define 1-1 mappings: each one translates one label into one taxon.
N-1 mappings
The previous table extract represents the simplest case, where both taxon and label patterns are literal. Below, the second label pattern is a non-literal regular expression:
Taxa (replacement patterns) | Labels (search patterns) |
---|---|
type/sequence/list |
external_free_call:list |
type/sequence/list |
member_call_method:(append|extend|insert|reverse|sort) |
It defines a 5-1 mapping, which converts five possible labels (member_call_method:append
, member_call_method:extend
, and so on) into a single taxon (the metacharacter "|"
meaning βorβ). These two rows thus constitute a 6-1 mapping.
1-N mapping
When a source code is, say, 42 lines long, it is tagged with the label "whole_span:42"
. This is an example of a single label which produces two taxa: "meta/program"
and "meta/count/program/sloc/42"
. Both span the whole program and, although it is not obvious, both have their uses:
- The first one is common to all programs, and provides an invariable access key to an all-encompassing span. In a command pipeline, it can be used to express the absence of a taxon.
- The second one has a variable part, and can be used to filter programs by size (for example, in
paroxython.recommend_programs
, the pattern"meta/count/program/sloc/[1-4]?[0-9]"
will be used to filter out the programs that have 50 lines or more).
This conversions are triggered by the following rows in the default taxonomy:
Taxa (replacement patterns) | Labels (search patterns) |
---|---|
meta/program |
whole_span:.+ |
meta/count/program/sloc/\1 |
whole_span:(.+) |
On the first row, the sequence of metacharacters ".+"
means: βeat all you can up to the end of the line.β On the second row, the added parentheses also captures these characters: they are restituted in the replacement pattern by "\1"
, which denotes a backreference to the first captured group.
Another example could be:
Taxa (replacement patterns) | Labels (search patterns) |
---|---|
type/sequence/list |
member_call_method:(append|extend|insert|reverse|sort) |
call/subroutine/method/sequence/list/\1 |
member_call_method:(append|extend|insert|reverse|sort) |
… which means: when you encounter a label member_call_method:sort
(for instance), that means the program features a list and a call to the sort()
method.
1-0 mapping
At first glance, some of the labels generated during the first tagging step seem redundant, e.g. "binary_operator:Add"
and "addition_operator"
. In fact, the former was used to calculate the latter. In spec.md
, the definition of "addition_operator"
is introduced by:
An addition operator is a binary operator
Add
which has not be classified as a concatenation operator.
This is a good example of a label for internal use only. In the conversion step, it will be simply ignored (i.e., it has no entry in the taxonomy).
Back to the roots
Functional inspirations: var
, def
, call
and type
What are the basic concepts on which computing can be built? As you know, Alonso Church answered this very question as early as 1936 (years before the computers were even invented!). The three building blocks of his lambda calculus, namely the concepts of variable, abstraction and application, provide us our first three taxonomic roots, which we will denote respectively by var
, def
and call
. To these we can naturally add that of type
, already shown above.
All of these could be enough to describe pure functional programs, at least theoretically. However:
- Python is no Haskell. It is a multi paradigm language, with a strong emphasis on imperative programming. And its zen famously holds that βpracticality beats purityβ.
- Here, we are not interested in teaching generalized computability. Pedagogical considerations take precedence over a crippling respect for mathematical abstractions. For instance, we don't need to explain thoroughly the concepts of literal, string, type, built-in function and call for making our young students feel they understand:
print("hello, world")
For these reasons, choosing to root our taxonomy in the four basic notions of the typed lambda calculus should be seen more as a tribute than a formal commitment. More specifically:
var
will essentially bring together everything relating to the concept of assignment and scoping;def
will accommodate not only lambda functions, but named ones, methods, generators and even classes. Note that this does not include variable definitions.call
will cover any call to anything with a__call__()
method, which Python calls a callable (sorry);- finally,
type
will welcome all types, no purity required.
Imperative needs: flow
The imperative nature of Python demands that we introduce the concept of control flow
, under which we put the loops, the conditionals, and some other animals10.
For your convenience: operator
, condition
, subscript
Again, those are mainly practical choices. After all, an operator
is nothing more than an unassuming function call with a funny name and a funny syntax. And a condition
11, a combination of function calls which happens to evaluate to a particular type
. As for the creation of the root subscript
, it comes from the observation that an lot of exciting programs can be written with sequences before venturing out to direct access (that is, as long as we treat Python lists as lists and not arrays). Not that in our default taxonomy, subscript
includes slicing and dictionary access too.
Importations
Previously classified under def
, since version 0.6.0, the import
statement has now its own root in the taxonomy.
Zooming out: pattern
Now this is arguably the most interesting feature to tag in a beginner-level program. Under pattern
, you will find numerous variants of the invaluable accumulation pattern (counting, summing, filtering, finding the βbestβ element, etc.), but also some early-exit patterns (testing for an universal or existential property, finding the first βgoodβ element, etc.), whether by traversing a sequence or evolving a state. A (currently) small number of common expression patterns are also available. However, the common assignment idioms (such as incrementation, swap, slide, conditional
assignment) are classified under var/assignment/explicit
.
Although the relative size of pattern
is the smallest of the taxonomy, note that it is almost that of meta/program
(which has one occurrence per program): in other words, almost all programs feature such a pattern (which spans several lines).
The loop patterns constitute an aspect of programming which is not always taught in a conscious and systematic way, and to which Paroxython intends to draw your attention.
Warning
For priority reasons, Paroxython searches for these loop patterns in βstatementβ loops only, not in βcomprehensionβ loops. This may change in a future version.
Tagging with style
Paradigms
Python allows you to mix freely multiple paradigms. However, for Paroxython, every program will fall into exactly one of the following four categories, always associated to the whole span:
style/object_oriented
if it features at least one class definition;style/functional
if it is not object oriented, and features at least one function returning something, but no loop and no assignment;style/procedural
if it is neither object oriented or functional, and features at least one subroutine definition;style/imperative
if it features no subroutine or class definition.
Additionally, style/imperative/flat
is the no-indent version of the latter (no loop, conditional, etc.).
Functional traits are also tagged. The corresponding taxa start with: style/functional_trait/
, and are suffixed by lambda
, higher_order
, pure_function
, map
, filter
, reduce
, etc. Unlike style/functional
, which spans the entire program, these taxa are located on certain lines. They can appear in a program of any (all-encompassing) style.
Independently of the 4-partition, style/one_liner
denotes any program whose exactly one SLOC is neither a definition header, an importation or an assertion. For instance, the following program is considered a one-liner:
from sum_proper_divisors_1 import sum_proper_divisors
def abundant_numbers_below(bound):
# An abundant number is a number for which the sum of its proper divisors
# is greater than the number itself (Wikipedia).
return [n for n in range(1, bound) if sum_proper_divisors(n) > n]
Flawed styles
Although Paroxython is no linter, it may tag some unpythonic (i.e., non-idiomatic) or naive patterns, frequent in the beginners' code. For instance:
aux = a
a = b
b = aux
… is tagged style/unpythonic/swap
, and:
if condition:
return False
else:
return True
is tagged style/naive/return_condition
. Search _unpythonic
and _naive
in spec.md
for the current list (which could of course be extended ad libitum).
Going meta
Paroxython will store inside the meta
tree some program metadata, such as the number of lines. Some children, such as topic
, technique
, complexity
, are already provided, which you can fill in by adding a manual hint in the source code of the programs.
Modifying the taxonomy
The definition of a taxonomy is, at least partly, a matter of pedagogical choices. You are encouraged to duplicate the default taxonomy, and replace or delete those taxa that do not fit your course purpose or logic.
Copy the default taxonomy.tsv
at the same level as the src
folder that contains your Python programs:
programming_101/
βββ taxonomy.tsv
βββ src/
β βββ hello_world.py
β βββ gob's_program.py
| βββ ...
This way, omitting the option --taxonomy
like in the command below will use your copy instead of the original.
paroxython collect programming_101/src
You can now freely experiment on this copy.
Pipeline tutorial
How to get recommendations? This tutorial walks you through a series of little βteacher storiesβ to introduce the pipeline of commands used by Paroxython's recommendation system.
You βexecuteβ this pipeline on command line with:
paroxython recommend -p 'path/to/your/pipe.py' 'path/to/your/programs/db.json'
Covering your base(s)
The simplest pipeline has no command: it just lists the programs of your database, without any filter. This could produce a rather long document, since each source code is printed, along with a list of each and every taxon it implements. Here is the content of your pipe.py
:
[]
Tip
The point is that the resulting list is sorted by increasing learning cost. This should give you a rough idea of the pedagogical angle of attack to adopt in your first classes.
Imparting knowledge
Tracking the progress of your course
Suppose that, in your first session, you have introduced your students to the hello_world.py
, wheat_and_chessboard.py
and euler_005_smallest_multiple.py
programs. Not only Paroxython should no longer recommend these programs, but the cost of learning the associated concepts should be considered as zero when they are encountered again in the future.
[
{
"operation": "impart",
"data": [
"hello_world.py",
"wheat_and_chessboard.py",
"euler_005_smallest_multiple.py",
],
},
]
Coming after another course
Suppose instead that you intervene after an introductory course given by a colleague. She gives you a folder (named "CS_101"
), which contains the programs she has studied with her class. Since you are secretly in love, you assume, somewhat foolishly, that the concepts they implement are mastered by your new students.
[
{
"operation": "impart",
"data": "find CS_101 -path '*.py'",
},
]
Tip
As you can see, rather than maintaining a list of programs or concepts in the "data"
field, you may provide a string. Paroxython will interpret it as a shell command, and expect it to print on stdout
the required list of items (programs or taxa), one per line.
Blacklisting programs
You want to filter out all programs reserved for an exam, or too complex, or not pedagogically interesting enough, or that could get you kicked out of your college, etc.
[
{
"operation": "exclude",
"data": [
"fizzbuzz.py",
"alpha_go.py",
"hello_world_of_pain.py",
"gob_s_program.py",
],
},
]
Concepts to be introduced later (or never)
For the time being, you don't want to be recommended any program requiring the concepts of recursivity (def/subroutine/recursive
), dictionary (type/non_sequence/dictionary
) or set (type/non_sequence/set
).
[
{
"operation": "exclude",
"data": [
"def/subroutine/recursive",
"type/non_sequence",
],
},
]
Note
Paroxython relies on the last three characters of a "data"
item to decide whether it is a
Python program (ending with ".py"
) or a taxon (like here).
Tip
Since the two latter taxa share a common prefix and Python doesn't provide other non-sequence
type, it is enough to exclude all taxa starting with type/non_sequence
.
Concepts to be introduced now
Your next class will be devoted to conditional constructs. Thus, you need to go through programs featuring taxa prefixed by flow/conditional
or even, why not, some conditional expressions (operator/ternary
). The suggested programs do not necessarily have to implement both concepts at the same time, but at least one. Any other program will be rejected.
[
{
"operation": "include",
"data": [
"flow/conditional",
"operator/ternary",
],
},
]
Preparing an assignment
Basically, you provide Paroxython with a list of programs already studied in class, and it suggests new programs that implement only old concepts. This requires the exclusion of all programs featuring at least one concept that has not been seen in any of the programs already seen.
It may sound a little complicated, because it is. But don't panic. Remember that Paroxython always orders its recommendations by increasing cost. Thus, all you have to do is ask it to list the programs that have not been introduced yet. The programs you want to choose among will appear at the top of the results, under the zero-learning cost section. Moreover, inside each section, they will be sorted by increasing size (SLOC), which can be a reasonable proxy for their difficulty.
But in the end, you, the teacher, are the judge. Paroxython is not going to set up a test for you, let alone grade the answers. It is just here to remind you of some possibilities you might not have thought of at the right time, and to give you some confidence that the exercises require no concept you have not pre-introduced in class, which later on might save you some awkward conversations with your dear students.
Since this is the last filter in our tutorial, let's summarize what we've seen by chaining several commands together:
- Impart the previous knowledge by extracting all the programs listed in the
"timeline.txt"
file that you update after each session. You can adapt the scriptparse_syllabus.py
to your own needs. If you wish, you can use thebase_path
variable, which represents the parent of the directory containing the programs of your personal database. - Exclude some irrelevant programs.
- Include the concepts that you want to test during your exam.
[
{
"operation": "impart",
"data": "python helpers/parse_syllabus.py {base_path}/timeline.txt",
},
{
"operation": "exclude",
"data": [
"foo.py",
"bar.py",
# "buzz.py",
],
},
{
"operation": "include",
"data": [
"pattern/elements/accumulate",
"topic/game",
"type/sequence/list",
],
},
]
Tip
This command pipeline is not a JSON file, but a Python program, or more restrictively a Python expression. As such, it offers several amenities you surely know and love: comments, trailing commas, r-strings (no need to double-escape backslashes in regexes!), etc. For security purposes, this file is not imported, but read as a text and evaluated by ast.literal_eval()
.
This is the end of our pipeline tutorial. If you dare, read on for more advanced features (regular expressions, span algebra, semantic triples, negations).
Pipeline documentation
This part goes deeper into the pipeline by describing in details how the recommendation system works. It assumes you've already read the pipeline tutorial. First of all, let us recall the example of pipeline we gave at the end.
>>> [
... {
... "operation": "impart",
... "data": "python helpers/parse_syllabus.py {base_path}/timeline.txt",
... },
... {
... "operation": "exclude",
... "data": [
... "foo.py",
... "bar.py",
... # "buzz.py",
... ],
... },
... {
... "operation": "include",
... "data": [
... "pattern/elements/accumulate",
... "topic/game",
... "type/sequence/list",
... ],
... },
... ]
Regular expression patterns
The first thing to clarify is that the strings which constitute the data
lists are not simple names, but βdescriptionsβ of names. These descriptions, or patterns, are written in the language of regular expressions, and more precisely in the dialect implemented by the third-party module regex12.
Tip
If you don't know how to write regular expressions, the good news is that you don't have to worry. Unless they contain certain special characters, they are self-descriptive. That means, for instance, that the string "type/sequence/list"
happens to be a regular expression which precisely describes (or βmatchesβ) the string "type/sequence/list"
. This is the case for all strings that are only made up of so-called literal characters.
As a matter of fact, all the characters that can appear in a taxon are literal. We advise you to avoid non-literal characters in your program paths as well. What are they? Essentially: .^$*+?{}\[]
. Depending on your operating system, some are already illegal in filenames anyway. To tell the truth, the first one (the dot) is an integral part of most filenames, as an extension separator. However, it turns out that it basically means βany characterβ, which of course includes the dot itself. So, the regular expression "hello_world.py"
matches the filename "hello_world.py"
, and the fact that it matches "hello_world/py"
and "hello_worldapy"
shouldn't be a real problem. In any case, you can always prefix (or escape) a non-literal character with a backslash "\"
to make it literal (here, "hello_world\.py"
): that's all you'll need to know about regular expressions.
The second thing to clarify is that a given pattern will match any string starting with it. Thus, "type/sequence/list"
will match not only "type/sequence/list"
, but "type/sequence/list/empty"
and "type/sequence/list/supercalifragilisticexpialidocious"
too. The same applies to program paths. For example, in our personal database, all programs from Project Euler start with "euler_"
. The pattern "euler_"
will then match all of them.
In terms of regular expressions, it means that, under the hood, Paroxython invokes match()
instead of search()
. In the rare cases where you want to restrict the whole matching to the pattern itself, simply suffix it with a dollar sign, i.e., "type/sequence/list$"
.
Execution of a pipeline
A pipeline is a set (or list, but the order is not important) of commands which apply sequentially to a database of program tags. It returns:
- a list of recommended programs;
- an estimation of the learning cost of each concept (i.e., taxon) they implement.
The set of commands constitutes a pipeline that evolves both:
- a selection of programs (initially all those in the tag database);
- a body of imparted knowledge (initially empty).
Note
During this process, the program selection can never increase, and the imparted knowledge can never decrease.
A command consists of:
- an operation among
"include"
,"exclude"
,"impart"
and"hide"
, optionally suffixed by the quantifier (" all"
); - the data to which it applies. This data is (or can be interpreted to produce) a list of criteria, each criterion being:
- either a program pattern,
- or a taxon pattern,
- or a relationship between two taxon patterns (more on this in the Semantic triples section).
Combining the operation with a criterion is done in four stages:
- An auxiliary set of programs is calculated.
- The set of recommended programs (the selection) is updated.
- A new set of taxa may be βlearnedβ, i.e., added to the body of imparted knowledge.
- The set of taxa and programs to hide in the final report may be updated.
Note that each stage depends on both the operation and the nature of the criterion to which it applies. Let us detail every possible case:
"include"
command
Its purpose is to keep only those selected programs that satisfy at least one given criterion.
- The auxiliary set contains any program such that:
- either its name matches the given pattern,
- or its source code:
- either features at least one taxon matching the given pattern,
- or satisfies the given relationship.
- The selection is then restricted to its intersection with the auxiliary set.
- The imparted knowledge is unchanged.
"include all"
variant
Its purpose is to keep only those selected programs that satisfy all given criteria.
{
"operation": "include all",
"data": [
criterion_1,
criterion_2,
...,
criterion_n
]
}
… is equivalent to:
{"operation": "include", "data": [criterion_1]},
{"operation": "include", "data": [criterion_2]},
...,
{"operation": "include", "data": [criterion_n]}
"exclude"
command
Its purpose is to filter out of the selection the programs which satisfy at least one given criterion.
- The same auxiliary set of programs as above (
"include"
) is calculated, but further extended to the programs which import them (either directly or indirectly). Indeed, when the teacher wants to exclude a certain program, any program that requires it should be excluded as well. - The selection is then restricted to its difference with the auxiliary set.
- The imparted knowledge is unchanged.
"exclude all"
variant
Unlike "include all"
, this variant actually expands the semantics of the system.
{
"operation": "exclude all",
"data": [
criterion_1,
criterion_2,
...,
criterion_n
]
}
… excludes the programs satisfying all the criteria. This cannot be transformed into a sequence of "exclude"
commands, which would exclude the programs satisfying any of the criteria.
"impart"
command
Its main purpose is to track the acquisition of new knowledge. However, if this knowledge is provided by a given set of programs, there is no more need to keep them in the selection of recommended programs.
- The auxiliary set contains all programs whose name matches at least one of the given patterns (relationships are not supported), plus those which import (either directly or indirectly) at least one of them. In other words, it is the same as above (
"exclude"
), but only in the case where the criterion is a program pattern. In all other cases, this auxiliary set will be empty. - The selection is then restricted to its difference with the auxiliary set (same as above for
"exclude"
). - There are three cases:
- If the criterion is a program pattern, all the taxa featured in a program whose name matches this pattern are added to the body of imparted knowledge.
- If the criterion is a taxon pattern, all the taxa matching this pattern are added to the body of imparted knowledge.
- Otherwise, the body of imparted knowledge is unchanged.
The "impart all"
variant is not supported, and treated as "impart"
.
"hide"
command
This is the simplest command of all. It merely accumulates the programs or the taxa matching the given patterns (relationships are not supported), without other effect on the filter. In the final stage, the accumulated programs or taxa will be filtered out of report.
Notes.
- Excluding a program is equivalent to hiding it plus every program importing it.
- The learning costs are not affected whatsoever by the hiding of a taxon: its individual cost continues to contribute towards the total cost of a program.
The "hide all"
variant is not supported, and treated as "hide"
.
Semantic triples
Span algebra
Being given a program p of n lines, a couple (i_1, i_2) of line numbers such that 1 \leq i_1 \leq i_2 \leq n represents a span of p. When Paroxython finds a taxon t in p, it doesn't just say whether p features t or not: it returns the list of spans where t occurs.
It is sometimes interesting to know how the spans of two taxa of the same program are located relatively to each other.
- For instance, a
print()
statement whose span is inside the span of a function indicates that this function has a side-effect. - A less trivial application stems from the fact that our taxonomy is multi-dimensional: each algorithmic feature can be associated with several taxa at the same time, characterizing it it along several dimensions. For instance, the taxa
flow/loop/for
andflow/loop/while
describe the category of a loop; whereas the taxaflow/loop/exit/early
andflow/loop/exit/late
describe its exit behavior. These dimensions are independent and, although it is entirely possible to do so, our default taxonomy does not list the results of the cross-product (i.e.,flow/loop/for/exit/early
,flow/loop/for/exit/late
,flow/loop/while/exit/early
andflow/loop/while/exit/late
). Querying an hypothetical taxonflow/loop/for/exit/early
(orflow/loop/exit/early/for
, by the way) comes down to querying the taxaflow/loop/for
spanning the same lines asflow/loop/exit/early
.
To express a relation \mathfrak{R} between the spans of two taxa, we start from the algebra devised in 1983 by James F. Allen in his seminal paper (PDF) about temporal intervals13. The so-called Allen's interval algebra defines 13 basic operators which capture all possible relations between two intervals X=(x_1, x_2) and Y=(y_1, y_2). The table below uses his terminology:
Examples | \mathfrak{R} on endpoints | X \mathbin{\mathfrak{R}} Y | Y \mathbin{\mathfrak{R}} X | Keys |
---|---|---|---|---|
....XXXXX........ |
||||
....YYYYY........ |
x_1 = y_1 \leq x_2 = y_2 | X equals Y
|
Y equals X
|
x=yβ€x=y |
....YYYYYYYYY.... |
x_1 = y_1 \leq x_2 \leq y_2 | X starts Y
|
Y started by X
|
x=yβ€xβ€y |
..YYYYYYYYY...... |
y_1 \leq x_1 \leq x_2 \leq y_2 | X during Y
|
Y contains X
|
yβ€xβ€xβ€y |
.YYYYYYYY........ |
y_1 \leq x_1 \leq x_2 = y_2 | X finishes Y
|
Y finished by X
|
yβ€xβ€x=y |
....|...|..YYYYY. |
x_1 \leq x_2 \leq y_1 \leq y_2 | X before Y
|
Y after X
|
xβ€xβ€yβ€y |
....|...|YYYYY... |
x_1 \leq x_2 = y_1 \leq y_2 | X meets Y
|
Y met by X
|
xβ€x=yβ€y |
....|..YYYYY..... |
x_1 \leq y_1 \leq x_2 \leq y_2 | X overlaps Y
|
Y overlapped by X |
xβ€yβ€xβ€y |
These well-known operators apply quite nicely to our spans, with the following adjustments:
Synonyms. The 13 operator names, whose some don't make much sense in a non-temporal context, are extended
with a number of synonyms, such as "inside"
for "during"
, and "equal"
or "is"
for
"equals"
. The current list can be found
here.
Non-strict inequalities. In the above table, contrary to the convention on time intervals, we consider the inequalities to be large. In a program, most features span exactly one (logical) line (i.e., x_1=x_2): this is simply not possible with the original relations on endpoints. Besides, when we look for a function which contains a loop, we are generally not interested in whether the loop finishes on the last line. Blind adherence to strictness convention would force us to express our query as a disjunction of these two relations for the following program to be selected:
def first_missing_non_negative(l):
elements = set(l)
for natural in range(len(elements) + 1):
if natural not in elements:
return natural
Strict inequalities. In addition to the existing relations, all combinations of strict and non-strict conditions are generated, for instance x_1 < x_2 < y_1 \leq y_2. That adds up to 162 in total14.
Keys. As we use many more operators than Allen, giving an English name to every single one is not an option. Paroxython actually identifies them by a unique key (see the last column of the table above) calculated like this:
- Start from the chain of equalities and inequalities that defines the relations between the two spans' endpoints: y_1 < x_1 = y_2 \leq x_2.
- Drop the indices (since, for a given x or y, indice 1 always comes before indice 2, no information is lost): y < x = y \leq x.
- Rewrite the formula using only the five characters
"="
,"<"
,"β€"
,"x"
and"y"
:y<x=yβ€x
.
Key normalization.
Paroxython tries (cf. normalize_predicate()
) to be tolerant with the user input. The verb "is"
is ignored, except when it constitutes the whole string (thus, "is after"
comes down to "after"
). When an operator is not an Allen's name or a predefined synonym, the following normalization process is applied:
- Replace any
<=
byβ€
. - Replace any
==
by=
. - Strip all characters distinct from
"="
,"<"
,"β€"
,"x"
and"y"
. - If the string is
"x=y"
or"y=x"
, replace it by"x=yβ€x=y"
. - If the string contains only one
x
(resp.y
), expand it intoxβ€x
(resp.yβ€y
).
This allows the user to be understood when entering formulas like
x == y
, x <= y
or y1 < x1 == x2 <= y2
.
Positive semantic triples
A pipeline command can apply not only to patterns of programs and taxa, but also to relationships between the spans of two taxon patterns. Being given an operator \mathfrak{R} and two taxon patterns t_1 and t_2, the statement t_1 \mathbin{\mathfrak{R}} t_2 will be codified in the form of the subjectβpredicateβobject expression (t_1, \mathfrak{R}, t_2), which can be entered as a mere Python tuple (t1, R, t2)
.
The simplest semantic triple is:
("meta/program", "contains", taxon_pattern)
It matches all programs where the span of the taxon "meta/program"
includes the span of a given taxon_pattern
. Since, by definition, every program features one taxon "meta/program"
, whose span encompasses the whole program, this is strictly equivalent to:
taxon_pattern
And now for something completely different: the following pipeline command will only keep programs featuring a conditional inside a loop or (inclusive) ended by a return
statement (the function first_missing_non_negative()
given above satisfies these two criteria).
{
"operation": "include",
"data": [
("flow/loop", "contains", "flow/conditional"),
("flow/conditional", "finished by", "def/subroutine/return"),
]
},
It is currently not possible to chain several operators with shared operands, for example to keep only the programs that feature a conditional inside a loop and ended by a return
statement: the quintuple ("flow/loop", "contains", "flow/conditional", "finished by", "def/subroutine/return")
would raise an error. As it stands, the best we can do is to chain the commands themselves:
{
"operation": "include",
"data": [
("flow/loop", "contains", "flow/conditional"),
]
},
{
"operation": "include",
"data": [
("flow/conditional", "finished by", "def/subroutine/return"),
]
},
{
"operation": "include",
"data": [
("def/subroutine/return", "contains", "flow/loop"),
]
},
This keeps only the programs that feature a conditional inside a loop and a conditional ended by a return
statement and a return
statement inside a loop. The result is a subset of the previous one, but may still include programs where the two loops, conditionals or return
statements are distinct.
Negative semantic triples
It is possible to negate the predicate of a semantic triple, either by prefixing it by an exclamation mark (e.g., "!xβ€yβ€yβ€x"
, or "!contains"
), or by adding the word not (e.g, "not xβ€yβ€yβ€x"
, "not contains"
,
or "contains not"
), whether it's grammatically correct or not.
Expressing the absence of a taxon
Here again, the simplest negated semantic triple is:
("meta/program", "not contains", taxon_pattern)
On a set of independent programs (i.e., not importing each other), there is a strict equivalence between:
{
"operation": "include",
"data": [
("meta/program", "not contains", taxon_pattern)
]
},
and:
{
"operation": "exclude",
"data": [
taxon_pattern
]
},
And, conversely, between:
{
"operation": "include",
"data": [
taxon_pattern
]
},
and:
{
"operation": "exclude",
"data": [
("meta/program", "not contains", taxon_pattern)
]
},
This property may be used to select the programs that include any subset of a given set of taxa. This can be done in a systematic way by converting15 the corresponding logical formula in conjunctive normal form, and transforming each clause into a separate "include"
command.
For instance, consider the two taxa "flow/conditional"
and "flow/loop"
. To keep only the programs that feature either a conditional (denoted by the literal a) or a loop (denoted by the literal b), but not both of them, express first the formula in CNF: (a \lor b) \land (\neg a \lor \neg b), then translate it into:
{
"operation": "include",
"data": [
"flow/conditional",
"flow/loop",
]
}, # keep only the programs that feature either a conditional or a loop
{
"operation": "include",
"data": [
("meta/program", "not contains", "flow/conditional"),
("meta/program", "not contains", "flow/loop"),
]
}, # among them, keep only those that feature no conditional or no loop
For two taxa, there are a total of 16 combinations, listed and tested here.
However, on a set on interdependent programs, "exclude"
is not the opposite of "include"
anymore: as already explained, excluding a program excludes also the programs which import it. Consequently, including the programs that do not feature a given taxon will filter out those which feature it directly; but excluding the programs that do feature this taxon will filter out those which feature it either directly or indirectly (by importation).
General semantics of the negation
The semantic triple:
(taxon_pattern_1, not predicate, taxon_pattern_2)
… describes the set of programs which feature at least one taxon matching taxon_pattern_1
, and such that, for any
span s_1
of such a taxon, there exists no span s_2
of a taxon matching taxon_pattern_2
for which predicate(s_1, s_2)
is verified.
Warning
Note that the result includes also all programs featuring at least one taxon matching taxon_pattern_1
, but no taxon matching taxon_pattern_2
. For instance, suppose the original (negative) triple is:
("def", "not contains", "var/assignment/explicit")
The function will return the (disjoint) union of these two sets:
- The programs featuring at least one subroutine, but no assignment at all.
- The programs featuring at least one subroutine, at least one assignment, but no assignment inside a subroutine.
For example, if the original (negative) predicate is "taxon_pattern_1 not inside taxon_pattern_2"
, any program consisting in:
"taxon_2{taxon_1}"
16 is rejected;"taxon_1"
is accepted (although there is notaxon_2
);"taxon_2"
is rejected (notaxon_1
);"taxon_1{taxon_2}"
is accepted;"taxon_1 taxon_2{taxon_1}"
is accepted (there exists a couple (s_1
,s_2
) such thattaxon_1
is not insidetaxon_2
);"taxon_1 taxon_2{taxon_1} taxon_2{}"
is accepted.
If the predicate is "taxon_pattern_2 not contains taxon_pattern_1"
(sic), any program consisting in:
"taxon_2{taxon_1}"
is rejected;"taxon_1"
is rejected (notaxon_2
);"taxon_2"
is accepted (although there is notaxon_1
);"taxon_1{taxon_2}"
is accepted;"taxon_1 taxon_2{taxon_1}"
is accepted (there exists a couple (s_1
,s_2
) such thattaxon_2
does not containtaxon_1
);"taxon_1 taxon_2{taxon_1} taxon_2{}"
is accepted.
Note that, due to the rule explained in the warning above, "taxon_pattern_1 not inside taxon_pattern_2"
is not equivalent to "taxon_pattern_2 not contains taxon_pattern_1"
(differences in bold). However, if taxon_pattern_2
is "meta/program"
, which is featured by all programs, these two forms become equivalent again.
A practical example
Suppose you want your course on the assignment statements to include the concept of parallel assignment, for example to introduce swapping without an auxiliary variable:
(a, b) = (b, a)
In the taxonomy, this concept is listed as a special case of both tuple ("type/sequence/tuple"
) and variable assignment ("var/assignment/explicit/parallel"
).
However, you want to introduce officially the abstract data type βtupleβ only much later in your course. You therefore wish to exclude from the recommendations any program implementing a tuple, unless it is part of a parallel assignment. The following command will make the trick:
{
"operation": "exclude",
"data": [
("type/sequence/tuple", "is not", "var/assignment/explicit/parallel")
]
},
The triple describes the set of programs which feature at least one tuple, and such that there is no tuple's span coinciding with a parallel assignment's span. In other words, the set of programs which feature at least one βnon parallelβ tuple. When we exclude these programs, those which remain either feature no tuple, or only the ones involved in a parallel assignment.
The other combinations are listed and tested here.
Glossary
Inside a definition, the terms which have their own entry in the glossary are italicized.
- AST (flat β)
- The dump of an Abstract Syntax Tree as a sequence of text lines, each one consisting of the whole path from the tree root to the current node, with some tweaks.
- Command
- A pipeline operation, along with its data.
- Criterion (plur. criteria)
- Either a taxon pattern, a program pattern or a semantic triple.
- Data
- In a restricted sense, the second member of a command.
-
- If it is a string, it is interpreted as a shell command printing a list of program or taxon patterns.
-
- If it is a list, each element is interpreted as a criterion.
- Hint (manual β)
- A comment that you may add in a source, either to fix a mislabelling problem, or to specify some metadata. It starts with
"# paroxython:
and can schedule either a label or a taxon for addition (by default) or deletion. Example:print(a + b) # paroxython: -addition_operator +concatenation_operator
- Label
- Name of a feature, as defined in
spec.md
. Examples: -
- simple:
swap
,if_else_branch
,verbose_conditional_assignment
;
- simple:
-
- with one variable suffix:
loop_with_return:for
,import_module:foo
,member_call_method:append
;
- with one variable suffix:
-
- with several variable suffixes:
member_call:bar:append
,for_range:start:stop:-1
.
- with several variable suffixes:
-
Comparison with taxa: the taxa aim to represent concepts; labels, not so much. For instance, labelling
a = b
byassignment:42
captures, in addition to the concept (assignment), an arbitrary piece of information (a name) which may or may not be internally used later to label more complex features. Moreover, taxa are structured into a taxonomy; labels are free form. - Operation
- A string among
"include"
,"exclude"
,"impart"
and"hide"
, with an optional suffix"any"
(implicit) or"all"
. - Path
- A continuous sequence of distinct edges from the root to a given node of a tree. Can be used for the taxonomy or for the flat AST.
- Pattern
- A regular expression describing the name of a program, a label or a taxon.
- Pipeline
- A list of filtering commands, expressed as a Python value and evaluated by
ast.literal_eval()
. - Predicate
- Used in a restricted sense to denote the middle term of a semantic triple. It is a function taking two poor spans
x
andy
and telling whether they satisfy a certain relationship (for instance, βx
aftery
β, or βy
not insidex
β). - Source
- The text of a Python program of your collection.
- Span
- The location of a given feature, as the couple of its first and last lines. By extension, a third member, the path, identifies the beginning of the feature in the AST.
- Span (poor β)
- A span deprived from its last member.
- Tag
- Generic term, covering both labels and taxa.
- Taxon (plur. taxa)
- Name of a feature, resulting from the conversion of a given label. Examples:
-
- the label
swap
is converted to the taxonvar/assignment/explicit/parallel/swap
;
- the label
-
- the label
member_call_method:append
is converted to the taxacall/subroutine/method/sequence/list/append
andtype/sequence/list
.
- the label
-
A taxon is a path from a root to a node in the taxonomy. It represents a learning concept that should be of interest to the teacher.
- Taxonomy
- A forest structuring the learning concepts of the domain. The idea is that the concept represented by a node must be introduced after its parent concept (if any).
- Triple (semantic β)
- Consists of a taxon pattern (the subject), a predicate and an another taxon pattern (the object). Example:
("flow/conditional", "finished by", "def/subroutine/return")
.
-
Note that, in this case, it is actually possible to write a single-point-of-exit version which would be correclty tagged as tail recursive:
def gcd(a, b): return (gcd(b, a % b) if b else a)
-
There are numerous Pascal dialects, and some of them support a C-like
return
equivalent (procedureexit
) since at least 1992. This extension is not part of the Pascal standard, though. ↩ -
Roberts, Eric. (1995). Loop exits and structured programming: reopening the debate. Proceedings of the 26th SIGCSE Technical Symposium on Computer Science Education, 1995, Nashville, Tennessee, USA, March 2-4, 1995: 268-272. doi:10.1145/199688.199815. ↩↩
-
A modern language like Swift definitely clarifies this intention by making
guard
a keyword (since 2.0). ↩ -
For the similar cases where no translation is provided in the default taxonomy, you can either add it yourself, or directly hint at the desired taxa, i.e., after
# paroxython:
, replace+member_call_method:list:count
bytype/sequence/list
andcall/subroutine/method/sequence/list/count
π¦. ↩ -
This contrasts with the convention in biological classification, where the term taxon refers to a taxonomic unit (a node of the tree). For instance, African Elephant is classified under the taxon Loxodonta. However, a taxon (implicitely) βencompasses all included taxa of lower rankβ, here:
- Kingdom: Animalia
- Phylum: Chordata
- Class: Mammalia
- Order: Proboscidea
- Family: Elephantidae
- Subfamily: Elephantinae
- Genus: Loxodonta
Since our nodes are not uniquely named (e.g., there are several nodes
literal
), we make all the lower rank nodes explicit by concatenating them (i.e.,type/number/integer/literal
instead of simplyliteral
, which would be ambiguous)π. ↩ -
More precisely, it is the result of executing
SELECT taxon, count(*) FROM taxon WHERE taxon not LIKE 'meta/count/%' GROUP BY taxon
on the SQLite tag database. ↩ -
On a side note: in the AST, an expression like
a < b < c
counts for one (chained) comparison. ↩ -
The sequence control flow is an exception. As it characterizes the imperative paradigm more than this or that program, it will deliberately be excluded from the features searched by Paroxython. ↩
-
A condition is a boolean expression, not to be confused with a conditional (control structure). ↩
-
This module appears to be here to stay, since it is proeminently featured in the documentation of the more mundane standard library re. Thanks and kudos to the author, Matthew Barnett: without his awesome work, many parts of our code would have been much tougher to write, and much slower to execute. ↩
-
Allen, James F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM. 26 (11): 832β843. doi:10.1145/182.358434 ↩
-
162 = 6 \times 27, where 6 is the number of distinct permutations of the quadruple (x, x, y, y), and 27=3^3 the number of elements of the triple auto-product of 3 symbols (
=
,β€
and<
). ↩ -
Automate the conversion with the function
to_cnf()
of SymPy. ↩ -
In these examples, the braces denote the fact that the span of a certain taxon is included in that of another taxon. ↩