Patterns

Some definitions

range
is an Excel range, delimited with a top, a left, a right and a bottom. A sheet is an example of a range.
line
is a row or a column. This is decided by the chosen layout: horizontal layouts will yield rows, vertical will yield columns.
pattern
is an object that matches the given range or line(s). If the match fails, the method raises a DoesntMatchException. If it succeeds, it fills up the context given as a parameter.

Note that patterns can be passed as arguments to the upper level pattern as object or classes. Classes will be instatianted.

There are 3 types of patterns:

Workbook

This pattern will be called to match a workbook:

class sheetparser.patterns.Workbook(names_dct=None, re_dct=None, *args, **options)

A top level pattern to match a workbook. Call match_workbook on an opened workbook document (as provided by a backend)

Parameters:
  • names_dct (map) – a dictionary that associates a sheet name to the sheet pattern
  • re_dct (map) – a dictionary or a tuple of pairs that associate a regular expression to the sheet pattern
match_workbook(workbook, context)

Iterates through the sheets in the workbook. If names_dct contains the sheet name, it will try and match the associated pattern. If not, the method will try in re_dct if any of the regular expressions matches the names. Finally, if any other pattern is provided, they will be tried in sequence.

The context will contain the matching sheet in the same order as in the workbook,

Ranges

The following patterns match either the whole sheet or a range:

class sheetparser.patterns.Sheet(name, layout, *patterns)
class sheetparser.patterns.Range(name, layout, *patterns, top=None, left=None, bottom=None, right=None)

A range of cells delimited by top, left, bottom, right. RangePatterns are to be used directly under Workbook.

Layout is Rows or Columns, and will be used to know if the range should be read horizontally or vertically.

Iterators of lines

These patterns are called on an iterator of lines, and will be passed as parameters to Range patterns or other patterns matching iterators of lines.

These patterns can be combined with the operator +, which returns a Sequence. a+b is equivalent to Sequence(a,b). Similarly, a|b is equivalent to OrPattern(a,b).

The name of the pattern is used by the ResultContext to store the matched element. The existing patterns that operate on an line iterator are:

class sheetparser.patterns.Empty(name)

Matches an empty line. Doesn’t match if there is no more lines in the line_iterator

class sheetparser.patterns.Sequence(name='sequence', *patterns)

matches the sub patterns in sequence. Will match all or nothing. Name is an optional parameter. If omitted, the name will be ‘sequence’.

class sheetparser.patterns.Many(name='many', pattern)

Matches the subpattern several times. The number of times is limited by the parameters max and min. Name defaults to ‘many’

class sheetparser.patterns.Maybe(name=None, pattern)

Matches the subpattern or nothing. Equivalent to ? in regexes

class sheetparser.patterns.OrPattern(pattern1, pattern2)

matches the first pattern and if it fails tries the seconds.

Parameters:
  • pattern1 (Pattern) – first pattern to try
  • pattern2 (Pattern) – fall back patter
class sheetparser.patterns.FlexibleRange(name='flexible', layout, *patterns, stop=None, min=None, max=None)

Finds a range by itering through the lines until the stop test returns true. That range is then used as a new range with the given layout and patterns.

Parameters:
  • name (str) – pattern name
  • layout (Layout) – layout used to iter the result range
  • patterns (Pattern) – patterns to be used with the new layout
  • stop (function(line_count,line)) – stop test, by default empty line
  • min (int) – minimum length of the range
  • max (int) – maximum length of the range (None for unbound)
class sheetparser.patterns.Table(name='table', table_args=DEFAULT_TRANSFORMS, stop=None)

A range of cells read from a line iterator. The table transforms are read in sequence at 2 times: when new lines are appended and when the table is complete.

Parameters:
  • name (str) – optional name of the table, “table” by default.
  • table_args (list) – the arguments that are sent to the ResultContext that will store the result. For ResultTable, the default, that will be the list of transforms.
  • stop (function) – that function is called on the following line. The table end is reached when that function returns True. It takes 2 parameters: the number of lines read so far and the line itself. By default, will stop on empty lines
class sheetparser.patterns.Line(name='line', line_args=None)

Matches a line: there must be one more row/column in the line_iterator and it must be non empty.

Parameters:line_args (list) – list of transforms to the result (strip, raise if empty...)

Stop tests

Stop tests are functions that are passed to FlexibleRange and Table to detect the end of a block. You can create your own and pass it as a parameter to the pattern.

sheetparser.patterns.empty_line(cells, line_count)

returns true if all cells are empty

sheetparser.patterns.no_horizontal(cells, line_count)

return True is no cell has horizontal border

sheetparser.patterns.no_vertical(cells, line_count)

check that there is no vertical line in the cells