Line and table transformations

The contents that is matched by the Line and Table patterns is stored in the context result. Another level of processing is provided by list of transformations.

Line transformations

They are passed as line_args parameters to the Line pattern. It is a list of function that take a list and return a list. These functions are called in sequence, the result of one function is passed to the following one.

The first function of the list must accept a list of Cell. The function get_value transforms it to the list of values.

These are the included line transformations:

sheetparser.results.non_empty(line)

A transformer that matches only non empty lines. Other will raise a DoesntMatchException

Parameterized functions (objects with a method __call__):

class sheetparser.results.StripLine(left=True, right=True)
class sheetparser.results.Match(regex, position=None, combine=None)

A transformer that matches lines that contain the given regex. Use combine to decide if all or any item should match

Parameters:
  • regex (regex) – a regular expression
  • position (list) – a list of positions or a slice
  • combine (function) – function that decides if the whole line matches

Table transformations

Similarly, the lines matched by the Table pattern are passed to a series of processings. They are subclasses of TableTransform which implement wrap or process_line (or both). process_line is called when a new line is added, and wrap is called at the end when all lines have been added.

class sheetparser.results.GetValue

Transforms a list of cells into a list of strings. All built in processors expect GetValue to be included as the first transformation.

class sheetparser.results.FillData

Adds the line to the table data

class sheetparser.results.HeaderTableTransform(top_header=1, left_column=1)

Extract the first lines and first columns as the top and left headers

Parameters:
  • top_header (int) – number of lines, 1 by default
  • left_column (int) – number of columns, 1 by default
sheetparser.results.RepeatExisting

alias of <lambda>

class sheetparser.results.RemoveEmptyLines(line_type=u'rows')

Remove empyt lines or empty columns in the table. Note: could be really simplified with numpy

class sheetparser.results.ToMap

Transforms the data from a list of lists to a map. The keys are the combination of terms in the headers (top and left) and the values are the table data

class sheetparser.results.MergeHeader(join_top=(), join_left=(), ch=u'.')

merges several lines in the header into one

class sheetparser.results.Transpose

Transforms lines into columns and columns to lines