1. Commie Mutant Traitor
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2004
    Alpharetta, GA
    Rep Power

    Opinions wanted: is this worth making a SF project out of?

    Mods: I wasn't certain if it should be in the language-specific forum or one of the discussion forums. If this would be better off in another message board, feel free to move it.

    Earlier this year, I took a course an undergraduate course in compiler design, and opted to use Python as my implementation language - the class was to focus on the design aspects, not performance, so I concluded that I would rather use a language which wouldn't fight back so much as Java or C++ would when it came to string handling. While this may have been a mistake in the end - the time I saved not writing a lot of string handling code was lost again by some rather overly ambitious design approaches - in the process of it, I implemented a more or less general string-driven finite automata library.

    I am now wondering if this would be something which would be of use to others, so I am considering creating a project on SourceForge for it. I would like to know if others think this would be worth the trouble to share or not, and if so, what changes or refinements should be made to make it a viable open-source library. I am concerned that it is too small and basic to be worth making a library out of. Conversely, it may be too heavyweight to be practical, since FSMs are usually implemented in much simpler manners than the one I used. Also, the design is not especially Pythonic, being rather clumsy and Java-esque in it's initialization specifically. Finally, it may not be suited to a lot of things people might want an FSM for - while the design is almost general, it is still designed primarily for tokenizing a stream of characters, which may limit its applicability.

    The library consists of two classes, a State class and a StateMachine class. A State object consists of a key, a table of input/state transition pairs, a flag whether the state can be a final state, a type for the type of token it returns (if any), and a pushback number (most likely, I will replace these with some more general sort of completion value, though just how I haven't decided). The class has four class constants: three representing the common transition states (accept, error, and other), one serving a common start state key (used by the StateMachine class), and one used to indicate a default transition state for input not otherwise specified (if none is given, it assumes an error state). In addition to a constructor and a string-representation method, t has one instance method, next(), which takes an input string and returns a tuple with state to which it would transition, the final-state flag, a token, (if it is an accept state), and a pushback number (if it is at a final state that ends one or more characters back).

    The StateMachine object consists of a table of state keys (right now, it only uses numbers, but I think that allowing named states may prove valuable later) and states, State object representing the current state of the FSM, and a flag indicating if the table is 'locked' or not (that is, the table has been completely populated). The StateMachine is constructed with a required start state, and new states are added to the table using the add() method, and new states can be added until lock() is called. There is a locked() method to test whtehr it is locked or not. Once the table is locked, the transition() method can be used to step through the input stream, taking one character at a time as input. It returns the accept flag, the token, and the pushback number returned by the call to current_state.next(); on a error state, it returns State.ERROR for the token.

    Here is the code as it currently exists:
    Python Code:
    from typecheck import *
    __title__ = 'AutonomousCollective'
    __author__ = 'Joseph Osako'
    __version__ = '0.1'
    __date__ = '2008:08:21'
    __credits__ += """Uses Dimitri Dvoinikov's typecheck decorator library"""
    __credits__ += """<http://www.targeted.org/python/recipes/typecheck.py>"""
    __credits__ += """for method typechecking.\n"""
    class State (object):
        """ Class representing the states of a Finite State Automata.
        Instance Variables:
            transitions: dictionary (string:integer)
            final: boolean
            toktype: Object
            pushback: integer
        State object consist of four properties: an a-list of accepted input
        strings and numeric identifiers for the succeeding states, a flag
        indicating if the state can lead to a final (accept or reject) state,
        the condition returned when accepted, and a number of characters which
        the recognizer should push back by. A State can take an input character
        and return an identifying number for the corresponding successor State.
        The State class has five class constants. Four of these represent common
        states for all state machines (START, ACCEPT, ERROR, and DEFAULT). The
        last represents a default input.
        START = 0
        ACCEPT = -1
        ERROR = -2
        DEFAULT = -3
        OTHER = "__OTHER__"
        def __init__(self, transitions, final = False, \
                    toktype = ERROR, pushback = 0):
            """ Constructor - create a State object.
            transitions - lookup table of inputs and states
            final - final/non-final state flag (default false)
            toktype - type of token recognized on accept (default None)
            pushback - number of chars to push back after accept (default 0)
            Initializes the instance variables for the new object.
            self.__final, self.__toktype = final, toktype
            self.__transitions = transitions
            self.__pushback = pushback
        @takes("State", str)
        def next(self, input):
            """ Return the successive state based on the input character.
             The next() method is the primary behavior of the State object.
            It returns a tuple containing the successor state, whether it
            is a final state or not, the recognized result (if any) and the
            amount of pushback (if any).
            next() takes a single character input: it will reject any string not
            of size one, returning the number of extraneous characters in
            'pushback'. The method first tries to match the character to a
            state through a simple lookup, on the assumption that most transitions
            will be based on a single possible input. If this does not succeed,
            it will try to match the characters against keys with multiple
            characters; this allows a given state transition to give a set of
            possible matches (e.g., all alphanumeric characters) rather than
            requiring a separate key for each character. If this does not succeed,
            it then checkes to see if there is a default state set. If none of these
            find a match, the method returns an error state.
            If a successor is found, the method then checks to see if the successor
            is either ACCEPT or ERROR. If it is ACCEPT, it returns the ACCEPT
            constant, a true final value, the type of the recongized token, and
            a pushback amount. If it is ERROR, it returns the ERROR constant, a
            true final value, ERROR as the token type, and the pushback amount.
            Otherwise, it returns the successor and false/None/0 for the remainder.
            if len(input) != 1:         # check validity of the input
                return self.ERROR, False, self.ERROR, len(input) - 1
            # try to match the input to a transition
            successor = None
            if input in self.__transitions:     # check single-character key match
                successor = self.__transitions.get(input)
                for s in self.__transitions:    # check multi-char keys
                    if input in s and s != self.OTHER:
                        successor = self.__transitions.get(s)
                if successor is None:           # use default transition, if any
                    successor = self.__transitions.get(self.OTHER)
                    if successor is None:
                        return self.ERROR, False, self.ERROR, 0
            # successor found, test if it is either an ACCEPT or ERROR state.
            if successor == self.ACCEPT:    # ACCEPT
                return self.ACCEPT, True, self.__toktype, self.__pushback
            elif successor == self.ERROR:   # ERROR
                return self.ERROR, True, self.ERROR, self.__pushback
                # not a final state, return the successor state
                return successor, False, None, 0
            def __str__(self):
                """ Basic string representation of a State."""
                return str(self.__transitions)
    class StateMachine(object):
        """ Generalized Finite State Automata class.
        Instance Variables:
            states: a lookup table of state numbers and states
            current_state: the current state of the FSM
            locked: flag whether all of the states are set.
        A FSM consists of a set of States, the current State, and a flag
        indicating if all States have been added to the set of states. Until
        the FSM is locked, new states may be added. The State machine begins
        in the Start state, and accepts input until it either accepts or rejects
        the input string. When it reaches one of these final states, it returns
        the type of the recongized string and resets itself to the Start state.
        def __init__(self, start_state):
            """ Constructor - create a FSM object.
            Adds the initial (Start) state, sets the current state to Start,
            and clears the locked flag.
            self.__states = {State.START: start_state}
            self.__locked = False
            self.__current_state = self.__states[State.START]
        @takes("StateMachine", int, State)
        def add(self, key, state):
            """ Add a new state to the state table.
            The add() method takes a state identifier key and a new State
            object. If the table is not locked, and there is no State
            by that number already, it adds state to the state table
            and returns true. Otherwise, it returns false.
            if self.__locked:
                return False   # do not allow any states to be added
            elif key in self.__states:  # state with that ID already in table
                return False
                self.__states[key] = state
                return True
        def lock(self):
            """ Locks state table so no new states can be added."""
            lock = True
        def transition(self, input):
            """ Perform a state transition, and if it is a final
            state, return success or failure and the recognized token type.
            # get the successor state of the current state, if any
            successor, final, token, pb = self.__current_state.next(input)
            # see if the state has finalized, reset the current state and
            # determine if it accepted or rejected the input string
            if final:
                self.__current_state = self.__states[State.START]
                if successor == State.ACCEPT:   # string accepted
                    return True, token, pb
                else:                           # string rejected
                    return True, State.ERROR, pb
            # if the state has not finalized, see if it is a 'non-final' error
            # state and return an error value if it is. This will only occur if
            # the input is not valid.
            elif successor == State.ERROR:
                self.__current_state = self.__states[State.START]
                return True, State.ERROR, pb
            # if the state is not final and not an error, advance to the next state
                self.__current_state = self.__states[successor]
                return False, 0, 0
        def __str__(self):
            """Return a simple string representation of the FSM for debugging. """
            return str(self.__states)
    if __name__ == "__main__":

    I have attached a copy of the actual project it had been part of for anyone who wants a reference point in how I intended it to be used.

    Any and all (constructive) comments and critiques would be helpful. At the moment, I have tentatively given the projec the name 'AutonomousCollective', but any other suggestions would be appreciated (though I'd prefer something with either a MP or H2G2 theme - the latter to match th name of the original project). TIA.
    Attached Files
    Last edited by Schol-R-LEA; August 23rd, 2008 at 04:13 PM.
    Rev First Speaker Schol-R-LEA;2 JAM LCF ELF KoR KCO BiWM TGIF
    #define KINSEY (rand() % 7) λ Scheme is the Red Pill
    Scheme in Short Understanding the C/C++ Preprocessor
    Taming Python A Highly Opinionated Review of Programming Languages for the Novice, v1.1

    FOR SALE: One ShapeSystem 2300 CMD, extensively modified for human use. Includes s/w for anthro, transgender, sex-appeal enhance, & Gillian Anderson and Jason D. Poit clone forms. Some wear. $4500 obo. tverres@et.ins.gov
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2005
    Bay Area, California
    Rep Power
    I think this implementation is pretty good for what its trying to do, but to be a full-fledged library you need to generalize for 3+ different use-cases. I think you'll find faults and either fix them or decide to limit the scope of the library to target a narrow nitch. Both are excellent choices, but force drastically different design choices.

    As per your private message regarding my assertion that states should not perform work or provide the transition logic, let me explain why. A very common use-case is for workfow engines: travel booking, ticket tracking (support, bugs, etc), data processing, and so on. If your task knows where to go next, then this would leak across different workflows. If you specify it on a per flow basis, you're still making the execution brittle. You'll find developers making implicit data dependancies across tasks or creating bizaar flows that loop back through states to arrive at the desired final one. You'll even find workflows being used where they shouldn't be (e.g. routing of an ESB). When you make states manage the tranitioning, you put them in control of the framework. In effect, you are inverting the framework's implicit inversion of control.

    I think your project works elegantly for a small, discrete set of states. But what if you have dozens? Will there be duplicate state logic but with different transitions, or the same state with many transitions (most of which are not acceptable given the flow)? Overall, it gets messier and messier the bigger the state machine grows.

    Your usage makes me think of Turing machines and push-down automata. Clients of those tend not to have too many states either, and it would fit well with your usage as well. So it might be useful as a pre-rolled version to allow others to plug into. Or at least be a fun excersize to simulate different types of FSA designs. Even a package of those could be useful, given the very special-purpose nature of them.

    I think the first thing to do is determine what class of problems you want to solve and then rework the design as needed.

IMN logo majestic logo threadwatch logo seochat tools logo