Files
talon/talon/signature/__init__.py

39 lines
1.1 KiB
Python
Raw Normal View History

2014-07-23 21:12:54 -07:00
"""The package exploits machine learning for parsing message signatures.
The public interface consists of only one `extract` function:
>>> (body, signature) = extract(body, sender)
Where body is the original message `body` and `sender` corresponds to a person
who sent the message.
When importing the package classifiers instances are loaded.
So each process will have it's classifiers in memory.
The import of the package and the call to the `extract` function are better be
enclosed in a try-catch block in case they fail.
.. warning:: When making changes to features or emails the classifier is
trained against, don't forget to regenerate:
* signature/data/train.data and
* signature/data/classifier
"""
import os
from . import extraction
2015-03-08 00:04:41 -05:00
from . extraction import extract #noqa
2014-07-23 21:12:54 -07:00
from . learning import classifier
DATA_DIR = os.path.join(os.path.dirname(__file__), 'data')
EXTRACTOR_FILENAME = os.path.join(DATA_DIR, 'classifier')
EXTRACTOR_DATA = os.path.join(DATA_DIR, 'train.data')
def initialize():
2015-03-08 00:04:41 -05:00
extraction.EXTRACTOR = classifier.load(EXTRACTOR_FILENAME,
EXTRACTOR_DATA)