diff --git a/README.rst b/README.rst index 2b5966f..2517450 100644 --- a/README.rst +++ b/README.rst @@ -89,7 +89,7 @@ the power of machine learning algorithms: # text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage." # signature == "John Doe\nvia mobile" -For machine learning talon currently uses `PyML`_ library to build SVM +For machine learning talon currently uses the `scikit-learn`_ library to build SVM classifiers. The core of machine learning algorithm lays in ``talon.signature.learning package``. It defines a set of features to apply to a message (``featurespace.py``), how data sets are built @@ -102,7 +102,21 @@ of features to the dataset we provide files ``classifier`` and used to load trained classifier. Those files should be regenerated every time the feature/data set is changed. -.. _PyML: http://pyml.sourceforge.net/ +To regenerate the model files, you can run + +.. code:: sh + + python train.py + +or + +.. code:: python + + from talon.signature import EXTRACTOR_FILENAME, EXTRACTOR_DATA + from talon.signature.learning.classifier import train, init + train(init(), EXTRACTOR_DATA, EXTRACTOR_FILENAME) + +.. _scikit-learn: http://scikit-learn.org .. _ENRON: https://www.cs.cmu.edu/~enron/ Research diff --git a/train.py b/train.py new file mode 100644 index 0000000..54d04b5 --- /dev/null +++ b/train.py @@ -0,0 +1,10 @@ +from talon.signature import EXTRACTOR_FILENAME, EXTRACTOR_DATA +from talon.signature.learning.classifier import train, init + + +def train_model(): + """ retrain model and persist """ + train(init(), EXTRACTOR_DATA, EXTRACTOR_FILENAME) + +if __name__ == "__main__": + train_model()