add script to regenerate ml model

This commit is contained in:
Alex Riina
2015-07-02 21:46:27 -04:00
parent 7ea773e6a9
commit 85c7ee980c
2 changed files with 26 additions and 2 deletions

View File

@@ -89,7 +89,7 @@ the power of machine learning algorithms:
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"
For machine learning talon currently uses `PyML`_ library to build SVM
For machine learning talon currently uses the `scikit-learn`_ library to build SVM
classifiers. The core of machine learning algorithm lays in
``talon.signature.learning package``. It defines a set of features to
apply to a message (``featurespace.py``), how data sets are built
@@ -102,7 +102,21 @@ of features to the dataset we provide files ``classifier`` and
used to load trained classifier. Those files should be regenerated every
time the feature/data set is changed.
.. _PyML: http://pyml.sourceforge.net/
To regenerate the model files, you can run
.. code:: sh
python train.py
or
.. code:: python
from talon.signature import EXTRACTOR_FILENAME, EXTRACTOR_DATA
from talon.signature.learning.classifier import train, init
train(init(), EXTRACTOR_DATA, EXTRACTOR_FILENAME)
.. _scikit-learn: http://scikit-learn.org
.. _ENRON: https://www.cs.cmu.edu/~enron/
Research

10
train.py Normal file
View File

@@ -0,0 +1,10 @@
from talon.signature import EXTRACTOR_FILENAME, EXTRACTOR_DATA
from talon.signature.learning.classifier import train, init
def train_model():
""" retrain model and persist """
train(init(), EXTRACTOR_DATA, EXTRACTOR_FILENAME)
if __name__ == "__main__":
train_model()