Matt Dietz
d37c4fd551
Drops Python 2 support
...
REP-1030
In addition to some python 2 => 3 fixes, this change bumps the scikit-learn
version to latest. The previously pinned version of scikit-learn failed trying
to compile all necessary C modules under python 3.7+ due to included header files
that weren't compatible with C the API implemented in python 3.7+.
Simultaneously, with the restrictive compatibility supported by scikit-learn,
it seemed prudent to drop python 2 support altogether. Otherwise, we'd be stuck
with python 3.4 as the newest possible version we could support.
With this change, tests are currently passing under 3.9.2.
Lastly, imports the original training data. At some point, a new version
of the training data was committed to the repo but no classifier was
trained from it. Using a classifier trained from this new data resulted
in most of the tests failing.
2021-06-10 14:03:25 -05:00
Derrick J. Wippler
1018e88ec1
Now removing namespaces from parsed HTML
2019-05-10 11:16:12 -05:00
Sergey Obukhov
8138ea9a60
fix text with Date: misclassified as quotations splitter
2019-01-18 16:49:39 +03:00
Sergey Obukhov
0e6d5f993c
fix appointments in text
2017-10-23 16:32:42 -07:00
Sergey Obukhov
df8259e3fe
bump version
2017-08-24 15:58:53 -07:00
Sergey Obukhov
678517dd89
reshape data as suggested by sklearn
2017-08-24 12:03:47 -07:00
Sergey Obukhov
d998beaff3
bump version after adding support for Vietnamese format
2017-07-10 11:42:52 -07:00
Sergey Obukhov
b38562c7cc
bump version after merging outlook 2013 support PR
2017-06-18 22:55:15 -07:00
Sergey Obukhov
743c76f159
bump version after merging python 3 support PR
2017-06-18 22:48:12 -07:00
Sergey Obukhov
ab5cbe5ec3
bumped talon version
2017-04-25 11:43:55 -07:00
Sergey Obukhov
6f159e8959
loosen the encoding requirement for detect_encoding
2017-04-25 11:19:01 -07:00
Sergey Obukhov
0f5e72623b
add android quotation pattern
2017-04-10 16:33:21 -07:00
Sergey Obukhov
49d1a5d248
bump version
2017-02-14 11:05:50 -08:00
Sergey Obukhov
5af846c13d
bump talon version
2016-11-30 12:56:06 -08:00
Sergey Obukhov
ea82a9730e
restrict html processing to a certain number of tags
2016-09-14 09:33:30 -07:00
Sergey Obukhov
e61894e425
bump version
2016-08-22 17:34:18 -07:00
Sergey Obukhov
ec8e09b34e
fix
2016-08-15 20:31:04 -07:00
Sergey Obukhov
bcf97eccfa
use html5lib to parse html
2016-08-15 19:36:21 -07:00
Sergey Obukhov
27adde7aa7
bump version
2016-08-15 13:21:10 -07:00
Sergey Obukhov
44fcef7123
bump version
2016-08-11 23:59:18 -07:00
Sergey Obukhov
a0d7236d0b
bump version and add a comment
2016-08-11 15:49:09 -07:00
Sergey Obukhov
a21ccdb21b
consider word capitilized only if it is camel case - not all upper case
2016-07-19 17:37:36 -07:00
Sergey Obukhov
01e03a47e0
version bump
2016-07-19 15:51:46 -07:00
Umair Khan
e61f0a68c4
Add six library to setup.py
2016-07-19 09:40:03 +05:00
Umair Khan
da998ddb60
Run modernizer on the code.
2016-07-12 17:25:46 +05:00
Umair Khan
07f68815df
Allow installation of ML free version.
...
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.
Fixes #96
2016-07-12 15:08:53 +05:00
Sergey Obukhov
7c3d91301c
open-sourcing email dataset
2016-06-10 14:10:53 -07:00
Sergey Obukhov
2d6c092b65
bump version
2016-05-31 18:42:47 -07:00
Sergey Obukhov
1b18abab1d
bump
2016-05-31 16:53:41 -07:00
Sergey Obukhov
44e70939d6
fixes mailgun/talon#89
2016-05-17 15:31:01 -07:00
Sergey Obukhov
42258cdd36
bump up version
2016-04-07 17:51:48 -07:00
Sergey Obukhov
02adf53ab9
fixes mailgun/talon#12
2016-03-04 13:14:50 -08:00
Sergey Obukhov
9c17dca17c
bump version
2016-02-29 14:50:52 -08:00
Sergey Obukhov
31803d41bc
fixes mailgun/talon#18
2016-02-19 19:07:10 -08:00
Sergey Obukhov
2ecd9779fc
bump up version
2016-02-19 18:32:07 -08:00
Sergey Obukhov
f6940fe878
bump up version
2015-12-18 19:15:58 -08:00
Sergey Obukhov
3d9ae356ea
add more tests, make standard reply tests more relaxed
2015-12-18 18:56:41 -08:00
Sergey Obukhov
41457d8fbd
fixes mailgun/talon#38 mailgun/talon#20
2015-12-05 00:37:02 -08:00
Sergey Obukhov
8db05f4950
add cssselect to dependencies
2015-10-14 20:31:26 -07:00
Sergey Obukhov
d62d633215
bump up version
2015-09-21 09:55:51 -07:00
Sergey Obukhov
2cb9b5399c
bump up version
2015-09-18 05:23:29 -07:00
Sergey Obukhov
b5af9c03a5
bump up version
2015-09-11 10:42:26 -07:00
Sergey Obukhov
15976888a0
use precise encoding when converting to unicode
2015-09-11 10:38:28 -07:00
Sergey Obukhov
9bee502903
bump up version
2015-09-11 06:27:12 -07:00
Sergey Obukhov
127771dac9
bump up version
2015-09-11 04:51:39 -07:00
Sergey Obukhov
567549cba4
bump up talon version
2015-09-10 10:47:16 -07:00
Sergey Obukhov
d9d89dc250
unpin lxml version
2015-09-10 10:44:05 -07:00
Sergey Obukhov
9358db6cee
bump up talon version
2015-09-03 11:03:01 -07:00
Scott MacVicar
e3c4ff38fe
move test stuff out to its own section
2015-07-02 21:49:09 -04:00
Scott MacVicar
8b1f87b1c0
Get this building and passing tests
...
Changes:
* add .DS_Store to .gitignore
* Decode base64 encoded emails for tests
* Pick a version of scikit since the pickled clasifiers are based on that
* Add missing numpy and scipy dependencies
2015-07-02 21:49:09 -04:00