Maxim Vladimirskiy
b30c375c5b
Expose extract_from_html_tree
2022-01-06 15:16:43 +03:00
Maxim Vladimirskiy
cec5acf58f
Remove max tags limit
2022-01-06 14:18:11 +03:00
Maxim Vladimirskiy
24d0f2d00a
Merge pull request #223 from mailgun/maxim/develop
...
PIP-1509: Optimise sender name check [python3]
v1.4.10
2021-11-19 13:11:29 +03:00
Maxim Vladimirskiy
94007b0b92
Optimise sender name check
2021-11-19 11:12:26 +03:00
Maxim Vladimirskiy
1a5548f171
Merge pull request #222 from mailgun/maxim/develop
...
PIP-1409: Remove version pins from setup.py [python3]
v1.4.9
2021-11-11 16:29:30 +03:00
Maxim Vladimirskiy
53c49b9121
Remove version pins from setup.py
2021-11-11 15:36:50 +03:00
Matt Dietz
bd50872043
Merge pull request #217 from mailgun/dietz/REP-1030
...
Drops Python 2 support [python3]
2021-06-15 09:46:29 -05:00
Matt Dietz
d37c4fd551
Drops Python 2 support
...
REP-1030
In addition to some python 2 => 3 fixes, this change bumps the scikit-learn
version to latest. The previously pinned version of scikit-learn failed trying
to compile all necessary C modules under python 3.7+ due to included header files
that weren't compatible with C the API implemented in python 3.7+.
Simultaneously, with the restrictive compatibility supported by scikit-learn,
it seemed prudent to drop python 2 support altogether. Otherwise, we'd be stuck
with python 3.4 as the newest possible version we could support.
With this change, tests are currently passing under 3.9.2.
Lastly, imports the original training data. At some point, a new version
of the training data was committed to the repo but no classifier was
trained from it. Using a classifier trained from this new data resulted
in most of the tests failing.
2021-06-10 14:03:25 -05:00
Sergey Obukhov
d9ed7cc6d1
Merge pull request #190 from yoks/master
...
Add __init__.py into data folder, add data files into MANIFEST.in
2019-07-02 18:56:47 +03:00
Sergey Obukhov
0a0808c0a8
Merge branch 'master' into master
2019-07-01 20:48:46 +03:00
Sergey Obukhov
16354e3528
Merge pull request #191 from mailgun/thrawn/develop
...
PIP-423: Now removing namespaces from parsed HTML
v1.4.8
2019-05-12 11:54:17 +03:00
Derrick J. Wippler
1018e88ec1
Now removing namespaces from parsed HTML
2019-05-10 11:16:12 -05:00
Ivan Anisimov
2916351517
Update setup.py
2019-03-16 22:17:26 +03:00
Ivan Anisimov
46d4b02c81
Update setup.py
2019-03-16 22:15:43 +03:00
Ivan Anisimov
58eac88a10
Update MANIFEST.in
2019-03-16 22:03:40 +03:00
Ivan Anisimov
2ef3d8dfbe
Update MANIFEST.in
2019-03-16 22:01:00 +03:00
Ivan Anisimov
7cf4c29340
Create __init__.py
2019-03-16 21:54:09 +03:00
Sergey Obukhov
cdd84563dd
Merge pull request #183 from mailgun/sergey/date
...
fix text with Date: misclassified as quotations splitter
v1.4.7
2019-01-18 17:32:10 +03:00
Sergey Obukhov
8138ea9a60
fix text with Date: misclassified as quotations splitter
2019-01-18 16:49:39 +03:00
Sergey Obukhov
c171f9a875
Merge pull request #169 from Savageman/patch-2
...
Use regex match to detect outlook 2007, 2010, 2013
2018-11-05 10:43:20 +03:00
Sergey Obukhov
3f97a8b8ff
Merge branch 'master' into patch-2
2018-11-05 10:42:00 +03:00
Esperat Julian
1147767ff3
Fix regression: windows mail format was left forgotten
...
Missing a | at the end of the regex, so next lines are part of the global search.
2018-11-04 19:42:12 +01:00
Sergey Obukhov
6a304215c3
Merge pull request #177 from mailgun/obukhov-sergey-patch-1
...
Update Readme with how to retrain on your own data
v1.4.6
2018-11-02 15:22:18 +03:00
Sergey Obukhov
31714506bd
Update Readme with how to retrain on your own data
2018-11-02 15:21:36 +03:00
Sergey Obukhov
403d80cf3b
Merge pull request #161 from glaand/master
...
Fix: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
2018-11-02 15:03:02 +03:00
Sergey Obukhov
7cf20f2877
Merge branch 'master' into master
2018-11-02 14:52:38 +03:00
Sergey Obukhov
afff08b017
Merge branch 'master' into patch-2
2018-11-02 09:13:42 +03:00
Sergey Obukhov
685abb1905
Merge pull request #171 from gabriellima95/Add-Portuguese-Language
...
Add Portuguese language to quotations
2018-11-02 09:12:43 +03:00
Sergey Obukhov
41990727a3
Merge branch 'master' into Add-Portuguese-Language
2018-11-02 09:11:07 +03:00
Sergey Obukhov
b113d8ab33
Merge pull request #172 from ad-m/patch-1
...
Fix catastrophic backtracking in regexp
2018-11-02 09:09:49 +03:00
Adam Dobrawy
7bd0e9cc2f
Fix catastrophic backtracking in regexp
...
Co-Author: @Nipsuli
2018-09-21 22:00:10 +02:00
gabriellima95
1e030a51d4
Add Portuguese language to quotations
2018-09-11 15:27:39 -03:00
Esperat Julian
238a5de5cc
Use regex match to detect outlook 2007, 2010, 2013
...
I encountered a variant of the outlook quotations with a space after the semicolon.
To prevent multiplying the number of rules, I implemented a regex match instead (I found how to here: https://stackoverflow.com/a/34093801/211204 ).
I documented all the different variants as cleanly as I could.
2018-08-31 12:39:52 +02:00
André Glatzl
53b24ffb3d
Cut out first some encoding html tags such as xml and doctype for avoiding conflict with unicode decoding
2017-12-19 15:15:10 +01:00
Sergey Obukhov
a7404afbcb
Merge pull request #155 from mailgun/sergey/appointment
...
fix appointments in text
v1.4.5
2017-10-23 16:34:08 -07:00
Sergey Obukhov
0e6d5f993c
fix appointments in text
2017-10-23 16:32:42 -07:00
Sergey Obukhov
60637ff13a
Merge pull request #152 from mailgun/sergey/v1.4.4
...
bump version
v1.4.4
2017-08-24 16:00:05 -07:00
Sergey Obukhov
df8259e3fe
bump version
2017-08-24 15:58:53 -07:00
Sergey Obukhov
aab3b1cc75
Merge pull request #150 from ezrapagel/fix_greedy_dash_regex
...
android_wrote regex incorrectly matching
2017-08-24 15:52:29 -07:00
Sergey Obukhov
9492b39f2d
Merge branch 'master' into fix_greedy_dash_regex
2017-08-24 15:39:28 -07:00
Sergey Obukhov
b9ac866ea7
Merge pull request #151 from mailgun/sergey/reshape
...
reshape data as suggested by sklearn
v1.4.3
2017-08-24 12:04:58 -07:00
Sergey Obukhov
678517dd89
reshape data as suggested by sklearn
2017-08-24 12:03:47 -07:00
Ezra Pagel
221774c6f8
android_wrote regex was incorrectly iterating characters in 'wrote', resulting in greedy regex that
...
matched many strings with dashes
2017-08-21 12:47:06 -05:00
Sergey Obukhov
a2aa345712
Merge pull request #148 from mailgun/sergey/v1.4.2
...
bump version after adding support for Vietnamese format
v1.4.2
2017-07-10 11:44:46 -07:00
Sergey Obukhov
d998beaff3
bump version after adding support for Vietnamese format
2017-07-10 11:42:52 -07:00
Sergey Obukhov
a379bc4e7c
Merge pull request #147 from hnx116/master
...
add support for Vietnamese reply format
2017-07-10 11:40:04 -07:00
Hung Nguyen
b8e1894f3b
add test case
2017-07-10 13:28:33 +07:00
Hung Nguyen
0b5a44090f
add support for Vietnamese reply format
2017-07-10 11:18:57 +07:00
Sergey Obukhov
b40835eca2
Merge pull request #145 from mailgun/sergey/outlook-2013-version-bump
...
bump version after merging outlook 2013 support PR
v1.4.1
2017-06-18 22:56:16 -07:00
Sergey Obukhov
b38562c7cc
bump version after merging outlook 2013 support PR
2017-06-18 22:55:15 -07:00