Commit Graph

247 Commits

Author SHA1 Message Date
Maxim Vladimirskiy a8c7e6a972 Merge pull request #226 from mailgun/maxim/develop
PIP-1562: Remove max tags limit [python3]
v1.5.0
2022-01-06 15:24:57 +03:00
Maxim Vladimirskiy b30c375c5b Expose extract_from_html_tree 2022-01-06 15:16:43 +03:00
Maxim Vladimirskiy cec5acf58f Remove max tags limit 2022-01-06 14:18:11 +03:00
Maxim Vladimirskiy 24d0f2d00a Merge pull request #223 from mailgun/maxim/develop
PIP-1509: Optimise sender name check [python3]
v1.4.10
2021-11-19 13:11:29 +03:00
Maxim Vladimirskiy 94007b0b92 Optimise sender name check 2021-11-19 11:12:26 +03:00
Maxim Vladimirskiy 1a5548f171 Merge pull request #222 from mailgun/maxim/develop
PIP-1409: Remove version pins from setup.py [python3]
v1.4.9
2021-11-11 16:29:30 +03:00
Maxim Vladimirskiy 53c49b9121 Remove version pins from setup.py 2021-11-11 15:36:50 +03:00
Matt Dietz bd50872043 Merge pull request #217 from mailgun/dietz/REP-1030
Drops Python 2 support [python3]
2021-06-15 09:46:29 -05:00
Matt Dietz d37c4fd551 Drops Python 2 support
REP-1030

In addition to some python 2 => 3 fixes, this change bumps the scikit-learn
version to latest. The previously pinned version of scikit-learn failed trying
to compile all necessary C modules under python 3.7+ due to included header files
that weren't compatible with C the API implemented in python 3.7+.

Simultaneously, with the restrictive compatibility supported by scikit-learn,
it seemed prudent to drop python 2 support altogether. Otherwise, we'd be stuck
with python 3.4 as the newest possible version we could support.

With this change, tests are currently passing under 3.9.2.

Lastly, imports the original training data. At some point, a new version
of the training data was committed to the repo but no classifier was
trained from it. Using a classifier trained from this new data resulted
in most of the tests failing.
2021-06-10 14:03:25 -05:00
Sergey Obukhov d9ed7cc6d1 Merge pull request #190 from yoks/master
Add __init__.py into data folder, add data files into MANIFEST.in
2019-07-02 18:56:47 +03:00
Sergey Obukhov 0a0808c0a8 Merge branch 'master' into master 2019-07-01 20:48:46 +03:00
Sergey Obukhov 16354e3528 Merge pull request #191 from mailgun/thrawn/develop
PIP-423: Now removing namespaces from parsed HTML
v1.4.8
2019-05-12 11:54:17 +03:00
Derrick J. Wippler 1018e88ec1 Now removing namespaces from parsed HTML 2019-05-10 11:16:12 -05:00
Ivan Anisimov 2916351517 Update setup.py 2019-03-16 22:17:26 +03:00
Ivan Anisimov 46d4b02c81 Update setup.py 2019-03-16 22:15:43 +03:00
Ivan Anisimov 58eac88a10 Update MANIFEST.in 2019-03-16 22:03:40 +03:00
Ivan Anisimov 2ef3d8dfbe Update MANIFEST.in 2019-03-16 22:01:00 +03:00
Ivan Anisimov 7cf4c29340 Create __init__.py 2019-03-16 21:54:09 +03:00
Sergey Obukhov cdd84563dd Merge pull request #183 from mailgun/sergey/date
fix text with Date: misclassified as quotations splitter
v1.4.7
2019-01-18 17:32:10 +03:00
Sergey Obukhov 8138ea9a60 fix text with Date: misclassified as quotations splitter 2019-01-18 16:49:39 +03:00
Sergey Obukhov c171f9a875 Merge pull request #169 from Savageman/patch-2
Use regex match to detect outlook 2007, 2010, 2013
2018-11-05 10:43:20 +03:00
Sergey Obukhov 3f97a8b8ff Merge branch 'master' into patch-2 2018-11-05 10:42:00 +03:00
Esperat Julian 1147767ff3 Fix regression: windows mail format was left forgotten
Missing a | at the end of the regex, so next lines are part of the global search.
2018-11-04 19:42:12 +01:00
Sergey Obukhov 6a304215c3 Merge pull request #177 from mailgun/obukhov-sergey-patch-1
Update Readme with how to retrain on your own data
v1.4.6
2018-11-02 15:22:18 +03:00
Sergey Obukhov 31714506bd Update Readme with how to retrain on your own data 2018-11-02 15:21:36 +03:00
Sergey Obukhov 403d80cf3b Merge pull request #161 from glaand/master
Fix: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
2018-11-02 15:03:02 +03:00
Sergey Obukhov 7cf20f2877 Merge branch 'master' into master 2018-11-02 14:52:38 +03:00
Sergey Obukhov afff08b017 Merge branch 'master' into patch-2 2018-11-02 09:13:42 +03:00
Sergey Obukhov 685abb1905 Merge pull request #171 from gabriellima95/Add-Portuguese-Language
Add Portuguese language to quotations
2018-11-02 09:12:43 +03:00
Sergey Obukhov 41990727a3 Merge branch 'master' into Add-Portuguese-Language 2018-11-02 09:11:07 +03:00
Sergey Obukhov b113d8ab33 Merge pull request #172 from ad-m/patch-1
Fix catastrophic backtracking in regexp
2018-11-02 09:09:49 +03:00
Adam Dobrawy 7bd0e9cc2f Fix catastrophic backtracking in regexp
Co-Author: @Nipsuli
2018-09-21 22:00:10 +02:00
gabriellima95 1e030a51d4 Add Portuguese language to quotations 2018-09-11 15:27:39 -03:00
Esperat Julian 238a5de5cc Use regex match to detect outlook 2007, 2010, 2013
I encountered a variant of the outlook quotations with a space after the semicolon.

To prevent multiplying the number of rules, I implemented a regex match instead (I found how to here: https://stackoverflow.com/a/34093801/211204).

I documented all the different variants as cleanly as I could.
2018-08-31 12:39:52 +02:00
André Glatzl 53b24ffb3d Cut out first some encoding html tags such as xml and doctype for avoiding conflict with unicode decoding 2017-12-19 15:15:10 +01:00
Sergey Obukhov a7404afbcb Merge pull request #155 from mailgun/sergey/appointment
fix appointments in text
v1.4.5
2017-10-23 16:34:08 -07:00
Sergey Obukhov 0e6d5f993c fix appointments in text 2017-10-23 16:32:42 -07:00
Sergey Obukhov 60637ff13a Merge pull request #152 from mailgun/sergey/v1.4.4
bump version
v1.4.4
2017-08-24 16:00:05 -07:00
Sergey Obukhov df8259e3fe bump version 2017-08-24 15:58:53 -07:00
Sergey Obukhov aab3b1cc75 Merge pull request #150 from ezrapagel/fix_greedy_dash_regex
android_wrote regex incorrectly matching
2017-08-24 15:52:29 -07:00
Sergey Obukhov 9492b39f2d Merge branch 'master' into fix_greedy_dash_regex 2017-08-24 15:39:28 -07:00
Sergey Obukhov b9ac866ea7 Merge pull request #151 from mailgun/sergey/reshape
reshape data as suggested by sklearn
v1.4.3
2017-08-24 12:04:58 -07:00
Sergey Obukhov 678517dd89 reshape data as suggested by sklearn 2017-08-24 12:03:47 -07:00
Ezra Pagel 221774c6f8 android_wrote regex was incorrectly iterating characters in 'wrote', resulting in greedy regex that
matched many strings with dashes
2017-08-21 12:47:06 -05:00
Sergey Obukhov a2aa345712 Merge pull request #148 from mailgun/sergey/v1.4.2
bump version after adding support for Vietnamese format
v1.4.2
2017-07-10 11:44:46 -07:00
Sergey Obukhov d998beaff3 bump version after adding support for Vietnamese format 2017-07-10 11:42:52 -07:00
Sergey Obukhov a379bc4e7c Merge pull request #147 from hnx116/master
add support for Vietnamese reply format
2017-07-10 11:40:04 -07:00
Hung Nguyen b8e1894f3b add test case 2017-07-10 13:28:33 +07:00
Hung Nguyen 0b5a44090f add support for Vietnamese reply format 2017-07-10 11:18:57 +07:00
Sergey Obukhov b40835eca2 Merge pull request #145 from mailgun/sergey/outlook-2013-version-bump
bump version after merging outlook 2013 support PR
v1.4.1
2017-06-18 22:56:16 -07:00