Commit Graph

146 Commits

Author SHA1 Message Date
Sergey Obukhov
2444ba87c0 Merge pull request #111 from mailgun/sergey/tagscount
restrict html processing to a certain number of tags
v1.3.2
2016-09-14 11:06:29 -07:00
Sergey Obukhov
534457e713 protect html_to_text as well 2016-09-14 09:58:41 -07:00
Sergey Obukhov
ea82a9730e restrict html processing to a certain number of tags 2016-09-14 09:33:30 -07:00
Sergey Obukhov
f04b872e14 Merge pull request #108 from mailgun/sergey/html5lib-fix
use new parser each time we parse a document
v1.3.1
2016-08-22 18:10:35 -07:00
Sergey Obukhov
e61894e425 bump version 2016-08-22 17:34:18 -07:00
Sergey Obukhov
35fbdaadac use new parser each time we parse a document 2016-08-22 16:25:04 -07:00
Sergey Obukhov
8441bc7328 Merge pull request #106 from mailgun/sergey/html5lib
use html5lib to parse html
v1.3.0
2016-08-19 15:58:07 -07:00
Sergey Obukhov
37c95ff97b fallback untouched html if we can not parse html tree 2016-08-19 11:38:12 -07:00
Sergey Obukhov
5b1ca33c57 fix cssselect 2016-08-16 17:11:41 -07:00
Sergey Obukhov
ec8e09b34e fix 2016-08-15 20:31:04 -07:00
Sergey Obukhov
bcf97eccfa use html5lib to parse html 2016-08-15 19:36:21 -07:00
Sergey Obukhov
f53b5cc7a6 Merge pull request #105 from mailgun/sergey/fromstring
html with comment that has no parent crashes html_tree_to_text
v1.2.16
2016-08-15 13:40:37 -07:00
Sergey Obukhov
27adde7aa7 bump version 2016-08-15 13:21:10 -07:00
Sergey Obukhov
a9719833e0 html with comment that has no parent crashes html_tree_to_text 2016-08-12 17:40:12 -07:00
Sergey Obukhov
7bf37090ca Merge pull request #101 from mailgun/sergey/empty-html
if html stripped off quotations does not have readable text fallback …
v1.2.15
2016-08-12 12:18:50 -07:00
Sergey Obukhov
44fcef7123 bump version 2016-08-11 23:59:18 -07:00
Sergey Obukhov
69a44b10a1 Merge branch 'master' into sergey/empty-html 2016-08-11 23:58:11 -07:00
Sergey Obukhov
b085e3d049 Merge pull request #104 from mailgun/sergey/spaces
fixes mailgun/talon#103 keep newlines when parsing html quotations
2016-08-11 23:56:26 -07:00
Sergey Obukhov
4b953bcddc fixes mailgun/talon#103 keep newlines when parsing html quotations 2016-08-11 20:17:37 -07:00
Sergey Obukhov
315eaa7080 if html stripped off quotations does not have readable text fallback to unparsed html 2016-08-11 19:55:23 -07:00
Sergey Obukhov
5a9bc967f1 Merge pull request #100 from mailgun/sergey/restrict
do not parse html quotations if html is longer then certain threshold
v1.2.14
2016-08-11 16:08:03 -07:00
Sergey Obukhov
a0d7236d0b bump version and add a comment 2016-08-11 15:49:09 -07:00
Sergey Obukhov
21e9a31ffe add test 2016-08-09 17:15:49 -07:00
Sergey Obukhov
4ee46c0a97 do not parse html quotations if html is longer then certain threshold 2016-08-09 17:08:58 -07:00
Sergey Obukhov
10d9a930f9 Merge pull request #99 from mailgun/sergey/capitalized
consider word capitilized only if it is camel case - not all upper case
v1.2.12
2016-07-20 16:47:12 -07:00
Sergey Obukhov
a21ccdb21b consider word capitilized only if it is camel case - not all upper case 2016-07-19 17:37:36 -07:00
Sergey Obukhov
7cdd7a8f35 Merge pull request #98 from mailgun/sergey/1.2.11
version bump
v1.2.11
2016-07-19 16:22:24 -07:00
Sergey Obukhov
01e03a47e0 version bump 2016-07-19 15:51:46 -07:00
Sergey Obukhov
1b9a71551a Merge pull request #97 from umairwaheed/strip-talon
Strip down Talon
2016-07-19 15:46:56 -07:00
Umair Khan
911efd1db4 Move encoding detection inside if condition. 2016-07-19 09:44:40 +05:00
Umair Khan
e61f0a68c4 Add six library to setup.py 2016-07-19 09:40:03 +05:00
Umair Khan
cefbcffd59 Make tests/text_quotations_test.py compatible with Python 3. 2016-07-13 14:45:26 +05:00
Umair Khan
622a98d6d5 Make utils compatible with Python 3. 2016-07-13 13:00:24 +05:00
Umair Khan
7901f5d1dc Convert msg_body into unicode in preprocess. 2016-07-13 11:18:10 +05:00
Umair Khan
555c34d7a8 Make sure html_to_text processes bytes 2016-07-13 11:18:10 +05:00
Umair Khan
dcc0d1de20 Convert msg_body to bytes in extract_from_html 2016-07-13 11:18:06 +05:00
Umair Khan
7bdf4d622b Only encode if str 2016-07-13 08:01:47 +05:00
Umair Khan
4a7207b0d0 Only convert to unicode if str 2016-07-13 08:01:47 +05:00
Umair Khan
ad9c2ca0e8 Upgrade quotations.py 2016-07-13 08:01:44 +05:00
Umair Khan
da998ddb60 Run modernizer on the code. 2016-07-12 17:25:46 +05:00
Umair Khan
07f68815df Allow installation of ML free version.
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.

Fixes #96
2016-07-12 15:08:53 +05:00
Sergey Obukhov
35645f9ade Merge pull request #95 from mailgun/sergey/forge
open-sourcing email dataset
v1.2.10
2016-06-10 15:45:29 -07:00
Sergey Obukhov
7c3d91301c open-sourcing email dataset 2016-06-10 14:10:53 -07:00
Sergey Obukhov
5bcf7403ad Merge pull request #94 from mailgun/obukhov-sergey-patch-1
Update README.rst
v1.2.9
2016-05-31 20:16:13 -07:00
Sergey Obukhov
2d6c092b65 bump version 2016-05-31 18:42:47 -07:00
Sergey Obukhov
6d0689cad6 Update README.rst 2016-05-31 18:39:07 -07:00
Sergey Obukhov
3f80e93ee0 Merge pull request #93 from mailgun/sergey/version-bump
bump
v1.2.8
2016-05-31 18:15:28 -07:00
Sergey Obukhov
1b18abab1d bump 2016-05-31 16:53:41 -07:00
Sergey Obukhov
03dd5af5ab Merge pull request #91 from KevinCathcart/patch-1
Support outlook 2007/2010 running in en-us locale
2016-05-31 16:50:35 -07:00
Sergey Obukhov
dfba82b07c Merge pull request #92 from mailgun/obukhov-sergey-kuntzcamera
Update README.rst
2016-05-31 15:42:34 -07:00