Sergey Obukhov
|
2444ba87c0
|
Merge pull request #111 from mailgun/sergey/tagscount
restrict html processing to a certain number of tags
v1.3.2
|
2016-09-14 11:06:29 -07:00 |
|
Sergey Obukhov
|
534457e713
|
protect html_to_text as well
|
2016-09-14 09:58:41 -07:00 |
|
Sergey Obukhov
|
ea82a9730e
|
restrict html processing to a certain number of tags
|
2016-09-14 09:33:30 -07:00 |
|
Sergey Obukhov
|
f04b872e14
|
Merge pull request #108 from mailgun/sergey/html5lib-fix
use new parser each time we parse a document
v1.3.1
|
2016-08-22 18:10:35 -07:00 |
|
Sergey Obukhov
|
e61894e425
|
bump version
|
2016-08-22 17:34:18 -07:00 |
|
Sergey Obukhov
|
35fbdaadac
|
use new parser each time we parse a document
|
2016-08-22 16:25:04 -07:00 |
|
Sergey Obukhov
|
8441bc7328
|
Merge pull request #106 from mailgun/sergey/html5lib
use html5lib to parse html
v1.3.0
|
2016-08-19 15:58:07 -07:00 |
|
Sergey Obukhov
|
37c95ff97b
|
fallback untouched html if we can not parse html tree
|
2016-08-19 11:38:12 -07:00 |
|
Sergey Obukhov
|
5b1ca33c57
|
fix cssselect
|
2016-08-16 17:11:41 -07:00 |
|
Sergey Obukhov
|
ec8e09b34e
|
fix
|
2016-08-15 20:31:04 -07:00 |
|
Sergey Obukhov
|
bcf97eccfa
|
use html5lib to parse html
|
2016-08-15 19:36:21 -07:00 |
|
Sergey Obukhov
|
f53b5cc7a6
|
Merge pull request #105 from mailgun/sergey/fromstring
html with comment that has no parent crashes html_tree_to_text
v1.2.16
|
2016-08-15 13:40:37 -07:00 |
|
Sergey Obukhov
|
27adde7aa7
|
bump version
|
2016-08-15 13:21:10 -07:00 |
|
Sergey Obukhov
|
a9719833e0
|
html with comment that has no parent crashes html_tree_to_text
|
2016-08-12 17:40:12 -07:00 |
|
Sergey Obukhov
|
7bf37090ca
|
Merge pull request #101 from mailgun/sergey/empty-html
if html stripped off quotations does not have readable text fallback …
v1.2.15
|
2016-08-12 12:18:50 -07:00 |
|
Sergey Obukhov
|
44fcef7123
|
bump version
|
2016-08-11 23:59:18 -07:00 |
|
Sergey Obukhov
|
69a44b10a1
|
Merge branch 'master' into sergey/empty-html
|
2016-08-11 23:58:11 -07:00 |
|
Sergey Obukhov
|
b085e3d049
|
Merge pull request #104 from mailgun/sergey/spaces
fixes mailgun/talon#103 keep newlines when parsing html quotations
|
2016-08-11 23:56:26 -07:00 |
|
Sergey Obukhov
|
4b953bcddc
|
fixes mailgun/talon#103 keep newlines when parsing html quotations
|
2016-08-11 20:17:37 -07:00 |
|
Sergey Obukhov
|
315eaa7080
|
if html stripped off quotations does not have readable text fallback to unparsed html
|
2016-08-11 19:55:23 -07:00 |
|
Sergey Obukhov
|
5a9bc967f1
|
Merge pull request #100 from mailgun/sergey/restrict
do not parse html quotations if html is longer then certain threshold
v1.2.14
|
2016-08-11 16:08:03 -07:00 |
|
Sergey Obukhov
|
a0d7236d0b
|
bump version and add a comment
|
2016-08-11 15:49:09 -07:00 |
|
Sergey Obukhov
|
21e9a31ffe
|
add test
|
2016-08-09 17:15:49 -07:00 |
|
Sergey Obukhov
|
4ee46c0a97
|
do not parse html quotations if html is longer then certain threshold
|
2016-08-09 17:08:58 -07:00 |
|
Sergey Obukhov
|
10d9a930f9
|
Merge pull request #99 from mailgun/sergey/capitalized
consider word capitilized only if it is camel case - not all upper case
v1.2.12
|
2016-07-20 16:47:12 -07:00 |
|
Sergey Obukhov
|
a21ccdb21b
|
consider word capitilized only if it is camel case - not all upper case
|
2016-07-19 17:37:36 -07:00 |
|
Sergey Obukhov
|
7cdd7a8f35
|
Merge pull request #98 from mailgun/sergey/1.2.11
version bump
v1.2.11
|
2016-07-19 16:22:24 -07:00 |
|
Sergey Obukhov
|
01e03a47e0
|
version bump
|
2016-07-19 15:51:46 -07:00 |
|
Sergey Obukhov
|
1b9a71551a
|
Merge pull request #97 from umairwaheed/strip-talon
Strip down Talon
|
2016-07-19 15:46:56 -07:00 |
|
Umair Khan
|
911efd1db4
|
Move encoding detection inside if condition.
|
2016-07-19 09:44:40 +05:00 |
|
Umair Khan
|
e61f0a68c4
|
Add six library to setup.py
|
2016-07-19 09:40:03 +05:00 |
|
Umair Khan
|
cefbcffd59
|
Make tests/text_quotations_test.py compatible with Python 3.
|
2016-07-13 14:45:26 +05:00 |
|
Umair Khan
|
622a98d6d5
|
Make utils compatible with Python 3.
|
2016-07-13 13:00:24 +05:00 |
|
Umair Khan
|
7901f5d1dc
|
Convert msg_body into unicode in preprocess.
|
2016-07-13 11:18:10 +05:00 |
|
Umair Khan
|
555c34d7a8
|
Make sure html_to_text processes bytes
|
2016-07-13 11:18:10 +05:00 |
|
Umair Khan
|
dcc0d1de20
|
Convert msg_body to bytes in extract_from_html
|
2016-07-13 11:18:06 +05:00 |
|
Umair Khan
|
7bdf4d622b
|
Only encode if str
|
2016-07-13 08:01:47 +05:00 |
|
Umair Khan
|
4a7207b0d0
|
Only convert to unicode if str
|
2016-07-13 08:01:47 +05:00 |
|
Umair Khan
|
ad9c2ca0e8
|
Upgrade quotations.py
|
2016-07-13 08:01:44 +05:00 |
|
Umair Khan
|
da998ddb60
|
Run modernizer on the code.
|
2016-07-12 17:25:46 +05:00 |
|
Umair Khan
|
07f68815df
|
Allow installation of ML free version.
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.
Fixes #96
|
2016-07-12 15:08:53 +05:00 |
|
Sergey Obukhov
|
35645f9ade
|
Merge pull request #95 from mailgun/sergey/forge
open-sourcing email dataset
v1.2.10
|
2016-06-10 15:45:29 -07:00 |
|
Sergey Obukhov
|
7c3d91301c
|
open-sourcing email dataset
|
2016-06-10 14:10:53 -07:00 |
|
Sergey Obukhov
|
5bcf7403ad
|
Merge pull request #94 from mailgun/obukhov-sergey-patch-1
Update README.rst
v1.2.9
|
2016-05-31 20:16:13 -07:00 |
|
Sergey Obukhov
|
2d6c092b65
|
bump version
|
2016-05-31 18:42:47 -07:00 |
|
Sergey Obukhov
|
6d0689cad6
|
Update README.rst
|
2016-05-31 18:39:07 -07:00 |
|
Sergey Obukhov
|
3f80e93ee0
|
Merge pull request #93 from mailgun/sergey/version-bump
bump
v1.2.8
|
2016-05-31 18:15:28 -07:00 |
|
Sergey Obukhov
|
1b18abab1d
|
bump
|
2016-05-31 16:53:41 -07:00 |
|
Sergey Obukhov
|
03dd5af5ab
|
Merge pull request #91 from KevinCathcart/patch-1
Support outlook 2007/2010 running in en-us locale
|
2016-05-31 16:50:35 -07:00 |
|
Sergey Obukhov
|
dfba82b07c
|
Merge pull request #92 from mailgun/obukhov-sergey-kuntzcamera
Update README.rst
|
2016-05-31 15:42:34 -07:00 |
|