Sergey Obukhov
37c95ff97b
fallback untouched html if we can not parse html tree
2016-08-19 11:38:12 -07:00
Sergey Obukhov
5b1ca33c57
fix cssselect
2016-08-16 17:11:41 -07:00
Sergey Obukhov
ec8e09b34e
fix
2016-08-15 20:31:04 -07:00
Sergey Obukhov
bcf97eccfa
use html5lib to parse html
2016-08-15 19:36:21 -07:00
Sergey Obukhov
f53b5cc7a6
Merge pull request #105 from mailgun/sergey/fromstring
...
html with comment that has no parent crashes html_tree_to_text
v1.2.16
2016-08-15 13:40:37 -07:00
Sergey Obukhov
27adde7aa7
bump version
2016-08-15 13:21:10 -07:00
Sergey Obukhov
a9719833e0
html with comment that has no parent crashes html_tree_to_text
2016-08-12 17:40:12 -07:00
Sergey Obukhov
7bf37090ca
Merge pull request #101 from mailgun/sergey/empty-html
...
if html stripped off quotations does not have readable text fallback …
v1.2.15
2016-08-12 12:18:50 -07:00
Sergey Obukhov
44fcef7123
bump version
2016-08-11 23:59:18 -07:00
Sergey Obukhov
69a44b10a1
Merge branch 'master' into sergey/empty-html
2016-08-11 23:58:11 -07:00
Sergey Obukhov
b085e3d049
Merge pull request #104 from mailgun/sergey/spaces
...
fixes mailgun/talon#103 keep newlines when parsing html quotations
2016-08-11 23:56:26 -07:00
Sergey Obukhov
4b953bcddc
fixes mailgun/talon#103 keep newlines when parsing html quotations
2016-08-11 20:17:37 -07:00
Sergey Obukhov
315eaa7080
if html stripped off quotations does not have readable text fallback to unparsed html
2016-08-11 19:55:23 -07:00
Sergey Obukhov
5a9bc967f1
Merge pull request #100 from mailgun/sergey/restrict
...
do not parse html quotations if html is longer then certain threshold
v1.2.14
2016-08-11 16:08:03 -07:00
Sergey Obukhov
a0d7236d0b
bump version and add a comment
2016-08-11 15:49:09 -07:00
Sergey Obukhov
21e9a31ffe
add test
2016-08-09 17:15:49 -07:00
Sergey Obukhov
4ee46c0a97
do not parse html quotations if html is longer then certain threshold
2016-08-09 17:08:58 -07:00
Sergey Obukhov
10d9a930f9
Merge pull request #99 from mailgun/sergey/capitalized
...
consider word capitilized only if it is camel case - not all upper case
v1.2.12
2016-07-20 16:47:12 -07:00
Sergey Obukhov
a21ccdb21b
consider word capitilized only if it is camel case - not all upper case
2016-07-19 17:37:36 -07:00
Sergey Obukhov
7cdd7a8f35
Merge pull request #98 from mailgun/sergey/1.2.11
...
version bump
v1.2.11
2016-07-19 16:22:24 -07:00
Sergey Obukhov
01e03a47e0
version bump
2016-07-19 15:51:46 -07:00
Sergey Obukhov
1b9a71551a
Merge pull request #97 from umairwaheed/strip-talon
...
Strip down Talon
2016-07-19 15:46:56 -07:00
Umair Khan
911efd1db4
Move encoding detection inside if condition.
2016-07-19 09:44:40 +05:00
Umair Khan
e61f0a68c4
Add six library to setup.py
2016-07-19 09:40:03 +05:00
Umair Khan
cefbcffd59
Make tests/text_quotations_test.py compatible with Python 3.
2016-07-13 14:45:26 +05:00
Umair Khan
622a98d6d5
Make utils compatible with Python 3.
2016-07-13 13:00:24 +05:00
Umair Khan
7901f5d1dc
Convert msg_body into unicode in preprocess.
2016-07-13 11:18:10 +05:00
Umair Khan
555c34d7a8
Make sure html_to_text processes bytes
2016-07-13 11:18:10 +05:00
Umair Khan
dcc0d1de20
Convert msg_body to bytes in extract_from_html
2016-07-13 11:18:06 +05:00
Umair Khan
7bdf4d622b
Only encode if str
2016-07-13 08:01:47 +05:00
Umair Khan
4a7207b0d0
Only convert to unicode if str
2016-07-13 08:01:47 +05:00
Umair Khan
ad9c2ca0e8
Upgrade quotations.py
2016-07-13 08:01:44 +05:00
Umair Khan
da998ddb60
Run modernizer on the code.
2016-07-12 17:25:46 +05:00
Umair Khan
07f68815df
Allow installation of ML free version.
...
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.
Fixes #96
2016-07-12 15:08:53 +05:00
Sergey Obukhov
35645f9ade
Merge pull request #95 from mailgun/sergey/forge
...
open-sourcing email dataset
v1.2.10
2016-06-10 15:45:29 -07:00
Sergey Obukhov
7c3d91301c
open-sourcing email dataset
2016-06-10 14:10:53 -07:00
Sergey Obukhov
5bcf7403ad
Merge pull request #94 from mailgun/obukhov-sergey-patch-1
...
Update README.rst
v1.2.9
2016-05-31 20:16:13 -07:00
Sergey Obukhov
2d6c092b65
bump version
2016-05-31 18:42:47 -07:00
Sergey Obukhov
6d0689cad6
Update README.rst
2016-05-31 18:39:07 -07:00
Sergey Obukhov
3f80e93ee0
Merge pull request #93 from mailgun/sergey/version-bump
...
bump
v1.2.8
2016-05-31 18:15:28 -07:00
Sergey Obukhov
1b18abab1d
bump
2016-05-31 16:53:41 -07:00
Sergey Obukhov
03dd5af5ab
Merge pull request #91 from KevinCathcart/patch-1
...
Support outlook 2007/2010 running in en-us locale
2016-05-31 16:50:35 -07:00
Sergey Obukhov
dfba82b07c
Merge pull request #92 from mailgun/obukhov-sergey-kuntzcamera
...
Update README.rst
2016-05-31 15:42:34 -07:00
Sergey Obukhov
08ca02c87f
Update README.rst
2016-05-31 15:14:32 -07:00
Kevin Cathcart
b61f4ec095
Support outlook 2007/2010 running in en-us locale
...
My American English copy of outlook 2007 is using inches in the reply separator rather than centimeters. The separator is otherwise Identical. What a strange thing to localize. I'm guessing it uses whatever it thinks the preferred units for page margins are.
2016-05-23 17:23:53 -04:00
Sergey Obukhov
9dbe6a494b
Merge pull request #90 from mailgun/sergey/89
...
fixes mailgun/talon#89
v1.2.7
2016-05-17 16:01:56 -07:00
Sergey Obukhov
44e70939d6
fixes mailgun/talon#89
2016-05-17 15:31:01 -07:00
Sergey Obukhov
ab6066eafa
Merge pull request #87 from mailgun/sergey/1.2.6
...
bump up version
v1.2.6
2016-04-07 17:54:12 -07:00
Sergey Obukhov
42258cdd36
bump up version
2016-04-07 17:51:48 -07:00
Sergey Obukhov
d3de9e6893
Merge pull request #86 from dougkeen/master
...
Fix #85 (exception when stripping gmail quotes)
2016-04-07 17:47:38 -07:00