Sergey Obukhov
0f5e72623b
add android quotation pattern
2017-04-10 16:33:21 -07:00
smitcona
a2eb0f7201
Creating new method which removes initial spaces and marks the message lines. Removing ambiguity introduced to mark_message_lines
2017-02-14 18:19:45 +00:00
smitcona
5c71a0ca07
Split the comment lines so that they are not over 80 characters
2017-02-13 16:45:26 +00:00
Sergey Obukhov
489d16fad9
Merge branch 'master' into mark-splitlines-in-email-quotation-indents
2017-02-09 21:10:16 -08:00
smitcona
a1d0a86305
Pass ignore_initial_spaces=True as this has better clarity than separate boolean variable
2017-02-07 12:47:33 +00:00
smitcona
34c5b526c3
Remove the whitespace before the line if the flag is set
2017-02-03 12:57:26 +00:00
smitcona
3edb6578ba
Dividing preprocess method into two methods, split_emails() now calls one without email content being altered.
2017-02-03 11:49:23 +00:00
smitcona
984c036b6e
Set the marker back to 'm' rather than 't' if it matches the QUOT_PATTERN. Updated test case.
2017-02-01 18:28:19 +00:00
smitcona
567467b8ed
Update comment
2017-02-01 17:29:05 +00:00
smitcona
139edd6104
Add new method which marks as splitlines, lines which are splitlines but start with email quotation indents ("> ")
2017-02-01 17:16:30 +00:00
Phanindra Ramesh Challa
e756d55abf
Fixes issue #123
2016-12-27 13:53:40 +05:30
smitcona
5685a4055a
Improved algorithm
2016-11-22 19:56:57 +00:00
smitcona
97b72ef767
Adding in_header_block variable for reliability
2016-11-22 19:06:34 +00:00
smitcona
31489848be
Remove print lines
2016-11-21 17:36:06 +00:00
smitcona
e5988d447b
Add space
2016-11-21 12:48:29 +00:00
smitcona
adfed748ce
split_emails function added, test added
2016-11-21 12:35:36 +00:00
Sergey Obukhov
534457e713
protect html_to_text as well
2016-09-14 09:58:41 -07:00
Sergey Obukhov
ea82a9730e
restrict html processing to a certain number of tags
2016-09-14 09:33:30 -07:00
Sergey Obukhov
35fbdaadac
use new parser each time we parse a document
2016-08-22 16:25:04 -07:00
Sergey Obukhov
37c95ff97b
fallback untouched html if we can not parse html tree
2016-08-19 11:38:12 -07:00
Sergey Obukhov
5b1ca33c57
fix cssselect
2016-08-16 17:11:41 -07:00
Sergey Obukhov
bcf97eccfa
use html5lib to parse html
2016-08-15 19:36:21 -07:00
Sergey Obukhov
a9719833e0
html with comment that has no parent crashes html_tree_to_text
2016-08-12 17:40:12 -07:00
Sergey Obukhov
69a44b10a1
Merge branch 'master' into sergey/empty-html
2016-08-11 23:58:11 -07:00
Sergey Obukhov
4b953bcddc
fixes mailgun/talon#103 keep newlines when parsing html quotations
2016-08-11 20:17:37 -07:00
Sergey Obukhov
315eaa7080
if html stripped off quotations does not have readable text fallback to unparsed html
2016-08-11 19:55:23 -07:00
Sergey Obukhov
a0d7236d0b
bump version and add a comment
2016-08-11 15:49:09 -07:00
Sergey Obukhov
4ee46c0a97
do not parse html quotations if html is longer then certain threshold
2016-08-09 17:08:58 -07:00
Sergey Obukhov
a21ccdb21b
consider word capitilized only if it is camel case - not all upper case
2016-07-19 17:37:36 -07:00
Umair Khan
911efd1db4
Move encoding detection inside if condition.
2016-07-19 09:44:40 +05:00
Umair Khan
622a98d6d5
Make utils compatible with Python 3.
2016-07-13 13:00:24 +05:00
Umair Khan
7901f5d1dc
Convert msg_body into unicode in preprocess.
2016-07-13 11:18:10 +05:00
Umair Khan
555c34d7a8
Make sure html_to_text processes bytes
2016-07-13 11:18:10 +05:00
Umair Khan
dcc0d1de20
Convert msg_body to bytes in extract_from_html
2016-07-13 11:18:06 +05:00
Umair Khan
7bdf4d622b
Only encode if str
2016-07-13 08:01:47 +05:00
Umair Khan
4a7207b0d0
Only convert to unicode if str
2016-07-13 08:01:47 +05:00
Umair Khan
ad9c2ca0e8
Upgrade quotations.py
2016-07-13 08:01:44 +05:00
Umair Khan
da998ddb60
Run modernizer on the code.
2016-07-12 17:25:46 +05:00
Umair Khan
07f68815df
Allow installation of ML free version.
...
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.
Fixes #96
2016-07-12 15:08:53 +05:00
Kevin Cathcart
b61f4ec095
Support outlook 2007/2010 running in en-us locale
...
My American English copy of outlook 2007 is using inches in the reply separator rather than centimeters. The separator is otherwise Identical. What a strange thing to localize. I'm guessing it uses whatever it thinks the preferred units for page margins are.
2016-05-23 17:23:53 -04:00
Sergey Obukhov
44e70939d6
fixes mailgun/talon#89
2016-05-17 15:31:01 -07:00
Doug Keen
333beb94af
Fix #85 (exception when stripping gmail quotes)
2016-04-04 14:22:50 -07:00
Sergey Obukhov
02adf53ab9
fixes mailgun/talon#12
2016-03-04 13:14:50 -08:00
defkev
743b452daf
Added Zimbra HTML quotation extraction
2016-02-21 16:56:52 +01:00
Sergey Obukhov
31803d41bc
fixes mailgun/talon#18
2016-02-19 19:07:10 -08:00
Sergey Obukhov
999e9c3725
fixes mailgun/talon#19
2016-02-19 17:53:52 -08:00
Sergey Obukhov
ce65ff8fc8
Merge pull request #71 from clara-labs/ms-2010-issue
...
First pass at handling issue with ms outlook 2010 with unenclosed quo…
2015-12-18 19:14:13 -08:00
Sergey Obukhov
3d9ae356ea
add more tests, make standard reply tests more relaxed
2015-12-18 18:56:41 -08:00
Carlos Correa
f688d074b5
First pass at handling issue with ms outlook 2010 with unenclosed quoted text.
2015-12-10 19:16:13 -08:00
Sergey Obukhov
41457d8fbd
fixes mailgun/talon#38 mailgun/talon#20
2015-12-05 00:37:02 -08:00