Commit Graph

79 Commits

Author SHA1 Message Date
Sergey Obukhov
0f5e72623b add android quotation pattern 2017-04-10 16:33:21 -07:00
smitcona
a2eb0f7201 Creating new method which removes initial spaces and marks the message lines. Removing ambiguity introduced to mark_message_lines 2017-02-14 18:19:45 +00:00
smitcona
5c71a0ca07 Split the comment lines so that they are not over 80 characters 2017-02-13 16:45:26 +00:00
Sergey Obukhov
489d16fad9 Merge branch 'master' into mark-splitlines-in-email-quotation-indents 2017-02-09 21:10:16 -08:00
smitcona
a1d0a86305 Pass ignore_initial_spaces=True as this has better clarity than separate boolean variable 2017-02-07 12:47:33 +00:00
smitcona
34c5b526c3 Remove the whitespace before the line if the flag is set 2017-02-03 12:57:26 +00:00
smitcona
3edb6578ba Dividing preprocess method into two methods, split_emails() now calls one without email content being altered. 2017-02-03 11:49:23 +00:00
smitcona
984c036b6e Set the marker back to 'm' rather than 't' if it matches the QUOT_PATTERN. Updated test case. 2017-02-01 18:28:19 +00:00
smitcona
567467b8ed Update comment 2017-02-01 17:29:05 +00:00
smitcona
139edd6104 Add new method which marks as splitlines, lines which are splitlines but start with email quotation indents ("> ") 2017-02-01 17:16:30 +00:00
Phanindra Ramesh Challa
e756d55abf Fixes issue #123 2016-12-27 13:53:40 +05:30
smitcona
5685a4055a Improved algorithm 2016-11-22 19:56:57 +00:00
smitcona
97b72ef767 Adding in_header_block variable for reliability 2016-11-22 19:06:34 +00:00
smitcona
31489848be Remove print lines 2016-11-21 17:36:06 +00:00
smitcona
e5988d447b Add space 2016-11-21 12:48:29 +00:00
smitcona
adfed748ce split_emails function added, test added 2016-11-21 12:35:36 +00:00
Sergey Obukhov
534457e713 protect html_to_text as well 2016-09-14 09:58:41 -07:00
Sergey Obukhov
ea82a9730e restrict html processing to a certain number of tags 2016-09-14 09:33:30 -07:00
Sergey Obukhov
35fbdaadac use new parser each time we parse a document 2016-08-22 16:25:04 -07:00
Sergey Obukhov
37c95ff97b fallback untouched html if we can not parse html tree 2016-08-19 11:38:12 -07:00
Sergey Obukhov
5b1ca33c57 fix cssselect 2016-08-16 17:11:41 -07:00
Sergey Obukhov
bcf97eccfa use html5lib to parse html 2016-08-15 19:36:21 -07:00
Sergey Obukhov
a9719833e0 html with comment that has no parent crashes html_tree_to_text 2016-08-12 17:40:12 -07:00
Sergey Obukhov
69a44b10a1 Merge branch 'master' into sergey/empty-html 2016-08-11 23:58:11 -07:00
Sergey Obukhov
4b953bcddc fixes mailgun/talon#103 keep newlines when parsing html quotations 2016-08-11 20:17:37 -07:00
Sergey Obukhov
315eaa7080 if html stripped off quotations does not have readable text fallback to unparsed html 2016-08-11 19:55:23 -07:00
Sergey Obukhov
a0d7236d0b bump version and add a comment 2016-08-11 15:49:09 -07:00
Sergey Obukhov
4ee46c0a97 do not parse html quotations if html is longer then certain threshold 2016-08-09 17:08:58 -07:00
Sergey Obukhov
a21ccdb21b consider word capitilized only if it is camel case - not all upper case 2016-07-19 17:37:36 -07:00
Umair Khan
911efd1db4 Move encoding detection inside if condition. 2016-07-19 09:44:40 +05:00
Umair Khan
622a98d6d5 Make utils compatible with Python 3. 2016-07-13 13:00:24 +05:00
Umair Khan
7901f5d1dc Convert msg_body into unicode in preprocess. 2016-07-13 11:18:10 +05:00
Umair Khan
555c34d7a8 Make sure html_to_text processes bytes 2016-07-13 11:18:10 +05:00
Umair Khan
dcc0d1de20 Convert msg_body to bytes in extract_from_html 2016-07-13 11:18:06 +05:00
Umair Khan
7bdf4d622b Only encode if str 2016-07-13 08:01:47 +05:00
Umair Khan
4a7207b0d0 Only convert to unicode if str 2016-07-13 08:01:47 +05:00
Umair Khan
ad9c2ca0e8 Upgrade quotations.py 2016-07-13 08:01:44 +05:00
Umair Khan
da998ddb60 Run modernizer on the code. 2016-07-12 17:25:46 +05:00
Umair Khan
07f68815df Allow installation of ML free version.
Add an option to the install script, `--no-ml`, that when given will
install Talon without ML support.

Fixes #96
2016-07-12 15:08:53 +05:00
Kevin Cathcart
b61f4ec095 Support outlook 2007/2010 running in en-us locale
My American English copy of outlook 2007 is using inches in the reply separator rather than centimeters. The separator is otherwise Identical. What a strange thing to localize. I'm guessing it uses whatever it thinks the preferred units for page margins are.
2016-05-23 17:23:53 -04:00
Sergey Obukhov
44e70939d6 fixes mailgun/talon#89 2016-05-17 15:31:01 -07:00
Doug Keen
333beb94af Fix #85 (exception when stripping gmail quotes) 2016-04-04 14:22:50 -07:00
Sergey Obukhov
02adf53ab9 fixes mailgun/talon#12 2016-03-04 13:14:50 -08:00
defkev
743b452daf Added Zimbra HTML quotation extraction 2016-02-21 16:56:52 +01:00
Sergey Obukhov
31803d41bc fixes mailgun/talon#18 2016-02-19 19:07:10 -08:00
Sergey Obukhov
999e9c3725 fixes mailgun/talon#19 2016-02-19 17:53:52 -08:00
Sergey Obukhov
ce65ff8fc8 Merge pull request #71 from clara-labs/ms-2010-issue
First pass at handling issue with ms outlook 2010 with unenclosed quo…
2015-12-18 19:14:13 -08:00
Sergey Obukhov
3d9ae356ea add more tests, make standard reply tests more relaxed 2015-12-18 18:56:41 -08:00
Carlos Correa
f688d074b5 First pass at handling issue with ms outlook 2010 with unenclosed quoted text. 2015-12-10 19:16:13 -08:00
Sergey Obukhov
41457d8fbd fixes mailgun/talon#38 mailgun/talon#20 2015-12-05 00:37:02 -08:00