Sergey Obukhov
a9719833e0
html with comment that has no parent crashes html_tree_to_text
2016-08-12 17:40:12 -07:00
Sergey Obukhov
69a44b10a1
Merge branch 'master' into sergey/empty-html
2016-08-11 23:58:11 -07:00
Sergey Obukhov
4b953bcddc
fixes mailgun/talon#103 keep newlines when parsing html quotations
2016-08-11 20:17:37 -07:00
Sergey Obukhov
315eaa7080
if html stripped off quotations does not have readable text fallback to unparsed html
2016-08-11 19:55:23 -07:00
Sergey Obukhov
21e9a31ffe
add test
2016-08-09 17:15:49 -07:00
Sergey Obukhov
a21ccdb21b
consider word capitilized only if it is camel case - not all upper case
2016-07-19 17:37:36 -07:00
Umair Khan
cefbcffd59
Make tests/text_quotations_test.py compatible with Python 3.
2016-07-13 14:45:26 +05:00
Umair Khan
622a98d6d5
Make utils compatible with Python 3.
2016-07-13 13:00:24 +05:00
Umair Khan
555c34d7a8
Make sure html_to_text processes bytes
2016-07-13 11:18:10 +05:00
Umair Khan
da998ddb60
Run modernizer on the code.
2016-07-12 17:25:46 +05:00
Sergey Obukhov
44e70939d6
fixes mailgun/talon#89
2016-05-17 15:31:01 -07:00
Doug Keen
333beb94af
Fix #85 (exception when stripping gmail quotes)
2016-04-04 14:22:50 -07:00
Sergey Obukhov
02adf53ab9
fixes mailgun/talon#12
2016-03-04 13:14:50 -08:00
Sergey Obukhov
31803d41bc
fixes mailgun/talon#18
2016-02-19 19:07:10 -08:00
Sergey Obukhov
999e9c3725
fixes mailgun/talon#19
2016-02-19 17:53:52 -08:00
Sergey Obukhov
ce65ff8fc8
Merge pull request #71 from clara-labs/ms-2010-issue
...
First pass at handling issue with ms outlook 2010 with unenclosed quo…
2015-12-18 19:14:13 -08:00
Sergey Obukhov
3d9ae356ea
add more tests, make standard reply tests more relaxed
2015-12-18 18:56:41 -08:00
Carlos Correa
f688d074b5
First pass at handling issue with ms outlook 2010 with unenclosed quoted text.
2015-12-10 19:16:13 -08:00
Sergey Obukhov
41457d8fbd
fixes mailgun/talon#38 mailgun/talon#20
2015-12-05 00:37:02 -08:00
Sergey Obukhov
2c416ecc0e
Merge pull request #62 from tgwizard/better-support-for-scandinavian-languages
...
Add better support for Scandinavian languages
2015-10-14 21:48:10 -07:00
Adam Renberg
14e3a0d80b
Add better support for Scandinavian languages
...
This is a port of https://github.com/tictail/claw/pull/6 by @simonflore.
2015-09-21 21:42:01 +02:00
Adam Renberg
fcd9e2716a
Add fix for Apple Mail email format
...
Where they have an initial > on the "date line".
2015-09-21 21:33:57 +02:00
Sergey Obukhov
ae508fe0e5
fixes mailgun/talon#26
2015-09-21 09:51:26 -07:00
Sergey Obukhov
d328c9d128
fixes mailgun/talon#43
2015-09-18 05:19:59 -07:00
Sergey Obukhov
ad09b18f3f
fixes mailgun/talon#52
2015-09-18 04:47:23 -07:00
Sergey Obukhov
15976888a0
use precise encoding when converting to unicode
2015-09-11 10:38:28 -07:00
Sergey Obukhov
385285e5de
process first 1000 lines for long messages, support for German and Dutch
2015-09-11 06:17:14 -07:00
Sergey Obukhov
cc98befba5
Merge pull request #50 from Easy-D/preserve-regular-blockquotes
...
Preserve regular blockquotes
2015-09-11 04:49:36 -07:00
Easy-D
ed6b861a47
add failing test that shows how regular blockquotes are removed
2015-07-16 21:24:49 +02:00
Oliver Song
7ea773e6a9
Fix iphone test
2015-07-02 21:49:09 -04:00
Scott MacVicar
8b1f87b1c0
Get this building and passing tests
...
Changes:
* add .DS_Store to .gitignore
* Decode base64 encoded emails for tests
* Pick a version of scikit since the pickled clasifiers are based on that
* Add missing numpy and scipy dependencies
2015-07-02 21:49:09 -04:00
Alex Riina
215e36e9ed
allow higher version of regex library
2015-07-02 21:49:09 -04:00
Alex Riina
e3ef622031
remove unused regex
2015-07-02 21:49:09 -04:00
Alex Riina
f16760c466
Remove flanker and replace PyML with scikit-learn
...
I never was actually able to successfully install PyML but the source-forge
distribution and lack of python3 support convinced me that scikit-learn would
be a fine substitute. Flanker was also difficult for me to install and seemed
only to be used in the tests, so I removed it as well to get into a position
where I could run the tests. As of this commit, only one is not passing
(test_standard_replies with android.eml) though I'm not familiar with the `email`
library yet.
2015-07-02 21:49:09 -04:00
Alex Riina
b36287e573
clean up style and extra imports
2015-07-02 21:49:09 -04:00
Alex Riina
4df7aa284b
remove extra imports
2015-07-02 21:49:09 -04:00
Simon
072a440837
Test cases for new patterns
2015-04-15 13:55:17 +02:00
szymonsobczak
3c9ef4653f
some more french fromats
2015-02-24 12:18:54 +01:00
szymonsobczak
b16060261a
support some polish and french formats
2015-02-24 11:39:12 +01:00
Jeremy Schlatter
3768d7ba31
make a separate test function for each language
2014-12-30 14:41:20 -08:00
Jeremy Schlatter
613d1fc815
Add extra splitter expressions and tests for German and Danish.
...
Also some refactoring to make it a bit easier to add more languages.
2014-12-23 15:44:04 -08:00
Sergey Obukhov
170f11038b
initial commit
2014-07-23 21:12:54 -07:00