Use regex match to detect outlook 2007, 2010, 2013
I encountered a variant of the outlook quotations with a space after the semicolon. To prevent multiplying the number of rules, I implemented a regex match instead (I found how to here: https://stackoverflow.com/a/34093801/211204). I documented all the different variants as cleanly as I could.
This commit is contained in:
@@ -87,23 +87,24 @@ def cut_gmail_quote(html_message):
|
||||
|
||||
def cut_microsoft_quote(html_message):
|
||||
''' Cuts splitter block and all following blocks. '''
|
||||
#use EXSLT extensions to have a regex match() function with lxml
|
||||
ns = {"re": "http://exslt.org/regular-expressions"}
|
||||
|
||||
#general pattern: @style='border:none;border-top:solid <color> 1.0pt;padding:3.0pt 0<unit> 0<unit> 0<unit>'
|
||||
#outlook 2007, 2010 (international) <color=#B5C4DF> <unit=cm>
|
||||
#outlook 2007, 2010 (american) <color=#B5C4DF> <unit=pt>
|
||||
#outlook 2013 (international) <color=#E1E1E1> <unit=cm>
|
||||
#outlook 2013 (american) <color=#E1E1E1> <unit=pt>
|
||||
#also handles a variant with a space after the semicolon
|
||||
splitter = html_message.xpath(
|
||||
#outlook 2007, 2010 (international)
|
||||
"//div[@style='border:none;border-top:solid #B5C4DF 1.0pt;"
|
||||
"padding:3.0pt 0cm 0cm 0cm']|"
|
||||
#outlook 2007, 2010 (american)
|
||||
"//div[@style='border:none;border-top:solid #B5C4DF 1.0pt;"
|
||||
"padding:3.0pt 0in 0in 0in']|"
|
||||
#outlook 2013 (international)
|
||||
"//div[@style='border:none;border-top:solid #E1E1E1 1.0pt;"
|
||||
"padding:3.0pt 0cm 0cm 0cm']|"
|
||||
#outlook 2013 (american)
|
||||
"//div[@style='border:none;border-top:solid #E1E1E1 1.0pt;"
|
||||
"padding:3.0pt 0in 0in 0in']|"
|
||||
#outlook 2007, 2010, 2013 (international, american)
|
||||
"//div[@style[re:match(., 'border:none; ?border-top:solid #(E1E1E1|B5C4DF) 1.0pt; ?"
|
||||
"padding:3.0pt 0(in|cm) 0(in|cm) 0(in|cm)')]]"
|
||||
#windows mail
|
||||
"//div[@style='padding-top: 5px; "
|
||||
"border-top-color: rgb(229, 229, 229); "
|
||||
"border-top-width: 1px; border-top-style: solid;']"
|
||||
, namespaces=ns
|
||||
)
|
||||
|
||||
if splitter:
|
||||
|
||||
Reference in New Issue
Block a user