(*** This bug was imported into bugs.kde.org ***) Package: kmail Version: 1.4.1 (using KDE 3.0.0 ) Severity: wishlist Installed from: Red Hat Linux 7.2.93 Compiler: gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110) OS: Linux (i686) release 2.4.18-3 OS/Compiler notes: I set KMail to display all messages as plaintext but this makes html messages quite unreadable. Would it be possible to strip all html tags from a message when I view it as plaintext? It would also be very convenient if html tags could be stripped when I reply to an html message. (Submitted via bugs.kde.org) (Called from KBugReport dialog)
*** Bug 47507 has been marked as a duplicate of this bug. ***
*** Bug 57776 has been marked as a duplicate of this bug. ***
*** Bug 107802 has been marked as a duplicate of this bug. ***
*** Bug 130132 has been marked as a duplicate of this bug. ***
I agree with this feature. Is very needed. I think that will be great if add a 3 viewing option: -text -strip HTML -html simply ignoring the tags. regards
how ofter do you get mails only with html? i walked through my mail and found only mails containing both plain and html parts.
Just often enough to be annoying.
for example suscribe to www.chistes.com yo will receive a mail with a joke in spanish every day. This is in HTML only. below is a paste of complete email so there is no important data. Is not hard to strip (only erase all within "<??>") But i dont know how or where put a perl pharser. ---------------------------------------------------------------------- Return-Path: <noreply@chistes.com> X-Original-To: my@email Delivered-To: my@email Received: from emls3.wwz.com (emls3.wwz.com [208.237.254.71]) by schweb.com.ar (Postfix) with ESMTP id A0B2B47054 for <my@email>; Sun, 7 Dec 2008 03:34:14 -0300 (ART) From: "Chistes.com" <noreply@chistes.com> To: <my@email> Subject: Su Chiste Del Dia - Christian Content-Type: text/html Date: Sun, 7 Dec 2008 00:27:52 -0600 Message-Id: <20081207063416.A0B2B47054@schweb.com.ar> Status: R X-Status: NC X-KMail-EncryptionState: X-KMail-SignatureState: X-KMail-MDN-Sent: <html><head><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><meta http-equiv="Content-Language" content="es"><title>Chistes.com te regala una Sonrisa</title><base href="http://www.chistes.com/" target="_blank"> <STYLE type="text/css"><!-- BODY{BORDER-RIGHT:medium none;BORDER-TOP:medium none;FONT-SIZE:x-small;MARGIN: 5px 5px 10px 0px;BORDER-LEFT:medium none;COLOR:black; BORDER-BOTTOM:medium none;FONT-FAMILY:Arial,'Times New Roman',Verdana,Helvetica; BACKGROUND-COLOR:#FFFF94;TEXT-DECORATION:none;Font-color:Black;Bgcolor:#FFFF94} .colorTable{bgcolor:#FFFF94;FONT-SIZE:x-small;BACKGROUND-COLOR:#FFFF94} P.chiste:first-letter{font-size:220%;float:left} P.consejo:first-letter{font-size:160%;float:left} A{COLOR:royalblue;TEXT-DECORATION:none} A:link{COLOR:royalblue;TEXT-DECORATION:none} A:visited{COLOR:royalblue;TEXT-DECORATION:none} A:hover{COLOR:royalblue;TEXT-DECORATION:underline} A:active{COLOR:#6699cc;TEXT-DECORATION:none} P{COLOR:black;FONT-SIZE:x-small} .footer{FONT-SIZE:xx-small;COLOR:black;FONT-STYLE:italic} .header{FONT-SIZE:xx-small;TEXT-ALIGN:center;Font-color:darkblue} .header A{COLOR:darkblue;TEXT-DECORATION:none} .header A:link{COLOR:darkblue;TEXT-DECORATION:none} .header A:visited{COLOR:darkblue;TEXT-DECORATION:none} .header A:hover{BORDER-RIGHT:navy thin;BORDER-TOP:navy thin;BORDER-LEFT:navy thin; COLOR:white;BORDER-BOTTOM:navy thin;BACKGROUND-COLOR:steelblue;TEXT-DECORATION:none} .header A:active{COLOR:darkblue;BACKGROUND-COLOR:#ffcc66;TEXT-DECORATION:none} ---> </STYLE></head><body topmargin="4" leftmargin="0" text="#333333"><table class="colorTable" border="0" align="center" cellspacing="0" width="410"><tr><td height="65" nowrap align="center" valign="top" width="518"> <table border="0" cellpadding="2" cellspacing="4" width="518" bgcolor="#CCCC00"><tr><td><a href="http://monitoreointernet.com/?xcmpx=2249"> <FONT SIZE="6" COLOR="#003399">Monitoreo Internet</FONT><FONT COLOR="#0000FF"><br><code><b>http://MonitoreoInternet.com</b></code></FONT></a></td><td width="227"><b><center><a href="http://monitoreointernet.com/?xcmpx=2249"><font size="2" color="#0000FF">Herramienta de Monitoreo GRATIS de disponibilidad, desempeño de tu sitio Web </font></a></center></b></td></tr></table> </td></tr><tr><td nowrap height="10" valign="middle" align="right" width="518" class="header"><a href="scripts/subscription/chistes_subscribe.asp" target="_blank">Chistes en tu email Gratis </a>| <a href="chistedeldia.asp"> Chiste del día</a> | <a href="LosMejores.asp">Los Mejores Chistes</a> | <a href="publicidad/publicidadchistescorreo.htm">Comprar Publicidad<br><br></a>Le sugerimos que ingrese nuestra dirección email, desde la que le enviamos diariamente<br>nuestra carta, a su lista de contacto, libreta de direcciones, lista segura u otra semejante.<br>De esta forma su proveedor de correo considerará nuestros email como deseados, seguros,<br>evitando que sean tratados como spaming y dejados en una bandeja distinta<br>de la bandeja entrada de su correo</td></tr></table><table class="colorTable" align="center" border="2" width="520" bordercolor="#CCCC00" bordercolorlight="#CCCC00" bordercolordark="#CCCC00" cellspacing="0" cellpadding="5"><tr><td valign="top" width="122" align="left" height="182" > <table border="1" cellpadding="2" cellspacing="2" width="153" bgcolor="#F0F0F0" bordercolor="#F0F0F0"><tr><td align="center" width="141"><font color="blue" size="1"><a href="publicidad/publicidadchistescorreo.htm"><b>Auspiciado Por</b></a></font></td></tr> <tr><td bordercolor="#B4D0DC" width="141"><a href="http://www.ConsejosSabios.com" target="_blank"><font size="1" color="blue"><U>ConsejosSabios.com</U></font><br><font color="#6F6F6F" size="1"> Le enviamos gratuitamente ConsejosSabios via CorreoE, los que pueden estimular su crecimiento personal diariamente.</font></a></td></tr><tr><td bordercolor="#B4D0DC" width="141"> <p align="center"><a href="publicidad/publicidadchistescorreo.htm"><font size="1">Sea un Auspiciador.<br>Para más detalles</a></p></font></td></tr></table> <br><div align="center"><a href="http://add.my.yahoo.com/rss?url=http://www.chistes.com/XML/ChisteDelDia.xml.asp" target="_blank"><img src="http://eur.i1.yimg.com/eur.yimg.com/i/es/yg/addto.gif" width="91" height="17" border="0"><br><font size="1">Agrega<br>El Chistes Del Día<br>en MI Yahoo!</font></a> </div></td><td valign="top" align="left" height="186" width="374"><p align="center">Si usted desea leer este mensaje con más claridad y tamaño desde su Browser<a href="ChisteDelDia.asp"> apriete aquí</a> <p><b>Chistes.com Nº 343279</b><br><i>Por Francisca - Chipre<br><br></i>Clasificación: <b>Hombres </b></p> <p class="chiste" align="left">¿Cómo es que sales con Juana, con lo fea que es?<BR> Es que tiene algo distinto que no había notado en ninguna mujer.<BR> -¿Y que es?<BR> Que quiere salir conmigo.<BR> <br><br><b><a href="Correo/SendJoke.asp?ID_Chiste=343279&Email=my@email" target="_blank">Envía este chiste a tus amigos GRATIS.</a></b></td></tr></table><table class="colorTable" align="center" border="2" width="520" bordercolor="#CCCC00" bordercolorlight="#CCCC00" bordercolordark="#CCCC00" cellspacing="0" cellpadding="5"><tr><td width="506"><p align="center"><b>El Consejo de Día de <a href="http://www.consejossabios.com/Clasificaciones.asp" target="_blank">ConsejosSabios.com</a></b> # 7303 <p><b>Clasificación: </b>Mascotas </p><p><b><u>Cuidado con el chocolate y tus mascotas</u></b></p><p class="consejo">El chocolate es venenoso para los perros, gatos y hurones.</p> <b>Gracias a: </b>Hugo - Yugoslavia<p><font size="2"><center><a href="http://www.consejossabios.com/Clasificaciones.asp">Para ver más consejos</a></center><br style="font-size:8px"><center><a href="http://www.consejossabios.com/scripts/subscription/consejo_subscribe.asp">Para suscribir a un amigo para que reciba Consejos Sabios</a></center></font></p></td></tr><tr><td align="center" colspan="2" height="341" width="506"> <p>Haga feliz a un amigo(a) con el regalo para toda ocasión. Para regalar un Chiste gratis todos los días <a href="http://www.chistes.com/scripts/subscription/chistes_subscribe.asp" target="_top">Presione Aquí</a>.<br><form method="POST" action="http://www.chistes.com/scripts/subscription/chistes_unsubscribe.asp?" target="_blank"><p align="center"><font face="Verdana, Arial, Helvetica" size="2">Si Ud. desea cancelar su suscripción:</font><br><input type="text" name="email" size="31" value="my@email" readonly><input type="submit" value="Cancelar suscripción" name="B3" style="font-family: Verdana, Arial, Helvetica; font-size: 8pt"></p></form><p><a href="http://www.chistes.com/scripts/subscription/chistes_unsubscribe.asp">si tienes problemas con la cancelación haga click Aquí<br>y siga las instrucciones.</a><p><a href="/scripts/subscription/chistes_subscribe.asp">Para Suscribirse Gratis a esta publicación.<br></a><p class=footer> © 1995-2008 Publicación diaria de Chistes.com un servicio <a href="http://www.emergency24.com" target="_top">Emergency24,Inc.</a></p><p class=header><a href="PrivacyPolicy.asp">Política de Privacidad </a>| <a href="Publicidad/PublicidadChistesCorreo.htm">Comprar Publicidad</a></p></td></tr></table><center><img height="1" width="30" src="http://read.ajokeaday.com/rj.aspx?s=8&d=20081207&e=my@email" border="0"></center></body></html>
well, I'm not a kmail developer, and I'd like to know their opinion. Qt has widget wich supports a limited set of html, what's the reason behind having 'plain text' option and not replacing it with 'simple html'? Qt's renderer is very lightweight and is for sure secure (well, maybe we can just warn user when he clicks a link). http://doc.trolltech.com/4.4/richtext-html-subset.html your html rendered readable, w/o forms and background
KMail could use the DOM feature element.innerText, this works like innerHTML except it strips all HTML/XML tags. So after KHTML has parsed a page/email, you can access the plain-text version by reading body.innerText.
I'm not a developer either. Perhaps there could be a hierarchy; don't interpret anything and show the code, don't interpret anything and don't show the code, high security simple interpretation, moderately secure complex interpretation.
Great idea! I like it. Maybe it is possible to add those 3 or 4 buttons in the header (to gether with "load external images" instead of the red warning text). It would make it much more easier for me to read mails and so switch between text and html modes. Cheers, Thomas
A feature such as with lynx -dump would be nice. Lynx converts everything to plain text, but considers HTML formatting, e.g. for list environments. BTW, what is KDE's or Kontact's policy on using external programs like lynx? I.e. it would be easy to check during run-time to see if lynx is available and use it to "render" an HTML text, but fall back to the current behavior if it was not available.
Created attachment 62637 [details] Prototype patch to integrate lynx in KMail This is a patch which uses lynx to "render" HTML code to plain text. I had no recent kdepim libraries installed (hacked at DS 2011), but the concept should be clear. If there is interest in such a solution, I will look into refining it.
Is there anyway this could be made using QtWebKit? If not then the program used should be configurable as some might have another text mode browser installed. And what is the real usecase for this? HTML display only uses html elements present in the actual mail and executes no scripts or does any access to the internet without being told so by the user.
(In reply to comment #15) > Is there anyway this could be made using QtWebKit? To my knowledge no, as WebKit is made for graphical presentation of HTML code and you still would need some translation step. Using lynx would avoid re-inventing the wheel. > If not then the program used should be configurable as some might have another > text mode browser installed. That would be possible. My patch so far is just a proof-of-concept and can be expanded. My idea would be to keep it automagically if possible, i.e automatically falling back to e.g. links2 if lynx is not available. > And what is the real usecase for this? > HTML display only uses html elements present in the actual mail and executes no > scripts or does any access to the internet without being told so by the user. It is not only about malicious HTML code, but also about really badly formatted HTML code such as spam, mails from people with bad taste regarding fonts and colors, purists who despise any formatting at all, but like to keep a basic text structure etc.
If I understand correctly this is for those who get html only mail (against the RFC:s) and who wish to overrule the formatting. If this is true then I wont' accept this patch as it adds more then it solves. If not could you please clarify for me the real usage?
I write this script, need be refined in case of "<img" tag when is slited into 2 different lines. This script remove any type of html tag keeping links to images as text, links to url is keeped as text. I hope was usefull to the comunity. #!/usr/bin/perl # script que convierte los emails HTML a TXT # ideal para usar en kmail # # # $block=0 // no se acarrea nada del renglon anterior # $block=1 // se esta buscando una URL (ej: <img src=http:xxxx >) que fue iniciado en otro renglon # $block=2 // se esta buscando un FIN de tag HTML que fue iniciado en otro renglon. # $filtrado =0; #Si ha habido alguna linea filtrada $activo =1; #Debe parsear el mail $special =0; # $block =0; #Se esta borrando un block de renglones $block_fin =""; $str_fin =">"; @htmltags=( "<html", "</html", "<body", "</body", "<table", "</table", "<tr", "</tr", "<td", "</td", "<hr", "<pre", "</hr", "<b", "</b", "<p", "</p", "<!--", "</a", "<span", "</span", "<font", "</font", "<style", "<script", ); @htmltags2=( ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", ">", "</style>", "</script>", ); @htmlspecial1=( "<a", "<img", ); @htmlspecial2=( " href=", " src=", ); @htmlspecial3=( ">", ">", ); open(IN,"/dev/stdin"); #open(IN,"kmail-samle-mail.txt"); while(<IN>){ $reng=$_; chomp $reng; print "\nIN :$reng\n"; $reng=~s/<br>/\n/; if( $block==1 ){ $reng=strip_url($reng,$block_fin); } if( $block==2 ){ $reng=strip_fin($reng,$block_url,$block_fin); } if( $block==0 ){ $tag_n=0; foreach $tag (@htmltags){ $str_fin=$htmltags2[$tag_n]; while($reng=~/$tag/){ $reng=strip($tag,$reng,$str_fin); } $tag_n++; } $tag_n=0; while($reng=~/\<img/){ my $tag ="<img"; my $str_url ="src="; my $str_fin =">"; $reng=strip_special($reng,$tag,$str_url,$str_fin); } } # if( length($reng)>0){ print "b=$block : $reng\n"; print "$reng\n"; # }; # if( $filtrado>20){ # exit 0; # } print " :"; } close IN; if( $filtrado!=0){ print "filtrado por kmail-html-strip\n"; } ####################################################################################### # sub strip{ my $tag =$_[0]; my $reng =$_[1]; my $str_fin =$_[2]; my $r_len =length($reng); my $t_len =length($tag); my $s_len =length($str_fin); my $inicio =0; while(substr($reng,$inicio,$t_len) ne $tag && $inicio<= $r_len){$inicio++;} $fin=$inicio+$s_len; while(substr($reng,$fin,$s_len) ne $str_fin && $fin<= $r_len){$fin++;} $reng=substr($reng,0,$inicio).substr($reng,$fin+$s_len); if($fin>$r_len && substr($reng,$fin,$s_len) ne $str_fin){ $block_fin=$str_fin; $block=2; } $filtrado++; return $reng } ################ # # En caso de tag abierto antes # sub strip_fin{ my $reng =$_[0]; my $str_fin =$_[1]; my $r_len =length($reng); my $fin =0; my $s_len =length($str_fin); # while(substr($reng,$fin,1) ne ">" && $fin<= $r_len){$fin++;} while(substr($reng,$fin,$s_len) ne $str_fin && $fin<= $r_len){$fin++;} $reng=substr($reng,$fin+$s_len); if($fin>$r_len && substr($reng,$fin,1) ne ">"){ $block=2; }else{ $block=0; } return $reng; } sub strip_url{ my $reng =$_[0]; my $str_url =$_[1]; my $str_fin =$_[2]; my $r_len =length($reng); my $t_len =length($tag); my $u_len =length($str_url); my $f_len =length($str_fin); my $url_ini =0; my $url_fin =0; my $tag_fin =0; while(substr($reng,$url_ini,$u_len) ne $str_url && $url_ini<= $r_len){$url_ini++;} print "tag_ini=--- url_ini=$url_ini url_fin=---- tag_fin=----- r_len=$r_len\n"; if($url_ini>$r_len && substr($reng,$url_ini,$u_len) ne $str_url){ # solo se inicio el tag sin encontrarse URL $block_url=$str_fin; $block_fin=$str_fin; $block=1; $reng=""; # $reng= substr($reng,0,$tag_ini); return } $block=0; $url_ini=$url_ini+$u_len; $url_fin=$url_ini+$u_len+1; while($url_fin<=$r_len){ $tp=substr($reng,$url_fin,1); if ( $tp eq "\"" || $tp eq " " || $tp eq ">"){ last; } $url_fin++; } $tag_fin=$url_fin; while(substr($reng,$tag_fin,$f_len) ne $str_fin && $tag_fin<= $r_len){$tag_fin++;} if($tag_fin>$r_len && substr($reng,$tag_fin,$f_len) ne $str_fin){ $block_fin=$str_fin; $block=2; } print "tag_ini=--- url_ini=$url_ini url_fin=$url_fin tag_fin=$tag_fin r_len=$r_len\n"; $reng =substr($reng,$url_ini,$url_fin-$url_ini).substr($reng,$tag_fin); } ######################################### # # sub strip_special{ print "special strip\n"; my $reng =$_[0]; my $tag =$_[1]; my $str_url =$_[2]; my $str_fin =$_[3]; # my $tag ="<img"; # my $str_url ="src="; # my $str_fin =">"; my $delimiter =""; my $r_len =length($reng); my $t_len =length($tag); my $u_len =length($str_url); my $f_len =length($str_fin); my $tag_ini =0; my $url_ini =0; my $url_fin =0; my $tag_fin =0; while(substr($reng,$tag_ini,$t_len) ne $tag && $tag_ini<= $r_len){$tag_ini++;} $url_ini=$tag_ini+$t_len; while(substr($reng,$url_ini,$u_len) ne $str_url && $url_ini<= $r_len){$url_ini++;} if($url_ini>$r_len && substr($reng,$url_ini,$u_len) ne $str_url){ # solo se inicio el tag sin encontrarse URL $block_url=$str_fin; $block_fin=$str_fin; $block=1; $reng= substr($reng,0,$tag_ini); } else { $url_ini=$url_ini+$u_len; $url_fin=$url_ini+$u_len+1; while($url_fin<=$r_len){ $tp=substr($reng,$url_fin,1); if ( $tp eq "\"" || $tp eq " " || $tp eq ">"){ last; } $url_fin++; } $tag_fin=$url_fin; while(substr($reng,$tag_fin,$f_len) ne $str_fin && $tag_fin<= $r_len){$tag_fin++;} if($tag_fin>$r_len && substr($reng,$tag_fin,$f_len) ne $str_fin){ $block_fin=$str_fin; $block=2; } $reng =substr($reng,0,$tag_ini).substr($reng,$url_ini,$url_fin-$url_ini).substr($reng,$tag_fin+$f_len); } print "tag_ini=$tag_ini url_ini=$url_ini url_fin=$url_fin tag_fin=$tag_fin r_len=$r_len\n"; # print "$reng\n"; return $reng; }
Torgny in #17, I am going blind. I want a screen reader. It will be easier to get a screen reader working properly if the intermediate step proposed above is already working. Sadly other people _send_ emails that do not conform to RFCs, or my taste. :-( There is nothing that we can do to stop them. I would strongly oppose generating non-standard messages. Do you really object to displaying messages any way that the recipient likes, or even needs? Christian in #18, How is your script supposed to be installed and used? Should it just be copied to somewhere on $PATH?
(In reply to comment #19) > I want a screen reader. It will be easier to get a screen reader working > properly if the intermediate step proposed above is already working. > > Sadly other people _send_ emails that do not conform to RFCs, or my taste. :-( Use case for helping screen readers: Yes that is a valid argument in my book. Fixing mails that do not conform to personal taste is not. > There is nothing that we can do to stop them. I would strongly oppose > generating non-standard messages. Do you really object to displaying messages > any way that the recipient likes - yes, the formatting of a mail is up to the sender to decide, also any reformatting means more code for us to maintain. >, or even needs? - no, due to that I'll see what I can do.
Torgny, my script is located into /home/user/bin I configure a filter rule: All email from certain sender pass through pipe. So the email come in from STDIN, and out striped to STDOUT. Warning, is experimental but functional. i accept any help and suggestion about this.
Git commit a291bd83397eb74d3a2dde3a5d04ba00b6190d7a by Montel Laurent. Committed on 06/01/2015 at 07:03. Pushed by mlaurent into branch 'KDE/4.14'. Fix Bug 44880 - would like to be able to strip tags from html messages FIXED-IN: 14.12.1 M +15 -6 messageviewer/viewer/objecttreeparser.cpp http://commits.kde.org/kdepim/a291bd83397eb74d3a2dde3a5d04ba00b6190d7a