[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31665: libxml-parse-html-region' doesn't extract text in tables

From: 積丹尼 Dan Jacobson
Subject: bug#31665: libxml-parse-html-region' doesn't extract text in tables
Date: Mon, 30 Sep 2019 00:52:40 +0800

>>>>> "LI" == Lars Ingebrigtsen <larsi@gnus.org> writes:
LI> 積丹尼 Dan Jacobson <jidanni@jidanni.org> writes:

>>>>>>> "LI" == Lars Ingebrigtsen <larsi@gnus.org> writes:
LI> Do you have an example table that `libxml-parse-html-region' doesn't
LI> "extract" text from?
>> OK here is a mail that I cleaned off my personal phone bill from:

LI> What was it you think is missing from that table?  I don't read Chinese,
LI> but there didn't seem to be any text in that table, just a bunch of
LI> images.

It should look like:

 |                             |
                                     | |                             |
 |                             |
 | |                             |
||||                                    |親愛的客戶,您好:                   |          
                          | | |                             |
||||                                    |-------------------------------------| 
                                   | | |                             |
||||                                    |為保障您資料的安全,請輸入密碼開啟附 |                   
                 | | |                             |
||||                                    |加檔案瀏覽您本期的帳單,密碼為『身分 |                   
                 | | |                             |
||||               [IS1]                |證號碼』(英文字母須大寫),營業人客戶 |               
[IS2]                | | |                             |
||||                                    |不需輸入密碼即可瀏覽。               |            
                        | | |                             |
||||                                    |若無法開啟附加檔案,請先確認是否已下 |                   
                 | | |                             |
||||                                    |載Acrobat Reader軟體。               |     
                               | | |                             |
||||                                    |-------------------------------------| 
                                   | | |                             |
 | |                             |
 |                             |
                                       |                             |
                                       |                             |
                                       |                             |
                                      ||                             |
||||                                                        |                   
                             [enf201]|||                             |
||||[end101]                                                |                   
                             [enl301]|||                             |
||||                                                        |                   
                             [enl401]|||                             |
                                       |                             |
                                       |                             |
                                       |                             |
 |                             |
                                     | |                             |
 |                             |
                                     | |                             |
                                     | |                             |
                                     | |                             |
 |                             |
                                       |                             |
                                       |                             |
                                       |                             |
 |                             |
                                     | |                             |
                                     | |                             |
||||||            |                |                |                |          
                                     | |                             |
                                     | |                             |
||||||電子帳單Q&A |    費率說明    |  客戶消費資訊  |    線上繳費    |                            
                   | |                             |
                                     | |                             |
||||||  服務專線  |    貼心提醒    |不可不知行動優惠| HiNet好康優惠  |                              
                 | |                             |
                                     | |                             |
 |                             |
                                       |                             |
                                       |                             |
                                       |                             |
 |                             |
|||                                                      [cht]                  
                                     | |                             |
 |                             |

But instead all we get is:

From: Phone Co. <p@cht.com.tw>
Subject: Phone Bill
To: "jidanni@jidanni.org" <jidanni@jidanni.org>
Date: Thu, 17 May 2018 12:12:06 +0800
Reply-To: x@cht.com.tw

[1. text/html]


reply via email to

[Prev in Thread] Current Thread [Next in Thread]