bug-unrtf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-unrtf] rtf with subindex to html conversion


From: Jordi Miguel
Subject: [bug-unrtf] rtf with subindex to html conversion
Date: Wed, 13 Oct 2010 10:50:47 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100913 Icedove/3.0.7

Hello,

We've been using unrtf 0.21.1 and we have found what we supose is a bug. When passing this rtf string to be converted to HTML:

{\rtf1\ansi\ansicpg1252\deff0{\fonttbl{\f0\fnil\fcharset0 Univers;}{\f1\fnil\fcharset128 Arial Unicode MS;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\tx0\lang3082\f0\fs20 regi\'f3 \cf1\f1\fs18 C\sub\'83\'c8\cf0\f0\fs20 \par }

the output generated is:

<!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN>
<html>
<head>
<meta http-equiv=content-type content=text/html charset=utf-8>
<!-- Translation from RTF performed by UnRTF, version 0.21.1 -->
<!--font table contains 2 fonts total-->
</head>
<body><font face=Univers><font size=2>regi&#243; <font color=#000000><font face=Arial Unicode MS><span style=font-size:9pt>C<sub>&#402;&#200;<font color=#000000><font face=Univers><br>
</font></font></sub></span></font></font></font></font></body>
</html>

There is an incorrectly parsed subindex whish is showing bad characters. Using OpenOffice for the conversion it generates the correct ouput:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
<TITLE></TITLE>
<META NAME="GENERATOR" CONTENT="OpenOffice.org 2.4  (Unix)">
<META NAME="CREATED" CONTENT="0;0">
<META NAME="CHANGED" CONTENT="0;0">
<STYLE TYPE="text/css">
<!--
@page { size: 8.5in 11in; margin-right: 1.25in; margin-top: 1in; margin-bottom: 1in }
                P { margin-bottom: 0.08in }
        -->
</STYLE>
</HEAD>
<BODY LANG="en-US" DIR="LTR" STYLE="border: none; padding: 0in">
<P STYLE="margin-bottom: 0in"> <FONT SIZE=2><SPAN LANG="es-ES">regi&oacute;
</SPAN></FONT><FONT COLOR="#000000"><FONT FACE="Arial Unicode MS, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><SPAN LANG="es-ES">C</SPAN></FONT></FONT></FONT><FONT COLOR="#0000 00"><SUB><FONT FACE="Arial Unicode MS, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><SPAN LANG="es-ES">&kappa;</SPAN></FONT></FONT></SUB></FONT></P>
</BODY>
</HTML>


Why is this problem appearing??
Does the strange codification of the original RTF string matters?

And most important, how can we solve it?? We are developers so if you can point where is the problem on the code and what solution to apply we're happy to help.


Thanks,
Jordi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]