56 lines
2.2 KiB
Plaintext
56 lines
2.2 KiB
Plaintext
|
xmlformat bugs
|
||
|
|
||
|
Ruby version is slower than the Perl version, a difference that shows up
|
||
|
particularly for larger documents. I haven't done any profiling to determine
|
||
|
which parts of the program account for the differences.
|
||
|
|
||
|
----------
|
||
|
Within normalized elements, the line-wrapping algorithm preserves inline
|
||
|
tags, but it doesn't take any internal line-breaks within those tags into
|
||
|
account in its line-length calculations. Consider this element:
|
||
|
|
||
|
<para>
|
||
|
This is text with an inline <inline-element attr="This is
|
||
|
an attribute
|
||
|
value">element</inline-element> in the middle.
|
||
|
</para>
|
||
|
|
||
|
The opening <inline-element> tag is considered to have a length
|
||
|
equal to its total number of characters. The line-wrapping algorithm
|
||
|
could be made more complex to take the line-breaks into account.
|
||
|
I haven't bothered, and may never bother. If such a change is made, no
|
||
|
indenting should be applied that would change the attribute value.
|
||
|
|
||
|
----------
|
||
|
Line-wrapping length calculations don't take into account the possibility
|
||
|
that text from a different element may occur on the same line if break
|
||
|
values are set to zero. For example, with a wrap-length of 15, you could
|
||
|
end up with output like this:
|
||
|
|
||
|
<listitem><para>This is a line
|
||
|
of text.</para></listitem><listitem><para>This is a line
|
||
|
of text.</para></listitem>
|
||
|
|
||
|
The middle line has more than 15 characters of text.
|
||
|
|
||
|
This also shows that wrap-length doesn't take into account the length of
|
||
|
tags of enclosing blocks on the same line, which can also be considered a
|
||
|
bug.
|
||
|
|
||
|
Fix: Set all your break values > 0.
|
||
|
|
||
|
----------
|
||
|
Normalization converts runs of spaces to single spaces. This means that
|
||
|
if you write two spaces after periods in text, normalization will
|
||
|
convert them to single spaces. Even if normalization didn't do that,
|
||
|
the two spaces would be lost if line-wrapping is enabled and they occur
|
||
|
at a line break.
|
||
|
|
||
|
I don't know if this is really a bug, but it's something to be aware of.
|
||
|
|
||
|
----------
|
||
|
Doesn't recognize multi-byte files. In some cases, you can work around this.
|
||
|
For example, an editor might save a file as Unicode even when the document
|
||
|
contains only ASCII characters. Re-save the file as an ASCII file (or
|
||
|
some single-byte encoding such as ISO-8859-1) and it should work.
|