[reportlab-users] Incorrect character composition
Glenn Linderman
v+python at g.nevcal.com
Tue Apr 21 06:50:26 EDT 2015
On 4/21/2015 2:51 AM, Robin Becker wrote:
> Glenn,
>
> my reading of the control sequence(s) is that these glyphs are being
> individually positioned in PDF; I see 12 separate Tm operators.
I agree.
> I ideally we should see a single BT with a string containing 14 bytes
> which would imply that acrobat handles all the glyph positioning.
I think we are on the same wavelength here, but I think you meant to say
"Adobe Reader (or other PDF display tool)" where you said "Acrobat". I
think it is the case that "Acrobat", (or other PDF generation tool), is
doing all the positioning, and encoding it into the PDF file.
The below seems to be referring to the Nuance generated file, the
Acrobat file used HEX codes.
"Ideally", of course, refers to the way it should work if the PDF
viewer's renderer was responsible for combined glyph positioning. Of
course, if it was, it should also be responsible for rendering the
kerning too, and then you wouldn't be able to do right justification
very well... it would have to be predicted in one place and matched in
the other... so I think the PDF technique is to have the viewer only
convert curves to pixels, following instructions by the PDF creator as
to where those curves should be placed, actually produces more
consistent results across platforms and devices... as much as it hurts
to have to do the calculations for the Td or Tm parameters when
generating the PDF.
>
> I believe that the text strings are actually using two bytes per
> glyph; the map looks like
>
> 6 beginbfchar
> <006d> <00e3>
> <047a> <0303>
> <0690> <0186>
> <0699> <0190>
> <0727> <0254>
> <072d> <025b>
> endbfchar
Ah, yes, I missed looking at the map... so I was unaware that it was
legal to use the character codes themselves in the <>, I thought <> was
only for HEX codes... but then again, that was just by observation of
various PDF files, not from the spec... And I've not tried to understand
very many.
>
> so the byte strings required correspond to the first of each pair.
>
> 006d = 00 m = \000m
> 047a = 04 z = ^Dz the tilde
> 06?? = 06 ?? = ^f?
> 0727 = 07 ' = ^G'
> 072d = 07 - = ^G-
>
> etc etc. My mailer can't actually cope with the odd characters in the
> 06 lines.
Understood... my mailer seemed to drop those control characters, also.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150421/973b4cbe/attachment.html>
More information about the reportlab-users
mailing list