Community Discussions and Support
Non ANSI in mail merge data

I have used Pegasus for many years and am now on 4.63. Once a year I make use of the mail merge facility to send Christmas cards to our clients. I went to a lot of trouble to keep the graphic small so as not to irritate anyone and feedback has been very positive.


We also personalise them with a brief message, stored with the address and name in the merge data file. In past years the messages have all bee in English, but this year a colleague thought it would be extra friendly to provide Chinese and Korean script.


I know I can send Chinese script successfully, I have chosen to use UTF-8 as the default for all my mail.


I also know that the data file successfully contains the Chinese characters, encoded UTF-8.


Sadly the merge fails, the Chinese script is garbled. So I suspect that the code which reads the merge data file assumes ANSI and cannot cope with anything else. I cannot find any guidance on the subject in the help file.


Two questions:


  1. Is it possible to use a non-ANSI data file, i.e. is further experimentation futile ?

  2. What encoding will work, or how do I flag the presence of wide character data to the process which reads the merge data?


I guess an answer would require someone to know the OS command used to read strings in from the data file. To keep the answer short and not too onerous, just knowing the function used might allow me to figure it out.


Many thanks to anyone who can throw light on it.


I have used Pegasus for many years and am now on 4.63. Once a year I make use of the mail merge facility to send Christmas cards to our clients. I went to a lot of trouble to keep the graphic small so as not to irritate anyone and feedback has been very positive. We also personalise them with a brief message, stored with the address and name in the merge data file. In past years the messages have all bee in English, but this year a colleague thought it would be extra friendly to provide Chinese and Korean script. I know I can send Chinese script successfully, I have chosen to use UTF-8 as the default for all my mail. I also know that the data file successfully contains the Chinese characters, encoded UTF-8. Sadly the merge fails, the Chinese script is garbled. So I suspect that the code which reads the merge data file assumes ANSI and cannot cope with anything else. I cannot find any guidance on the subject in the help file. Two questions: 1. Is it possible to use a non-ANSI data file, i.e. is further experimentation futile ? 2. What encoding will work, or how do I flag the presence of wide character data to the process which reads the merge data? I guess an answer would require someone to know the OS command used to read strings in from the data file. To keep the answer short and not too onerous, just knowing the function used might allow me to figure it out. Many thanks to anyone who can throw light on it.

Without knowing anything about how mail merge works and assuming it doesn't provide a way for submitting a content-type header indicating the charset you're using you may want to add a a leading UTF-8-BOM to your already encoded UTF-8 text, see here for details: https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark. While I know that Pegasus Mail can handle UTF-8-BOMs when extracting text from MIME messages I can't tell about its capabilities regarding mail merge. But it definitely cannot read UNICODE proper i.e. 16bit characters unless being UTF-8 encoded. Whether it can deal with UTF-16-BOMs internally I don't know, the HTML renderer I'm providing for Pegasus Mail is capable of processing it but only comes into play for displaying messages.


HTH


PS: I would be interested in a "broken" merge message to see what's going wrong with processing it. I guess that Pegasus Mail just provides a mismatching content-type header charset like US-ASCII since UTF-8 actually looks like US-ASCII so resetting US-ASCII to UTF-8 might fix it. If you can share a sample message with me/us please send a sample attached to a new message using this thread's name as the subject to <beta-reports [at] pmail.gen.nz>


Without knowing anything about how mail merge works and assuming it doesn&#039;t provide a way for submitting a content-type header indicating the charset you&#039;re using you may want to add a a leading UTF-8-BOM to your already encoded UTF-8 text, see here for details: https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark. While I know that Pegasus Mail can handle UTF-8-BOMs when extracting text from MIME messages I can&#039;t tell about its capabilities regarding mail merge. But it definitely cannot read UNICODE proper i.e. 16bit characters unless being UTF-8 encoded. Whether it can deal with UTF-16-BOMs internally I don&#039;t know, the HTML renderer I&#039;m providing for Pegasus Mail is capable of processing it but only comes into play for displaying messages. HTH PS: I would be interested in a &quot;broken&quot; merge message to see what&#039;s going wrong with processing it. I guess that Pegasus Mail just provides a mismatching content-type header charset like US-ASCII since UTF-8 actually looks like US-ASCII so resetting US-ASCII to UTF-8 might fix it. If you can share a sample message with me/us please send a sample attached to a new message using this thread&#039;s name as the subject to &lt;beta-reports [at] pmail.gen.nz&gt;
			Michael
--
IERenderer's Homepage
PGP Key ID (RSA 2048): 0xC45D831B
S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C
edited Dec 21 '23 at 4:22 pm

Thanks for the response and sorry for the long silence. I needed to be back at work to make the trials.


I use Notepad++ which makes it very easy to switch around encodings. Using UTF-8-BOM broke the merge completely; no jobs were generated by Pegasus.


I also tried UTF-16 LE BOM because in the past I have found this encoding to work successfully when wanting special characters in an ini file used for configuration. (That may just be property of the scripting language I was using). No jobs generated.


I have uploaded two files, one is the mail merge data file encoded UTF-8. The other is an email I received as a result of using this data file. I have just attached the mail with Chinese (not the Korean).


I have no difficulty sending emails with Chinese characters manually from Pegasus. It is just the merge that fails, which I suspect means that the code which reads the merge data assumes ANSI


Thanks for the response and sorry for the long silence. I needed to be back at work to make the trials. I use Notepad++ which makes it very easy to switch around encodings. Using UTF-8-BOM broke the merge completely; no jobs were generated by Pegasus. I also tried UTF-16 LE BOM because in the past I have found this encoding to work successfully when wanting special characters in an ini file used for configuration. (That may just be property of the scripting language I was using). No jobs generated. I have uploaded two files, one is the mail merge data file encoded UTF-8. The other is an email I received as a result of using this data file. I have just attached the mail with Chinese (not the Korean). I have no difficulty sending emails with Chinese characters manually from Pegasus. It is just the merge that fails, which I suspect means that the code which reads the merge data assumes ANSI
edited Jan 2 at 10:50 am

I have uploaded two files


I'm sorry but I didn't get any files, where did you to upload them to?


[quote=&quot;pid:56262, uid:15184&quot;]I have uploaded two files[/quote] I&#039;m sorry but I didn&#039;t get any files, where did you to upload them to?
			Michael
--
IERenderer's Homepage
PGP Key ID (RSA 2048): 0xC45D831B
S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C

I definitely uploaded them, but I should then have checked. I'll try again.


The interface definitely showed an upload taking place this time, but I see no indication that they are here. Maybe only inline images are allowed.


I've just tried a jpg and the image appeared. (I removed it again)


I'll try a zip: post.zip


Ah that looks hopeful. You should find the files I tried to send inside.


I definitely uploaded them, but I should then have checked. I&#039;ll try again. The interface definitely showed an upload taking place this time, but I see no indication that they are here. Maybe only inline images are allowed. I&#039;ve just tried a jpg and the image appeared. (I removed it again) I&#039;ll try a zip: [post.zip](serve/attachment&amp;path=659686ba43b23) Ah that looks hopeful. You should find the files I tried to send inside.

You should find the files I tried to send inside.


Thanks, got them and tested using commandline message sending so I think I understand what's happening: Since the UTF-8 body submitted to Pegasus Mail still contains 8-bit characters it applies an additional UTF-8 encoding instead of just applying quoted-printable enveloping and HTML entity encoding which results in double UTF-8 encoding to the result of becoming unreadable for the recipients. IOW: The only way around this would be to manually edit such messages or sending the customized greetings as attachments. I'm afraid. I can provide a "fixed" version if you want me to so you just see what it would look like, please let me know.


[quote=&quot;pid:56268, uid:15184&quot;]You should find the files I tried to send inside.[/quote] Thanks, got them and tested using commandline message sending so I think I understand what&#039;s happening: Since the UTF-8 body submitted to Pegasus Mail still contains 8-bit characters it applies an additional UTF-8 encoding instead of just applying quoted-printable enveloping and HTML entity encoding which results in double UTF-8 encoding to the result of becoming unreadable for the recipients. IOW: The only way around this would be to manually edit such messages or sending the customized greetings as attachments. I&#039;m afraid. I can provide a &quot;fixed&quot; version if you want me to so you just see what it would look like, please let me know.
			Michael
--
IERenderer's Homepage
PGP Key ID (RSA 2048): 0xC45D831B
S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C

Thanks again. I had to have a bit of a think and mess with bit fields and Wikipedia descriptions, but I think I understand what you mean.


The essential problem seems to be that the mail merge assumes 1 byte is one character in the data file. If it's an 8 bit byte, it will correctly encode it UTF-8 (in this case) and then make it quoted printable, but the damage is already done by the 1 byte 1 character assumption.


I can send the messages manually no problem, but it kind of defeats the object. For now I may just ask my colleagues to refrain from being smart-arse and make their customised greetings in English, as they have done in previous years.


I'm not sure what is meant by a fixed version. Do you mean a patched version of Pegasus?


If I could be bothered I guess I could write a script to put correct output files into the send queue bypassing the Pegasus merge. However, I don't need any of this now until next Christmas!


Thanks again. I had to have a bit of a think and mess with bit fields and Wikipedia descriptions, but I think I understand what you mean. The essential problem seems to be that the mail merge assumes 1 byte is one character in the data file. If it&#039;s an 8 bit byte, it will correctly encode it UTF-8 (in this case) and then make it quoted printable, but the damage is already done by the 1 byte 1 character assumption. I can send the messages manually no problem, but it kind of defeats the object. For now I may just ask my colleagues to refrain from being smart-arse and make their customised greetings in English, as they have done in previous years. I&#039;m not sure what is meant by a fixed version. Do you mean a patched version of Pegasus? If I could be bothered I guess I could write a script to put correct output files into the send queue bypassing the Pegasus merge. However, I don&#039;t need any of this now until next Christmas!

The essential problem seems to be that the mail merge assumes 1 byte is one character in the data file. If it's an 8 bit byte, it will correctly encode it UTF-8 (in this case) and then make it quoted printable, but the damage is already done by the 1 byte 1 character assumption.


Here's what I tried: If having an online converter do the UTF-8 conversion of the Chinese the result starts with the following ten hex bytes (in quoted-printable encoding):


=E6=9D=A5=E8=87=AA=E8=8B=B1=E5

If you let your text editor write the Chinese to file in UTF-8 format the result of converting the this UTF-8 from the file into quoted-printable starts with these ten hex bytes:


=C3=A6=C2=9D=C2=A5=C3=A8=E2=80

If sending the first one via email it get's properly decoded, the second one is what Pegasus Mail currently creates misreading the file contents as 8bit text for conversion to UTF-8 (来自英å, copied using a hex editor): It just doesn't get a chance to do proper encoding because it never gets to see the original Chinese for recognizing multibyte characters.


The only way to get around this would be by providing the UTF-8 to Pegasus Mail without having it try another encoding or providing Unicode for converting it to UTF-8 - none of these is currently implemented (and I can't patch it).


If I could be bothered I guess I could write a script to put correct output files into the send queue bypassing the Pegasus merge. However, I don't need any of this now until next Christmas!


But then you might want to think about cross-cultural implications: How many Chinese people would care about celebrating Christmas at all ...? Wikipedia says they make up just 5.2% in the PRC ...


[quote=&quot;pid:56270, uid:15184&quot;]The essential problem seems to be that the mail merge assumes 1 byte is one character in the data file. If it&#039;s an 8 bit byte, it will correctly encode it UTF-8 (in this case) and then make it quoted printable, but the damage is already done by the 1 byte 1 character assumption.[/quote] Here&#039;s what I tried: If having an online converter do the UTF-8 conversion of the Chinese the result starts with the following ten hex bytes (in quoted-printable encoding): ```` =E6=9D=A5=E8=87=AA=E8=8B=B1=E5 ```` If you let your text editor write the Chinese to file in UTF-8 format the result of converting the this UTF-8 from the file into quoted-printable starts with these ten hex bytes: ```` =C3=A6=C2=9D=C2=A5=C3=A8=E2=80 ```` If sending the first one via email it get&#039;s properly decoded, the second one is what Pegasus Mail currently creates misreading the file contents as 8bit text for conversion to UTF-8 (&aelig;&yen;&egrave;&Dagger;&ordf;&egrave;&lsaquo;&plusmn;&aring;, copied using a hex editor): It just doesn&#039;t get a chance to do proper encoding because it never gets to see the original Chinese for recognizing multibyte characters. The only way to get around this would be by providing the UTF-8 to Pegasus Mail without having it try another encoding or providing Unicode for converting it to UTF-8 - none of these is currently implemented (and I can&#039;t patch it). [quote=&quot;pid:56270, uid:15184&quot;]If I could be bothered I guess I could write a script to put correct output files into the send queue bypassing the Pegasus merge. However, I don&#039;t need any of this now until next Christmas![/quote] But then you might want to think about cross-cultural implications: How many Chinese people would care about celebrating Christmas at all ...? Wikipedia says they make up just 5.2% in the PRC ...
			Michael
--
IERenderer's Homepage
PGP Key ID (RSA 2048): 0xC45D831B
S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C

ut then you might want to think about cross-cultural implications: How many Chinese people would care about celebrating Christmas at all ...? Wikipedia says they make up just 5.2% in the PRC ...


These are people we know and it's the thought that counts! The Chinese are also not the only ones whose characters require more than 8 bits.


I think we both understand whats happening and I don't think it can be fixed without access to the source code for the mail merge.


If I do write my own merge, maybe I'll post then, in case anyone else needs it. However, this is not likely until the end of the year, so thanks for the help and I guess this thread is done for now.


[quote=&quot;pid:56271, uid:2133&quot;]ut then you might want to think about cross-cultural implications: How many Chinese people would care about celebrating Christmas at all ...? Wikipedia says they make up just 5.2% in the PRC ...[/quote] These are people we know and it&#039;s the thought that counts! The Chinese are also not the only ones whose characters require more than 8 bits. I think we both understand whats happening and I don&#039;t think it can be fixed without access to the source code for the mail merge. If I do write my own merge, maybe I&#039;ll post then, in case anyone else needs it. However, this is not likely until the end of the year, so thanks for the help and I guess this thread is done for now.

These are people we know and it's the thought that counts!


OK, that's a different thing, but AFAIK officially they don't have an easy status in the PRC.


The Chinese are also not the only ones whose characters require more than 8 bits.


You're at least the first one ever asking for support regarding this issue, AFAIK.


[quote=&quot;pid:56280, uid:15184&quot;]These are people we know and it&#039;s the thought that counts![/quote] OK, that&#039;s a different thing, but AFAIK officially they don&#039;t have an easy status in the PRC. [quote=&quot;pid:56280, uid:15184&quot;]The Chinese are also not the only ones whose characters require more than 8 bits.[/quote] You&#039;re at least the first one ever asking for support regarding this issue, AFAIK.
			Michael
--
IERenderer's Homepage
PGP Key ID (RSA 2048): 0xC45D831B
S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C
live preview
enter atleast 10 characters
WARNING: You mentioned %MENTIONS%, but they cannot see this message and will not be notified
Saving...
Saved
With selected deselect posts show selected posts
All posts under this topic will be deleted ?
Pending draft ... Click to resume editing
Discard draft