Community Discussions and Support
Non-English dictionaries for Pegasus Mail

I just test TinySpell1.7 it works fine!

 

http://www.tinyspell.m6.net/

<p>I just test TinySpell1.7 it works fine!</p><p> </p><p>http://www.tinyspell.m6.net/</p>

Hello,

 On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?

 And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.

 Regards,

 Gianluca
 

<p>Hello,</p><p> On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?</p><p> And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.</p><p> Regards,</p><p> Gianluca  </p>

David Harris once wrote about creating an external dictionary: 

[quote]

In order to create a dictionary, I need the following:

1: A word list. Ideally, one word per line, but I can massage just about
anything into the form I need. The word list should have between
60,000 and 200,000 words: fewer than 60,000 makes it nearly useless,
while more than 200,000 makes it too slow. The word list can contain
accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM
437) character sets. At present, no other character sets are supported
(this is for sorting reasons).

2: A list of the most commonly-used words in the language. This part
is critical for adequate performance. The list should contain between
800 and 2000 words, and those words should be the ones most likely
to be encountered in the e-mail environment (in English, this would be
words like "the", "and", "but", "at" and so on).

[/quote] 

But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.

 

<p>David Harris once wrote about creating an external dictionary: </p><p>[quote] </p><p>In order to create a dictionary, I need the following: 1: A word list. Ideally, one word per line, but I can massage just about anything into the form I need. The word list should have between 60,000 and 200,000 words: fewer than 60,000 makes it nearly useless, while more than 200,000 makes it too slow. The word list can contain accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM 437) character sets. At present, no other character sets are supported (this is for sorting reasons). 2: A list of the most commonly-used words in the language. This part is critical for adequate performance. The list should contain between 800 and 2000 words, and those words should be the ones most likely to be encountered in the e-mail environment (in English, this would be words like "the", "and", "but", "at" and so on). </p><p>[/quote] </p><p>But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that. </p><p> </p>

-- Han van den Bogaerde - support@vandenbogaerde.net Member of Pegasus Mail Support Group. My own Pegasus Mail related web information: http://www.vandenbogaerde.net/pegasusmail/

[quote user="Han v.d. Bogaerde"] 

But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.

[/quote]

Maybe I'm misunderstanding your reply... [:)]

You quoted David Harris who wrote that the word list can contain ISO-8859-1 extended characters, but you wrote that the program cannot handle any non ascii characters, so I'm a bit puzzled from these statements.

 Please, can you clarify?

 Thanks,

 Gianluca
 

[quote user="Han v.d. Bogaerde"] <p>But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that. </p><p>[/quote]</p><p>Maybe I'm misunderstanding your reply... [:)]</p><p>You quoted David Harris who wrote that the word list can contain ISO-8859-1 extended characters, but you wrote that the program cannot handle any non ascii characters, so I'm a bit puzzled from these statements.</p><p> Please, can you clarify?</p><p> Thanks,</p><p> Gianluca  </p>

[quote user="Han v.d. Bogaerde"]

David Harris once wrote about creating an external dictionary: 

[quote]

In order to create a dictionary, I need the following:

1: A word list. Ideally, one word per line, but I can massage just about
anything into the form I need. The word list should have between
60,000 and 200,000 words: fewer than 60,000 makes it nearly useless,
while more than 200,000 makes it too slow. The word list can contain
accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM
437) character sets. At present, no other character sets are supported
(this is for sorting reasons).

2: A list of the most commonly-used words in the language. This part
is critical for adequate performance. The list should contain between
800 and 2000 words, and those words should be the ones most likely
to be encountered in the e-mail environment (in English, this would be
words like "the", "and", "but", "at" and so on).

[/quote] 

But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.

 

[/quote]

 

Han

I can say that Pegasus Mail 4.41 can handle characters beyond 127. We could create a French dictionary (it is included in our translation module since v4.41) and it works quite well.

It is able to detect *all* the mispellings (saying words that are not included in our word lists) but in a few case the words replacement list is empty or not complete.

The solution to do it was to change the word order in the lists: windows does not sort the words along their ascii code (for example é is after e and before f), if the 2 lists are sorted along their ascii code the dictionary created works, not perfectly but it works.
One year ago we exchanged a few mails with David to try to fix this problem, and he said that the solution to have a dictionary containing accented characters to work perfectly was to change the way PM handle accented characters, he planed to do it, waiting for this we decided to offer a French dictionary (it's better thant nothing). The big work was to find a list of the most used words in French, we found one of 128,000 words (with their appearance frequency) and kept the first 2,000.

Best regards


 

[quote user="Han v.d. Bogaerde"]<p>David Harris once wrote about creating an external dictionary: </p><p>[quote] </p><p>In order to create a dictionary, I need the following: 1: A word list. Ideally, one word per line, but I can massage just about anything into the form I need. The word list should have between 60,000 and 200,000 words: fewer than 60,000 makes it nearly useless, while more than 200,000 makes it too slow. The word list can contain accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM 437) character sets. At present, no other character sets are supported (this is for sorting reasons). 2: A list of the most commonly-used words in the language. This part is critical for adequate performance. The list should contain between 800 and 2000 words, and those words should be the ones most likely to be encountered in the e-mail environment (in English, this would be words like "the", "and", "but", "at" and so on). </p><p>[/quote] </p><p>But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that. </p><p> </p><p>[/quote]</p><p> </p><p>Han</p><p>I can say that Pegasus Mail 4.41 <u><b>can</b></u> handle characters beyond 127. We could create a French dictionary (it is included in our translation module since v4.41) and it works quite well.</p><p>It is able to detect *all* the mispellings (saying words that are not included in our word lists) but in a few case the words replacement list is empty or not complete.</p><p>The solution to do it was to change the word order in the lists: windows does not sort the words along their ascii code (for example é is after e and before f), if the 2 lists are sorted along their ascii code the dictionary created works, not perfectly but it works. One year ago we exchanged a few mails with David to try to fix this problem, and he said that the solution to have a dictionary containing accented characters to work perfectly was to change the way PM handle accented characters, he planed to do it, waiting for this we decided to offer a French dictionary (it's better thant nothing). The big work was to find a list of the most used words in French, we found one of 128,000 words (with their appearance frequency) and kept the first 2,000.</p><p>Best regards</p><p>  </p>

Hi Gianluca,

 obviously we know each other [:D]

I'm the main Italian translator for the program, and I also started to prepare the Italian dictionary, but now I'm a bit still in the work [:(]

If you want to collaborate with me, I've already got a list which must be checked and "cut" for the purpose. We could also prepare another word list from a new generation file, if you want.

I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.

 

<P>Hi Gianluca,</P> <P> obviously we know each other [:D]</P> <P>I'm the main Italian translator for the program, and I also started to prepare the Italian dictionary, but now I'm a bit still in the work [:(]</P> <P>If you want to collaborate with me, I've already got a list which must be checked and "cut" for the purpose. We could also prepare another word list from a new generation file, if you want.</P> <P>I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.</P> <P mce_keep="true"> </P>

[quote user="luctur"]And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.[/quote]

I would like nothing more than to replace the ageing spelling checker code in Pegasus Mail with something more internationalization-friendly.

In a perfect world, I would define a simple interface that could be met by any developer in an external DLL. Thinking off the top of my head, the DLL would only have to export about three functions

  • One to look up a word and return a TRUE or FALSE value for whether it could be found in the dictionary
  • One to return a list of suggested alternatives for a word not found in the search above
  • One to get information about language and settings

The key priorities would be that the lookup should be extremely fast, the dictionary should be as compact as possible, and there should be a wide range of languages supported. If the lookup could be done fast enough, I can easily enable real-time checking (with the squiggly red underline, like Word) with only a couple of lines of code (my own checking code is too slow to be able to do this effectively).

Note that this is really only a spelling checker - grammar extensions might be possible later, but initially, getting the spelling working nicely is the first priority.

If anyone would be willing to collaborate with me in developing something like this, please either followup in this thread, or mail me directly offline.

Cheers!

-- David --

<p>[quote user="luctur"]And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.[/quote] I would like nothing more than to replace the ageing spelling checker code in Pegasus Mail with something more internationalization-friendly. In a perfect world, I would define a simple interface that could be met by any developer in an external DLL. Thinking off the top of my head, the DLL would only have to export about three functions </p><ul><li>One to look up a word and return a TRUE or FALSE value for whether it could be found in the dictionary</li><li>One to return a list of suggested alternatives for a word not found in the search above</li><li>One to get information about language and settings</li></ul> The key priorities would be that the lookup should be extremely fast, the dictionary should be as compact as possible, and there should be a wide range of languages supported. If the lookup could be done fast enough, I can easily enable real-time checking (with the squiggly red underline, like Word) with only a couple of lines of code (my own checking code is too slow to be able to do this effectively). Note that this is really only a spelling checker - grammar extensions might be possible later, but initially, getting the spelling working nicely is the first priority. If anyone would be willing to collaborate with me in developing something like this, please either followup in this thread, or mail me directly offline. Cheers! -- David --

[quote user="Valter Mura"]

I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.

[/quote]

I've done it right now on your yahoo account. [:)]

Regards,

Gianluca
[quote user="Valter Mura"]<P>I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.</P>[/quote] I've done it right now on your yahoo account. [:)] Regards, Gianluca

[quote user="David Harris"]

The key priorities would be that the lookup should be extremely fast, the dictionary should be as compact as possible, and there should be a wide range of languages supported. If the lookup could be done fast enough, I can easily enable real-time checking (with the squiggly red underline, like Word) with only a couple of lines of code (my own checking code is too slow to be able to do this effectively).

Note that this is really only a spelling checker - grammar extensions might be possible later, but initially, getting the spelling working nicely is the first priority.

If anyone would be willing to collaborate with me in developing something like this, please either followup in this thread, or mail me directly offline.

[/quote]

I'm not a developer, but I've got a rather extensive experience in creating dictionaries and in using different spell checker tools.

AFAIK, MySpell/Hunspell spell checker is the tool you're looking for, because:

a) it's released under Lesser Gnu Public License too (see http://sourceforge.net/project/shownotes.php?release_id=383043&group_id=143754)  and therefore you have the source code and it can be used in proprietary software without issues;
b) it can be compiled into a simple .dll and used for real-time checking. si.Mail client (another client I'm testing) has implemented it in this way, see http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe
c) there are really a lot of available dictionaries (88, in this moment. See: http://wiki.services.openoffice.org/wiki/Dictionaries);
d) these dictionaries are usually small or smaller in size than other spell checker tools for the same language because they are "compressed" by using a prefix/suffix system that can be changed by simply editing a text file (the dictionary itself)
e) the spell checking phase (real-time too) is usually fast and not RAM-consuming, though this depend on size of dictionary you use.

If you want to change PegasusMail spell checking tool you should really have a look to Hunspell. You would spare a lot of time, both yours and dictionaries makers' one.
As a friend of mine once told me, it's easier to build an house when you already have the bricks... [:D]

Regards,

Gianluca

[quote user="David Harris"] The key priorities would be that the lookup should be extremely fast, the dictionary should be as compact as possible, and there should be a wide range of languages supported. If the lookup could be done fast enough, I can easily enable real-time checking (with the squiggly red underline, like Word) with only a couple of lines of code (my own checking code is too slow to be able to do this effectively). Note that this is really only a spelling checker - grammar extensions might be possible later, but initially, getting the spelling working nicely is the first priority. If anyone would be willing to collaborate with me in developing something like this, please either followup in this thread, or mail me directly offline. [/quote] I'm not a developer, but I've got a rather extensive experience in creating dictionaries and in using different spell checker tools. AFAIK, MySpell/Hunspell spell checker is <STRONG>the </STRONG>tool you're looking for, because: a) it's released under <STRONG>Lesser</STRONG> Gnu Public License too (see <A href="http://sourceforge.net/project/shownotes.php?release_id=383043&group_id=143754" mce_href="http://sourceforge.net/project/shownotes.php?release_id=383043&group_id=143754" target="_blank">http://sourceforge.net/project/shownotes.php?release_id=383043&group_id=143754</A>)  and therefore you have the source code and it can be used in proprietary software without issues; b) it can be compiled into a simple .dll and used for real-time checking. si.Mail client (another client I'm testing) has implemented it in this way, see <A href="http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe" mce_href="http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe" title="http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe">http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe</A> c) there are <STRONG>really </STRONG>a lot of available dictionaries (88, in this moment. See: <A href="http://wiki.services.openoffice.org/wiki/Dictionaries" mce_href="http://wiki.services.openoffice.org/wiki/Dictionaries" title="http://wiki.services.openoffice.org/wiki/Dictionaries">http://wiki.services.openoffice.org/wiki/Dictionaries</A>); d) these dictionaries are usually small or smaller in size than other spell checker tools for the same language because they are "compressed" by using a prefix/suffix system that can be changed by simply editing a text file (the dictionary itself) e) the spell checking phase (real-time too) is usually fast and not RAM-consuming, though this depend on size of dictionary you use. If you want to change PegasusMail spell checking tool you should really have a look to Hunspell. You would spare a lot of time, both yours and dictionaries makers' one. As a friend of mine once told me, it's easier to build an house when you already have the bricks... [:D] Regards, Gianluca

[quote user="luctur"]

Hello,

 On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?

 And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.

 Regards,

 Gianluca
 

[/quote]

 

You've gotten a lot of answers but since I'm the worlds worst two finger keyboard smasher I find that the program "As-U-Type" http://www.asutype.com/ works for me.  It's not free but it's cheap.  It has a number of different dictionaries 4 flavors of English, French, Spanish and Medical so far.  Corrections made on the fly. 

 

[quote user="luctur"]<p>Hello,</p><p> On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?</p><p> And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of  its  wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.</p><p> Regards,</p><p> Gianluca  </p><p>[/quote]</p><p> </p><p>You've gotten a lot of answers but since I'm the worlds worst two finger keyboard smasher I find that the program "As-U-Type" http://www.asutype.com/ works for me.  It's not free but it's cheap.  It has a number of different dictionaries 4 flavors of English, French, Spanish and Medical so far.  Corrections made on the fly.  </p><p> </p>

This brings the more general question of Pegasus's ability to recycle tools from opensource projects with a large developers community.

Benefiting from the updates of tools from Mozilla and OpenOffice would imho be a priority. Apart from the quality of such tools, the fact that they have become standard is also a plus. From such a point of view I would not recommend tools from confidential niche software, even if they present some slight advantage.

 

I see nothing wrong in avoiding to reinvent the wheel, and furthermore appearing more standard. 

<p>This brings the more general question of Pegasus's ability to recycle tools from opensource projects with a large developers community. Benefiting from the updates of tools from Mozilla and OpenOffice would imho be a priority. Apart from the quality of such tools, the fact that they have become standard is also a plus. From such a point of view I would not recommend tools from confidential niche software, even if they present some slight advantage.</p><p> </p><p>I see nothing wrong in avoiding to reinvent the wheel, and furthermore appearing more standard. </p>

Are there any non-English dictionaries at the moment and if so where can I find them? I'm looking for Dutch specifically.

Are there any non-English dictionaries at the moment and if so where can I find them? I'm looking for Dutch specifically.

AFAIK the only one is French

Regards

 

<p>AFAIK the only one is French</p><p>Regards</p><p> </p>
live preview
enter atleast 10 characters
WARNING: You mentioned %MENTIONS%, but they cannot see this message and will not be notified
Saving...
Saved
With selected deselect posts show selected posts
All posts under this topic will be deleted ?
Pending draft ... Click to resume editing
Discard draft