Hello,
On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?
And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of its wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.
Regards,
Gianluca
David Harris once wrote about creating an external dictionary:
[quote]
In order to create a dictionary, I need the following:
1: A word list. Ideally, one word per line, but I can massage just about
anything into the form I need. The word list should have between
60,000 and 200,000 words: fewer than 60,000 makes it nearly useless,
while more than 200,000 makes it too slow. The word list can contain
accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM
437) character sets. At present, no other character sets are supported
(this is for sorting reasons).
2: A list of the most commonly-used words in the language. This part
is critical for adequate performance. The list should contain between
800 and 2000 words, and those words should be the ones most likely
to be encountered in the e-mail environment (in English, this would be
words like "the", "and", "but", "at" and so on).
[/quote]
But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.
-- Han van den Bogaerde - support@vandenbogaerde.net Member of Pegasus Mail Support Group. My own Pegasus Mail related web information: http://www.vandenbogaerde.net/pegasusmail/
[quote user="Han v.d. Bogaerde"]
But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.
[/quote]
Maybe I'm misunderstanding your reply... [:)]
You quoted David Harris who wrote that the word list can contain ISO-8859-1 extended characters, but you wrote that the program cannot handle any non ascii characters, so I'm a bit puzzled from these statements.
Please, can you clarify?
Thanks,
Gianluca
[quote user="Han v.d. Bogaerde"]
David Harris once wrote about creating an external dictionary:
[quote]
In order to create a dictionary, I need the following:
1: A word list. Ideally, one word per line, but I can massage just about
anything into the form I need. The word list should have between
60,000 and 200,000 words: fewer than 60,000 makes it nearly useless,
while more than 200,000 makes it too slow. The word list can contain
accented characters from the ISO-8859-1 (WinANSI) or OEM (IBM
437) character sets. At present, no other character sets are supported
(this is for sorting reasons).
2: A list of the most commonly-used words in the language. This part
is critical for adequate performance. The list should contain between
800 and 2000 words, and those words should be the ones most likely
to be encountered in the e-mail environment (in English, this would be
words like "the", "and", "but", "at" and so on).
[/quote]
But the most important is that the current program part canNOT handle any non ascii characters. Which makes it nearly impossible to use in French and languages like that.
[/quote]
Han
I can say that Pegasus Mail 4.41 can handle characters beyond 127. We could create a French dictionary (it is included in our translation module since v4.41) and it works quite well.
It is able to detect *all* the mispellings (saying words that are not included in our word lists) but in a few case the words replacement list is empty or not complete.
The solution to do it was to change the word order in the lists: windows does not sort the words along their ascii code (for example é is after e and before f), if the 2 lists are sorted along their ascii code the dictionary created works, not perfectly but it works.
One year ago we exchanged a few mails with David to try to fix this problem, and he said that the solution to have a dictionary containing accented characters to work perfectly was to change the way PM handle accented characters, he planed to do it, waiting for this we decided to offer a French dictionary (it's better thant nothing). The big work was to find a list of the most used words in French, we found one of 128,000 words (with their appearance frequency) and kept the first 2,000.
Best regards
Hi Gianluca,
obviously we know each other [:D]
I'm the main Italian translator for the program, and I also started to prepare the Italian dictionary, but now I'm a bit still in the work [:(]
If you want to collaborate with me, I've already got a list which must be checked and "cut" for the purpose. We could also prepare another word list from a new generation file, if you want.
I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.
[quote user="luctur"]And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of its wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.[/quote]
I would like nothing more than to replace the ageing spelling checker code in Pegasus Mail with something more internationalization-friendly.
In a perfect world, I would define a simple interface that could be met by any developer in an external DLL. Thinking off the top of my head, the DLL would only have to export about three functions
[quote user="Valter Mura"]
I'm here and I need help. You can also contact me by direct mail [Y] , if you wish.
[/quote][quote user="David Harris"]
The key priorities would be that the lookup should be extremely fast, the dictionary should be as compact as possible, and there should be a wide range of languages supported. If the lookup could be done fast enough, I can easily enable real-time checking (with the squiggly red underline, like Word) with only a couple of lines of code (my own checking code is too slow to be able to do this effectively).
Note that this is really only a spelling checker - grammar extensions might be possible later, but initially, getting the spelling working nicely is the first priority.
If anyone would be willing to collaborate with me in developing something like this, please either followup in this thread, or mail me directly offline.
[/quote]
I'm not a developer, but I've got a rather extensive experience in creating dictionaries and in using different spell checker tools.
AFAIK, MySpell/Hunspell spell checker is the tool you're looking for, because:
a) it's released under Lesser Gnu Public License too (see http://sourceforge.net/project/shownotes.php?release_id=383043&group_id=143754) and therefore you have the source code and it can be used in proprietary software without issues;
b) it can be compiled into a simple .dll and used for real-time checking. si.Mail client (another client I'm testing) has implemented it in this way, see http://194.165.104.66/~mvrhov/downloads/siMail_2007-06-10.exe
c) there are really a lot of available dictionaries (88, in this moment. See: http://wiki.services.openoffice.org/wiki/Dictionaries);
d) these dictionaries are usually small or smaller in size than other spell checker tools for the same language because they are "compressed" by using a prefix/suffix system that can be changed by simply editing a text file (the dictionary itself)
e) the spell checking phase (real-time too) is usually fast and not RAM-consuming, though this depend on size of dictionary you use.
If you want to change PegasusMail spell checking tool you should really have a look to Hunspell. You would spare a lot of time, both yours and dictionaries makers' one.
As a friend of mine once told me, it's easier to build an house when you already have the bricks... [:D]
Regards,
Gianluca
[quote user="luctur"]
Hello,
On one hand, I like Pegasus Mail's speed and how it uses few resources, but on the other one I'm not able to find an Italian (and French and Spanish) dictionary for its spell checker. Is there one?
And if the answer is no, how is it possible to create one? I'm among the authors of OpenOffice.org/Mozilla/Aspell Italian dictionary and I'd like to know wether the porting of its wordlist and grammar rules (elision, etc) to Pegasus Mail spell checker type is technically possible or not.
Regards,
Gianluca
[/quote]
You've gotten a lot of answers but since I'm the worlds worst two finger keyboard smasher I find that the program "As-U-Type" http://www.asutype.com/ works for me. It's not free but it's cheap. It has a number of different dictionaries 4 flavors of English, French, Spanish and Medical so far. Corrections made on the fly.
This brings the more general question of Pegasus's ability to recycle tools from opensource projects with a large developers community.
Benefiting from the updates of tools from Mozilla and OpenOffice would imho be a priority. Apart from the quality of such tools, the fact that they have become standard is also a plus. From such a point of view I would not recommend tools from confidential niche software, even if they present some slight advantage.
I see nothing wrong in avoiding to reinvent the wheel, and furthermore appearing more standard.
Your previous draft for topic is pending
If you continue, your previous draft will be discarded.