tilleke
20th March 2009, 07:10 PM
WMSpellChecker
I have been programming for Windows Mobile for a few months by now, mainly using Basic4ppc which is an excellent developing-software, at a low cost, for producing great Windows Mobile-applications (but also Desktop-applications). Basic4ppc is based upon the .Net Framework.
One of the strenght of Basic4ppc is that it has great support and not only from the developer but also from other users who are supplying Basic4ppc with a lot of extra featues through external libraries. I got interested in writing a library myself and this is how the WMSpellChecker-library was born. However, from the very beginning, my idea was that the library should be compatible not only with Basic4ppc but also with Windows Mobile-applications developed in Visual Studio and SharpDevelop using VB.NET and C#.
I have seen commercial solutions for spellchecking but since I wanted to learn writing a library, I thought this would be a nice thing to give "for free" to fellow developers.
Well, let me get back to the WMSpellChecker-library:
Basically, a spell checker customarily consists of two parts:
1) A set of routines for scanning text and extracting words, and
2) An algorithm for comparing the extracted words against a known list of correctly spelled words (i.e., the dictionary).
However, what mentioned above is only a "half" spell checker since these days spell checkers also suggest replacements/corrections for misspelled words (among other things such as synonyms and grammar-hints). Said suggestions can be proposed by a spellchecking-engine based upon various techniques:
- phonetic algorithms such as "Soundex" among others.
- word lists containing common misspelled words and letters commonly inverted
- functions called "Near Miss Strategy" and introduced by one of the first spell-checkers on the market, namely Ispell for UNIX and with its roots dating back to 1971.
- algorithms like "edit distance" which measures the amount of difference between two sequences. A famous one is the "Levenshtein distance".
- and other techniques
The "techniques" mentioned above have all been implemented in the library.
I am aware of the fact that (at least) WM6 already offers spelling-suggestions and a spell checker if PocketWord (Office) has been installed but still I liked this idea so I decided to make a library. In any case, as far as I know, only the dictionary corresponding to the language of WM6 is being installed so if you want to spell check words in other languages you cannot do so.
The way it works.....
First of all, apart from referencing the library itself, you need to add two objects to your application, namely DICTIONARY and COMPUTEDETECTION.
Then you need to load the dictionary-files by using "LoadDict". Currently they consist of four separate files. However, I may change this in a future release. The dictionary-files must be located in the application-directory although you can create sub-folders. This first release only supports English and the dictionaries distributed with the library must not be tempered with. Next release will bring support for other languages and will also include a separate program for handling dictionaries.
Once the dictionaries have been loaded, you can start the spellchecking by calling the library using "ComputeDetection" which passes on your textbox-control to the library. In case there are words that are not present in the dictionary, then a set of suggestions will be returned to the calling application and at the same time the word which was not found will be shown in the textbox in capital-letters. The suggestions produced by the library can be obtained using "ReturnSuggestions" which returns a string-array.
Once you have shown the suggestions returned by the library, you can let your user in your application decide what to do i.e.
-"IgnoreWord" - ignoring the wrong word
-"AddWord" - adding an own word to replace the wrong word
-"ReplaceWord" - replacing the wrong word with a word from the suggestions
At this point, you tell the library to continue spellchecking by using "ContinueDetection". You should also verify if spellchecking has been terminated by using "IsSpellingFinished".
At any time, you can interrupt spellchecking by using "UnloadDict". This will be useful in a future release of the library so you can unload an English dictionary and to replace it with, for instance, a French dictionary without exiting your own application. However, before unloading the dictionary, you should verify if a dictionary has already been loaded or not by using "IsDictionaryLoaded".
In the help-file, you can find more important information as to the methods/properties available. Please also check out the two sample-projects present in the attachment where the source-code has been fully commented. One is using a classic spellchecking-interface and the other one is using context-menus.
Other comments....
This first release has some limitations, such as support only for English and the need for a textbox-control. However, I will add other features in the future, for instance:
-support for other languages
-dictionary-tools (for creating dictionaries) - will be an external program
-possibility to add a user-dictionary
-possibility to limit amount of suggestions produced by the library (by using a "ranking-system")
-no further need for a textbox-control in your application. Your application will be able to pass on to the library only the word(s) you wish to spellcheck and the library will only return the suggestion(s). In this way, the spellchecker-library will not "interfere" with your application and you can use whichever control you prefer although you as a developer has to take care of the words to be passed on the library for verification.
-spellchecking "on the fly"
-extended error-handling
A few notes regarding dictionaries....
The English dictionary supplied with the library is composed of nearly 70'000 words. Dictionaries to be used with the library must be sorted and each word in the dictionary must use LF = chr(10) as line-endings. In addition, the dictionary should be saved as UTF-8.
From the dictionary, a KeyMap is created using either a Soundex - or a DoubleMetaphone-algorithm. In this moment, the KeyMap is being furnished with the library and loaded as an external file but future releases might create it on the fly (or at least an option to do so). With next release, I will add a utility, to be run from the Desktop, which will let you create your own dictionary and corresponding KeyMap which are compatible with WMSpellChecker.
Unlike English and Scandinavian ones, dictionaries for German and Latin languages such as Spanish, Italian and French will probably be rather large. This is due to the fact that German, Italian and other similar languages use a lot of suffixes, for instance when creating verbs. In order to overcome this, certain famous spellcheckers such as ASpell, ISpell, HunSpell (used by OpenOffice) have implemented dictionaries which mostly contain only the base-form of words/verbs. However, they use a supplementary file called "affix" which contains a lot of grammar-rules and this file together with the simplified dictionary overcomes the problem of large dictionaries. However, I believe this system is probably rather memory- and performance-hungry and might not be the best solution for Windows Mobile and PPC. However, maybe in the future I will look into this.
Another negative side-effect of using a too large dictionary is that said dictionary may include more obscure words which will increase the risk that the spelling-engine will "miss" real-word errors. For instance, the word wether illustrates this. The word is, arguably, so obscure that any occurrence of wether in a passage is more likely to be a misspelling of weather or whether than a genuine occurrence of wether, so that a spellchecker that did not have the word in its dictionary would do better than one that did.
Conclusion....
The library can be used with projects developed with Basic4ppc (PPC and Desktop) but should also work with projects created in Visual Studio and SharpDevelop (using VB.NET and C#). The library has been compiled targeting Framework Version 2.0.
Library-version: 1.0
Helpfile-version: 1.0
As mentioned before, this is my first serious library. Please check it out and let me know how well it integrates in your applications.
Please also give me feedback, suggestions for improvements, missing features, bug-reports etc.
The idea is to add spelling-support for other languages as well and here I might need some help from end-users. I will let you know.
UPDATE - 17/08/2009: I will in the next days release an updated version with support for other languages as well (starting with French, German, Swedish and Spanish).
Enjoy!
Rgds,
Tilleke
I have been programming for Windows Mobile for a few months by now, mainly using Basic4ppc which is an excellent developing-software, at a low cost, for producing great Windows Mobile-applications (but also Desktop-applications). Basic4ppc is based upon the .Net Framework.
One of the strenght of Basic4ppc is that it has great support and not only from the developer but also from other users who are supplying Basic4ppc with a lot of extra featues through external libraries. I got interested in writing a library myself and this is how the WMSpellChecker-library was born. However, from the very beginning, my idea was that the library should be compatible not only with Basic4ppc but also with Windows Mobile-applications developed in Visual Studio and SharpDevelop using VB.NET and C#.
I have seen commercial solutions for spellchecking but since I wanted to learn writing a library, I thought this would be a nice thing to give "for free" to fellow developers.
Well, let me get back to the WMSpellChecker-library:
Basically, a spell checker customarily consists of two parts:
1) A set of routines for scanning text and extracting words, and
2) An algorithm for comparing the extracted words against a known list of correctly spelled words (i.e., the dictionary).
However, what mentioned above is only a "half" spell checker since these days spell checkers also suggest replacements/corrections for misspelled words (among other things such as synonyms and grammar-hints). Said suggestions can be proposed by a spellchecking-engine based upon various techniques:
- phonetic algorithms such as "Soundex" among others.
- word lists containing common misspelled words and letters commonly inverted
- functions called "Near Miss Strategy" and introduced by one of the first spell-checkers on the market, namely Ispell for UNIX and with its roots dating back to 1971.
- algorithms like "edit distance" which measures the amount of difference between two sequences. A famous one is the "Levenshtein distance".
- and other techniques
The "techniques" mentioned above have all been implemented in the library.
I am aware of the fact that (at least) WM6 already offers spelling-suggestions and a spell checker if PocketWord (Office) has been installed but still I liked this idea so I decided to make a library. In any case, as far as I know, only the dictionary corresponding to the language of WM6 is being installed so if you want to spell check words in other languages you cannot do so.
The way it works.....
First of all, apart from referencing the library itself, you need to add two objects to your application, namely DICTIONARY and COMPUTEDETECTION.
Then you need to load the dictionary-files by using "LoadDict". Currently they consist of four separate files. However, I may change this in a future release. The dictionary-files must be located in the application-directory although you can create sub-folders. This first release only supports English and the dictionaries distributed with the library must not be tempered with. Next release will bring support for other languages and will also include a separate program for handling dictionaries.
Once the dictionaries have been loaded, you can start the spellchecking by calling the library using "ComputeDetection" which passes on your textbox-control to the library. In case there are words that are not present in the dictionary, then a set of suggestions will be returned to the calling application and at the same time the word which was not found will be shown in the textbox in capital-letters. The suggestions produced by the library can be obtained using "ReturnSuggestions" which returns a string-array.
Once you have shown the suggestions returned by the library, you can let your user in your application decide what to do i.e.
-"IgnoreWord" - ignoring the wrong word
-"AddWord" - adding an own word to replace the wrong word
-"ReplaceWord" - replacing the wrong word with a word from the suggestions
At this point, you tell the library to continue spellchecking by using "ContinueDetection". You should also verify if spellchecking has been terminated by using "IsSpellingFinished".
At any time, you can interrupt spellchecking by using "UnloadDict". This will be useful in a future release of the library so you can unload an English dictionary and to replace it with, for instance, a French dictionary without exiting your own application. However, before unloading the dictionary, you should verify if a dictionary has already been loaded or not by using "IsDictionaryLoaded".
In the help-file, you can find more important information as to the methods/properties available. Please also check out the two sample-projects present in the attachment where the source-code has been fully commented. One is using a classic spellchecking-interface and the other one is using context-menus.
Other comments....
This first release has some limitations, such as support only for English and the need for a textbox-control. However, I will add other features in the future, for instance:
-support for other languages
-dictionary-tools (for creating dictionaries) - will be an external program
-possibility to add a user-dictionary
-possibility to limit amount of suggestions produced by the library (by using a "ranking-system")
-no further need for a textbox-control in your application. Your application will be able to pass on to the library only the word(s) you wish to spellcheck and the library will only return the suggestion(s). In this way, the spellchecker-library will not "interfere" with your application and you can use whichever control you prefer although you as a developer has to take care of the words to be passed on the library for verification.
-spellchecking "on the fly"
-extended error-handling
A few notes regarding dictionaries....
The English dictionary supplied with the library is composed of nearly 70'000 words. Dictionaries to be used with the library must be sorted and each word in the dictionary must use LF = chr(10) as line-endings. In addition, the dictionary should be saved as UTF-8.
From the dictionary, a KeyMap is created using either a Soundex - or a DoubleMetaphone-algorithm. In this moment, the KeyMap is being furnished with the library and loaded as an external file but future releases might create it on the fly (or at least an option to do so). With next release, I will add a utility, to be run from the Desktop, which will let you create your own dictionary and corresponding KeyMap which are compatible with WMSpellChecker.
Unlike English and Scandinavian ones, dictionaries for German and Latin languages such as Spanish, Italian and French will probably be rather large. This is due to the fact that German, Italian and other similar languages use a lot of suffixes, for instance when creating verbs. In order to overcome this, certain famous spellcheckers such as ASpell, ISpell, HunSpell (used by OpenOffice) have implemented dictionaries which mostly contain only the base-form of words/verbs. However, they use a supplementary file called "affix" which contains a lot of grammar-rules and this file together with the simplified dictionary overcomes the problem of large dictionaries. However, I believe this system is probably rather memory- and performance-hungry and might not be the best solution for Windows Mobile and PPC. However, maybe in the future I will look into this.
Another negative side-effect of using a too large dictionary is that said dictionary may include more obscure words which will increase the risk that the spelling-engine will "miss" real-word errors. For instance, the word wether illustrates this. The word is, arguably, so obscure that any occurrence of wether in a passage is more likely to be a misspelling of weather or whether than a genuine occurrence of wether, so that a spellchecker that did not have the word in its dictionary would do better than one that did.
Conclusion....
The library can be used with projects developed with Basic4ppc (PPC and Desktop) but should also work with projects created in Visual Studio and SharpDevelop (using VB.NET and C#). The library has been compiled targeting Framework Version 2.0.
Library-version: 1.0
Helpfile-version: 1.0
As mentioned before, this is my first serious library. Please check it out and let me know how well it integrates in your applications.
Please also give me feedback, suggestions for improvements, missing features, bug-reports etc.
The idea is to add spelling-support for other languages as well and here I might need some help from end-users. I will let you know.
UPDATE - 17/08/2009: I will in the next days release an updated version with support for other languages as well (starting with French, German, Swedish and Spanish).
Enjoy!
Rgds,
Tilleke