[HOWTO] Create Gingerbread Keyboard Dictionary for your Language (Croatian example)

Search This thread

navdra

Senior Member
Jun 24, 2010
157
39
I really liked new Gingerbread Keyboard but was bugged by the fact that it was missing Croatian language dictionary, so I tried to figure out how to create one.
I made it, and it's working great so I decided to share the procedure here so others can make use of it.

Here it goes...


What you need:

1. Good source for word list frequency
Good prediction Dictionary relies on word list frequency, as defined by the AOSP
http://android.git.kernel.org/?p=pl...33b63a8b8a1043fceae592b567b93ee275504;hb=HEAD
So, you need a source from which you can extract how often different words appear. After some thinking, googling, trial and error I came to conclusion that for smartphone usage there is no better place than big national forum. That's what I used, anyway.

2. OpenOffice (and MS Office) dictionary for your language
You can find it here:
http://extensions.services.openoffice.org/en/dictionaries
You don't want to have misspelled words in the dictionary, right? So, after creating word list from the source, you'll want to throw out the words that are not in this list.
Just to be sure that I'll keep all the 'good' words in the list I also ran MS Office Spelling procedure trough it. Will explain it later on.

3. Tools - GNU Utilities, MS Office, Ultraedit, Wget (HTTrack)...
There are no more powerful tools for stream editing than Unix tools. Period.
At first I tried to do something without it and when I learned a bit about them, realized how great these are for task like this. Get them here:
http://sourceforge.net/projects/unxutils
Windows comes with it's own 'sort' command but you'll want to use the one from GNU utilities, so put it in the directory where you start your commands from.
You'll need to download that forum that I mentioned earlier somehow. I used wget:
http://www.gnu.org/software/wget/
It was pretty slow (took like two days to mirror part of the forum with posts). When I was near the end with the download I learned about HTTrack:
http://www.httrack.com
I tried it out shortly and it seems a lot faster (can do multiple connections!)

4. Makedict
Get it here:
http://softkeyboard.googlecode.com/svn/trunk/DictionaryTools/
For Windows, you need makedict_Windows.bat and makedict.jar


PROCEDURE:
I don't have experience with html, so at first I had to study how is vbulletin forum that I aimed at structured. I wanted to download just the pages that contain posts, and not memberlists etc. At the end I came up to this syntax for wget
Code:
wget -k -m -E -p -np -R member.php*,memberlist.php*,calendar.php*,faq.php*,printthread.php*,newreply.php*,search.php*,*sendtofriend*,sendmessage.php*,*goto=nextnewest*,newreply.php*,misc.php*,forumdisplay.php*,showpost.php*,announcment.php*,image.php*,viewonline.php*,showthread.php*mode*,showthread.php*s=*,showthread.php*page* -o log.txt http://xxxxxxx.hr/
I'm not sure if the syntax is entirely correct, but it worked for me, so I never looked back. Wget started to download only the stuff I wanted - thread pages from forum. It took long time to collect 9 GB of data. Look at the HTTrack. I think it can do it much faster.
Now you want to extract only messages text from the html
Code:
cat showthread* | sed -n "/<!-- message -->/,/<!-- \/ message -->/p > forum0.txt"
Check out what you got. You don't want the quotes included in this because they would pump up the word count for words that appear in them, so strip that out too:
Code:
sed "s/<[^:]*said://g" > forum1.txt
Finally, strip out the rest of HTML code:
Code:
sed -e "s/<[^>]*>//g" forum1.txt > forum2.txt
I noticed that I had some leftover croatian characters represented with their Unicode codes, so I replaced those too:
Code:
cat forum2.txt | sed "s/š/š/g" | sed "s/đ/đ/g" | sed "s/č/č/g"  | sed "s/ć/ć/g"| sed "s/ž/ž/g" | sed "s/Š/Š/g" | sed "s/Đ/Đ/g" | sed "s/Č/Č/g" | sed "s/Ć/Ć/g" | sed "s/Ž/Ž/g" > forum.txt
Found the codes here:
http://yorktown.cbe.wwu.edu/sandvig/docs/unicode.aspx
Now you can start to make your word list by throwing out all but words
Code:
cat forum.txt | tr "[:punct:][:blank:][:digit:]" "\n" | grep "^." > unsortedallwordslist.txt
and counting how often they appear
Code:
cat allwordslist.txt | tr "A-Z" "a-z" | tr "ŠĐČĆŽ" "šđčćž" | sort | uniq -c | sort -nr  > words.txt
I got around 205.000 counted words after this.
Now when you have it all nicely counted and sorted, you want to throw out misspelled and incorrect words from it. I used Excel for it. But first, I took OpenOffice word list (you can simply unzip oxt file) and cleaned it up a bit.
First, you need it in correct Windows encoding. Ultraedit can do it. In my case I had to convert from iso-8859-2 to win-1250. Open an iso-8859-2 document, go to "view/set code page" and choose "iso-8859-2", than go to: "file/conversions" and choose ASCII to UNICODE, than you will see all characters right, but when you want save edited code/text you must
convert it back, so choose UNICODE to ASCII and save it, that's it.
Also, it had suffixes such as "/AE" here and there so I removed those too
Code:
sed "s/\/[A-Z]*//g" hr_HR.dic > hr_HR.txt
and mad it all lowercase
Code:
cat hr_big.dic | tr "[A-Z]" "[a-z]" | tr "[ŠĐČĆŽ]" "[šđčćž]"
Now I imported both lists in Excel and simply checked if my forum word list words are correct by checking if they can be found in OpenOffice dictionary.
=COUNTIF('openofficedic'!A1:A375541;B1)
After that's finished, copy just the values in new column and delete the column with formulas, so it doesn't go trough it again. Sort by new values you got and keep the ones that passed trough this 'spell check' (I got around 90.000 woords in this step).
Cut&paste rows that have zeros in it in new worksheet. I wanted to compare those with MS dictionary so I don't throw anything out that is not in OpenOffice dictionary. Here is the function I used
Code:
Public Function SpellCheck(rng As Excel.Range) As Boolean()
   Dim i as Long, size as Long
   Dim objExcel as New Excel.Application
   Dim result() as Boolean

   size = rng.Cells.Count
   ReDim result(1 to size)

   for i = 1 to size
      result(i) = objExcel.CheckSpelling(rng.Cells(i).Text)
   next i

   SpellCheck = result()
   objExcel.Quit
End Function

The function I found was originally written to act as an array function but I never managed to work. But it worked as normal function and I just invoked it by
Code:
=SpellCheck(B1)
This took reaaaally loooong time. Again copy values to new column and delete the column with formulas so it won't go trough it again. Delete the rows with 'FALSE' in it. Check trough the rest and clean it up a bit - MS spellcheck can act funny sometimes. I got another 20.000 words from this list that weren't recognized by OpenOffice spell check, merged two lists and the final word count for dictionary was now around 110.000. I believe it's optimal, maybe a little on a bigger side, but the final main.dict is just under 900 kB which is more than acceptable.
Now, you have to distribute frequencies in 255 classes for Gingerbread prediction engine. You could do it just by dividing every number with a factor you get by dividing top word count by 255. But look at the scatter plot of this and you'll notice that you'll spend top classes very fast that way. So, I optimized the distribution a bit in a separate calculation. I arranged the word count in top class to be 1 and calculated the rest by using the formula "nextclasswordcount=previousclasswordcount*factor^4". I used Excel Solver to find the factor. Total word count had to match original wordcount (in my case 110.000), obviously. I even corrected it a bit, so that sum in new distribution is only 70.000 (1 in first class and 2000 in last), so that it smooths out the distribution nicely with more frequent words and let the rest of 40.000 fall in class "1". It took some tweaking and you could use better formula maybe, but this worked for me much better than just dividing it with same factor.
I had this calculation in two separate rows and returned the classes back next to the words:
Code:
=IF(ISNA(HLOOKUP(A6;$J$4:$JD$5;2;FALSE));D5;HLOOKUP(A6;$J$4:$JD$5;2;FALSE))
Maybe it's best to d/l xlsm from here so I don't have to explain a lot...
The rest is easy. Create the string needed for correct xml format,
Code:
=CONCATENATE("<w f=";CHAR(34);D2;CHAR(34);">";F2;"</w>")
close the word list with "<wordlist>" in first and "</wordlist>" in last row and add "<?xml version="1.0" encoding="UTF-8" ?>" at the top and finally compile the .dict file:
Code:
makedict_Windows.bat from.xml > main.dict

Phew! A lot of typing.

And here is the LatinIME.apk with croatian layout and dictionary that I got this way:
HR_Gingerbread_keyboard-1.0.apk
I used mobilix's layout (thank you!) with just few my own corrections (corrected &amp glitches in symbolic keyboard).

And hera are resources where I got inspiration from (thanks Gert Schepens):
http://www.gertschepens.be/android-dictionary-files
http://blog.cone.be/2010/08/19/android-keyboard-dictionaries/

So much for now. Enjoy.

Now, the next step would be to try to include my work in official AOSP or maybe Cyanogen source. I could use few pointers on how to do that. I would prefer to do it simple. I registered on github, but that's where I got for now. I have to do some more reading about it...
 

anders4431

Member
Mar 13, 2010
13
1
Thanks for your guide!
I successfully created a danish dictionary, but how do i implement the main.dict file i just created into the LatinIME.apk?
In the LatinIME.apk file, i tried creating the folder /res/raw-da, and putting the main.dict file there. But it didn't work.
 
Last edited:

navdra

Senior Member
Jun 24, 2010
157
39
I'm glad you made it!
Be sure to create nice, clean main.dict which we will add to AOSP hopefully.
I used APK Manager to decompile, add 'raw-hr' with my main.dict, recompile and sign the .apk that already had Croatian layout. There was a bug in .apk I used that was preventing language switching and I noticed that the bug was widespread in many LatinIME.apk versions floating around. I don't know where this bug comes from but the problem was in default main.dict file in 'raw' folder which had to be replaced with proper one (I took from Cyanogenmod version of LatinIME, but you can use my .apk).
 

Stile35

Senior Member
Dec 9, 2010
217
8
Can you, please, send me somehow this, already produced Croatian dictionary file in order to incorporate it into mine Gingerbread keyboard?

Thanks.
 

navdra

Senior Member
Jun 24, 2010
157
39
I presume you're after main.dict...
Use APK Manager, decompile my apk and you'll find it in 'raw-hr' folder.

Sent from my HTC Desire
 
  • Like
Reactions: Stile35

lockzackary

Senior Member
Jan 24, 2011
147
10
San Pedro
hi there, i was so grateful to have found this thread after googling for almost 7hours for a tagalog dictionary

although your method of bytestreaming a forum could not work for for me who has no fast internet connection.

so i would like to verify if, upon continuous usage, will the Gingerbread keyboard modify the word frequency over time?
i mean i could modify a script to just assign 0 as frequency value for all words and use make_dict, (or to avoid problems, just assign any random value from 0 - 255)
and as i use my keyboard, will it edit those frequency scores eventually?

anyway, im trying it out right now and would post my results here too.

again thank you very much for your insight :)
 

ytsejam_

Senior Member
Jun 29, 2006
309
2
Manila
hi there, i was so grateful to have found this thread after googling for almost 7hours for a tagalog dictionary

although your method of bytestreaming a forum could not work for for me who has no fast internet connection.

so i would like to verify if, upon continuous usage, will the Gingerbread keyboard modify the word frequency over time?
i mean i could modify a script to just assign 0 as frequency value for all words and use make_dict, (or to avoid problems, just assign any random value from 0 - 255)
and as i use my keyboard, will it edit those frequency scores eventually?

anyway, im trying it out right now and would post my results here too.

again thank you very much for your insight :)

I'm very much looking forward to your results..
it's driving me crazy everytime i reflash my ROM that i need to rebuild my Tagalog user dictionary.
 

lockzackary

Senior Member
Jan 24, 2011
147
10
San Pedro
@ytsejam_
Hey there, i was done with the dictionary although i have yet to test it with a compatible rom, and to further complicate thngs, its too tedious to populate the dictionary with tagalog text-speak (e.g.: cnu,sno)
As there are so much variations for a single word, hehe although im still building it up so hehe, i hope other pinoy's can wait for it,
As far as i know this approach on creating dictionaries only work on Samsung devices and not necessarily android so i hope by then they still own their galaxy hehe
Sent from my GT-I9000 using XDA App
 

behdude

Member
May 16, 2010
15
0
Tehran
Need a keyboard not just dictionary

Hi,

I just wanna add a new language (in this case, Persian) for keyboard.

I Use Cyanogen 7 and it supports Persian and Arabic very well. but it has no layout for Persian (but it has arabic) can you please help me with this?

Just some clue.

thanks
 

navdra

Senior Member
Jun 24, 2010
157
39
Hi,

I just wanna add a new language (in this case, Persian) for keyboard.

I Use Cyanogen 7 and it supports Persian and Arabic very well. but it has no layout for Persian (but it has arabic) can you please help me with this?

Just some clue.

thanks

Copy kbd_qwerty.xml and kbd_qwerty_black.xml to appropriate xml-xx subfolder and edit it in unicode. You should figure it out. It's pretty easy.
 

dcos

Senior Member
Nov 24, 2010
317
427
Koprivnica
Hi, I tried installing it on CM7 Wildfire, but I keep getting App not installed. Do you know the cause maybe?
 

dcos

Senior Member
Nov 24, 2010
317
427
Koprivnica
Hi,

Did you need to change anything else except adding dictionary in raw-hr folder? I'm trying to add croatian dict to ICS keyboard but it looks like I cannot get system to recognise that dictionary is available.
Maybe some xml files should be updated?
 

navdra

Senior Member
Jun 24, 2010
157
39
Hi,

Did you need to change anything else except adding dictionary in raw-hr folder? I'm trying to add croatian dict to ICS keyboard but it looks like I cannot get system to recognise that dictionary is available.
Maybe some xml files should be updated?

Not sure why it does not work. Try to compile it from source. If you succeeded, share with the rest of us what you did to make it work.

Sent from my Galaxy S II
 

spartanpg

Member
May 26, 2010
26
0
Has anyone managed to find a solution for this? I'm also looking to add a portuguese dictionary to the keyboard, though after adding the dictionary to latinime.apk it wasn't recognized.
 

apofview

Member
Feb 3, 2012
25
0
Copy kbd_qwerty.xml and kbd_qwerty_black.xml to appropriate xml-xx subfolder and edit it in unicode. You should figure it out. It's pretty easy.

Hi, I'm trying to make montenegrin input language, it is similar to croatioan just two letters more Ś, ś, Ź, ź.

Opening with unicode notepad++ can't edit kbd_qwerty.xml and kbd_qwerty_black.xml


cm7.2
 

Top Liked Posts

  • There are no posts matching your filters.
  • 18
    I really liked new Gingerbread Keyboard but was bugged by the fact that it was missing Croatian language dictionary, so I tried to figure out how to create one.
    I made it, and it's working great so I decided to share the procedure here so others can make use of it.

    Here it goes...


    What you need:

    1. Good source for word list frequency
    Good prediction Dictionary relies on word list frequency, as defined by the AOSP
    http://android.git.kernel.org/?p=pl...33b63a8b8a1043fceae592b567b93ee275504;hb=HEAD
    So, you need a source from which you can extract how often different words appear. After some thinking, googling, trial and error I came to conclusion that for smartphone usage there is no better place than big national forum. That's what I used, anyway.

    2. OpenOffice (and MS Office) dictionary for your language
    You can find it here:
    http://extensions.services.openoffice.org/en/dictionaries
    You don't want to have misspelled words in the dictionary, right? So, after creating word list from the source, you'll want to throw out the words that are not in this list.
    Just to be sure that I'll keep all the 'good' words in the list I also ran MS Office Spelling procedure trough it. Will explain it later on.

    3. Tools - GNU Utilities, MS Office, Ultraedit, Wget (HTTrack)...
    There are no more powerful tools for stream editing than Unix tools. Period.
    At first I tried to do something without it and when I learned a bit about them, realized how great these are for task like this. Get them here:
    http://sourceforge.net/projects/unxutils
    Windows comes with it's own 'sort' command but you'll want to use the one from GNU utilities, so put it in the directory where you start your commands from.
    You'll need to download that forum that I mentioned earlier somehow. I used wget:
    http://www.gnu.org/software/wget/
    It was pretty slow (took like two days to mirror part of the forum with posts). When I was near the end with the download I learned about HTTrack:
    http://www.httrack.com
    I tried it out shortly and it seems a lot faster (can do multiple connections!)

    4. Makedict
    Get it here:
    http://softkeyboard.googlecode.com/svn/trunk/DictionaryTools/
    For Windows, you need makedict_Windows.bat and makedict.jar


    PROCEDURE:
    I don't have experience with html, so at first I had to study how is vbulletin forum that I aimed at structured. I wanted to download just the pages that contain posts, and not memberlists etc. At the end I came up to this syntax for wget
    Code:
    wget -k -m -E -p -np -R member.php*,memberlist.php*,calendar.php*,faq.php*,printthread.php*,newreply.php*,search.php*,*sendtofriend*,sendmessage.php*,*goto=nextnewest*,newreply.php*,misc.php*,forumdisplay.php*,showpost.php*,announcment.php*,image.php*,viewonline.php*,showthread.php*mode*,showthread.php*s=*,showthread.php*page* -o log.txt http://xxxxxxx.hr/
    I'm not sure if the syntax is entirely correct, but it worked for me, so I never looked back. Wget started to download only the stuff I wanted - thread pages from forum. It took long time to collect 9 GB of data. Look at the HTTrack. I think it can do it much faster.
    Now you want to extract only messages text from the html
    Code:
    cat showthread* | sed -n "/<!-- message -->/,/<!-- \/ message -->/p > forum0.txt"
    Check out what you got. You don't want the quotes included in this because they would pump up the word count for words that appear in them, so strip that out too:
    Code:
    sed "s/<[^:]*said://g" > forum1.txt
    Finally, strip out the rest of HTML code:
    Code:
    sed -e "s/<[^>]*>//g" forum1.txt > forum2.txt
    I noticed that I had some leftover croatian characters represented with their Unicode codes, so I replaced those too:
    Code:
    cat forum2.txt | sed "s/š/š/g" | sed "s/đ/đ/g" | sed "s/č/č/g"  | sed "s/ć/ć/g"| sed "s/ž/ž/g" | sed "s/Š/Š/g" | sed "s/Đ/Đ/g" | sed "s/Č/Č/g" | sed "s/Ć/Ć/g" | sed "s/Ž/Ž/g" > forum.txt
    Found the codes here:
    http://yorktown.cbe.wwu.edu/sandvig/docs/unicode.aspx
    Now you can start to make your word list by throwing out all but words
    Code:
    cat forum.txt | tr "[:punct:][:blank:][:digit:]" "\n" | grep "^." > unsortedallwordslist.txt
    and counting how often they appear
    Code:
    cat allwordslist.txt | tr "A-Z" "a-z" | tr "ŠĐČĆŽ" "šđčćž" | sort | uniq -c | sort -nr  > words.txt
    I got around 205.000 counted words after this.
    Now when you have it all nicely counted and sorted, you want to throw out misspelled and incorrect words from it. I used Excel for it. But first, I took OpenOffice word list (you can simply unzip oxt file) and cleaned it up a bit.
    First, you need it in correct Windows encoding. Ultraedit can do it. In my case I had to convert from iso-8859-2 to win-1250. Open an iso-8859-2 document, go to "view/set code page" and choose "iso-8859-2", than go to: "file/conversions" and choose ASCII to UNICODE, than you will see all characters right, but when you want save edited code/text you must
    convert it back, so choose UNICODE to ASCII and save it, that's it.
    Also, it had suffixes such as "/AE" here and there so I removed those too
    Code:
    sed "s/\/[A-Z]*//g" hr_HR.dic > hr_HR.txt
    and mad it all lowercase
    Code:
    cat hr_big.dic | tr "[A-Z]" "[a-z]" | tr "[ŠĐČĆŽ]" "[šđčćž]"
    Now I imported both lists in Excel and simply checked if my forum word list words are correct by checking if they can be found in OpenOffice dictionary.
    =COUNTIF('openofficedic'!A1:A375541;B1)
    After that's finished, copy just the values in new column and delete the column with formulas, so it doesn't go trough it again. Sort by new values you got and keep the ones that passed trough this 'spell check' (I got around 90.000 woords in this step).
    Cut&paste rows that have zeros in it in new worksheet. I wanted to compare those with MS dictionary so I don't throw anything out that is not in OpenOffice dictionary. Here is the function I used
    Code:
    Public Function SpellCheck(rng As Excel.Range) As Boolean()
       Dim i as Long, size as Long
       Dim objExcel as New Excel.Application
       Dim result() as Boolean
    
       size = rng.Cells.Count
       ReDim result(1 to size)
    
       for i = 1 to size
          result(i) = objExcel.CheckSpelling(rng.Cells(i).Text)
       next i
    
       SpellCheck = result()
       objExcel.Quit
    End Function

    The function I found was originally written to act as an array function but I never managed to work. But it worked as normal function and I just invoked it by
    Code:
    =SpellCheck(B1)
    This took reaaaally loooong time. Again copy values to new column and delete the column with formulas so it won't go trough it again. Delete the rows with 'FALSE' in it. Check trough the rest and clean it up a bit - MS spellcheck can act funny sometimes. I got another 20.000 words from this list that weren't recognized by OpenOffice spell check, merged two lists and the final word count for dictionary was now around 110.000. I believe it's optimal, maybe a little on a bigger side, but the final main.dict is just under 900 kB which is more than acceptable.
    Now, you have to distribute frequencies in 255 classes for Gingerbread prediction engine. You could do it just by dividing every number with a factor you get by dividing top word count by 255. But look at the scatter plot of this and you'll notice that you'll spend top classes very fast that way. So, I optimized the distribution a bit in a separate calculation. I arranged the word count in top class to be 1 and calculated the rest by using the formula "nextclasswordcount=previousclasswordcount*factor^4". I used Excel Solver to find the factor. Total word count had to match original wordcount (in my case 110.000), obviously. I even corrected it a bit, so that sum in new distribution is only 70.000 (1 in first class and 2000 in last), so that it smooths out the distribution nicely with more frequent words and let the rest of 40.000 fall in class "1". It took some tweaking and you could use better formula maybe, but this worked for me much better than just dividing it with same factor.
    I had this calculation in two separate rows and returned the classes back next to the words:
    Code:
    =IF(ISNA(HLOOKUP(A6;$J$4:$JD$5;2;FALSE));D5;HLOOKUP(A6;$J$4:$JD$5;2;FALSE))
    Maybe it's best to d/l xlsm from here so I don't have to explain a lot...
    The rest is easy. Create the string needed for correct xml format,
    Code:
    =CONCATENATE("<w f=";CHAR(34);D2;CHAR(34);">";F2;"</w>")
    close the word list with "<wordlist>" in first and "</wordlist>" in last row and add "<?xml version="1.0" encoding="UTF-8" ?>" at the top and finally compile the .dict file:
    Code:
    makedict_Windows.bat from.xml > main.dict

    Phew! A lot of typing.

    And here is the LatinIME.apk with croatian layout and dictionary that I got this way:
    HR_Gingerbread_keyboard-1.0.apk
    I used mobilix's layout (thank you!) with just few my own corrections (corrected &amp glitches in symbolic keyboard).

    And hera are resources where I got inspiration from (thanks Gert Schepens):
    http://www.gertschepens.be/android-dictionary-files
    http://blog.cone.be/2010/08/19/android-keyboard-dictionaries/

    So much for now. Enjoy.

    Now, the next step would be to try to include my work in official AOSP or maybe Cyanogen source. I could use few pointers on how to do that. I would prefer to do it simple. I registered on github, but that's where I got for now. I have to do some more reading about it...
    1
    I presume you're after main.dict...
    Use APK Manager, decompile my apk and you'll find it in 'raw-hr' folder.

    Sent from my HTC Desire
    1
    @navdra
    Added post regarding your work on new oxygen forum:
    http://forum.oxygen.im/viewtopic.php?id=464
    Hope it's ok with you?!
    1
    Thanks for the help navdra, but I failed again. I've recompiled the whole CM again, edited the spellchecker.xml to include Hungarian but it still doesn't work. I've attached my xml and dict files, could you perhaps take a look at them? I'm completely clueless about what I'm doing wrong... Thanks in advance!

    Just to clarify, here's what I did:
    1. Download Hungarian Webcorpus from here: http://mokk.bme.hu/en/eszkozok/ (it's an open-licenced word list with frequencies included)
    2. Edited the corpus (kept only the most relevant frequency, removed html & other special characters and words with lower than 500 frequency, converted to UTF-8, etc)
    3. Created an XML from the result
    4. Used reweigh.pl from here to set frequencies between 0-254 (seems like this step is no longer necessary - if compiled without it I get the same .dict file)
    5. Compiled the XML to a dict
    6. Put the dict into cyanogenmod's overlay directory into raw-hu, where the other main.dict files are also present.
    7. Edited spellchecker.xml to include hu as a supported language
    8. Compiled CM9
    9. Failed, because I'm not getting suggestions :(

    Update: I tried compiling the same xml with the new makedict (from the ICS source), and it complains about duplicate words! (for the words which are also in the dictionary with Capitals). Perhaps this is the problem...

    Update 2: It works! Seems like you have to use the ICS makedict in order to have a working dictionary on ICS. As soon as I managed to compile it with the new makedict (had to convert frequencies to 0-254 range) it works. Thanks for the help.