WuBi 98 Input Method Table for ibus
02 Dec 2019I have written a seris of posts on how to extend WuBi 98 support to various platforms. The company that is maintaining the official WuBi 98 distribution (i.e. WangMa) has put little effort into updating their software for modern operating systems. I have purchased their WuBi IME on Windows, and it kept crashing my other applications due to poor compatibility with Win10 APIs. The worse thing is that they only allow you to install their IME on three distinct computers. Since I switch hardware rather regularly, it is not feasible to use their software anymore.
In two previous posts (1, 2), I tried to use WuBi 98 with a free WuBi software produced by Baidu. These two solutions work smoothly on Windows. However, because Baidu WuBi does not have a Ubuntu version, I am still unable to type with my favorite IME on Ubuntu.
Today, I come across this article that teaches you how to write a custom table for Ubuntu’s IME system ibus. I realize that it is fairly easy to create a custom WuBi 98 table on Ubuntu. When I am testing the performance of my previous WuBi 98 tables, I find out that there are some issues with level-three codes. The problem is fixed with this update.
The code that generates the dictionary is attached as follows. The two files 2019-06-06-wb98perfection.txt
and 2018-09-02-wubi-database.txt
can be found in my previous posts.
import re
entryDict = dict()
keyTable = dict()
with open('2019-06-06-wb98perfection.txt', 'r', encoding='utf16') as infile:
for line in infile:
if len(line) > 0:
key, val = line.strip().split('\t')
if key in keyTable:
freq = keyTable[key] + 1
keyTable[key] = freq
else:
freq = 1
keyTable[key] = freq
entryDict[(key, val)] = freq
# fix level-three common chars
commomChars = set()
with open('common_chars.txt', 'r') as infile:
data = infile.read()
for char in data:
match = re.match('\\s', char)
if match is None:
commomChars.add(char)
lvl3 = []
with open('2018-09-02-wubi-database.txt', 'r') as infile:
for line in infile:
key, val = line.strip().split('\t')
if len(key) == 3 and val in commomChars:
if (key, val) not in entryDict:
entryDict[(key, val)] = 0
if key not in keyTable:
keyTable[key] = 0
entry = []
for key, val in entryDict.items():
entry.append([key[0], key[1], val])
for item in entry:
if len(item[0]) == 3:
if item[1] in commomChars:
# raise freq
freq = keyTable[item[0]] + 1
keyTable[item[0]] = freq
item[2] = freq
entry.sort(key = lambda x : (x[0], -x[2]))
for item in entry:
item[2] = repr(item[2])
outputs = ['\t'.join(item) for item in entry]
outputStr = 'BEGIN_TABLE\n' + '\n'.join(outputs) + '\nEND_TABLE'
with open('output_table.txt', 'w', encoding='utf8') as outfile:
outfile.write(outputStr)
Downloads:
- 2019-12-02-wubi98ibus.txt (use
ibus-table-createdb
to convert it to ibus database) - common_chars.txt