login
v2
v1

jmoiron.net

裏切者

posted February3rd, 2007 @ 17:40:42

- tags: development

- comments: 0

Unfortunately for my family my updates recently are not about me, but about things I'm doing. I've been pretty prolific as far as coding is concerned thusfar this year, both writing programs at work and on my own. Because I spend so much time at my girlfriend's place, and she has a windows machine, I've had to impose a few hopefully non-annoying programs on here, but I finally found the last missing piece: PieTTY. Now that unicode things work in my console, I'm able to develop on her computer and actually verify what I'm doing.

I've actually spent quite a lot of time in the past few weeks investigating Japanese character encodings and writing code that deals directly with the unicode spec and EUC_JP and the translation both of Kana to Romaji and of wide-format unix characters to normal format unix characters (for fuzzy string comparissons). Character encoding is actually pretty exciting when you "get it right" (one of my dreams is an operating system that correctly displays any textual information; it's somewhat impossible without giving it some hints, but at least having all unicode glyphs and a prioritized list of encoding guesses would be nice).

Here's a rundown of some stuff I've been hacking on:

  • a google map mashup at work w/ sensor information and map overlay stuff (for all intents and purposes this is private, so no links)
  • a patch to python-romkan that makes it work w/ utf-8 instead of euc-jp
  • a ton of updates in a code sprint earlier today to pyexif
  • a lot of changes to the nds libraries including finishing ndstool compat on the header
  • created the efteep project page

uromkan

A few notes on 'uromkan' (utf8-romkan):

romkan is a seemingly popular perl module that can convert between romaji (either entirely hiragana or entirely katakana), but coming from the perl (and ruby) world, it was targeted towards some non-unicode text encoding and was riddled with unnecessary regular expressions. Some of them were so thick my mind couldn't actually penetrate them, but it seem to work.

In [5]: print uromkan.hirakata(uromkan.romkan('aisu kuri-mu'))
アイス クリーム

In [6]: print uromkan.romkan('uragirimono')
うらぎりもの

In [7]: print uromkan.kanrom('そうして')
soushite

In [8]: print uromkan.romkan('ficchi')
ふぃっち

(That last thing is nonsense, but I just wanted to show that small-vowels and small tsu consonant doubling is working)

comments