Saturday, January 30, 2010

Conversion of Chinese Text from Lines to Columns

Ok, I admit it, anyone seriously involved in compositing Chinese texts using the 'old-system' of top-bottom, right to left method will, in all certainty, have a UI which allows such input. If we need to include such a layout with convetional widgets that are left to right, top to bottom, then this will not be easy to composite. Using the Tcl included in my last posting, I've modified the proc to allow a set of strings containing lines of Chinese texts to be converted to a tab seprated table of text whic can then be inserted into an ordinary text file. Needless to say, it cannot be sensibly edited. But, perhaps the next little programming task! Anyway, here the modified script. (Sorry, but Blogger doesn't seem to like the embedded Unicode Chinese Script).

#!/bin/sh
# the next line restarts using tclsh \
exec tclsh "$0" "$@"
#---------------

package require Gnocl

# args:
# data = list items to be formatted into colums
# nrow = maximum number of rows to produce
# pad = character used to fill matrix gaps
# returns
# list of formatted row strings
# note
# tcl only
proc tabulate_Chinese_Columns {data nrows {pad -} } {

set r 0 ;# row counter
set str {} ;# list contain final, formatted list, returned by proc
set m 0 ;# maximum string length, used for padding


# Chinese has no spaces, must split everyting first
set data [split $data {}]

# initialise an array to hold output strings
for {set i 0 } {$i < $nrows } {incr i} { set rows($i) {} }
# build up the output strings
for {set i 0} {$i < [llength $data] } {incr i} {
if {$rows($r) == {} } {
 set rows($r) "[lindex $data $i]"
} else {
set rows($r) "$rows($r)\t[lindex $data $i]"
}
incr r
if {$r == $nrows } {set r 0} }
# get the row size in columns length
for {set i 0 } {$i < $nrows } {incr i} {
set l [string length $rows($i)]
if {$l >= $m} { set m $l}
}

# pad shorter rows with character
for {set i 0 } {$i < $nrows } {incr i} {
set l [string length $rows($i)]
if {$l < $m} {
set rows($i) "$rows($i)\t$pad"
}
# invert if necessary
set rows($i) [string reverse $rows($i)]
}

# build list
for {set i 0 } {$i < $nrows } {incr i} {
lappend str $rows($i)
}

return $str
}


# the uniquitous demo script
set txt1 [gnocl::text ]
gnocl::window -child $txt1 -defaultWidth 200 -defaultHeight 300 -title "The Analects"

set str(1) "????????????????????????????????????????????????"
# Xue Er: The Master said,"Is it not pleasant to learn with a constant perseverance
# and application? Is it not delightful to have friends coming from distant
# quarters? Is he not a man of complete virtue, who feels no discomposure
# though men may take no note of him?"

set str(2) "??????????????????????????????????????????????????????????????"
# Xue Er: The philosopher You said, "They are few who, being filial and fraternal,
# are fond of offending against their superiors. There have been none, who,
# not liking to offend against their superiors, have been fond of stirring
# up confusion. The superior man bends his attention to what is radical.
# That being established, all practical courses naturally grow up.
# Filial piety and fraternal submission! - are they not the root
# of all benevolent actions?"

set str(3) "???????????????"
# Xue Er: The Master said, "Fine words and an insinuating appearance are
# seldom associated with true virtue."

lappend data3 $str(1) $str(2) $str(3)

foreach row [tabulate_Chinese_Columns $data3 20 ""] {
$txt1 insert end \t${row}\t\n
}

No comments: