World Library  
Flag as Inappropriate
Email this Article

Japanese language and computers

Article Id: WHEBN0000333528
Reproduction Date:

Title: Japanese language and computers  
Author: World Heritage Encyclopedia
Language: English
Subject: Keyboard layout, Genkō yōshi, Input method, Chinese input methods for computers, Horizontal and vertical writing in East Asian scripts
Collection:
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Japanese language and computers

A Japanese kana keyboard

In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write English is very small, and thus it is possible to use only one byte to encode one English character. However, the number of characters in Japanese is much more than 256, and hence Japanese cannot be encoded using only one byte, and Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding. Some problems relate to transliteration and romanization, some to character encoding, and some to the input of Japanese text.

Character encodings

There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today.

For example, most Japanese emails are in JIS encoding and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of Extended Unix Code. If a program fails to determine the encoding scheme employed, it can cause mojibake (文字化け, "misconverted garbled/garbage characters", literally "transformed characters") and thus unreadable text on computers.

The first encoding to become widely used was JIS X 0201, which is a single-byte encoding that only covers standard 7-bit ASCII characters with half-width katakana extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including old embedded equipment such as cash registers). This means that only katakana, not kanji, was supported using this technique. Some embedded displays still have this limitation.

The development of kanji encodings was the beginning of the split. Shift JIS supports kanji and was developed to be completely backward compatible with JIS X 0201, and thus is in much embedded electronic equipment.

However, Shift JIS has the unfortunate property that it often breaks any parser (software that reads the coded text) that is not specifically designed to handle it. For example, a text search method can get false hits if it is not designed for Shift JIS. EUC, on the other hand, is handled much better by parsers that have been written for 7-bit ASCII (and thus EUC encodings are used on UNIX, where much of the file-handling code was historically only written for English encodings). But EUC is not backwards compatible with JIS X 0201, the first main Japanese encoding. Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus JIS encoding was developed for sending and receiving e-mails.

In character set standards such as JIS, not all required characters are included, so gaiji (外字 "external characters") are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be written using a larger character set (such as Unicode) that supports the required character.

Unicode was intended to solve all encoding problems over all languages. The UTF-8 encoding used to encode Unicode in web pages does not have the disadvantages that Shift-JIS has. Unicode is supported by international software, and it eliminates the need for gaiji. There are still controversies, however. For Japanese, the kanji characters have been unified with Chinese; that is, a character considered to be the same in both Japanese and Chinese is given a single number, even if the appearance is actually somewhat different. This process, called Han unification, has caused controversy. The previous encodings in Japan, Taiwan Area, Mainland China and Korea have only handled one language and Unicode should handle all. The handling of Kanji/Chinese have however been designed by a committee composed of representatives from all four countries/areas. Unicode is slowly growing because it is better supported by software from outside Japan, but still (as of 2011) most web pages in Japanese use Shift-JIS. The Japanese WorldHeritage uses Unicode.

Text input

Written Japanese uses several different scripts: kanji (Chinese characters), 2 sets of kana (phonetic syllabaries) and roman letters. While kana and roman letters can be typed directly into a computer, entering kanji is a more complicated process as there are far more kanji than there are keys on most keyboards. To input kanji on modern computers, the reading of kanji is usually entered first, then an input method editor (IME), also sometimes known as a front-end processor, shows a list of candidate kanji that are a phonetic match, and allows the user to choose the correct kanji. More-advanced IMEs work not by word but by phrase, thus increasing the likelihood of getting the desired characters as the first option presented. Kanji readings inputs can be either via romanization (rōmaji nyūryoku, ローマ字入力) or direct kana input (kana nyūryoku, かな入力). Romaji input is more common on PCs and other full-size keyboards (although direct input is also widely supported), whereas direct kana input is typically used on mobile phones and similar devices – each of the 10 digits (1–9,0) corresponds to one of the 10 columns in the gojūon table of kana, and multiple presses select the row.

There are two main systems for the romanization of Japanese, known as Kunrei-shiki and Hepburn; in practice, "keyboard romaji" (also known as wāpuro rōmaji or "word processor romaji") generally allows a loose combination of both. IME implementations may even handle keys for letters unused in any romanization scheme, such as L, converting them to the most appropriate equivalent. With kana input, each key on the keyboard directly corresponds to one kana. The JIS keyboard system is the national standard, but there are alternatives, like the thumb-shift keyboard, commonly used among professional typists.

Direction of text

Japanese can be written in two directions. Yokogaki style writes left-to-right, top-to-bottom, as with English. Tategaki style writes first top-to-bottom, and then moves right-to-left.

At present, handling of downward text is incomplete. For example, HTML has no support for tategaki and Japanese users must use HTML tables to simulate it. However, CSS level 3 includes a property "writing-mode" which can render tategaki when given the value "vertical-rl" (i.e. top to bottom, right to left). Word processors and DTP software have more complete support for it.

See also

References

External links

  • Japanese Owned computer companies in United States
  • A complete introduction to Japanese character encodings
  • Chinese, Japanese, and Korean character set standards and encoding systems
  • Japanese text encoding
  • A collection of free Japanese typefaces
  • How to install japanese font
  • Online Japanese Dictionary of Linguistics
  • Online Japanese Dictionary
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.