Phonetic Writing Systems

l    s    I
i o t
k m
e e l
t o
t h o
h i k
i n s
s g
.

Since this page is about Japanese characters, you're going to need a Japanese font, such as MS Mincho, and a compatible browser to get much out of it beyond the Romaji section. Recent versions of Mozilla, Chrome, Opera, and IE all display this page properly (at least for me) but I can't say anything with certainty about other browsers or earlier versions.

Before getting into the character sets used in Japanese, note that Japanese may be written horizontally or vertically. Horizontal writing is borrowed from the West and, as such, is read in rows, each row read left to right, starting with the topmost row and moving down (like this text). Vertical writing, the traditional Japanese form, is read in columns, each column read top to bottom, starting with the rightmost column and moving left, as shown in the demonstration to the right. Occassionally, these columns are only one row deep, which results in text that reads from right to left (siht ekil), but this is rare outside of decorative uses.

In any case, Japanese uses four different character sets. Here are three of them, in order of what is likely to be increasing foreignness from the perspective of the average Westerner. The fourth, kanji, is on its own page.


ローマ字 (ローマじ) Romaji

This one should be nothing new. It's just the Roman alphabet (the one English uses). It's rarely used in written Japanese, though it does show up occasionally. Though sometimes it appears for impact, or because ASCII tends to be less trouble for computers, the main use seems to be in providing sort of an intermediate level between Western languages like English and standard Japanese. This can make it useful for those beginning to learn the language, though if you're serious about learning Japanese, you're probably better off avoiding it and jumping straight into kana.

It's also ironic that, despite being the standardized spelling, "Romaji" is not a correct romanization of ローマ字 under any system. It should be (spaces optional) "Rouma ji", "Rooma ji", "Rōma ji", or "Rôma ji", all of which indicate the long vowel. Numerous place names, like Tokyo (properly "Toukyou" or the equivalent), and other words that have assimilated into English, like dojo (properly "doujou" or the equivalent), suffer from the same problem. Possibly it's a result of lazy copying dropping the macron from the ō used for a long o in the Hepburn romanization.

There are a few things to watch out for when dealing with romanized Japanese. Just because it looks English doesn't mean it's pronounced like English (the vowels, at least, are closer to Spanish), and there are other quirks that vary depending on which of the several romanization systems you're dealing with. Here are all the important pronunciation points that I can think of for now:

Syllables

Japanese is based on syllables, though linguists insist that they're morae, not syllables, because of some obscure difference between the two terms. Regardless, the point is that each syllable, or mora if you prefer, is pronounced for (roughly) the same amount of time when said correctly (at least officially; there are of course variations in actual usage, such as when someone elongates part of a word for emphasis).

You can check a kana chart (like those below) to see what the morae are, but it's usually fairly simple to pick them out if you know what you're looking for. Each is normally one of the following:

Vowel Pronunciation

These are typically more like Spanish than English.

As you have likely noticed, a long vowel in Japanese (and in most non-English languages, for that matter) has the same sound as the short vowel but is held for a longer period of time. English oddities aside, there's a reason long vowels are called that.

Consonant Pronunciation

Miscellany

Because different people think differently, there are several different romanization schemes. Several official ones, even. I cover those differences and my personal preferences in the section on hiragana.

Back to top

片仮名 (かたかな) Katakana

This character set is primarily used to write words borrowed from other languages. The top two languages borrowed from are English and Portuguese (not counting Chinese, since borrowed Chinese words are typically assimilated more completely into Japanese and written in kanji). However, just because you know an English word that Japanese borrowed doesn't mean you'll be able to pick it out. Since the sounds don't match exactly, words usually have to be adapted to fit the kana available—like ice cream → アイスクリーム (AISU KURIIMU); try saying it out loud, keeping in mind that way Romaji is pronounced—and since there are hardly any redundant sounds in Japanese, homonyms and near-homonyms from other languages typically end up with identical kana (like "race" and "lace", both written レース).

Katakana is additionally used for emphasis, scientific names, sound effects, and possibly other purposes that I haven't come across yet or can't think of at the moment, so don't assume that all words in katakana must automatically be borrowed. It's sort of like the italics of Japanese.

Here's the standard katakana chart and some extended characters (actually variations of the standard in most cases), with my preferred romanization (more on that a bit later). The kana invented to better accommodate foreign words are relatively recent and therefore less common, and often not completely standardized, but I have seen many of them at least occasionally in actual usage.

Standard chart

a

i

u

e

o

ka

ki

ku

ke

ko

sa

shi

su

se

so

ta

chi

tsu

te

to

na

ni

nu

ne

no

ha

hi

fu

he

ho

ma

mi

mu

me

mo

ya

yu

yo

ra

ri

ru

re

ro

wa

wi

we

wo

n or n'
Other morae

ga

gi

gu

ge

go

za

ji

zu

ze

zo

da

dji

dzu

de

do

ba

bi

bu

be

bo

pa

pi

pu

pe

po

(long vowel mark)

(gemination mark)
2-charcter morae
キャ
kya
キュ
kyu
キョ
kyo
ギャ
gya
ギュ
gyu
ギョ
gyo
シャ
sha
シュ
shu
ショ
sho
ジャ
ja
ジュ
ju
ジョ
jo
チャ
cha
チュ
chu
チョ
cho
ヂャ
dja
ヂュ
dju
ヂョ
djo
ニャ
nya
ニュ
nyu
ニョ
nyo
ヒャ
hya
ヒュ
hyu
ヒョ
hyo
ビャ
bya
ビュ
byu
ビョ
byo
ピャ
pya
ピュ
pyu
ピョ
pyo
ミャ
mya
ミュ
myu
ミョ
myo
リャ
rya
リュ
ryu
リョ
ryo
Invented morae
ヴァ
va
ヴィ
vi

vu
ヴェ
ve
ヴォ
vo
クァ
kwa
グァ
gwa
クィ
kwi
グィ
gwi
クェ
kwe
グェ
gwe
クォ
kwo
グォ
gwo
キェ
kye
ギェ
gye
スィ
si
ズィ
zi
シェ
she
ジェ
je
ツァ
tsa
ツィ
tsi
ドゥ,
デュ
du
トゥ,
テュ
tu
ツェ
tse
ツォ
tso
ティ
ti
ディ
di
チェ
che
ニェ
nye
ファ
fa
フャ
fya
フィ
fi
フュ
fyu
フェ
fe
ヒェ
hye
フォ
fo
フョ
fyo
ビェ
bye
ピェ
pye
ミェ
mye
リェ
rye
ウィ
wi
ウェ
we
ウォ
wo

Converting from other languages

What makes katakana so interesting and useful even if you don't know a word of Japanese is that, as explained above, it's most often used to write words that aren't Japanese in origin. Especially in recent years, more katakana words are borrowed from English than from any other language, and video games (just to give an example) frequently give English, or at least pseudo-English, names to items, skills, and so on. If you know katakana and understand how words tend to be adapted, you stand a good chance of being able to figure out the original word. Here are some of the conventions generally used to convert English (specifically, though much of this applies to other languages as well) words to katakana.

Reverting to other languages

Since some tweaking goes on, it's understandable that it can be difficult to decypher a borrowed word, particularly on unusual borrows such as those often found in fiction. Here are some common points of confusion.

Back to top

平仮名 (ひらがな) Hiragana

This is the most commonly used phonetic character set in Japanese writing. Any Japanese word can be written using only hiragana. Hiragana represent the same sounds as katakana, but the sounds added to better fit borrowed words don't normally apply to hiragana, which is not typically used for borrowed words. It can happen, such as when the word needs special emphasis, but it's uncommon. So here's the hiragana chart.

Standard chart
Other morae
2-character morae

a

i

u

e

o
きゃ
kya
きゅ
kyu
きょ
kyo

ka

ki

ku

ke

ko

ga

gi

gu

ge

go
ぎゃ
gya
ぎゅ
gyu
ぎょ
gyo

sa

shi

su

se

so

za

ji

zu

ze

zo
しゃ
sha
しゅ
shu
しょ
sho

ta

chi

tsu

te

to

da

dji

dzu

de

do
じゃ
ja
じゅ
ju
じょ
jo

na

ni

nu

ne

no

ba

bi

bu

be

bo
ちゃ
cha
ちゅ
chu
ちょ
cho

ha

hi

fu

he

ho

pa

pi

pu

pe

po
ぢゃ
dja
ぢゅ
dju
ぢょ
djo

ma

mi

mu

me

mo
にゃ
nya
にゅ
nyu
にょ
nyo

ya

yu

yo
ひゃ
hya
ひゅ
hyu
ひょ
hyo

ra

ri

ru

re

ro
びゃ
bya
びゅ
byu
びょ
byo

wa

wi

we

wo
ぴゃ
pya
ぴゅ
pyu
ぴょ
pyo

n or n'
みゃ
mya
みゅ
myu
みょ
myo

(gemination mark)
りゃ
rya
りゅ
ryu
りょ
ryo

Voiced, Unvoiced, and Semi-Voiced

Those funny little marks:

By now you've probably noticed that many of the basic kana have other kana that look the same except for a few little marks in the corner. There's a reason for that. The consonants k, s, t, and h are what linguists call "unvoiced" or "voiceless" consonants, which means that they are pronounced without the use of the vocal chords. Adding the mark ゛, called the 濁点 (dakuten, "voiced mark") or informally the てんてん (ten ten, "dot dot"), to kana with these consonants produces the equivalent "voiced" consonants g, z, d, and b. As you may have guessed, voiced consonants are those that require use of the vocal chords to pronounce. Additionally, kana with the h consonant may also take the mark ゜, called the 半濁点 (handakuten, "half-voiced mark") or informally the まる (maru, "circle"), to produce the p, a "semivoiced" consonant.

There are also several uses of the dakuten that don't quite fit the normal usage. The katakana ウ (u) may appear with a dakuten as ヴ to represent a 'vu' sound, though the b consonant is used for 'v' just as often. In addition, kana that cannot normally have a dakuten may be written with one when indicating abnormal or distorted noises similar to the base kana. For instance, あ゛ seems to be fairly popular for rendering strangled shouts, though I'm not sure how you'd romanize it.

It seems that linguists also use the handakuten on k kana to represent an 'ng' sound, but I've never seen it personally. Anyway, 'ngu' would look like く゜, for example.


Sorting

The basics:

The usual ordering is called 五十音順 (gojuu on jun, "50-sound order") after the kana table (which originally contained 50 sounds rather than the modern 45), or あいうえお順 (a i u e o jun, "a i u e o order") after the first row of kana, much as English alphabetical order is also called ABC order.

Plain hiragana follow the order of the standard kana chart: あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわゐゑを. This much is fully standardized. ん doesn't exactly fit into the standard chart, but typically comes after を.

The kana は (ha) and へ (he) are considered the same for sorting purposes regardless of whether they're used as particles and pronounced as wa and e (respectively) or used as parts of words and pronounced ha and he.

Except for tiebreaking purposes, all variants of a kana are treated as the same character. Specifically, a hiragana character and the equivalent katakana character are considered the same, unvoiced (は) and voiced (ば) and semivoiced (ぱ) kana are considered the same, and normal-sized (つ) and reduced-sized (っ) kana are considered the same. This is somewhat similar to upper-case and lower-case English letters being considered the same except for tiebreaking purposes, if more complicated.

The ヴ character invented to handle 'v' sounds in foreign words is typically handled as a "voiced" ウ, if only because that's what it looks like. Some instead treat ヴァ as a variant of バ (ba), etc., but while this has the advantage of placing very similar sounds together, it breaks with the usual method of handling each individual kana separately.

As in English, [end of term] comes before any character. In other words, shorter terms come before longer ones that start out the same, and 'same' in this case means the same base kana, ignoring any variants. To give concrete examples, くろ (kuro) comes before ぐろう (gurou) or クロウ (KUROU), each of which come before クロウチ (KUROUCHI). This is much like in English sorting, where "an" comes before "ant", which comes before "antihero".

Kanji have no effect on ordering, in the sense that the kanji themselves do not matter, except when the kanji themselves are being sorted, rather than terms. Kanji terms are sorted by their reading, the way they would appear if written in kana.

Tiebreakers and other tricky stuff:

As noted previously, hiragana and katakana, unvoiced, voiced, and semivoiced kana, and full-sized and small kana are all considered equivalent when not directly competing, and the ー complicates things further. So what happens if two items are identical except for one of these equivalent characters? This is where the tiebreaking comes into play. Unfortunately, the system for doing so appears to be somewhat less than universal.

As if all that weren't a big enough mess already, there's the question to do if the rules you're using conflict. For example, if unvoiced comes before voiced and hiragana comes before katakana, which comes first, が (ga, hiragana, but voiced) or カ (KA unvoiced, but katakana)? Again, there don't seem to be any standardized rules here. Fortunately, this sort of conflict is relatively uncommon, especially in indices and informal lists that aren't likely to spell out their rules. Dictionaries will typically describe what conventions they use.

While I'm no dictionary, I do think it makes sense to define an ordering system, even if I never need to use the full details of it. The examples given in the following steps are invented for convenience and unlikely to correspond to actual words.

  1. Sort first by the base kana, putting shorter terms before longer terms that begin with the same base kana. Regard each kana as an individual unit, regardless of whether or not it's part of a compound sound (きゃ (kya), ヴィ (VI), etc.). For now, regard all variants as the same kana, ignoring voicing, size, and character set. For now, also regard the long vowel marker ー as identical to the preceding vowel sound, including e and o, even though those could be romanized as ei and ou.
    • かあき ⇒ カーキク ⇒ かーきくけ ⇒ カアキクケコ
    • ちゃつ ⇒ ちやつて ⇒ ちゃってと ⇒ ちやってとた
    • はひ ⇒ ばひふ ⇒ はぴぶへ ⇒ ぱひふへほ
  2. If any two (or more) terms are regarded as identical so far but are not written identically, then within these terms, sort unvoiced before voiced and voiced before semi-voiced. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones. Regard ヴ (VU) as a voiced ウ (U).
    • さしす ⇒ さしず ⇒ さじす ⇒ ざしす ⇒ ざしず
    • かきく ⇒ カキグ ⇒ がきく ⇒ ガキグ ⇒ ガギグ
    • ちゃふ ⇒ ちやぶ ⇒ ちゃぷ ⇒ ぢゃぶ ⇒ ぢやぷ
  3. If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort normal-sized kana before small ones. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
    • キヤフオテイ ⇒ キヤフオティ ⇒ キヤフォテイ ⇒ キャフオティ ⇒ キャフォティ
    • きやつえ ⇒ キヤツェ ⇒ きゃつえ ⇒ キャツェ
  4. If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort hiragana before katakana and both before kanji (the long vowel marker counts as whatever the preceeding vowel is). If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
    • あいうえお ⇒ あいうエお ⇒ あいウえオ ⇒ あイうえお ⇒ アイウえお ⇒ アイウエオ
    • えーのー ⇒ ええのオ ⇒ えーノー ⇒ えエのー ⇒ エエノオ
  5. If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort actual kana before the long vowel marker. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
    • パアトナア ⇒ パアトナー ⇒ パートナア ⇒ パートナー
  6. If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, I give up and sort them at random. This could occur when they have identical kana, but different kanji. While there are several kanji-sorting schemes, I'm not familiar enough with any to attempt to use them. Of more immediate concern to me is that several items in my topic index link to more than one topic due to multiple usages, but these all have brief supplemental notes in English that I use as tiebreakers.

いろは order:

An alternate order exists but is rarely used for sorting. Actually a poem known as the いろは (Iroha) after its first three kana, it is remarkable primarily for using each of the 47 kana in use at the time exactly once. The poem is traditionally divided into lines as follows, though this results in breaking up several words:

いろはにほへと
ちりぬるをわか
よたれそつねな
らむうゐのおく
やまけふこえて
あさきゆめみし
ゑひもせす

Though this order is uncommon for sorting, the kana sometimes appear in this order as labels for an ordered list, for example.

For the curious, there is an online classical Japanese database with translations of the いろは.

Back to top

Romanization Conventions

There are at least three different major romanization schemes in use, and that's not counting all the variants from people (like me) who don't care much what's official. Here's a quick guide to certain variants that I'm aware of and which ones I normally use.

KanaVariantsMy preference
しゃ/シャsya, sha, shyasha
し/シsi, shishi
しゅ/シュsyu, shu, shyushu
しょ/ショsyo, sho, shyosho
じゃ/ジャzya, jya, jaja
じ/ジzi, jiji
じゅ/ジュzyu, jyu, juju
じょ/ジョzyo, jyo, jujo
ちゃ/チャtya, cha, chyacha
ち/チti, chichi
ちゅ/チュtyu, chu, chyuchu
ちょ/チョtyo, cho, chyocho
ぢゃ/ヂャdya, dja, djya, ja, jyadja
ぢ/ヂdi, dji, jidji
ぢゅ/ヂュdyu, dju, djyu, ju, jyudju
ぢょ/ヂョdyo, djo, djyo, jo, jyodjo
つ/ツtu, tsutsu
づ/ヅdu, dzu, zudzu
ふ/フhu, fufu
を/ヲwo, owo
ん/ン n' always, n always,
n' when ambiguous but n otherwise,
nn (thanks to typing conversions)
n' when ambiguous
but n otherwise
ら/ラra, lara
り/リri, liri
る/ルru, luru
れ/レre, lere
ろ/ロro, lora
A + ーAA, A-, Â, ĀAA
a + あaa, â, āaa
I + ーII, I-, Î, ĪII
U + ーUU, U-, Û, ŪUU
u + うuu, û, ūuu
E + ーEE, EI, E-, Ê, ĒEE
O + ーOO, OU, OH, O-, Ô, ŌOU
o + おoo, oh, ô, ōoo
o + うoo, ou, oh, ô, ōou
っち/ッチcchi, tchitchi

Occasionally I'll come across something outlandish that's not listed here... and that's when winging it comes into play.

None of this matters when a term has an official romanization. 東京 is "Tokyo" even though it should be Toukyou, ローマ字 is "romaji" instead of ROUMA ji, etc.

All others use the renderings given on the kana charts above. The only exceptions are that I typically romanize the particles は and へ as wa and e, respectively, since that's how they're pronounced, regardless of the kana. Some insist on using ha and he due to the kana, and while that arguably has some merit, it confuses the pronunciation rather than indicating it.

As I see it, my combination of choices has the advantage of approximating the English sounds while assigning a different romanization to every common mora, with the exception of を/ヲ and ウォ, which doesn't matter much because ウォ is only used for borrowed words, while を/ヲ is virtually never used for borrowed words.

What I mean by n being ambiguous at times is with such kana as に, んい, and んに. They all clearly need an i and an n or two, but all three are different and even have different pronunciations. If you make ん always n, then they're ni, ni, and nni, which ignores the difference between に and んい. On the other hand, if it's always n', you get ni, n'i, and n'ni, which, for んに, is redundant and funny-looking, not to mention that it leaves a lot of words with an apostrophe on the end. I prefer ni, n'i, and nni for these reasons. Similarly, I prefer to romanize にゃ, んや, and んにゃ as nya, n'ya, and nnya. This is probably my biggest gripe with the Microsoft Japanese IME—if I type "s o n n a", I expect to see そんな, not the そんあ that it actually gives me. The stupid thing converts "n n" to ん instantly and automatically without any regard to context, when I expect it to have the sense to interpret "n n a" as ん (n) + な (na). If I wanted んあ (n'a), I'd type "n ' a".

It might make more sense to write the r row with ls, considering that I've always thought the consonant sounds more like an l anyway. The r writing is so prevalent, though, that it's essentially uncontestable. Kind of like how モーグリ is a lot closer to "moagly", but "moogle" is too widely known to bother arguing about.

My preference of OU for O + ー is purely because I hate seeing OO for words that use it. This partly stems from seeing some people romanize o + う as oo, which goes entirely against the kana. ありがとう (arigatou) will never be arigatoo to me.

I also can't agree with writing を (wo) as just o. It's not necessarily (depending partially on dialect) the same sound as お (o), even if it is very close.

Back to top