View Full Version : Arabic Transliteration
Arbc_Enthusiast
28-02-08, 05:37 PM
Hi All,
I have been looking at various schemes to transliterate Arabic words into English. I want to write a computer program to automatically do this. I have been mostly following Wiki to get information (http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_%28Arabic%29).
I have discovered that vowels, or the lack of them, is a particular problem.
I read that 'و' can be interpreted as a constant and translated to 'w' or it may be interpreted as a long vowel and interpreted as an 'i'. How can you tell which is the correct interpretation. Likewise, I have a similar problem understanding when 'ي' is a vowel or constant.
One last question, suppose I have a name such as تقي - Taqi - (ت ق ي) . On the above wiki page this name is transliterated as Taqi, while on Google it is transliterated as Taki.
I read that:
ت is T
ق is Q
ي is Y (constant) or I (vowel)
So assuming I know that 'ي' is a vowel, I get Tqi. My question is: Where does the 'a' come from in the T'a'qi transliteration. It appears to be added from nowhere and I'm confused.
Thanks for any pointers you might have,
Best Regards,
Frank.
.: Anna :.
28-02-08, 05:53 PM
Hi Frank
The issue here is if you are making the transliterations from unvocalised text. In Arabic we use harakat to give the vowels as you mentioned, this is dama, fatha,kasra or sukoon, or any of the vowels in tanween, written on top ( or under the for kasra)letters and not incorporated into the main body/shape of the word. So with this name Taqi, on the ta there will be a fatha but it may not be written down.
These harakat are ur key to understand whether waw and ya are as vowels or consonants also. Because a waw when preceeded by a damma will be a vowel, and ya when proceeded by a kasra will be a vowel making oo and ee respectively, but in other cases they can be consonant, eg when proceeded by a sukoon and carrying a vowel themselves.
so ur problem is that a computer programme when dealing with an unvowelled text can not automatically know what the harakat should be, because it might need a human brain to work that out. unless u want to load it with some root patterns to help give it a chance... like you could do this for the verb forms and what is derived from that, and that should work easily like if it sees mf33l, it can register it as form 2 and fill in the gaps for the vowels, but u have a problem still bc it can be in the shape mufa33il or mufa33al depending if active or passive participle.. and other problems like that would show elsewhere too
Arbc_Enthusiast
28-02-08, 07:37 PM
Hi Anna, Thanks for your response.
I had read a little about the Haraket but I couldn't find example uses in real text. Now I understand that most texts are unvocalised and that I need to infer the vowels.
On initial thought, maybe I could get a list of Arabic words together with their English transliterations, perhaps by analysing the entire Arabic Wikipedia.
Then, from these, I could learn where vowels have been added and build a set of rules that I could use in the future.
I understand that each Arabic word has a root but I don't know how to identify that root. However, I'm sure I will find material to help me with this.
I'm not clear on what you mean by root patterns. Say the root is 'xyz', are you suggesting that I could learn that this pattern is usually followed by a particular vowel or learn some information about vowels for this pattern. Also, is mf33l and 'form2' just examples or do the actually mean something specific (sorry if that is a silly question)?
Cheers,
Frank.
.: Anna :.
29-02-08, 06:27 PM
I dont think u can do it from wikipedia.. better if you learn the basis of the Grammar and then u will build up your understanding so that u will be able to correctly vocalise a text by yourself, which would be the first step if you want ur comp prog to do it, u must be able atleast to do it yourself manually first.
The root is triliteral.. so when u see a word, the majority have got 3 consonants. Some examples:
Kitaab... root is k-t-b
Jazeerah... root is j-z-r
actually i think i have a lesson link for this
root system (http://ummah.com/forum/showthread.php?t=63192)
but something to note, common prefix is the letter m so when u see this on the beginning and there are 3 consonants after, meem wdnt b part of the root.
and when u see long vowels, its normally not in the root although written in body of the text. ta marbuta wont be a root.. and other things like this u will learn.
mf33l doesnt mean something, just using f3l to demonstrate the root, that is the convention to use for demonstrations but here i have a link about form 1 and 2 aswell:
form 1 and 2 (http://ummah.com/forum/showthread.php?t=64091)
it explains the patterns
if u go in that forum u can browse maybe some other things relavant like inital waw and hollow verbs, because that affects root identification.
forum is here (http://ummah.com/forum/forumdisplay.php?f=130)
in addition to the verb based patterns, which there are 10 main ones.. there are some other patterns which are noted in nouns or adjectives, but these are less fixed, and are more like just trends.
just one example is a pattern of intensification.. so u have the basic root f-3-l then we apply it with a shadda on the 3 and and alif before the lam.. and that meaning becomes more intense. and there are many like this
Hi all,
This thread caught my attention because I worked on an Arabic transliteration software that I believe members on this forum might find useful.
The software is called eiktub. It is a simple Notepad style text editor that replaces Latin script by its Arabic phonetic equivalent in a smart manner such that the Arabic script is deducible from the English transliteration unambiguously and without necessarily understanding the meaning of the Arabic text. The software is free and can be downloaded from http://www.eiktub.com
This software is so simple to use yet is sophisticated enough such that you can write Qureaanic verses using the Othmanic script as shown in the example here http://www.eiktub.com/screen_shot.html
There is also a Lite version of the software (does not support vowelization i.e. تشكيل) that can be tested at http://www.eiktub.com/online.html with no need for any download or installation.
In addition the same concept can be used to search the Arabic web using the Google search engine as demonstrated at http://search.eiktub.com
I hope you find these three tools useful. I am really interested in your feedback regarding ease of use if the concept or any problems that you encounter.
.: Anna :.
04-03-08, 07:51 AM
not bad :up: can be useful for those who dnt have installed the arabic keyboard, to use as an online keyboard. definately good that u have those tips, otherwise its difficult to know sometimes which letter u assigned for each, eg when u write "al" normally it was only putting the lam, until u go Al- but i think if people use it, they wil become used to the systems requirements and make sure they enter it in the correct way. People using who can not see spelling errors for themselves and fix it though, eg they dont know arabic letters and they think they can just type sound and get the correct word out, i think they will bring alot of mistakes if they just type it and copy what comes immediately
OBlachko
04-03-08, 10:14 AM
hi all!
i hope to learn it but i think its too difficult;)
not bad :up: can be useful for those who dnt have installed the arabic keyboard, to use as an online keyboard. definately good that u have those tips, otherwise its difficult to know sometimes which letter u assigned for each, eg when u write "al" normally it was only putting the lam, until u go Al- but i think if people use it, they wil become used to the systems requirements and make sure they enter it in the correct way. People using who can not see spelling errors for themselves and fix it though, eg they dont know arabic letters and they think they can just type sound and get the correct word out, i think they will bring alot of mistakes if they just type it and copy what comes immediately
Thanks Anna. I am always interested in feedback.
Having a very accurate transliteration scheme is not easy since many Arabic sounds have no Latin equivalent. Our approach was to use Capital letters for heavy sounds: d for daal vs. D for Daad. Still are many other challanges like handling of the hamzat'.
But you are right. A person who does not know Arabic cannot simply type based on phonetics and expect to get the right output. I have an american friend who used the online pad to write me an email. The message had many mistakes but was comprehensible on a basic level.
I personally like to use the search site http://search.eiktub.com to find Arabic resources on the web. It's very handy.
.: Anna :.
04-03-08, 02:25 PM
yeah that is the problem. capital letters is a good solution. also u can use the numbers system (like 3 ayn, 2 hamza etc) but there are still some letters not covered, like daad i dnt think has a number.
another one, is what are u using for dhal and kha, as people commonly write it like dh, kh.. then because the computer thing is a machine and does automatically, it tends to put dal, ha, or kaf ha.. when it sees those.
but overall i think what u produced its good masha allah :up:
Arbc_Enthusiast
05-03-08, 03:19 PM
Thanks both for responses.
I having taking your advice Anna and I am trying to learn the underlying rules of the grammar. As I understand, most Arabic words are formed from 3 base letters; the root. And that several patterns can be applied to this root to produce new related words.
I have had a look at the link you posted for The Root System (http://ummah.com/forum/showthread.php?t=63192). Can you please clarify something for me:
"If we wanted to symbolise the pattern of these following words:
شارِب .1
صاحِبٌ .2
ٌصالِح .3
then we would write : فاعِلٌ "
When I paste these words into different editors I see different things so I have looked at each character individually using a c application:
Example 1 has sin character with a dammatan, an alif, a rá with a kasra, and a bá character
Example 2 has a sád, an alif, a há with a kasra, and a bá with a dammatan.
Example 3 has a sád with a dammatan, an alif, a lám with a kasra, and a há.
And the root has: a fá, an alif, an ain with a kasra and a lam with a dammatan.
There are a few things I don't understand:
1. If the last character in the root, the lam (ل), has a dammatan then why doesn't the last character in each example have a dammatan. Only the 2nd example does.
2. How can the first character in example 1 and example 3 have a dammatan when the first character in the root pattern, fa (ف), does not.
Have I misunderstood something?
On a wider point, I should have said that my ultimate goal is to transliterate Arabic person names. If I have a name such as Muhammad (محمد) then I think
a sample root pattern would look something like this:
ف ع ل د
To vocalise this name, I would need to try several different 'noun' patterns; for example, adding fathas, damma, and kasras. But how does one know which one is the correct or the most likely pattern when there will be only one or two words, as with a person name? I'm beginning to think that knowledge of the grammar rules will not be enough to transliterate a name alone, and that I will require some statistical knowledge about vowel usage to complement the grammar rules.
.: Anna :.
05-03-08, 06:29 PM
oh all of them can take the dammatayn actually, it might not have been written sorry bt they all take it. apologies for that confusion. their pattern is identical.
the root for muhammad is h-m-d and it is in form 2, so the pattern is
mufa33al, this is the passive participle of form 2 verb hammada (form fa33ala).
the meem is not part of the root, that is a common prefix
the_middle_road
05-03-08, 07:22 PM
Also, in the examples that you gave the first letter would not take a domma but a fatha. It must be a mistake on the part of whoever or whatever gave it a domma.
.: Anna :.
06-03-08, 07:53 AM
actually they r not written with a damma on the first? (on what he pasted)
sorry arabic enthusiast i missed that u said that. there is no damma in the body of the word its at the end only. and i dno why u have brought one in the middle?
can never hav a damma or a kasra infront of an alif like that... only a fatha, bc together it makes the long vowel like aa..
they all just follow the pattern: faa3ilun.. exactly that pattern
but really :confused: why u put damma on the 1st letter bc it is not written there
Arbc_Enthusiast
06-03-08, 10:15 AM
Thank you both for taking the time to help me.
I think my main problem is with copying and pasting.
>but really why u put damma on the 1st letter bc it is not written there?
When I copy example 1 (شارِب) into google translator and insert a new line between each character I get:
ٌش
ا
ر ِ
ب
You can see that the first character has a dammatayn added, I checked the Unicode value at it is 064c, i.e. dammatayn. But I figure now that this is wrong and should be at the last character.
> they all just follow the pattern: faa3ilun.. exactly that pattern
When I copy an example of the that pattern into here, I get
ف ا عِ ل ٌ
You can see that the dammatayn is added to the first character rather than the last. Again, I assume this is some sort of copy and paste error. Based on this assumption, I now think I understand what the pattern means:
faa - a fa with an alif
3i - a 3ayn with a kasra
lun - a lam with a domma
>the root for muhammad is h-m-d and it is in form 2, so the pattern is
mufa33al, this is the passive participle of form 2 verb hammada (form fa33ala).
Using form fa33ala, I would get 'hAmMAdA'
fa3 = fa fatha => add the vowel 'a'
3a = 3ayn shadda fatha => double the constant and add vowel 'a'
la = lam fatha => add vowel 'a'
and for pattern mufa33al, I would get muhammad
mu = mim damma
How can you tell which is the correct form to use, i.e form 2 and not one of the other 10? And did you use the passive participle because this was a person name?
Thanks again for your help.
.: Anna :.
06-03-08, 10:35 AM
yeh its a c&p error.. so better if u dont copy and paste but u just look and maybe do it manually. tanween (2 dammas or 2 of any vowel) can only come on the end of a word, not beginning or middle letters
the verb patterns they go like this:
fa3ala
fa33ala
faa3ala
af3ala
tafa33ala
tafaa3ala
infa3ala
ifta3ala
if3alla
istaf3ala
so u look which one applies to the shape which u have, or if any of the derivitives.. (you see they keep same features similar to the verb pattern so can tell.. like muhammad, it has the middle shadda)
the trend is like this
mu on start is often for active and passive participles. active takes kasra and passive fatha, altho not for form 1 its slight dif.. from 2-10 it applies more. eg in form 4 aswel.. muslim, is a active participle, u see with the kasra..
passive its not just cos its a name, but bc of the meaning.. and see bc it has the fatha. if it was muhammid, this means someone who praises, and muhammad is a praised one. some names are in the active, like muhsin.. one who does good, is a form 4 active. root h-s-n, Saalih, is a form 1 active (i said pattern is dif see no meem..) name meaning upright/doing good. root s-l-h..
Arbc_Enthusiast
06-03-08, 11:36 AM
>so u look which one applies to the shape which u have, or if any of the derivitives.. (you see they keep same features similar to the verb pattern so can tell.. like muhammad, it has the middle shadda)
I don't understand. To begin with I only have the name (محمد) or the root h-m-d. I do not know that this will eventually transliterate to muhammad and thus I don't know about the middle shadda. This could not any of the forms be applied?
>mu on start is often for active and passive participles. active takes kasra and >passive fatha, altho not for form 1 its slight dif.. from 2-10 it applies more. eg >in form 4 aswel.. muslim, is a active participle, u see with the kasra..
So the root for muslim is s-l-m
Initally using form 4 af-3a-la, i would get 's-la-ma
Then using the active participle mu-f-3i-lun I would get mu-s-li-mun. Is that correct?
the_middle_road
06-03-08, 12:56 PM
>so u look which one applies to the shape which u have, or if any of the derivitives.. (you see they keep same features similar to the verb pattern so can tell.. like muhammad, it has the middle shadda)
I don't understand. To begin with I only have the name (محمد) or the root h-m-d. I do not know that this will eventually transliterate to muhammad and thus I don't know about the middle shadda. This could not any of the forms be applied?
Yes, if you don't have the harakat and the shadda written then it would be hard to tell which form it would be. Writing it the way you have it, it's possible for it be on different forms.
>mu on start is often for active and passive participles. active takes kasra and >passive fatha, altho not for form 1 its slight dif.. from 2-10 it applies more. eg >in form 4 aswel.. muslim, is a active participle, u see with the kasra..
So the root for muslim is s-l-m
Initally using form 4 af-3a-la, i would get 's-la-ma
Then using the active participle mu-f-3i-lun I would get mu-s-li-mun. Is that correct?
Its correct.
Arbc_Enthusiast
06-03-08, 01:16 PM
Thanks a lot for clarifying that. I know have a much better understanding of the root system.
In terms of the original problem of transliterating unvocalised text, I don't see how a knowledge of the root system will help. The only other way I can think of transliterating a name such as (محمد) is to first transliterate it to m-h-m-d and then use some statistics to infer the most likely vowels.
Thanks a lot for clarifying that. I know have a much better understanding of the root system.
In terms of the original problem of transliterating unvocalised text, I don't see how a knowledge of the root system will help. The only other way I can think of transliterating a name such as (محمد) is to first transliterate it to m-h-m-d and then use some statistics to infer the most likely vowels.
The problem your are getting into goes beyond transliteration. Because of the missing vowel you will be getting into analysis of whole sentences.
The root system can help you identify some of the missing vowels. To identify all you have to examine the usage of the word. e.g. if it is a verb, is it past or present? If it is a noun is it a subject or an object? But it's much more complicated than that because the Arabic grammar is more complex than the English.
Example: Drb cab be interpreted as
- Darab (verb to beat)
- Durib (passive form of verb to beat)
- Darb (noun beating)
so you have to analyze the whole sentence which makes it tricky. and even in this case the solution might not be unique.
Arbc_Enthusiast
06-03-08, 05:53 PM
Thanks eiktub.
I have a few idea's that I am going to try out. I'll post how successful\unsucessful I am.
Thanks eiktub.
I have a few idea's that I am going to try out. I'll post how successful\unsucessful I am.
If you don't mind asking.. are you doing this on your own or for a company?
Arbc_Enthusiast
07-03-08, 06:33 PM
Hey. It's for a company; it's a side project that I am working on alone. I find it a pretty interesting topic so I'm going to try an spend as much time on this as possible.
yeah that is the problem. capital letters is a good solution. also u can use the numbers system (like 3 ayn, 2 hamza etc) but there are still some letters not covered, like daad i dnt think has a number.
another one, is what are u using for dhal and kha, as people commonly write it like dh, kh.. then because the computer thing is a machine and does automatically, it tends to put dal, ha, or kaf ha.. when it sees those.
but overall i think what u produced its good masha allah :up:
eiktub has been updated to support the common Arabic instant messaging representations:
3 for عين
7 for حاء
7' for خاء
9 for صاد
9' for ضاد
2 for همزة
I hope this makes typing on eiktub much easier.
.: Anna :.
24-03-08, 12:55 PM
masha allah thats good :up:
how about also 5 for kha or 6 for Ta, cos ppl use those too
Thanks Anna, I wondered about that.
In the wikipedia Arabizi page they show many conflicting schemes. I am not sure which one is more popular... but I will take your word that 5 and 6 should be implemented for Kh and T.
5 is khaae
6 is Taae
6' is Zhaae
Great news for FireFox users. eiktub has released a FireFox Add-on that allows writing Arabic in any web form as long as you use Firefox as browser. This will be a handy tool for sending emails, Blogging, or writing on forums such as here.
You could download the Add-on from this link:
https://addons.mozilla.org/en-US/firefox/addon/7240
To use it:
1 - Simply right click a web form and select eiktub - اكتب عربي
2- A text pad will show up, use it to write Arabic
3 - When done, hit the "Insert" button to transfer the text to the web page you were browsing.
Enjoy!
mastralvarado
11-05-08, 11:50 PM
Great news for FireFox users. eiktub has released a FireFox Add-on that allows writing Arabic in any web form as long as you use Firefox as browser. This will be a handy tool for sending emails, Blogging, or writing on forums such as here.
You could download the Add-on from this link:
https://addons.mozilla.org/en-US/firefox/addon/7240
To use it:
1 - Simply right click a web form and select eiktub - اكتب عربي
2- A text pad will show up, use it to write Arabic
3 - When done, hit the "Insert" button to transfer the text to the web page you were browsing.
Enjoy!
:salams,
Thank you very much indeed!:up:
I already installed it and it works flawlessly.:coolbro:
vBulletin® v3.7.2, Copyright ©2000-2008, Jelsoft Enterprises Ltd.