Machine translation: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
No edit summary
Line 1: Line 1:
1. Remember the human.
{|
Never forget that the person reading your mail or posting is, indeed, a person, with feelings that can be hurt.
|-
Corollary 1 to Rule #1: It's not nice to hurt other people's feelings.
|valign=top|  
Corollary 2: Never mail or post anything you wouldn't say to your reader's face.
|width=2|
Corollary 3: Notify your readers when flaming.
|valign=top rowspan=2 width=12%|
2. Adhere to the same standards of behavior online that you follow in real life.

Corollary 1: Be ethical.
<small>
Corollary 2: Breaking the law is bad Netiquette.
<!-- translations of page purpose -->
3. Know where you are in cyberspace.
<center>[[Meta:Interlanguage links|''Translate this page'']]!</center>
Corollary 1: Netiquette varies from domain to domain.

Corollary 2: Lurk before you leap.
Help us develop tools for translating Wikipedia.
4. Respect other people's time and bandwidth.

Corollary 1: It's OK to think that what you're doing at the moment is the most important thing in the universe, but don't expect anyone else to agree with you.
|-
Corollary 2: Post messages to the appropriate discussion group.
|valign=top|
Corollary 3: Try not to ask stupid questions on discussion groups.
The purpose of the '''Wikipedia Machine Translation Project''' is to develop ideas, methods and tools that can help translate Wikipedia articles from one language to another (particularly out of English and into languages with small numbers of fluent speakers).
Corollary 4: Read the FAQ (Frequently Asked Questions) document.

Corollary 5: When appropriate, use private email instead of posting to the group.
==Motivation==
Corollary 6: Don't post subscribe, unsubscribe, or FAQ requests.
<br>Small languages can't produce articles as fast as English wikipedia because the number of wikipedians is too low. The solution for this problem is the translation of English wikipedia. But, some languages will not have enough translators. Machine Translation can improve the productivity of the community.
Corollary 7: Don't waste expert readers' time by posting basic information.

:But manual translation can be added later, for a more accurate text.

==TradWiki/WikiTran==
<br> TradWiki/WikiTran (WikipediaTranslator/WikiTranslator/BabelWiki) is a wiki that will be coded to help wikipedians translate articles from English to other languages.
*I rather like '''WikiTran''' myself. --[[user:Stephen Gilbert|Stephen Gilbert]]
::I prefer Wikibabel, in a similar way to WIKIpedia, WIKIspecies and so on.
::How about Wikitongues? - [[User:FrancisTyers|FrancisTyers]] 21:21, 16 October 2005 (UTC)

===License===
All code and data should be released under a free licence ([[GPL]] for code, [[GFDL]] for text).

'''Advantages'''
*faster translation of wikipedia
*generation of large amounts of useful data (corpora).
*creation of a useful tool

===TradWiki/WikiTran - Translation memory approach===
<br>A Translation Memory is a computer program that uses a database of old translations to help a human translator. If this approach is followed, WikipediaTranslator will need the following features:
*visualization of translated and original versions
*split of original versions on several parts for individual translation

===Lexical, syntactic and semantic analysis of wikipedia content===
<br>The first step for wikipedia translation is the analysis of wikipedia's content. This analysis will determine:
*Number of words and sentences
*Words distribution
*Frequency of the most popular sentences and expressions
*Semantic relations between words and between sentences
*Syntactic analysis of all sentences

:It would be interesting the user could click on every word in an article to link to the wiktionary definition, if there is not an inside wikipedia article. And indicate to the software to translate the word into another language ( using the right mouse clicking).

Information about the most popular sentences and expressions can be used to create a translation database of such expressions so translators don't need to repeat a translation.

:Yes, a database of [[idiom]]s
::You mean like a [[w:translation memory]] system?

==Resources:==
*General
**[http://www.faganfinder.com/translate/ Fagan Finder Translation Wizard] - single interface to many free online translators
*Dictionaries
**[http://www.ibiblio.org/dbarberi/dict/ Dutch to English Translation Tools] (source available)
**English dictionary
***[http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html#English-dicts Ispell]
***[[w:Wiktionary|Wiktionary]]
***http://dict.leo.org
**Portuguese dictionary
***[http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html#Portuguese-dicts Ispell]
**English-Portuguese dictionary
**Ergane (free dictionary, several languages)
***http://download.travlang.com/Ergane/frames-en.html
**WWWJDIC (English-Japanese/Japanese-English dictionary)
***http://www.csse.monash.edu.au/~jwb/wwwjdic.html
**Papillon project (free multilingual dictionary buillt by computational linguists)
***http://www.papillon-dictionary.org/Home.po
***Chinese, English, Estonian, French, German, Japanese, Lao, Malay, Thaï
**All Free Dictionaries Project - Dicts.info
***http://www.dicts.info/
***37 interconnected languages - 417,521 entries - 1000+ multilingual dictionaries

*Translation rules
*Code
:Unfortunately none of these projects seem to have been updated since around 2003.
**[[w:Interlingua Translator|Interlingua Translator]] (Translator under LGPL)
***http://intertrans.sourceforge.net/
****Translate every text in an abstract unique digital Interlingua (Parser)
****Translate the Interlingua to the new text (Generator)
****Written in Java
**[[w:GPLTran|GPLTran]] (Translator under GPL)
***http://www.translator.cx
****Supposed to translate paragraphs or entire webpapges
****Paragraph translation is spotty and buggy
****Web translation doesn't seem to work at all.
***''Actually'' this isn't machine translation, it is a literal word-for-word translation
***Download code at http://www.translator.cx/dist/
**[[w:Linguaphile|Linguaphile]] (Translator under GPL)
***http://linguaphile.sourceforge.net/
****open source, platform independent, and programmed in perl
****simplistic and easy to use command line translator
****56 languages
***Download code at http://www.translator.cx/dist/
**[[w:Traduki|Traduki]]
***C/Lua-based project, uses the metalanguage approach with Esperanto for lexycal content (to some extent)
***Project restarted in 2003, current being developed
***http://traduki.sourceforge.net (version 0.2 released, and translates "The dog eats the apple" to Esperanto: "La hundo mangxas la pomon")
:I like the idea use traduki. One can use traduki keys to stablish relations between words in different languages. I.e. hundo is the key to en:dog, es:perro and so on. So, going to hundo, you can add another translation to other lnnguages, without add language: links in the es:perro article, for example.

**http://www.link.cs.cmu.edu/link/ -- Link Grammar
*Databases
**http://www.cogsci.princeton.edu/~wn/links/ -- <nowiki>WordNet</nowiki>, a lexical database for the English language.

===Links===
*general
**Links on [[Machine translation]] (MT): http://www.ife.dk/url-mt.htm
**Machine translation (MT), and the future of the translation industry http://accurapid.com/journal/15mt.htm
**Machine Translation: an Introductory Guide: http://clwww.essex.ac.uk/MTbook/
*Visual Interactive Syntax Learning: http://visl.sdu.dk/visl/
*wikipedia articles
**[http://www.wikipedia.com/wiki/Machine_translation Machine translation]
**[http://www.wikipedia.com/wiki/Language_translation Language translation]
**[http://www.wikipedia.com/wiki/Translation Translation]
*Free translations on the web
**http://anglahindi.iitk.ac.in/ (An English to Hindi Machine Aided Translation System: an ongoing project at IIT Kanpur, India)
**http://www.google.com/language_tools (uses Systran software)
**http://www.freetranslation.com/
**http://www.systransoft.com/
**http://babelfish.altavista.com/ (uses Systran software)
**http://www.babylon.com/
**http://www.translator.cx (GPLTran)
**http://www.reverso.net/textonly/default_ie.asp
**http://www.worldlingo.com/en/microsoft/computer_translation.html Works good, many languages (at least somewhat Systran software)
**http://www.nifty.com/globalgate/ Japanese
*[[en:Neural network|Neural nets]]
*Machine translation
*Translations memories
*wired magazine
**[http://www.wired.com/wired/archive/8.05/timeline.html Wired 8.05: Machine Translation's Past and Future]
**[http://www.wired.com/wired/archive/8.05/tpmap.html Wired 8.05: Universal Translators]
**[http://www.wired.com/wired/archive/8.05/translation.html Wired 8.05: Talking to Strangers]
**[http://www.wired.com/wired/archive/4.10/geek.html 4.10: Geek Page - Using interlinguas to map between languages]
*Portuguese
**Processamento Computacional do Português http://www.linguateca.pt/index.html
*Meta-language
**http://www.undl.org A United Nations project based on an artificial, machine-readable language (UNL). The idea is to semi-automatically create a UNL text from, say, English, then have it fully-automatically translated in up to 150 languages on-the-fly. The project is now an independent organization.
*The World Wide Translator (The Tragedy of the Anticommons of translations memories)
**http://www.technologyreview.com/articles/01/09/wo_leo092101.asp

References:
* [http://xxx.lanl.gov/abs/cmp-lg/9706026 A Word-to-Word Model of Translational Equivalence]

== Discussion ==

See the [[talk:Wikipedia Machine Translation Project]] page.

Revision as of 13:28, 31 October 2005

1. Remember the human. Never forget that the person reading your mail or posting is, indeed, a person, with feelings that can be hurt. Corollary 1 to Rule #1: It's not nice to hurt other people's feelings. Corollary 2: Never mail or post anything you wouldn't say to your reader's face. Corollary 3: Notify your readers when flaming. 2. Adhere to the same standards of behavior online that you follow in real life. Corollary 1: Be ethical. Corollary 2: Breaking the law is bad Netiquette. 3. Know where you are in cyberspace. Corollary 1: Netiquette varies from domain to domain. Corollary 2: Lurk before you leap.

4. Respect other people's time and bandwidth.

Corollary 1: It's OK to think that what you're doing at the moment is the most important thing in the universe, but don't expect anyone else to agree with you. Corollary 2: Post messages to the appropriate discussion group. Corollary 3: Try not to ask stupid questions on discussion groups. Corollary 4: Read the FAQ (Frequently Asked Questions) document. Corollary 5: When appropriate, use private email instead of posting to the group. Corollary 6: Don't post subscribe, unsubscribe, or FAQ requests. Corollary 7: Don't waste expert readers' time by posting basic information.