Proiectul Wiki(pedia) de traducere automată 

/

 Proyecto Wiki(pedia) para traducción automática

Page start up on 06.04.2024_10.20 (UTC+1 / Castellón de la Plana,  Comunidad Valenciana, España). 21ºC, Humedad 61%,  Viento 10 km/.

Proyecto Wiki(pedia) de traducción automática - Wikipedia en español.> 0

Wiki(pedia) Machine Translation Project - English Wikipedia. > Machine translation >  21:44, 27 March 2024Pppery talk contribs‎  25 950 bytes  

Проект Wiki(pedia) машинного перевода. - Русская Википедия. > 0

Proiectul Wiki(pedia) de traducere automată . - Wikipedia în limba română. > 0

+ 1  languages (Strojový překlad Wikipedie > 30. 6. 2021, 20:33Mario7 diskuse příspěvky‎  1 867 bajtů).  


Wikipedia - EcuRed (Enciclopedia Colaborativa Cubana)

The purpose of the Wiki(pedia) Machine Translation Project is to develop ideas, methods and tools that can help translate Wikipedia articles (and Wikimedia pages) from one language to another, particularly out of English and into languages with small numbers of fluent speakers.

Remember to read the current talk page and particularly what is stated it the Wikipedia Translation page:

Wikipedia is a multilingual project. Articles on the same subject in different languages can be edited independently; they do not have to be translations of one another or correspond closely in form, style or content. Still, translation is often useful to spread information between articles in different languages.

Translation takes work. Machine translation, especially between unrelated languages (e.g. English and Japanese), produces very low quality results. Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing. (see for example here). The translation templates have links to machine translations built in automatically, so all readers should be able to access machine translations easily.

Remember that if the idea would be to simply run a Wikipedia article in a fully-automatic machine translation system (such as Google Translate), there would be no point in adding the results to the "foreign" Wikipedia: a user should just feed the system with the desired URL.


Motivation

Small languages can't produce articles as fast as Wikimedia projects in languages such as English, Japanese, German or Spanish, because the number of wikipedians is too low and some prefer to contribute to bigger projects. One potential solution for this problem, in discussion since 2002, is the translation of Wikimedia projects. As some languages will not have enough translators, Machine Translation can improve the productivity of the community. This sort of automatic translation would be a first step for manual translations to be added and corrected later, while the local communities develop.

A second, but very important motivation, is the development of free tools for Computational Linguistics and Natural Language Processing. These fields are very important, but resources for small languages are usually inexistent, low-quality, expensive and/or restricted in their usage. Even for "big" languages such as English, free resources are still short. We could develop...

[subscribe]


Approaches


Interlingua approach

A different, but related approach would be translating articles into a machine translation interlingua like UNL, then writing software modules to translate automatically from that interlingua into each target language. The initial translation could be created fully by hand, or machine translated with humans verifying accuracy of the translation and choosing between multiple alternatives. This only saves work with respect to direct translation if there are several target languages whose modules are well enough developed, but those modules are much easier to write, and expectably more accurate, than full real-language to real-language automatic translation systems.


Translating between closely related languages

I would imagine it would be an easier task to translate between similar languages than non-similar ones. For example, we have Wikipedias in Catalan and Spanish and Macedonian and Bulgarian, perhaps even Dutch and Afrikaans (some more studies would have to be done to evaluate which would be most appropriate). There is some free software being produced in Spain called en:Apertium that might be useful here.


Suggested statistical approach


(for more information on these data, see my talk page at the English Wikipedia Tresoldi 16:22, 13 March 2010 (UTC))[reply]


Evaluating with Wikipedias

Originally found in the Apertium Wiki.

One of the ways of improving an MT system, and at the same time improve and add content in Wikipedias, is to use Wikipedias as a test bed. You can translate text from one Wikipedia to another, then either post-edit yourself, or wait for, or ask other people to post-edit the text. One of the nice things is that MediaWiki (the software Wikipedia is based on) allows you to view diffs between the versions (see the 'history' tab).

This strategy is beneficial both to Wikipedia and to any machine translation system, such as Apertium or a statistical one based in Moses. Wikipedia gets new articles in languages which might not otherwise have them, and the machine translation system gets information on how we can improve the software. It is important to note that Wikipedia is a community effort, and that rightly people can be concerned about machine translation. To get an idea of this, put yourself in the place of people having to fix a lot of "hit and run" SYSTRAN (a.k.a. BabelFish) or Google Translate translations, with little time and not much patience.

Guidelines

An example of the kind of conversation you might have is found here.

How to translate

In order to be more useful, when you create the page, first paste in the unedited machine translation output. Save the page with an edit summary saying that you're still working on it. Then proceed to post-edit the output. After you've finished, save the page again. If you go to the history tab at the top of the page and do "Compare selected versions" you will see the differences (diff) between the machine translation and the post-edited output. This gives a good indication of how good the original Apertium output was.

It's also helpful if you first paste the input. Then you can compare 1. input, 2. MT output, 3. post-edit (keeping the input text in the article history might be useful if you want to compare old MT-output with a newer version of the machine translator)


Existing free software

Attempts

Several projects were or have been started to use computer assisted translation on Wikimedia projects. An incomplete list follows, for projects conducted on the Wikimedia projects themselves.

In other cases they were used outside, or not programmatically:

Resources

General

Dictionaries

Corpora

Bibliography

See also

Generic English Wikipedia articles

The purpose of the Wiki(pedia) Machine Translation Project is to develop ideas, methods and tools that can help translate Wikipedia articles (and Wikimedia pages) from one language to another, particularly out of English and into languages with small numbers of fluent speakers.

Remember to read the current talk page and particularly what is stated it the Wikipedia Translation page:

Wikipedia is a multilingual project. Articles on the same subject in different languages can be edited independently; they do not have to be translations of one another or correspond closely in form, style or content. Still, translation is often useful to spread information between articles in different languages.

Translation takes work. Machine translation, especially between unrelated languages (e.g. English and Japanese), produces very low quality results. Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing. (see for example here). The translation templates have links to machine translations built in automatically, so all readers should be able to access machine translations easily.

Remember that if the idea would be to simply run a Wikipedia article in a fully-automatic machine translation system (such as Google Translate), there would be no point in adding the results to the "foreign" Wikipedia: a user should just feed the system with the desired URL.


Motivation

Small languages can't produce articles as fast as Wikimedia projects in languages such as English, Japanese, German or Spanish, because the number of wikipedians is too low and some prefer to contribute to bigger projects. One potential solution for this problem, in discussion since 2002, is the translation of Wikimedia projects. As some languages will not have enough translators, Machine Translation can improve the productivity of the community. This sort of automatic translation would be a first step for manual translations to be added and corrected later, while the local communities develop.

A second, but very important motivation, is the development of free tools for Computational Linguistics and Natural Language Processing. These fields are very important, but resources for small languages are usually inexistent, low-quality, expensive and/or restricted in their usage. Even for "big" languages such as English, free resources are still short. We could develop...

[subscribe]


Approaches


Interlingua approach

A different, but related approach would be translating articles into a machine translation interlingua like UNL, then writing software modules to translate automatically from that interlingua into each target language. The initial translation could be created fully by hand, or machine translated with humans verifying accuracy of the translation and choosing between multiple alternatives. This only saves work with respect to direct translation if there are several target languages whose modules are well enough developed, but those modules are much easier to write, and expectably more accurate, than full real-language to real-language automatic translation systems.


Translating between closely related languages

I would imagine it would be an easier task to translate between similar languages than non-similar ones. For example, we have Wikipedias in Catalan and Spanish and Macedonian and Bulgarian, perhaps even Dutch and Afrikaans (some more studies would have to be done to evaluate which would be most appropriate). There is some free software being produced in Spain called en:Apertium that might be useful here.


Suggested statistical approach


(for more information on these data, see my talk page at the English Wikipedia Tresoldi 16:22, 13 March 2010 (UTC))[reply]


Evaluating with Wikipedias

Originally found in the Apertium Wiki.

One of the ways of improving an MT system, and at the same time improve and add content in Wikipedias, is to use Wikipedias as a test bed. You can translate text from one Wikipedia to another, then either post-edit yourself, or wait for, or ask other people to post-edit the text. One of the nice things is that MediaWiki (the software Wikipedia is based on) allows you to view diffs between the versions (see the 'history' tab).

This strategy is beneficial both to Wikipedia and to any machine translation system, such as Apertium or a statistical one based in Moses. Wikipedia gets new articles in languages which might not otherwise have them, and the machine translation system gets information on how we can improve the software. It is important to note that Wikipedia is a community effort, and that rightly people can be concerned about machine translation. To get an idea of this, put yourself in the place of people having to fix a lot of "hit and run" SYSTRAN (a.k.a. BabelFish) or Google Translate translations, with little time and not much patience.

Guidelines

An example of the kind of conversation you might have is found here.

How to translate

In order to be more useful, when you create the page, first paste in the unedited machine translation output. Save the page with an edit summary saying that you're still working on it. Then proceed to post-edit the output. After you've finished, save the page again. If you go to the history tab at the top of the page and do "Compare selected versions" you will see the differences (diff) between the machine translation and the post-edited output. This gives a good indication of how good the original Apertium output was.

It's also helpful if you first paste the input. Then you can compare 1. input, 2. MT output, 3. post-edit (keeping the input text in the article history might be useful if you want to compare old MT-output with a newer version of the machine translator)


Existing free software

Attempts

Several projects were or have been started to use computer assisted translation on Wikimedia projects. An incomplete list follows, for projects conducted on the Wikimedia projects themselves.

In other cases they were used outside, or not programmatically:


Resources

General


Dictionaries


Corpora


Bibliography


See also

Generic English Wikipedia articles

The purpose of the Wiki(pedia) Machine Translation Project is to develop ideas, methods and tools that can help translate Wikipedia articles (and Wikimedia pages) from one language to another, particularly out of English and into languages with small numbers of fluent speakers.

Remember to read the current talk page and particularly what is stated it the Wikipedia Translation page:

Wikipedia is a multilingual project. Articles on the same subject in different languages can be edited independently; they do not have to be translations of one another or correspond closely in form, style or content. Still, translation is often useful to spread information between articles in different languages.

Translation takes work. Machine translation, especially between unrelated languages (e.g. English and Japanese), produces very low quality results. Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing. (see for example here). The translation templates have links to machine translations built in automatically, so all readers should be able to access machine translations easily.

Remember that if the idea would be to simply run a Wikipedia article in a fully-automatic machine translation system (such as Google Translate), there would be no point in adding the results to the "foreign" Wikipedia: a user should just feed the system with the desired URL.


Motivation

Small languages can't produce articles as fast as Wikimedia projects in languages such as English, Japanese, German or Spanish, because the number of wikipedians is too low and some prefer to contribute to bigger projects. One potential solution for this problem, in discussion since 2002, is the translation of Wikimedia projects. As some languages will not have enough translators, Machine Translation can improve the productivity of the community. This sort of automatic translation would be a first step for manual translations to be added and corrected later, while the local communities develop.

A second, but very important motivation, is the development of free tools for Computational Linguistics and Natural Language Processing. These fields are very important, but resources for small languages are usually inexistent, low-quality, expensive and/or restricted in their usage. Even for "big" languages such as English, free resources are still short. We could develop...

[subscribe]


Approaches


Interlingua approach

A different, but related approach would be translating articles into a machine translation interlingua like UNL, then writing software modules to translate automatically from that interlingua into each target language. The initial translation could be created fully by hand, or machine translated with humans verifying accuracy of the translation and choosing between multiple alternatives. This only saves work with respect to direct translation if there are several target languages whose modules are well enough developed, but those modules are much easier to write, and expectably more accurate, than full real-language to real-language automatic translation systems.


Translating between closely related languages

I would imagine it would be an easier task to translate between similar languages than non-similar ones. For example, we have Wikipedias in Catalan and Spanish and Macedonian and Bulgarian, perhaps even Dutch and Afrikaans (some more studies would have to be done to evaluate which would be most appropriate). There is some free software being produced in Spain called en:Apertium that might be useful here.


Suggested statistical approach


(for more information on these data, see my talk page at the English Wikipedia Tresoldi 16:22, 13 March 2010 (UTC))[reply]


Evaluating with Wikipedias

Originally found in the Apertium Wiki.

One of the ways of improving an MT system, and at the same time improve and add content in Wikipedias, is to use Wikipedias as a test bed. You can translate text from one Wikipedia to another, then either post-edit yourself, or wait for, or ask other people to post-edit the text. One of the nice things is that MediaWiki (the software Wikipedia is based on) allows you to view diffs between the versions (see the 'history' tab).

This strategy is beneficial both to Wikipedia and to any machine translation system, such as Apertium or a statistical one based in Moses. Wikipedia gets new articles in languages which might not otherwise have them, and the machine translation system gets information on how we can improve the software. It is important to note that Wikipedia is a community effort, and that rightly people can be concerned about machine translation. To get an idea of this, put yourself in the place of people having to fix a lot of "hit and run" SYSTRAN (a.k.a. BabelFish) or Google Translate translations, with little time and not much patience.

Guidelines

An example of the kind of conversation you might have is found here.

How to translate

In order to be more useful, when you create the page, first paste in the unedited machine translation output. Save the page with an edit summary saying that you're still working on it. Then proceed to post-edit the output. After you've finished, save the page again. If you go to the history tab at the top of the page and do "Compare selected versions" you will see the differences (diff) between the machine translation and the post-edited output. This gives a good indication of how good the original Apertium output was.

It's also helpful if you first paste the input. Then you can compare 1. input, 2. MT output, 3. post-edit (keeping the input text in the article history might be useful if you want to compare old MT-output with a newer version of the machine translator)


Existing free software

Attempts

Several projects were or have been started to use computer assisted translation on Wikimedia projects. An incomplete list follows, for projects conducted on the Wikimedia projects themselves.

In other cases they were used outside, or not programmatically:


Resources

General


Dictionaries


Corpora


Bibliography


See also

Generic English Wikipedia articles