— tomauger.com

Crowdsourcing WordPress i18n

During one of the wrap-up talks at WordPress Community Summit 2012, @nacin dropped the hint that a crack team was working on modifying the i18n/l10n system within WordPress to allow incremental / on-demand deployment of “language packs” with new .mo files, independently of the core update cycle.

This is a huge step toward enabling complete localization of WordPress, but the scary-cool part is for plugin developers – opening the door for third-party translations to be easily pushed out without necessitating a full dot revision.

[Skip all the verbage and take me to the pretty UI pictures]

Much (okay, all) of the details were clouded in much mystery, but it seems that there would be some kind of server / repo component (I envision something like Trac, only not like Trac) where designated community translators can log in, create translations for core, plugins, themes, whatever, and then push them out into language packs, which would then pop up in your Updates panel whenever your wp-cron did an update check.

Cool. But There’s a Bottleneck.

The bottleneck of course are the community translators themselves. Having the responsibility to translate all those strings from all those plugins, particularly if the plugin addresses functionality from some knowledge domain outside the purview of the translator’s expertise. This could be the case of building a state-of-the-art sports arena that never gets filled to capacity.

But if we crowdsourced the translations?

“Crowdsourcing” is such a buzzword these days, but in the case of a community-driven project like WordPress, especially in the domain of i18n, it couldn’t be more appropriate. All that remains is setting up the technology in such a way as to make it really easy for any user to submit their translations, improvements, even new locales, without even knowing what a .pot, .po or .mo file is, and without having to visit GlotPress (which I’m not even going to link to here, because trust me, you’ll just find the site confusing).

So here’s a draft UI. Beyond proposing a home for i18n features within the WordPress Admin, I’m proposing a very user-centric way of submitting translations / improvements / spelling fixes for core, plugins and even themes. When a user enters “Translation Mode” for a specific component (ie: core, a plugin, a theme), all localizable text – __() and its derivatives – becomes “hot” and can be translated / fixed with a click of a button.

Where do we go from here?

I’d like to see whether this approach has merit. I am soliciting comments from the community. Please don’t be afraid of Disqus – feel free to comment anonymously.

  • Cátia Kitahara

    It looks awesome, however I have my doubts about how the translations are aproved. I don’t think it is a good idea that it’s based on votes because it may happen that a translation which seems more adequate and have the highest score, may break things. Example: in the Brazilian Portuguese Community when we first translated WordPress we decided not to translate the word “post”, there was a huge discussion at the time, some people (including me) prefered to translate it as “artigo”. Many people today still think it should be translated this way and if there’s a votation based only on the best translation, it would probably win. But, with custom post type, many people may have created a post type named “artigo” (I did it) making it not possible, or at least a good idea, to change how we translate “post” into “artigo”. Can you imagine the confusion it would cause in a users mind when he/she updates their site and there are two itens on the menu named “artigo”? What we must have in mind is that the job of translating isn’t a mechanical thing and there are many factors involved that influences why things are translated in a way and not in another, including technical ones. I agree with you that we must find a way of diminishing the burden of translating that many plugins, however I believe that there still must exist a group of translators with enough technical and linguistic knowledge in the end. I don’t see a way of excluding this necessity.

  • I fully agree with Cátia in that a) voting for (very subjective) translations is a flawed system at best and b) looking at string translation as a simple mechanical process is even more flawed. Nothing against voting *within* a community, but separating the translations themselves from the people behind it, i.e. the local communities, feels like a danger to that community; no more incentive to join and discuss, not only translations, but inevitably the whole of WordPress in that region or language group (think developers, designers, meetups, WordCamps, et al). By seeing translations in my dashboard, I needn’t be aware of the whole context of debate around it, which is not a good thing.

    That said, I absolutely love the idea of being able to report mistranslated strings right from their specific context. This would help immensely; the number of strings is growing and anything that can help us translators spot errors and inconsistencies is most welcome. I, for one, am available to help make that happen.

  • Pingback: WordPress › Your thoughts on Crowdsourcing WordPress i18n Please… « WordPress Polyglots

  • Cátia Kitahara

    By seeing translations in my dashboard, I needn’t be aware of the whole context of debate around it, which is not a good thing.

    Unless there’s a link to the community, then it would be a good way of bringing more people to participate in it.
    I also love the idea of reporting the errors.

  • Gabriel Reguly

    It would be fantastic to have it bound to some glossary, where one could see the community approved way of doing the translations.

    Also I agree with Cátia and Zé, it should not work as an automated tool, but as a way to bring new translators to the communities.

    Regarding localizable text becoming ‘hot’, I can foresee an issue of a
    localizable text already being inside a link, as in localizable text where it would end up
    as localizable text.

  • Gabriel Reguly

    Poor Disqus, it does not know how to filter HTML :-(

    Follows my last sentence:

    Regarding localizable text becoming ‘hot’, I can foresee an issue of a localizable text already being inside a link, as in < a href="original_link >localizable text</a> where it would end up as <a href="original_link><a href="hot_translation_thing">localizable text</a></a>.

  • Tom Auger

    Thanks Ze and Catia (forgive my lack of diacritical) for your comments. I think I understand. I believe what you are saying is that translation is one of the main factors that can galvanize a community that might not get the same mainstream attention that the English-speaking community enjoys; I understand that one of the concerns is that if you remove this common and important task from the community, you lose a strong driver that foments discussion and solidarity.

    I’m not willing to totally part with the crowsdsourcing idea yet – I think there’s room to explore. For example, the voting system does not have to be normative or authoritative. Just because a particular translation has the highest votes does not mean that this translation will be automatically adopted or even recommended. The voting system can be just another way for other members of the linguistic community who might not participate in the discussion otherwise, to provide their opinions and be involved.

    I still envision a “back end” site for the actual administration and committing of translations, very much in the way that Trac caters to the development effort. This site could be where the community gathers, discusses, and makes final decisions about translations. The “crowdsourced” translations could be exposed on that site, for example in a “Newest Submissions” widget. Think of the possibilities around having such a system in place – discussions could still take place, the community could still be engaged, but the reach has just widened to casual users, who might otherwise just be “users” of WordPress rather than members of the community. Suddenly we have found a way to engage them.

    But the real impetus, for me, that drives my enthusiasm for this project, is opening all those 1000’s of plugins to additional locales. Only a handful are translated, because, let’s face it, there’s an extremely high barrier to entry for translation as it stands now. I haven’t even tried it, even though I deal with bilingual (English + French) issues daily. We, as a global community, need to make it much much easier for these plugins to be translated, and right now the linguistic communities rally mostly behind Core and a small nut of the top percentile of plugins that are so ubiquitous.

    Wouldn’t making translation accessible to a much wider audience, particularly the users of those plugins, help increase the global reach of plugins (and themes)?

  • Tom Auger

    Yes, this is a very good point. But could that be solved by technology? 

    1. if we capture the discussions around particular translations, categorize them, and then allow them to be exposed within the “Translate” dialog box when you are making a translation, would that be enough to draw a casual user into the debate?
    2. could a moderator “protect” certain content from casual translation, instead referring the user to the discussion as a link within the “Translate” dialog box?
    3. Could the “translate” dialog box be, instead, a “window” (think: iframe) to the actual discussion site, where these debates are happening?

  • Tom Auger

    Gabriel, I’m not sure that the link inside a link is really a problem, since the JavaScript would sit on top of that, and intercept mouse clicks before they went through to the link. However, I’m more concerned about some of the more complex strings, for example:

    Hi, ! Click [here] to go to page of your profile.

    There are a couple of ways a developer might implement this, and some are a lot easier for the translation than others.

  • Tom Auger

    I see what you mean. That would be up to the developer to write it like this:

    printf( __( “$1%slocalizable text$2%s”, “textdomain” ), ‘‘, ‘‘ );

    which is arguably NOT the correct way to do this – better might be printf( ‘‘ . “%s” . ‘‘, __( ‘localizable text’, ‘textdomain’ ) ); – because then the translator doesn’t have to see the $1%s crap. But there might be a workaround by rolling our own __printf() function that does the wrapping…

  • http://twitter.com/Eyesx Mattias Tengblad

    “Also I agree with Cátia and Zé, it should not work as an automated
    tool, but as a way to bring new translators to the communities.”

    +1

  • Kenan Dervišević

    Facebook uses this method for their translation system. It’s easy to use and you are able to see the place on the site where your translation is used. The only thing that I find lacking is the fact that everyone can translate everything and, as a result, the quality of the translation suffers. There needs to be a system of administration that will allow more experienced translators to grant permissions for translation per project for new users. Also, good thing that Facebook’s translation system has is a glossary of some basic words widely used across their pages and it automatically shows you the appropriate word for the phrase that you are currently translating.

  • Cátia Kitahara

    I don’t think technology solve things that depend upon people’s will, but I believe technology can help and induce people to do things. Your suggestions seem to be good and it made me think that maybe this interface should be integrated with GlotPress (translate.wordpress.org) instead of something apart from it, like your third suggestion, in that case, Glotpress should be the place where these debates are happening and it also reminds me of another improvement to GlotPress, that is comments to strings. This “protection” you pointed is actually something I’ve asked for as an improvement to GlotPress, I mean, there should be a way of locking some strings which are “definitely” translated or to which there’s a final consensus, like my example, and that shouldn’t be available for new suggestions of translations. Unless somebody with validation super powers decided it should be opened again. It’s also a way of diminishing translators work.

  • Cátia Kitahara

    I also asked for a glossary or at least the possibility to link to an existing glossary at GlotPress. I’m not talking about that Memory Translation System that automatically suggests translations, as Nacin had pointed out it’s something out of question now. Tom, I’m sorry, I don’t remember if I met you there at the summit, but have you participated in the GlotPress working groups?

  • http://www.tomauger.com Tom Auger

    I need to look into what Facebook and other sites are doing for this. Thanks for the heads up!

  • http://www.tomauger.com Tom Auger

    I agree, GlotPress is almost certainly the correct domain for this stuff to happen under. I’m not sure that GlotPress as a P2 site really works right now, but I only visited a few times. It might be good to cross-post this discussion there – I don’t have the ability to create a new thread (or do I?)

  • http://www.tomauger.com Tom Auger

    Sadly we didn’t meet at the summit, though I was there! Interestingly, i18n wasn’t even on my radar until Nacin started talking about the Language Packs and someone else mentioned the importance of getting all those Plugins translated…

  • http://www.tomauger.com Tom Auger

    Ooops. Disqus stripping away my HTML too! Those are anchor “a” tags within the empty quotes in both examples…

  • http://twitter.com/simonwheatley Simon Wheatley

    This is cool, but quite a task to implement I would guess. At the point where you can start selecting and submitting strings in “translation mode” in the WordPress admin area, it might be an easier first step to link out to the WordPress.org glotpress instance(s) (when they exist). I think this fits with the various comments supporting bringing new translators into a (mentored) community. Once we have this easier step, then allowing people to report or correct strings in their WordPress install would definitely be a good thing to explore.

  • Cátia Kitahara

    I’m not refering to the blog, but the tool itself at translate.wordpress.org :)

  • Gabriel Reguly

    Tom, what is the correct way for you might be just ignored by 50% of other developers.
    But that is not my point, anyone is free to develop as desired.

    Regarding votes, I have just seen this from Facebook (follows an image from screenshot, hopefully Disqus will behave this time)

  • Gabriel Reguly

    Not yet. Disqus did not behave properly again ;-(

    Follows link to image: https://dl.dropbox.com/u/355512/1352420665895.png

  • Gabriel Reguly

    Another image, different voting situation: https://dl.dropbox.com/u/355512/1352421475814.png

    Those are just for you to be aware, I don’t think voting is the best interest of any community. People should participate with more effort than just a brief vote – it serves to any democracy ;-)

  • http://www.facebook.com/carledug Carlos Eduardo Gonzales Barbos

    //

    I think ‘crowdsourcing’ is a nice idea, but a slightly dangerous one. We should not think on translation production being different from, say, WP core development process. A good translation is not the piece of work that has got the higher number of votes on a voting system. A good PHP code is not necessarily, as well, the best voted one. A Member of the Parliament is not a good person, or a competent one, just because he’s got such a great votation result.

    Crowdsourcing is indeed a good idea when we want the user have the right of choice among a bunch of possibilities. The translation team of Portugal came with that nice idea of a choice between the formal Portuguese set and the informal Portuguese set for WordPress core. How nice it would be if users could build their own translation sets from a cloud of diffuse contributions, as well as opt to pick an officially deployed one, set up by a responsible translators team, if that is their preference. Even the translation teams could take some benefit from that cloud of contributions coming directly from users rather than waiting from them to contribute via GlotPress.

    Let us keep on doing our careful translation jobs, however, and let it come clear to the user that there are at least these two possibilities – the translators teams deployments and the brainstorming cloud of crowdsourced contributions.

    //

  • http://www.tomauger.com Tom Auger

    I appreciate these different viewpoints!

    It seems like a lot of comments are coming from community members who themselves have a great stake in the translation process.

    But what about linguistic communities where there doesn’t exist the same high level of involvement? And how do we scale this?

    With the current process that is in place today, do you, as an active member of the community feel that you are moving fast enough to translate the 22,000+ plugins in the repo? At the rate you’re going, when will we achieve 100% translation in all languages? How about 5%?

  • Cátia Kitahara

    Tom, I read your post again, especially this part:

    Much (okay, all) of the details were clouded in much mystery, but it seems that there would be some kind of server / repo component (I envision something like Trac, only not like Trac) where designated community translators can log in, create translations for core, plugins, themes, whatever, and then push them out into language packs, which would then pop up in your Updates panel whenever yourwp-cron did an update check.

    And it ocurred to me that maybe you don’t know exactly what GlotPress is. Do you? :)

  • http://www.tomauger.com Tom Auger

    guilty as charged, but your comment inspired me to take a closer look, and it does seem like GlotPress is the mystery platform. I don’t believe that GlotPress currently pushes out language packs that can be installed separately from a full repository update though, at least not yet. Or am I wrong?

  • Cátia Kitahara

    Yes, exactly :). GlotPress is the mistery platform, however it only generates the po/mo files. The locale pack (WordPress + mo files) is generated by the admins of locale.wordpress.org through its WordPress admin interface. There’s a system there where we can build a pack pulling the po/mo files from the i18n svn repository or from translate.wordpress.org (GlotPress). What I understood that is going to happen is we’ll no longer need to build the locale packs this way. GlotPress will push the po/mo files directly to the i18n svn repo and WordPress will pull them, more or less how you described. The bottleneck now is that GlotPress needs improvements, it was a bit abandonned. During the days after the summit, a group of people had a discussion about it, I was among them with Nacin, Zé Fontainhas, Simon Wheatley, and others I don’t remember everyone, that’s why I’ve asked if we met there. We concluded that the priority was usability improvements and some other features envolving roles and capabilities. They closed many tickets there, I don’t know yet how it is now. I think Zé Fontainhas may give you a more accurate report.

  • Pingback: Can WordPress i18n be Crowdsourced? - WP Realm

  • http://www.facebook.com/morgadin Nuno Morgadinho

    OT: What software did you use to draw the draft UI?

  • http://www.tomauger.com tomauger

    Notability on the iPad. I use it for all sorts of things. Great app.

  • http://www.tatvasoft.co.uk/web-development.php AmeryMueller

    I also agree, GlotPress is almost certainly the correct domain for this stuff to happen under. I’m not sure that GlotPress as a P2 site really works right now, but I only visited a few times.

  • Andrew Bukowski

    Here is a cool plugin for wordpress localization: http://wordpress.org/plugins/poeditor/ It works with a localization tool that is based on the same crowd sourcing principles. I recommend it to everybody who has dillemas about crowd sourcing translations.