Issues to consider when localizing your website

In this article, I will give an overview of some things to consider when translating or localizing your website. Drawing from my experience as a translator and IT specialist, I will try to highlight not only various linguistic considerations, but also some subtle technical and practical issues to keep in mind.

Why is a website different from a “normal” translation project?

In the simplest case, the translation of a website may not be significantly different from the translation of normal documents. You may be able to provide a static copy to the translator in a Word file and then extract and upload the text when you receive it in the same format.

However, many websites do not consist of a few pages of static text, which means that a website translation project may require special consideration and additional skills on the part of the translator:

  • you can have pages built “on the fly” from a database instead of existing in static files;
  • You might have a server application, for example, to process form input, which in turn outputs text visible to the user;
  • From a linguistic point of view, it is rare that the content of a website refers to only one field: it is almost certain that some IT terminology will slip in somewhere.

For the first two of these reasons, it is not uncommon for your website to involve text in different formats saved in different files. You may have some raw text or HTML files that you can easily extract to a text file or Word document from your content management system, plus some data in a database that you may need to extract to a CSV or a SQL dump, plus some properties. files used by your back-end server. In the initial stages of obtaining a quote for the project, tell the translator which file format is most convenient for you to work (and send a sample) and ask if they can work with that format. (In my case, for example, I’ve seen clients spend time trying to convert CSV files to Word documents, altering the text in the process, when I would have liked to work with the original CSV files.)

language problems

Although most websites will include some IT terminology at some point, this probably shouldn’t be the main linguistic issue involved in website localization. My reason for saying this is that, given the technical issues that we will cover below, I strongly recommend hiring a translator who is IT-savvy in the first place to translate the website.

An initial linguistic decision, but one that the translator will probably be able to make for you, refers to form of address: As you may know, various languages ​​use different verb forms to address the reader/listener either “informally” or “formally” (for example, the you against you distinction in French), with some languages ​​even having a three-way distinction. The appropriate form of address will depend on your target audience and the customs of the countries you are addressing; Therefore, the translator may need to consult with you about who your main target audience is and what impression you want to give (do you want your text to sound “serious” or more “hip and hip”?).

Other linguistic problems arise when translating short elements of a database or properties file, where context is sometimes missing. Do you mean a “check” as in a “check” or as in a “verification”? Do you mean “top” as in “highest price” or as in “go to the top of the page”? And in the case of strings that can have parameters (indicated by the sequence {0}, {1}, etc. in properties files in Java and various other languages), what are the various values ​​that these parameters can have? (since they can affect the translation)?

Sometimes solving these problems will require you to answer direct questions from the translator about the interpretation of your text. But as a simple measure that can save some time and questions, I recommend using multiple property files. Let each main area of ​​your site/application have its own properties file. And in particular, allow sections of your server/site that address different people to have their own properties file. Crucially, if you can avoid it, don’t mix the same file strings that target the website visitor and the strings that are part of your back-end management system.

Practical and technical problems.

When you receive your translated material from the translator (or, indeed, ideally sooner), there are one or two practical issues to keep in mind. You may have already noticed the differences in word count that can happen between languages ​​(typically, text in Latin-derived languages ​​like French and Spanish is about 20-30% longer than its English counterpart). This could have an effect not only on the layout of the page, but also on the size of the database fields. More subtly, the character count in another language might be similar, but the word count could vary drastically if that language uses composition more extensively than English (for example, you may find that a Finnish translated text has a similar number of characters as English, but half the number of words). A narrow column layout that works on your English page can suddenly look disastrous when applied to the German or Finnish translation.

If your site is interactive, then you have the additional problem of accepting the input that users will expect to be able to provide in their web forms, etc. This will include, for example, the ability to enter accented characters or a greater variety of characters, plus some more subtle changes to your site validation. In English, there may be spaces that are not allowed in the Last Name field. But speakers of various other languages ​​often have multiple last names and would expect to be able to fit into a space in this field.

Two other, sometimes related, problems are character encoding and sandwich. The first essentially refers to the way the computer stores/represents characters (how characters are translated into bytes). The second concerns how characters and strings are matched and ranked: for example, if a my with an acute accent is considered equal to one without an accent for search purposes, and in what order they appear when sorting. These issues usually don’t arise when dealing solely with English, but should usually be considered when dealing with text in another language.

Character encoding differs from system to system, with some common standards including ISO-8859-1, UTF-8, and other encodings like Mac OS Roman. Depending on your website/app, you may need to make sure you have the correct character encoding set up in multiple layers:

  • when reading in the translated file;
  • when reading/writing to your database via JDBC or other application layer framework;
  • when reading user input via Servlet API, etc.;
  • in the database field definitions themselves, to ensure that they can store the required range of characters.

How do you know if you have the correct character encoding? A tell-tale sign of incorrect character encoding in various Latin-based languages, such as French and Spanish, is if you frequently see sequences of two accented characters next to each other, possibly including a capital letter in between. words. (This happens when a file encoded in UTF-8 is incorrectly interpreted as being in ISO-8859-1 or Mac OS encoding.)

The collation (sorting/matching) problem can be dealt with at the database layer (most database systems allow you to configure collation modes for a particular column/table/database). Or it can be dealt with at the application layer (in Java, look at the Collator class as an alternative or extension to the raw Collections.sort() and String.equals() methods).

Conclusion

I hope in this article I have highlighted some of the main areas of concern when localizing a website and shown that such problems can go well beyond the translation itself. Working with a translator who is aware of these issues could save you time and effort making your business available in the different countries you want to target.

Add a Comment

Your email address will not be published. Required fields are marked *