Purpose: POSIX cultural localization API
Python Version: 1.5, with extensions through 2.5 (this discussion assumes 2.5)
Description:
The locale module is part of Python's internationalization and localization support library. It provides a standard way to handle operations that may depend on the language or location of your users. For example, formatting numbers as currency, comparing strings for sorting, and working with dates. It does not cover translation (see the gettext module) or Unicode encoding.
Changing the locale can have application-wide ramifications, so the recommended practice is to avoid changing the value in a library and to let the application set it one time. In the examples below, I will change the locale several times for illustration purposes. It is far more likely that your application will set the locale once at startup and not change it.
Example:
The most common way to let the user change the locale settings for an application is through an environment variable (LC_ALL, LC_CTYPE, LANG, or LANGUAGE, depending on your platform). The application then calls locale.setlocale() without a hard-coded value, and the environment value is used.
import locale
import os
import pprint
print 'Environment settings:'
for env_name in [ 'LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE' ]:
print '\t%s = %s' % (env_name, os.environ.get(env_name, ''))
# What is the default locale?
print 'Default locale:', locale.getdefaultlocale()
# Default settings based on the user's environment.
locale.setlocale(locale.LC_ALL, '')
# If we do not have a locale, assume US English.
print 'From environment:', locale.getlocale()
pprint.pprint(locale.localeconv())
On my Mac, this produces output like:
$ python locale_env_example.py
Environment settings:
LC_ALL =
LC_CTYPE =
LANG =
LANGUAGE =
Default locale: (None, 'mac-roman')
From environment: (None, None)
{'currency_symbol': '',
'decimal_point': '.',
'frac_digits': 127,
'grouping': [127],
'int_curr_symbol': '',
'int_frac_digits': 127,
'mon_decimal_point': '',
'mon_grouping': [127],
'mon_thousands_sep': '',
'n_cs_precedes': 127,
'n_sep_by_space': 127,
'n_sign_posn': 127,
'negative_sign': '',
'p_cs_precedes': 127,
'p_sep_by_space': 127,
'p_sign_posn': 127,
'positive_sign': '',
'thousands_sep': ''}
Now if we run the same script with the LANG variable set, you can see that the locale and default encoding change accordingly:
France:
$ LANG=fr_FR python locale_env_example.py
Environment settings:
LC_ALL =
LC_CTYPE =
LANG = fr_FR
LANGUAGE =
Default locale: (None, 'mac-roman')
From environment: ('fr_FR', 'ISO8859-1')
{'currency_symbol': 'Eu',
'decimal_point': ',',
'frac_digits': 2,
'grouping': [127],
'int_curr_symbol': 'EUR ',
'int_frac_digits': 2,
'mon_decimal_point': ',',
'mon_grouping': [3, 3, 0],
'mon_thousands_sep': ' ',
'n_cs_precedes': 0,
'n_sep_by_space': 1,
'n_sign_posn': 2,
'negative_sign': '-',
'p_cs_precedes': 0,
'p_sep_by_space': 1,
'p_sign_posn': 1,
'positive_sign': '',
'thousands_sep': ''}
Spain:
$ LANG=es_ES python locale_env_example.py
Environment settings:
LC_ALL =
LC_CTYPE =
LANG = es_ES
LANGUAGE =
Default locale: (None, 'mac-roman')
From environment: ('es_ES', 'ISO8859-1')
{'currency_symbol': 'Eu',
'decimal_point': ',',
'frac_digits': 2,
'grouping': [127],
'int_curr_symbol': 'EUR ',
'int_frac_digits': 2,
'mon_decimal_point': ',',
'mon_grouping': [3, 3, 0],
'mon_thousands_sep': '.',
'n_cs_precedes': 1,
'n_sep_by_space': 1,
'n_sign_posn': 1,
'negative_sign': '-',
'p_cs_precedes': 1,
'p_sep_by_space': 1,
'p_sign_posn': 1,
'positive_sign': '',
'thousands_sep': ''}
Portual:
$ LANG=pt_PT python locale_env_example.py
Environment settings:
LC_ALL =
LC_CTYPE =
LANG = pt_PT
LANGUAGE =
Default locale: (None, 'mac-roman')
From environment: ('pt_PT', 'ISO8859-1')
{'currency_symbol': 'Eu',
'decimal_point': ',',
'frac_digits': 2,
'grouping': [127],
'int_curr_symbol': 'EUR ',
'int_frac_digits': 2,
'mon_decimal_point': '.',
'mon_grouping': [3, 3, 0],
'mon_thousands_sep': '.',
'n_cs_precedes': 0,
'n_sep_by_space': 1,
'n_sign_posn': 1,
'negative_sign': '-',
'p_cs_precedes': 0,
'p_sep_by_space': 1,
'p_sign_posn': 1,
'positive_sign': '',
'thousands_sep': ' '}
Poland:
$ LANG=pl_PL python locale_env_example.py
Environment settings:
LC_ALL =
LC_CTYPE =
LANG = pl_PL
LANGUAGE =
Default locale: (None, 'mac-roman')
From environment: ('pl_PL', 'ISO8859-2')
{'currency_symbol': 'z?\x82',
'decimal_point': ',',
'frac_digits': 2,
'grouping': [3, 3, 0],
'int_curr_symbol': 'PLN ',
'int_frac_digits': 2,
'mon_decimal_point': ',',
'mon_grouping': [3, 3, 0],
'mon_thousands_sep': ' ',
'n_cs_precedes': 1,
'n_sep_by_space': 2,
'n_sign_posn': 4,
'negative_sign': '-',
'p_cs_precedes': 1,
'p_sep_by_space': 2,
'p_sign_posn': 4,
'positive_sign': '',
'thousands_sep': ' '}
So you can see that the currency symbol setting changes, the character to separate whole numbers from decimal fractions, etc. Now let's use the different locales to print the same information formatted for each of these different locales (US dollars, Euros, and Polish złoty):
sample_locales = [ ('USA', 'en_US'),
('France', 'fr_FR'),
('Spain', 'es_ES'),
('Portugal', 'pt_PT'),
('Poland', 'pl_PL'),
]
for name, loc in sample_locales:
locale.setlocale(locale.LC_ALL, loc)
print '%20s: %s' % (name, locale.currency(1234.56))
The output is this small table:
$ python locale_currency_example.py
USA: $1234.56
France: 1234,56 Eu
Spain: Eu 1234,56
Portugal: 1234.56 Eu
Poland: zł 1234,56
Besides generating output in different formats, the locale module helps with parsing input. Different cultures use different conventions for formatting numbers (as illustrated above). The locale module provides atoi() and atof() functions for converting the strings to integer and floating point values respectively.
sample_data = [ ('USA', 'en_US', '1234.56'),
('France', 'fr_FR', '1234,56'),
('Spain', 'es_ES', '1234,56'),
('Portugal', 'pt_PT', '1234.56'),
('Poland', 'pl_PL', '1234,56'),
]
for name, loc, a in sample_data:
locale.setlocale(locale.LC_ALL, loc)
f = locale.atof(a)
locale.setlocale(locale.LC_ALL, 'en_US')
print '%20s: %7s => %f' % (name, a, f)
$ python locale_atof_example.py
USA: 1234.56 => 1234.560000
France: 1234,56 => 1234.560000
Spain: 1234,56 => 1234.560000
Portugal: 1234.56 => 1234.560000
Poland: 1234,56 => 1234.560000
Another important aspect of localization is date and time formatting:
import locale
import time
sample_locales = [ ('USA', 'en_US'),
('France', 'fr_FR'),
('Spain', 'es_ES'),
('Portugal', 'pt_PT'),
('Poland', 'pl_PL'),
]
for name, loc in sample_locales:
locale.setlocale(locale.LC_ALL, loc)
print '%20s: %s' % (name, time.strftime(locale.nl_langinfo(locale.D_T_FMT)))
$ python locale_date_example.py
USA: Sun May 20 10:19:54 2007
France: Dim 20 mai 10:19:54 2007
Spain: dom 20 may 10:19:54 2007
Portugal: Dom 20 Mai 10:19:54 2007
Poland: ndz 20 maj 10:19:54 2007
This week I have only covered some of the high-level functions in the localize module. There are others which are lower level (format_string) or which relate to managing the locale for your application (resetlocale). As usual, you will want to refer to the Python library documentation for more details.
I am still learning about internationalization and localization myself, so if you have feedback on this summary (or if you spot a mistake), please post a comment on the blog to let me know.
References:
Example code
Locale - Wikipedia
Internationalization and localization - Wikipedia
OpenI18N.org - The Free standards Group Open Internationalisation Initiative
MSDN - National Language Support Constants
Internationalizing Python - Martin von Löwis (from 1997)
Python Module of the Week
Updated 9/5/2007 with minor formatting changes.
Technorati Tags:
python, PyMOTW
5 comments:
First link is broken.
Module: locale
Purpose: POSIX cultural localization API
---
Nice to heard about Poland (I am from Poland)
It's one mistake 'zl' in currency should be AFTER money. e.g.
1234567 zl
---
Very good post :)
Thanks for the heads-up Soltys, I've updated the post to fix the link. I'm still working out how to use MarsEdit most efficiently, so I'm going to blame the mistake on the editor for now. :-)
As far as the Polish currency goes, I copied the example directly from the output of the locale module. I'm running under Mac OS X, so I would be curious to know if you see different output if you run the same code on another OS. I can try Linux later today. I don't have access to Windows, though, so if you do I would appreciate it if you could add a comment with the output.
Thanks!
Doug
AFAIK "Eu" isn't correct. That should be '€' or 'euro' depending on whether your charset supports the euro-symbol or not (and IMNSHO every modern operating system should use unicode by now).
Nice job with this series. I am always surprised by the functionality that I continue to discover in the stdlib. The PyMOTW is accelerating that discovery process!
I've tried the currency example code under Fedora Core 3 with Python 2.5, and see the same results. Can anyone post output from Windows?
Post a Comment