Monday, November 16, 2009

ISO Language Codes

Every so often, I need to obtain a full list of ISO language codes for use in an application. I thought I'd write a reference post for future reference, and whomever else might find this kind of thing useful.

Languages are given a two-character code in ISO-639-2, which the Library of Congress is nice enough to host for us here.

I wrote a simple bit of code to generate a 'Language' table for MySQL, and posted it here:
http://www.fiestacabin.com/files/iso-lang-codes.sql.txt

Feel free to use this snippet; the table I created looks like:

CREATE TABLE IF NOT EXISTS Language (
id unsigned int not null auto_increment,
code char(2) not null,
description varchar(64),
primary key (id),
unique index iLanguage_code (code)
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;


Just for fun, I am including my Java snippet which generated the SQL; critique at will!

package sandbox;

import java.io.File;
import java.util.List;

import org.apache.commons.io.FileUtils;
import org.apache.commons.lang.StringEscapeUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.text.StrTokenizer;

public class ISOLangCodes {
public static void main(String[] args) throws Exception {
List<String> lines = (List<String>) FileUtils.readLines(new File("c:/temp/iso-lang-codes.txt"), "UTF-8");

for( String line : lines ){
StrTokenizer s = StrTokenizer.getCSVInstance(line).setDelimiterChar('|');
String langCode = s.getTokenArray()[2];
String langDesc = s.getTokenArray()[3];

if( StringUtils.isNotBlank(langCode) )
System.out.println(
"INSERT INTO Languages (code, description) VALUES ('" + langCode +
"', '" + StringEscapeUtils.escapeSql(langDesc) + "');");
}
}
}

No comments: