Languages are given a two-character code in ISO-639-2, which the Library of Congress is nice enough to host for us here.
I wrote a simple bit of code to generate a 'Language' table for MySQL, and posted it here:
http://www.fiestacabin.com/files/iso-lang-codes.sql.txt
Feel free to use this snippet; the table I created looks like:
CREATE TABLE IF NOT EXISTS Language (
id unsigned int not null auto_increment,
code char(2) not null,
description varchar(64),
primary key (id),
unique index iLanguage_code (code)
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Just for fun, I am including my Java snippet which generated the SQL; critique at will!
package sandbox;
import java.io.File;
import java.util.List;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang.StringEscapeUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.text.StrTokenizer;
public class ISOLangCodes {
public static void main(String[] args) throws Exception {
List<String> lines = (List<String>) FileUtils.readLines(new File("c:/temp/iso-lang-codes.txt"), "UTF-8");
for( String line : lines ){
StrTokenizer s = StrTokenizer.getCSVInstance(line).setDelimiterChar('|');
String langCode = s.getTokenArray()[2];
String langDesc = s.getTokenArray()[3];
if( StringUtils.isNotBlank(langCode) )
System.out.println(
"INSERT INTO Languages (code, description) VALUES ('" + langCode +
"', '" + StringEscapeUtils.escapeSql(langDesc) + "');");
}
}
}