By: Rachel Kaufman, TechNewsDaily Contributor
Published: 02/12/2013 09:13 AM EST on TechNewsDaily
A computer algorithm works almost as well as a trained linguist in reconstructing how dead "protolanguages" would have sounded, says a new study.
"Our [computer] system is doing a basic job right now," says Alex Bouchard-Côté, an assistant professor in the department of statistics at the University of British Columbia and lead author of the paper describing the algorithm. But the program does a good enough job that it may be able to give linguists a head start, the statistician added.
For centuries, scholars have reconstructed languages by hand: looking at the same word in two or more languages and making educated guesses about what that word's "ancestor" may have sounded like. For example, the Spanish word for man ("hombre") and the French word for man ("homme") descended from the Latin word "homo." The way linguists compare words from descendant languages to reconstruct the parent language is called, appropriately, the comparative method.
The early 19th-century linguist Franz Bopp was the first to compare Greek, Latin and Sanskrit using this method. Jacob Grimm, one of the Brothers Grimm of fairy tale fame, used the comparative method to show how Germanic languages developed from a common ancestor.
The difference between that and Bouchard-Côté's program, the statistician says, "is we do it on a larger scale."
As a proof of concept, Bouchard-Côté fed words from 637 Austronesian languages (spoken in Indonesia, Madagascar, the Philippines, Papua New Guinea, Malaysia and more) into the new algorithm, and the system came up with a list of what the ancestor words of all those languages would have sounded like. In more than 85 percent of cases, the automated reconstruction came within one character of the ancestor word commonly accepted as true by linguists.
The algorithm won't replace trained, human linguists, but could speed up language analysis.
Using a computer to do large-scale reconstruction offers another advantage, Bouchard-Côté says: with big data sets, "you can really start finding regularities … You might find that certain sounds are more likely to change than others."
So Bouchard-Côté's team tested the "functional load hypothesis," which says that sounds that are more important for distinguishing two words are less likely to change over time. A formal test of this hypothesis in 1967 looked at four languages; Bouchard-Côté's algorithm looked at 637.
"The revealed pattern would not be apparent had we not been able to reconstruct large numbers of protolanguages," Bouchard-Côté and his coauthors write in the new study.
In addition to simply helping linguists understand how people spoke in the past, studying ancient languages can perhaps answer historical questions.
For example, Bouchard-Côté says, "Say people are interested in finding out when Europe was settled. If you can figure out if the language of the settling population had a word for wheel, then you can get some idea of the order in which things occurred, because you would have some records that show you when the wheel was invented.”