Using computational criteria to extract large Swadesh lists for lexicostatistics

DSpace Repositorium (Manakin basiert)


Aufrufstatistik
Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/68640
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-686406
http://dx.doi.org/10.15496/publikation-10058
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-686408
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-686404
Dokumentart: Konferenzpaper
Erscheinungsdatum: 2016-03-02
Sprache: Englisch
Fakultät: 5 Philosophische Fakultät
5 Philosophische Fakultät
Fachbereich: Allgemeine u. vergleichende Sprachwissenschaft
DDC-Klassifikation: 400 - Sprache, Linguistik
Schlagworte: Sprachstatistik , Phylogenetik
Freie Schlagwörter:
Lexicostatistics
Swadesh lists
phylogenetic linguistics
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en
Gedruckte Kopie bestellen: Print-on-Demand
Zur Langanzeige

Abstract:

We propose a new method for empirically determining lists of basic concepts for the purpose of compiling extensive lexicostatistical databases. The idea is to approximate a notion of “swadeshness” formally and reproducibly without expert knowledge or bias, and being able to rank any number of concepts given enough data. Unlike previous approaches, our procedure indirectly measures both stability of concepts against lexical replacement, and their proneness to phenomena such as onomatopoesia and extensive borrowing. The method provides a fully automated way to generate customized Swadesh lists of any desired length, possibly adapted to a given geographical region. We apply the method to a large lexical database of Northern Eurasia, deriving a swadeshness ranking for more than 5,000 concepts expressed by German lemmas. We evaluate this ranking against existing shorter lists of basic concepts to validate the method, and give an English version of the 300 top concepts according to this ranking.

Das Dokument erscheint in: