A parser for Tatoeba example sentence files
Go to file
Marvin Elsen d705c0e11f
All checks were successful
Publish package / publish (push) Successful in 2m8s
Initial commit
2024-10-14 18:48:55 +02:00
.gitea/workflows Initial commit 2024-10-14 18:48:55 +02:00
gradle Initial commit 2024-10-14 18:48:55 +02:00
src Initial commit 2024-10-14 18:48:55 +02:00
.gitignore Initial commit 2024-10-14 18:48:55 +02:00
build.gradle.kts Initial commit 2024-10-14 18:48:55 +02:00
gradle.properties Initial commit 2024-10-14 18:48:55 +02:00
gradlew Initial commit 2024-10-14 18:48:55 +02:00
gradlew.bat Initial commit 2024-10-14 18:48:55 +02:00
LICENSE Initial commit 2024-10-14 18:48:55 +02:00
README.md Initial commit 2024-10-14 18:48:55 +02:00
settings.gradle.kts Initial commit 2024-10-14 18:48:55 +02:00

Tatoeba Parser for Kotlin

A parser for Tatoeba example sentence files written in Kotlin.

Build

To build the project locally, simply run the following command from the terminal:

./gradlew build

Installation

Tatoeba Parser for Kotlin is available from my self-hosted Gitea instance.

First, add the repository to your build.gradle.kts file:

repositories {
    maven {
        url = uri("https://gitea.marvinelsen.com/api/packages/marvinelsen/maven")
    }
}

Afterwards, add the package dependency to your build.gradle.kts file:

dependencies {
    implementation("com.marvinelsen:tatoeba-parser:1.0.0")
}

Usage

fun main() {
    val tatoebaInputStream =
        GZIPInputStream(object {}.javaClass.getResourceAsStream("/cmn_sentences.tsv.gz")!!)

    tatoebaInputStream.use {
        val tatoebaParser = TatoebaParser.instance
        val tatoebaSentences = tatoebaParser.parse(tatoebaInputStream)

        tatoebaSentences.forEach { sentence ->
            println(sentence.simplified)
        }
    }
}

License

All source code in this repository is licensed under a MIT license, unless otherwise noted.

To the following third-party code, data, and files in the repository different licenses apply:

Tatoeba Example Sentences

Tatoeba example sentences are licensed under a CC BY 2.0 FR.