The Kusunoki Treebank – a parsed corpus of contemporary Japanese

Front Page

The Kusunoki Treebank is a corpus of contemporary Japanese with hand worked tree analysis for approaching half-a-million words. Highlights include:

Further results — notably, dependency graphs — derived from the analysis can be seen with the search interface.

A companion resource

The Kusunoki Treebank is a companion resource of The Kainoki Treebank. The Kusunoki Treebank is created by exchanging the (Japanese script) morphological base of The Kainoki Treebank for a different (romanised) morphological base while retaining the syntactic level annotation. Currently just under a third of the data of The Kainoki Treebank is used.

Search Interface

The Kusunoki Treebank is associated with a powerful user interface that enables search using virtually any aspect of the annotation. Results of specific searches can be downloaded in the form of annotated data. The source data to which the search interface links is being updated constantly to reflect improvements in analysis.

Attribution

Presentations of research results using the The Kusunoki Treebank should include a citation taking the general form of the example below (with appropriate modifications depending on the date of access):

Kainoki, Ed (2022) “The Kusunoki Treebank – a parsed corpus of contemporary Japanese” https://jptrees.github.io (accessed 9 January 2022).

Terms of use

This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License