Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project. When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to the CMCL project.

Natural Language Processing API

New: You can now get unified access to the latest NLP tools online using a web interface or a machine actionable REST API:

Coptic NLP Service

The NLP service currently covers segmentation, normalization, part of speech tagging, lemmatization and language of origin tagging. For individual command line tools, see below.

Part-of-Speech Tagging

Coptic Universal Dependency Treebank

A treebank is a collection of texts in which sentences have been exhaustively annotated with syntactic analyses. Our Coptic Treebank project uses the Universal Dependencies standards, which apply the same annotation scheme to multiple languages.

Additional Annotation Tools