Table of Contents

Tagger and lemmatizer HOWTO

Installation

> git clone https://github.com/ufal/morphodita
> cd src/
> vim Makefile.builtem
-  C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE
+  C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE
> make

Models

Download, unzip:

Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1

English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0

(download link is at the bottom of the page)

(beware, the models may have a non-free license)

Run tagger

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \
| ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy

Run lemmatizer

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \
| ./run_tagger --input=untokenized --output=vertical \
czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \
| cut -f 2 | tr "\n" " "

Problems

Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.