====== Tagger and lemmatizer HOWTO ====== ===== Installation ===== > git clone https://github.com/ufal/morphodita > cd src/ > vim Makefile.builtem - C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE + C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE > make ===== Models ===== Download, unzip: Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1 English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0 (download link is at the bottom of the page) (beware, the models may have a non-free license) ===== Run tagger ===== echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \ | ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy ===== Run lemmatizer ===== echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \ | ./run_tagger --input=untokenized --output=vertical \ czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \ | cut -f 2 | tr "\n" " " ===== Problems ===== Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.