User Tools

Site Tools


tagger
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


tagger [2019-06-21 13:32:06] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== Tagger and lemmatizer HOWTO ======
  
 +===== Installation =====
 +
 +<code>
 +> git clone https://github.com/ufal/morphodita
 +> cd src/
 +> vim Makefile.builtem
 +-  C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE
 ++  C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE
 +> make
 +</code>
 +
 +===== Models =====
 +
 +Download, unzip:
 +
 +Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1
 +
 +English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0
 +
 +(download link is at the bottom of the page)
 +
 +(beware, the models may have a non-free license)
 +===== Run tagger =====
 +
 +<code>echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \
 +| ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy</code>
 +
 +===== Run lemmatizer =====
 +
 +<code>echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \
 +| ./run_tagger --input=untokenized --output=vertical \
 +czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \
 +| cut -f 2 | tr "\n" " "
 +</code>
 +
 +===== Problems =====
 +
 +Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.
tagger.txt · Last modified: by 127.0.0.1

Except where otherwise noted, content on this wiki is licensed under the following license: Public Domain
Public Domain Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki