19 packages found. Page 1 of 1.

Name Version Votes Popularity? Description Maintainer
frog 0.18.3-1 1 0.00 Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It includes a tokenizer, part-of-speech tagger, lemmatizer, morphological analyser, named entity recognition, shallow parser and dependency parser. proycon
frog-git 1-4 1 0.00 Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It includes a tokenizer, part-of-speech tagger, lemmatizer, morphological analyser, named entity recognition, shallow parser and dependency parser. proycon
gposttl-git r34.4d19dda-1 0 0.00 Brill's Parts-of-Speech Tagger, with built-in Tokenizer and Lemmatizer m3thodic
perl-string-tokenizer 0.05-1 0 0.00 A simple string tokenizer. jnbek
python-html5lib-git 0.999999.r4.g935783d-1 0 0.00 A Python HTML parser/tokenizer based on the WHATWG HTML5 spec robertfoster
python-sacremoses 0.0.19-1 0 0.00 Python port of Moses tokenizer, truecaser and normalizer STommydx
python-ucto-git 10-1 1 0.00 Python binding for Ucto, an advanced tokenizer (for NLP) proycon
python2-html5lib-git 0.999999.r4.g935783d-1 0 0.00 A Python2 HTML parser/tokenizer based on the WHATWG HTML5 spec robertfoster
python2-ucto-git 10-1 1 0.00 Python binding for Ucto, an advanced tokenizer (for NLP) proycon
ruby-buftok 0.2.0-3 0 0.00 BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs. supermario
sentencepiece-git r492.ffa2c82-1 0 0.00 Unsupervised text tokenizer for Neural Network-based text generation panosk
tokenizer-git r75.0602585-1 0 0.00 Convert source code into numerical tokens aksr
ucto 0.18-1 1 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. proycon
ucto-git 1-3 2 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. proycon
uctodata 0.8-1 0 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. This package contains the necessary data. proycon
uctodata-git 1-1 0 0.00 An advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages. Tokenization is an essential first step in any NLP pipeline. These are the data files. proycon
perl-perl-tokenizer 0.10-2 1 0.00 Perl::Tokenizer - a tiny Perl code tokenizer. trizen
python-mail-parser 3.9.3-1 1 0.00 Tokenizer for raw mails flacks
mailparser 3.9.3-3 1 0.97 Tokenizer for raw mails flacks

19 packages found. Page 1 of 1.