A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Tibetan part-of-speech tagset is available in Tibetan corpora annotated with a Rule-based Part-of-speech Tagger for Classical Tibetan developed by a research project ‘Tibetan in Digital Communication’ hosted at SOAS, University of London.
An Example of a tag in the CQL concordance search box: [tag="n.prop"] finds all proper nouns, e.g. བོད་, རབ་འབྱོར་(note: please make sure that you use straight double quotation marks)
Basic part-of-speech tagset
| POS categories | POS tag |
| Adjectives | adj |
| Adverbs | adv..* |
| Case markers | case..* |
| Clitics | cl..* |
| Converbs | cv..* |
| Demonstratives, determiners, etc. | d..* |
| Nouns | n..* |
| Negation | neg |
| Numbers | num..* |
| Pronouns | p..* |
| Verbs (and verbal nouns) | v..* (n.v..*) |
Detailed POS tagset
| POS tag | Description |
| adj | adjective |
| adv.dir | directional adverb |
| adv.intense | intensive adverb |
| adv.mim | mimetic adverb |
| adv.proclausal | proclausal adverb |
| adv.temp | temporal adverb |
| case.abl | ablative (affix -las after a noun phrase) |
| case.agn | agentive (affixes -kyis, -gyis, -gis, -yis, -s) |
| case.all | allative (affix -la after a noun phrase) |
| case.ass | associative (affix -daṅ after a noun phrase) |
| case.comp | comparative (affixes -bas and -pas after a noun phrase) |
| case.ela | ellative (affix -las after a noun phrase) |
| case.gen | genitive (affixes -kyi, -gyi, -gi, -yi, -ḥi) |
| case.loc | locative (affix -na after a noun phrase) |
| case.nare | quotative (affixes -na, -re) |
| case.term | terminative (affixes -du, -tu, -su, -ru, -r) |
| cl.lta | clitic lta in the combinations lta ste and na lta |
| cl.tsam | the clitics -tsam |
| cl.focus | the focus clitics ni |
| cl.quot | the quotative clitics ces |
| cv.abl | affix -las after a verb stem |
| cv.agn | affixes -gis |
| cv.all | affix -la after a verb stem |
| cv.are | affix -ta-re and its allomorphs after a verb stem |
| cv.ass | affix -da? after a verb stem |
| cv.ela | affix -las after a verb stem |
| cv.fin | affixes -to |
| cv.gen | affixes -gi |
| cv.imp | affixes -cig |
| cv.impf | affixes -ci? |
| cv.loc | affix -na after a verb stem |
| cv.ques | affixes -tam and its allomorphs. |
| cv.sem | affixes -te |
| cv.term | affixes -tu |
| d.dem | demonstratives |
| d.det | determiners |
| d.emph | emphatics |
| d.indef | indefinites |
| d.plural | plurals |
| d.tsam | tsam |
| dunno | a word that we have not been able to analyze |
| interj | interjection |
| n..* | noun |
| n.count | lexical nouns |
| n.mass | mass nouns |
| n.prop | proper nouns |
| n.rel | relator nouns |
| n.v.aux | auxiliary verbal noun |
| n.v.cop | copula verbal noun |
| n.v.fut | future verbal noun |
| n.v.fut.n.v.past | future/past verbal noun |
| n.v.fut.n.v.pres | future/present verbal noun |
| n.v.imp | imperative verbal noun |
| n.v.invar | invariable verbal noun |
| n.v.neg | negative verbal noun |
| n.v.past | past verbal noun |
| n.v.past.n.v.pres | past/present verbal noun |
| n.v.pres | present verbal noun |
| neg | two negation prefixes ma and mi |
| num.* | numeral |
| num.card | cardinal number |
| num.ord | ordinal number |
| numeral | numeral |
| p.indef | indefinite pronouns |
| p.interrog | interrogative pronouns |
| p.pers | personal pronouns |
| p.refl | personal reflexive |
| punc | punctuation mark |
| sent | end of sentence punctuation |
| skt | |
| v.aux | auxiliary verbs |
| v.cop | copula verbs |
| v.cop.neg | negative copula verb |
| v.fut | future verb stem |
| v.fut.v.past | future/past verb stem |
| v.fut.v.pres | future/present verb stem |
| v.imp | imperative verb stem |
| v.invar | invariable verb stem |
| v.neg | the inherently negative verb med |
| v.past | past verb stem |
| v.past.v.pres | past/present verb stem |
| v.pres | present verb stem |
Note: word forms with and without tsheg (e.g. ཐོག་ and ཐོག) are separate lexical entries, but they are both normalized to the same form in attribute “notsheg”.
Source
http://larkpie.net/tibetancorpus/ https://soas-repository.worktribe.com/output/420898
Reference
Garrett, Edward and Hill, Nathan W. and Zadoks, Abel (2014) ‘A Rule-based Part-of-speech Tagger for Classical Tibetan.’ Himalayan Linguistics, 13 (1). pp. 9-57. (CC BY-NC-ND 4.0)




