DAIS - Digital Archive of the Serbian Academy of Sciences and Arts
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   DAIS
  • Институт за српски језик САНУ / Institute for the Serbian Language of SASA
  • ИСЈ САНУ - Општа колекција / General collection
  • View Item
  •   DAIS
  • Институт за српски језик САНУ / Institute for the Serbian Language of SASA
  • ИСЈ САНУ - Општа колекција / General collection
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian

Thumbnail
2019
stankovic.et.al.sasa.2019.pdf (981.1Kb)
Authors
Stanković, Ranka
Šandrih, Branislava
Stijović, Rada
Krstev, Cvetana
Vitas, Duško
Marković, Aleksandra
Article (Published version)
Metadata
Show full item record
Abstract
In this paper we present a model for selection of good dictionary examples for Serbian and the development of initial model components. The method used is based on a thorough analysis of various lexical and syntactic features in a corpus compiled of examples from the five digitized volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary. The initial set of features was inspired by a similar approach for other languages. The feature distribution of examples from this corpus is compared with the feature distribution of sentence samples extracted from corpora comprising various texts. The analysis showed that there is a group of features which are strong indicators that a sentence should not be used as an example. The remaining features, including detection of non-standard and other marked lexis from the SASA dictionary, are used for ranking. The selected candidate examples, represented as featurevectors, are used with the GDEX ranking tool for Serbian candidate ex...amples and a supervised machine learning model for classification on standard and non-standard Serbian sentences, for further integration into a solution for present and future dictionary production projects.

Keywords:
Serbian / good dictionary examples / automatization of dictionary-making / feature extraction / machine learning
Source:
Electronic lexicography in the 21st century : Smart lexicography, 2019, 248-269
Publisher:
  • Brno : Lexical Computing CZ s.r.o.
Funding / projects:
  • Literature and visual arts: russian-serbian dilague (RS-178003)
  • Infrastructure for Technology Enhanced Learning in Serbia (RS-47003)
  • Lingiustic research of contemporary serbian literary language and the development of the SASA Dictionary of serbocroatian literary and national language (RS-178009)

ISSN: 2533-5626

[ Google Scholar ]
Handle
https://hdl.handle.net/21.15107/rcub_dais_7162
URI
https://dais.sanu.ac.rs/123456789/7162
Collections
  • ИСЈ САНУ - Општа колекција / General collection
Institution/Community
Институт за српски језик САНУ / Institute for the Serbian Language of SASA
TY  - JOUR
AU  - Stanković, Ranka
AU  - Šandrih, Branislava
AU  - Stijović, Rada
AU  - Krstev, Cvetana
AU  - Vitas, Duško
AU  - Marković, Aleksandra
PY  - 2019
UR  - https://dais.sanu.ac.rs/123456789/7162
AB  - In this paper we present a model for selection of good dictionary examples for Serbian and the
development of initial model components. The method used is based on a thorough analysis of
various lexical and syntactic features in a corpus compiled of examples from the five digitized
volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary. The initial set of
features was inspired by a similar approach for other languages. The feature distribution of
examples from this corpus is compared with the feature distribution of sentence samples
extracted from corpora comprising various texts. The analysis showed that there is a group of
features which are strong indicators that a sentence should not be used as an example. The
remaining features, including detection of non-standard and other marked lexis from the SASA
dictionary, are used for ranking. The selected candidate examples, represented as featurevectors,
are used with the GDEX ranking tool for Serbian candidate examples and a supervised
machine learning model for classification on standard and non-standard Serbian sentences, for
further integration into a solution for present and future dictionary production projects.
PB  - Brno : Lexical Computing CZ s.r.o.
T2  - Electronic lexicography in the 21st century : Smart lexicography
T1  - SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
SP  - 248
EP  - 269
UR  - https://hdl.handle.net/21.15107/rcub_dais_7162
ER  - 
@article{
author = "Stanković, Ranka and Šandrih, Branislava and Stijović, Rada and Krstev, Cvetana and Vitas, Duško and Marković, Aleksandra",
year = "2019",
abstract = "In this paper we present a model for selection of good dictionary examples for Serbian and the
development of initial model components. The method used is based on a thorough analysis of
various lexical and syntactic features in a corpus compiled of examples from the five digitized
volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary. The initial set of
features was inspired by a similar approach for other languages. The feature distribution of
examples from this corpus is compared with the feature distribution of sentence samples
extracted from corpora comprising various texts. The analysis showed that there is a group of
features which are strong indicators that a sentence should not be used as an example. The
remaining features, including detection of non-standard and other marked lexis from the SASA
dictionary, are used for ranking. The selected candidate examples, represented as featurevectors,
are used with the GDEX ranking tool for Serbian candidate examples and a supervised
machine learning model for classification on standard and non-standard Serbian sentences, for
further integration into a solution for present and future dictionary production projects.",
publisher = "Brno : Lexical Computing CZ s.r.o.",
journal = "Electronic lexicography in the 21st century : Smart lexicography",
title = "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian",
pages = "248-269",
url = "https://hdl.handle.net/21.15107/rcub_dais_7162"
}
Stanković, R., Šandrih, B., Stijović, R., Krstev, C., Vitas, D.,& Marković, A.. (2019). SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian. in Electronic lexicography in the 21st century : Smart lexicography
Brno : Lexical Computing CZ s.r.o.., 248-269.
https://hdl.handle.net/21.15107/rcub_dais_7162
Stanković R, Šandrih B, Stijović R, Krstev C, Vitas D, Marković A. SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian. in Electronic lexicography in the 21st century : Smart lexicography. 2019;:248-269.
https://hdl.handle.net/21.15107/rcub_dais_7162 .
Stanković, Ranka, Šandrih, Branislava, Stijović, Rada, Krstev, Cvetana, Vitas, Duško, Marković, Aleksandra, "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century : Smart lexicography (2019):248-269,
https://hdl.handle.net/21.15107/rcub_dais_7162 .

DSpace software copyright © 2002-2015  DuraSpace
About DAIS - Digital Archive of the Serbian Academy of Sciences and Arts | Send Feedback

re3dataOpenAIRERCUB
 

 

All of DSpaceInstitutions/communitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About DAIS - Digital Archive of the Serbian Academy of Sciences and Arts | Send Feedback

re3dataOpenAIRERCUB