DAIS - Digital Archive of the Serbian Academy of Sciences and Arts
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   DAIS
  • Институт за српски језик САНУ / Institute for the Serbian Language of SASA
  • ИСЈ САНУ - Општа колекција / General collection
  • View Item
  •   DAIS
  • Институт за српски језик САНУ / Institute for the Serbian Language of SASA
  • ИСЈ САНУ - Општа колекција / General collection
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

The Dictionary of the Serbian Academy: from the Text to the Lexical Database

Thumbnail
2018
stankovic.stijovic.vitas.krstev.sabo.dictionary.pdf (1.004Mb)
Authors
Stanković, Ranka
Stijović, Rada
Vitas, Duško
Krstev, Cvetana
Sabo, Olga
Article (Published version)
Metadata
Show full item record
Abstract
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach is compatible with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database model was designed in compliance with these structured forms, following mostly the lemon model. Mapping of the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A software solution for the dictionary text analysis, parsing and lexical database population was developed and test...ed on the first and the last published volumes of the dictionary (which contain 27,141 articles in total). An evaluation of the results shows that the developed model and software solution can be successfully used for the other volumes as well.

Keywords:
computer lexicography / lexical databasе / language resources / dictionary / Serbian language
Source:
Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, 2018, 941-949
Publisher:
  • Ljubljana : Ljubljana University Press, Faculty of Arts
Funding / projects:
  • Lingiustic research of contemporary serbian literary language and the development of the SASA Dictionary of serbocroatian literary and national language (RS-178009)

ISBN: 978-961-06-0097-8

[ Google Scholar ]
Handle
https://hdl.handle.net/21.15107/rcub_dais_4927
URI
https://dais.sanu.ac.rs/123456789/4927
Collections
  • ИСЈ САНУ - Општа колекција / General collection
Institution/Community
Институт за српски језик САНУ / Institute for the Serbian Language of SASA
TY  - JOUR
AU  - Stanković, Ranka
AU  - Stijović, Rada
AU  - Vitas, Duško
AU  - Krstev, Cvetana
AU  - Sabo, Olga
PY  - 2018
UR  - https://dais.sanu.ac.rs/123456789/4927
AB  - In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular
Language. Scanning and character recognition were a particular challenge, since various non-standard
character set encoding was used in the course of the almost 60-year long production of the dictionary. The first
aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized
text of and transform it into structured data stored in relational lexical database. This approach is compatible
with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database model
was designed in compliance with these structured forms, following mostly the lemon model. Mapping of
the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A
software solution for the dictionary text analysis, parsing and lexical database population was developed and
tested on the first and the last published volumes of the dictionary (which contain 27,141 articles in total). An
evaluation of the results shows that the developed model and software solution can be successfully used for
the other volumes as well.
PB  - Ljubljana : Ljubljana University Press, Faculty of Arts
T2  - Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts
T1  - The Dictionary of the Serbian Academy: from the Text to the Lexical Database
SP  - 941
EP  - 949
UR  - https://hdl.handle.net/21.15107/rcub_dais_4927
ER  - 
@article{
author = "Stanković, Ranka and Stijović, Rada and Vitas, Duško and Krstev, Cvetana and Sabo, Olga",
year = "2018",
abstract = "In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular
Language. Scanning and character recognition were a particular challenge, since various non-standard
character set encoding was used in the course of the almost 60-year long production of the dictionary. The first
aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized
text of and transform it into structured data stored in relational lexical database. This approach is compatible
with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database model
was designed in compliance with these structured forms, following mostly the lemon model. Mapping of
the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A
software solution for the dictionary text analysis, parsing and lexical database population was developed and
tested on the first and the last published volumes of the dictionary (which contain 27,141 articles in total). An
evaluation of the results shows that the developed model and software solution can be successfully used for
the other volumes as well.",
publisher = "Ljubljana : Ljubljana University Press, Faculty of Arts",
journal = "Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts",
title = "The Dictionary of the Serbian Academy: from the Text to the Lexical Database",
pages = "941-949",
url = "https://hdl.handle.net/21.15107/rcub_dais_4927"
}
Stanković, R., Stijović, R., Vitas, D., Krstev, C.,& Sabo, O.. (2018). The Dictionary of the Serbian Academy: from the Text to the Lexical Database. in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts
Ljubljana : Ljubljana University Press, Faculty of Arts., 941-949.
https://hdl.handle.net/21.15107/rcub_dais_4927
Stanković R, Stijović R, Vitas D, Krstev C, Sabo O. The Dictionary of the Serbian Academy: from the Text to the Lexical Database. in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. 2018;:941-949.
https://hdl.handle.net/21.15107/rcub_dais_4927 .
Stanković, Ranka, Stijović, Rada, Vitas, Duško, Krstev, Cvetana, Sabo, Olga, "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts (2018):941-949,
https://hdl.handle.net/21.15107/rcub_dais_4927 .

DSpace software copyright © 2002-2015  DuraSpace
About DAIS - Digital Archive of the Serbian Academy of Sciences and Arts | Send Feedback

CoreTrustSealre3dataOpenAIRERCUB
 

 

All of DSpaceInstitutions/communitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About DAIS - Digital Archive of the Serbian Academy of Sciences and Arts | Send Feedback

CoreTrustSealre3dataOpenAIRERCUB