Bajčetić, Lenka

Link to this page

Authority KeyName Variants
b50de337-9414-4981-b238-aa3bec81d25f
  • Bajčetić, Lenka (2)
Projects

Author's Bibliography

SCyDia – OCR For Serbian Cyrillic with Diacritics

Ilić, Velibor; Bajčetić, Lenka; Petrović, Snežana; Španović, Ana

(Mannheim : IDS-Verlag, 2022)

TY  - JOUR
AU  - Ilić, Velibor
AU  - Bajčetić, Lenka
AU  - Petrović, Snežana
AU  - Španović, Ana
PY  - 2022
UR  - https://dais.sanu.ac.rs/123456789/14197
AB  - In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the
biggest obstacle is the lack of machine-readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are
not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
PB  - Mannheim : IDS-Verlag
T2  - Dictionaries and Society. Proceedings of the XX EURALEX International Congress,12-16 July 2022, Mannheim, Germany
T1  - SCyDia – OCR For Serbian Cyrillic with Diacritics
SP  - 387
EP  - 400
UR  - https://hdl.handle.net/21.15107/rcub_dais_14197
ER  - 
@article{
author = "Ilić, Velibor and Bajčetić, Lenka and Petrović, Snežana and Španović, Ana",
year = "2022",
abstract = "In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the
biggest obstacle is the lack of machine-readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are
not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.",
publisher = "Mannheim : IDS-Verlag",
journal = "Dictionaries and Society. Proceedings of the XX EURALEX International Congress,12-16 July 2022, Mannheim, Germany",
title = "SCyDia – OCR For Serbian Cyrillic with Diacritics",
pages = "387-400",
url = "https://hdl.handle.net/21.15107/rcub_dais_14197"
}
Ilić, V., Bajčetić, L., Petrović, S.,& Španović, A.. (2022). SCyDia – OCR For Serbian Cyrillic with Diacritics. in Dictionaries and Society. Proceedings of the XX EURALEX International Congress,12-16 July 2022, Mannheim, Germany
Mannheim : IDS-Verlag., 387-400.
https://hdl.handle.net/21.15107/rcub_dais_14197
Ilić V, Bajčetić L, Petrović S, Španović A. SCyDia – OCR For Serbian Cyrillic with Diacritics. in Dictionaries and Society. Proceedings of the XX EURALEX International Congress,12-16 July 2022, Mannheim, Germany. 2022;:387-400.
https://hdl.handle.net/21.15107/rcub_dais_14197 .
Ilić, Velibor, Bajčetić, Lenka, Petrović, Snežana, Španović, Ana, "SCyDia – OCR For Serbian Cyrillic with Diacritics" in Dictionaries and Society. Proceedings of the XX EURALEX International Congress,12-16 July 2022, Mannheim, Germany (2022):387-400,
https://hdl.handle.net/21.15107/rcub_dais_14197 .

Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić

Bajčetić, Lenka; Gmitrović, Marijа; Španović, Ana; Petrović, Snežana

(New York, NY : Association for Computing Machinery, 2022)

TY  - JOUR
AU  - Bajčetić, Lenka
AU  - Gmitrović, Marijа
AU  - Španović, Ana
AU  - Petrović, Snežana
PY  - 2022
UR  - https://dais.sanu.ac.rs/123456789/13364
AB  - This paper aims to present the digitization process of a very important piece of Serbian intangible cultural heritage, Српске народне
пословице и друге различне као оне у обичаj узете риjечи (Engl. Serbian folk proverbs), compiled by Vuk Stefanovi´c Karadˇzi´c
during the first half of the 19th century. In the paper, we discuss the necessary steps in the digitization process, the challenges we had
to deal with as well as the solutions we came up with. The goal of this process is to have a fully digitized, user-friendly version of
Serbian folk proverbs, that will also easily integrate and be compatible with other digitized resources and/or multi-dictionary portals.
PB  - New York, NY : Association for Computing Machinery
T2  - Digital Humanities Workshop
T1  - Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić
SP  - 89
EP  - 95
DO  - 10.1145/3526242.3526265
UR  - https://hdl.handle.net/21.15107/rcub_dais_13364
ER  - 
@article{
author = "Bajčetić, Lenka and Gmitrović, Marijа and Španović, Ana and Petrović, Snežana",
year = "2022",
abstract = "This paper aims to present the digitization process of a very important piece of Serbian intangible cultural heritage, Српске народне
пословице и друге различне као оне у обичаj узете риjечи (Engl. Serbian folk proverbs), compiled by Vuk Stefanovi´c Karadˇzi´c
during the first half of the 19th century. In the paper, we discuss the necessary steps in the digitization process, the challenges we had
to deal with as well as the solutions we came up with. The goal of this process is to have a fully digitized, user-friendly version of
Serbian folk proverbs, that will also easily integrate and be compatible with other digitized resources and/or multi-dictionary portals.",
publisher = "New York, NY : Association for Computing Machinery",
journal = "Digital Humanities Workshop",
title = "Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić",
pages = "89-95",
doi = "10.1145/3526242.3526265",
url = "https://hdl.handle.net/21.15107/rcub_dais_13364"
}
Bajčetić, L., Gmitrović, M., Španović, A.,& Petrović, S.. (2022). Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić. in Digital Humanities Workshop
New York, NY : Association for Computing Machinery., 89-95.
https://doi.org/10.1145/3526242.3526265
https://hdl.handle.net/21.15107/rcub_dais_13364
Bajčetić L, Gmitrović M, Španović A, Petrović S. Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić. in Digital Humanities Workshop. 2022;:89-95.
doi:10.1145/3526242.3526265
https://hdl.handle.net/21.15107/rcub_dais_13364 .
Bajčetić, Lenka, Gmitrović, Marijа, Španović, Ana, Petrović, Snežana, "Digitization of the Serbian folk proverbs compiled by Vuk S. Karadžić" in Digital Humanities Workshop (2022):89-95,
https://doi.org/10.1145/3526242.3526265 .,
https://hdl.handle.net/21.15107/rcub_dais_13364 .
1