Daniel Stökl Ben Ezra
École pratique des hautes études
Université Paris Sciences-Lettres

Computational Document Analysis: New and Open Questions from a Pragmatic Perspective

Following to the recent successes in the application of deep learning on written human artefacts, the impact of computational document analysis on research in the humanities and social sciences is growing rapidly. Yet, many challenges persist, such as the scarcity of tagged data and the sheer heterogeneity of handwritten artefacts from many different hands, periods and cultures using a panoply of writing surfaces and techniques for a variety of purposes. Already now, questions from the humanities and social sciences are largely guiding computational research on specific tasks, such as layout analysis, transcription/recognition, dating, provenancing or scribe identification. I will try to make a case how increasing interaction of the computational document analysis community with scholars from the humanities can open up new ventures and further enhance the mutual benefit beyond the application to larger quantities of data. After briefly addressing communication issues as well as structural and institutional bottlenecks that can only be overcome jointly, I shall present new challenges to old document analysis problems that are considered solved by some.  Furthermore, in addition to the script, also the somewhat neglected writing materials have the potential to provide extremely interesting insights for answering some of the above mentioned questions. Finally, I shall try to propose entirely new questions. Particular attention will be devoted to non-Latin scripts and to fragmentary material as well as to our open-source manuscript annotation platform eScriptorium (https://escripta.hypotheses.org).

Daniel Stökl Ben Ezra (Ph.D. Hebrew University of Jerusalem, 2001, Kennedy-Leigh award) is currently directeur d’études (research professor) for Ancient Hebrew and Aramaic Language, Literature, Epigraphy and Paleography at the Ecole Pratique des Hautes Etudes, PSL in Paris. He has previously been permanent reseach fellow at the CRNS. He has had invited positions at the Universities of Princeton, Bern, Zurich, the Hebrew University and the Berlin Wissenschaftskolleg. From 2013-2018 he was the founding director of the digital humanities programme at the EPHE. He is now codirecting the open-source manuscript annotation platform eScriptorium, leading the Sofer Mahir project on the automatic transcription of medieval Hebrew manuscripts and has been coPI of digital edition platform erabbinica and the Tikkoun Sofrim crowd sourcing project. He has also coorganized three ‘manuSciences‘ summerschools on the investigation of manuscripts with methods from philology, material sciences, digital humanities and computer sciences (2015, 2017, 2019). In his philological work he is particularly interested in the Dead Sea Scrolls and early Rabbinic literature, festivals and libraries. He likes coding.