Welcome to our repository for SSDA 2023! This project focuses on the recognition of German historical documents, leveraging advanced techniques in document image recognition. Our goal is to provide an accessible resource for training models on small historical datasets, helping researchers and developers improve the quality and accuracy of document recognition in under-resourced historical datasets.
This repository is part of an ongoing effort by the Historical Image Explorers Team at the University of Fribourg, Switzerland For SSDA 2023. As specialists in historical document analysis, we’ve designed this project to address the unique challenges faced when working with historical documents, such as varying scripts, degraded text, and limited annotated data. Here, you’ll find:
Sample Code: Step-by-step examples to guide you through SSDA 2023 recognition tasks on German historical documents. Specialized Datasets: Carefully curated small datasets tailored to train models efficiently in historical document recognition. Additional Resources: Tutorials and tips for training with limited data, tailored specifically to support historical image analysis. Key Features Small Dataset Training: We provide methods and best practices for training with small datasets, which is often necessary in historical research due to limited resources. Customizable Pipelines: Our code offers flexibility for adapting and fine-tuning models for various historical scripts and layouts. Community Contributions: We encourage community members to share their own datasets, models, or code improvements to help enhance the collective knowledge and resources available for historical document recognition. Getting Started
We welcome contributions from researchers and developers interested in historical document analysis. If you have additional datasets, model improvements, or innovative techniques, please feel free to contribute or reach out to our team!