Arabic Corpus

The Arabic corpus has been developed as part of a research project named "A New Approach of Semi-Indexing of Text Documents". This corpus consists of more than 460 Arab books. Arabic corpus can be used for the development of language engineering applications, information retrieval and information extraction. The total corpus size is 137 MB It contains 23,264,785 words and more than 128,584,458 letters.

Project Samples

Project Activity

See All Activity >

Follow Arabic Corpus

Arabic Corpus Web Site

Other Useful Business Software

Cloud tools for web scraping and data extraction

Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.

Explore 10,000+ tools

Rate This Project

User Ratings

1.0 out of 5 stars

★★★★★

★★★★

★★★

★★

★

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

User Reviews

Filter Reviews:

All

mzeid Posted 2017-11-21

The Arabic text is corrupted and garbled in the files in 'AA3-utf8.zip'. Can you please fix and reupload?

Additional Project Details

Registered

2013-12-30

Report inappropriate content