Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "document classification"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 25
Mac 22
Windows 22
More...
BSD 7
ChromeOS 7

Category

Artificial Intelligence 17
Scientific/Engineering 5
Business 4
Software Development 3
System 2
Text Editors 2
Communications 1
Internet 1
Multimedia 1

License

OSI-Approved Open Source 14
Creative Commons Attribution License 2

Translations

English 2
French 1
Tamil 1

Programming Language

Python 7
Java 6
C++ 4
C 2
More...
JavaScript 2
S/R 1

Status

Beta 5
Production/Stable 3
Alpha 2
Planning 1
More...
Mature 1

Showing 25 open source projects for "document classification"

View related business solutions

Linux Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Context for your AI agents
Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.

Try for free
1

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...

Downloads: 0 This Week

Last Update: 13 hours ago
See Project
2

Apache OpenNLP

Apache OpenNLP

Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.

Downloads: 1 This Week

Last Update: 2025-12-06
See Project
3

flair

A very simple framework for state-of-the-art NLP

...Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. A PyTorch NLP framework. Our framework builds directly on PyTorch, making it easy to train your own models and experiment with new approaches using Flair embeddings and classes.

Downloads: 1 This Week

Last Update: 2025-02-05
See Project
4

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2025-10-01
See Project
G-P - Global EOR Solution
Companies searching for an Employer of Record solution to mitigate risk and manage compliance, taxes, benefits, and payroll anywhere in the world

With G-P's industry-leading Employer of Record (EOR) and Contractor solutions, you can hire, onboard and manage teams in 180+ countries — quickly and compliantly — without setting up entities.

Learn More
5

DeepSeek AIO

Access and use all DeepSeek AI models in one program.

DeepSeek AIO is a simple program that allows you to interact with all DeepSeek large language models in one place. It supports text-based chats, data analysis, code generation, language translation, and more. The program is designed to make it easy for users to use DeepSeek's AI tools for different purposes without switching between multiple platforms.

Downloads: 6 This Week

Last Update: 2025-11-26
See Project
6

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 1 This Week

Last Update: 2025-11-01
See Project
7

e-Dokyumento

e-Dokyumento is web-based Document Management System (DMS)

e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the...

2 Reviews

Downloads: 1 This Week

Last Update: 2022-05-14
See Project
8

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....

Downloads: 0 This Week

Last Update: 2022-08-22
See Project
9

Interpret-Text

State-of-the-art explainers for text-based machine learning models

A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard. Interpret-Text builds on Interpret, an open source python package for training interpretable models and helping to explain blackbox machine learning systems. We have added extensions to support text models. Interpret-Text incorporates community-developed interpretability techniques for NLP models and a visualization dashboard to view the results....

Downloads: 0 This Week

Last Update: 2023-12-19
See Project
Trumba is an All-in-one Calendar Management and Event Registration platform
Great for live, virtual and hybrid events

Publish, promote and track your events more affordably and effectively—all in one place.

Learn More
10

DynaQ

Innovative text document search. http://dynaq.opendfki.de for details.

The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de

Downloads: 0 This Week

Last Update: 2021-08-05
See Project
11

fastText

Library for fast text classification and representation

...Such categories can be review scores, spam v.s. non-spam, or the language in which the document was typed. Nowadays, the dominant approach to build such classifiers is machine learning, that is learning classification rules from examples. In order to build such classifiers, we need labeled data, which consists of documents and their corresponding categories (or tags, or labels).

Downloads: 0 This Week

Last Update: 2023-10-10
See Project
12

Document Classification

Document/Text Classification using Naive Bayes model.

Downloads: 0 This Week

Last Update: 2015-08-20
See Project
13

20newsgroupClassify in NaiveBayes Matlab

...Mitchell's book (Machine Learning, Tom Mitchell) 2. Please download the data from http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html Report: A Comprehensive Report below: "20 newsgroup Classification problem" Accuracy: 83.625% Number of Training data: 50% of Total document chosen sequentially Number of Training data: 50% of remaining document Download the code and other necessary files in the Files Tab.

Downloads: 0 This Week

Last Update: 2015-10-29
See Project
14

GTkNN

GPU-based Textual kNN (GT-kNN)

The following code is a parallel kNN implementation that uses GPUs for the high dimensional data in text classification. You can use it to classify documents using kNN or to generate meta-features based on the distances between a query document and its k nearest neigbors

Downloads: 0 This Week

Last Update: 2015-07-09
See Project
15

Arabic Wikipedia into Named Entity

“Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. Author URL: http://www.cs.bham.ac.uk/~fsa081/index.html http://fsalotaibi.kau.edu.sa Email: fsalotaibi {AT} kau.edu.sa fsa081 {AT} cs.bham.ac.uk

Downloads: 0 This Week

Last Update: 2014-08-24
See Project
16

MyNook

A machine learning system for supervised document classification

An open source system for supervised document classification based on statistical machine learning techniques. On the contrary of the state of art classification techniques, MyNook just requires the title of the document, not the content itself.

Downloads: 0 This Week

Last Update: 2016-10-31
See Project
17

Content based File Organizer

This is a document organizer that learns from user behavior. It uses classification algorithms to prepare label-suggestions for files. It also has a search feature that extends user queries with WordNet dictionary.

Downloads: 0 This Week

Last Update: 2014-12-16
See Project
18

NLP4J Natural Language Processing 4 Java

NLP4J library is a toolset written in Java for Natural Language Processing. This version is oriented to Document Classification and uses Naive Bayes, TF-IDF, etc. There are also pre-processing tools.

Downloads: 0 This Week

Last Update: 2014-07-01
See Project
19

Trainable Relation Extraction framework

T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
20

Young Researchers' Induction Foundation

Collection of Statistical Language Processing Tools and Modules for Information Retrieval, Document Classification, Vectorization, Pattern Matching, Knowledge/Text Mining related problems.

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
21

Qualiweb

Qualiweb aims at providing semantic web metrics for modeling a website visitors needs according to a given taxonomy or document classification. Web metrics provided by Qualiweb give an indication of how successful each of the website topics have been.

Downloads: 0 This Week

Last Update: 2013-03-19
See Project
22

Judge

JUDGE (Java Utility for Document Genre Eduction) features automatic classification and clustering of documents, optionally as a webservice. The program is written entirely in Java and makes use of the Weka machine learning toolkit.

Downloads: 0 This Week

Last Update: 2015-12-01
See Project
23

WSDC-Web Service Document Classificator

The project is to build a web service automatic document classification (like POPFile do for mail)

Downloads: 0 This Week

Last Update: 2013-02-25
See Project
24

layoutlm-base-uncased

Multimodal Transformer for document image understanding and layout

...The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. It achieves state-of-the-art results in form understanding and information extraction benchmarks. This model is particularly useful for document AI applications like document classification, question answering, and named entity recognition.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
25

Ministral 3 3B Base 2512

Small 3B-base multimodal model ideal for custom AI on edge hardware

Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks....

Downloads: 0 This Week

Last Update: 2025-12-03
See Project

Previous
You're on page 1
Next

Related Searches

opennlp

deepseek

rtf portable

signing documents

document classification

sentiment analysis

naive bayes for text classification with matlab code

knn c++

arabic corpus

freeware

Related Categories

Artificial Intelligence

Scientific/Engineering

Business

Software Development

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: