A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set the TIKA_SERVER_JAR environment variable to TIKA_SERVER_JAR="file:////tika-server.jar" which successfully tells python-tika to "download" this file and move it to /tmp/tika-server.jar and run as a background process. This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.

Features

  • Parser Interface (backwards compat prior to REST)
  • The parser interface extracts text and metadata using the /rmeta interface
  • Optionally, you can pass Tika server URL along with the call what's useful for multi-instance execution
  • Specify Output Format To XHTML
  • The unpack interface handles both metadata and text extraction in a single call
  • Internally returns back a tarball of metadata and text entries that is internally unpacked

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow tika-python

tika-python Web Site

Other Useful Business Software
Auth0 for AI Agents now in GA Icon
Auth0 for AI Agents now in GA

Ready to implement AI with confidence (without sacrificing security)?

Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.
Start building today
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of tika-python!

Additional Project Details

Programming Language

Python

Related Categories

Python Text Processing Software, Python Healthcare Software, Python Machine Learning Software

Registered

2022-05-26