Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: parsing file failed #2222

Open
lilu6301 opened this issue Sep 3, 2024 · 7 comments
Open

[Question]: parsing file failed #2222

lilu6301 opened this issue Sep 3, 2024 · 7 comments
Labels
question Further information is requested

Comments

@lilu6301
Copy link

lilu6301 commented Sep 3, 2024

Describe your problem

[ERROR]Insert chunk error, detail info please check ragflow-logs/api/cron_logger.log. Please also check ES
Picture1

@lilu6301 lilu6301 added the question Further information is requested label Sep 3, 2024
@mjtechguy
Copy link

Same. I am trying to ingest a .csv file using the latest docker-compose and Ollama (nomic-embed-text)

Any ideas?

@lilu6301
Copy link
Author

lilu6301 commented Sep 5, 2024

my os is ubuntu 20.04. fixed by updated to 22.04

@PuppyMeng
Copy link

my os is ubuntu 20.04. fixed by updated to 22.04

I’m having the same problem, too. using ubuntu 24.04.1

@cfenglv
Copy link

cfenglv commented Oct 10, 2024

same here. Running with local build docker image on Mac.
Looking into cron_logger.log gives:
{'type': 'document_parsing_exception', 'reason': '[1:90157] failed to parse: The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]'}}"

By the way, looking into database.log gives:
ERROR (21) Can't update token usage for bc2bb8c3862b11ef8466f966ff2a9065/CHAT

I am using local Ollama, chat model local llama3.2:1b, embedding model local Ollama "snowflake-arctic-embed"

@aalboori
Copy link

I had this error due to low storage space. Please check the status of ES through ragflow user setting-->system. If it is red, please follow the link https://www.elastic.co/guide/en/elasticsearch/reference/8.15/red-yellow-cluster-status.html#fix-red-yellow-cluster-status to diagnose the reason.

@EzeLLM
Copy link

EzeLLM commented Dec 4, 2024

g the same problem, too. using ubuntu 24.04.1

this solved my problem. the following command in particular, as it fits with my logs:

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "cluster.routing.allocation.enable" : null
  }
}
'

@WaterDrop-EarthDivision

I'm happy to share my experience here.

The reason I have this is because of lack of space, strictly because elasticsearch monitors your storage usage and stops sharding when it's greater than 85 percent.

I will describe below how I discovered and solved this problem.

1. Find the username and password for elasticsearch by using docker logs -f ragflow-server.

image
'username': 'elastic', 'password': 'infini_rag_flow'

2. Diagnosis via link shared by @aalboori

2.1 Use curl -X GET “localhost:1200/_cluster/health?pretty” -u elastic:infini_rag_flow to view cluster status.

Use curl -X GET “localhost:1200/_cluster/health?pretty” -u elastic:infini_rag_flow to view cluster status. Note that in the original code it was “localhost:7200”. The -u stands for the login and password.

2.2 Use curl -X GET “localhost:1200/_cat/allocation?v=true&h=node,shards,disk.*&pretty” -u elastic:infini_rag_flow to view usage.And then I realized that it was using up to 94% of the memory.Then I realized that a whopping 94% of the memory was being used. But I have almost 2T of RAM.

3. Replace using the specified memory instead of a percentage.

curl -X PUT "localhost:1200/_cluster/settings?pretty" -H 'Content-Type: application/json' -u elastic:infini_rag_flow -d'
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "50gb",
    "cluster.routing.allocation.disk.watermark.high": "30gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "20gb"
  }
}'

You can't change just one alone. Must all three be percentages or memory capacity ! ! !

I hope my experience helps you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants