[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat #4503

predoctech · 2025-01-16T04:44:41Z

Describe your problem

As this project has a Chinese/English focus I tried to experiment with a bilingual test case.
So the source document is in Chinese:

Embedding is done with maidalun1020/bce-embedding-base_v1, which I understood to be a Bilingual and Crosslingual Embedding model.
I work under the assumption that it means while the source document is in Chinese, I will be able to perform retrieval testing, search, and chat in English should the semantic meaning of a chunk matches. Obviously the LLM deployed (Gemini) needs to be bilingual as well which is the case.
However that is not what I have experienced with.
Retrieval testing: Always return with "no data"
Search: No result

Chat: Knowledge base is empty

Please advise if multilingual support is available in Ragflow, or if what has attempted wasn't the correct approach for such a purpose? Thanks.

senovr · 2025-01-16T06:42:19Z

I would second this question. I tried multi-language use case (one knowledge base, documents in two languages, embedder e5-medium that is multi-lingual).
When I asked question in English- only English documents are used for reference, when I am asking in second language - it uses only second language documents.

KevinHuSh · 2025-01-17T01:58:15Z

Multilingual search is not supported well so far.

predoctech · 2025-01-17T11:03:56Z

Upon further experiments I found that the limitation is more to do with the RAG process rather than the LLM model. Basically an embedded vector from English questions will not retrieve any embedded vector with Chinese data, thus leaving any subsequent LLM interaction irrelevant. However according to the description of BCEmbedding model:
EmbeddingModel handle bilingual and crosslingual retrieval task in English and Chinese
So why would this become a hurdle when adopted and utilized within RAGFLOW?

senovr · 2025-01-17T11:49:38Z

Just thinking aloud: Did you test the process outside of rag flow? May be issue is in embeddings model , and not on rag flow side ? I will also test some proprietary embedders available via api, will come back if something interesting comes up

predoctech added the question Further information is requested label Jan 16, 2025

KevinHuSh added the Feature label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat #4503

[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat #4503

predoctech commented Jan 16, 2025

senovr commented Jan 16, 2025

KevinHuSh commented Jan 17, 2025

predoctech commented Jan 17, 2025

senovr commented Jan 17, 2025 via email •

edited

Loading

[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat #4503

[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat #4503

Comments

predoctech commented Jan 16, 2025

Describe your problem

senovr commented Jan 16, 2025

KevinHuSh commented Jan 17, 2025

predoctech commented Jan 17, 2025

senovr commented Jan 17, 2025 via email • edited Loading

senovr commented Jan 17, 2025 via email •

edited

Loading