You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As this project has a Chinese/English focus I tried to experiment with a bilingual test case.
So the source document is in Chinese:
Embedding is done with maidalun1020/bce-embedding-base_v1, which I understood to be a Bilingual and Crosslingual Embedding model.
I work under the assumption that it means while the source document is in Chinese, I will be able to perform retrieval testing, search, and chat in English should the semantic meaning of a chunk matches. Obviously the LLM deployed (Gemini) needs to be bilingual as well which is the case.
However that is not what I have experienced with.
Retrieval testing: Always return with "no data"
Search: No result
Chat: Knowledge base is empty
Please advise if multilingual support is available in Ragflow, or if what has attempted wasn't the correct approach for such a purpose? Thanks.
The text was updated successfully, but these errors were encountered:
I would second this question. I tried multi-language use case (one knowledge base, documents in two languages, embedder e5-medium that is multi-lingual).
When I asked question in English- only English documents are used for reference, when I am asking in second language - it uses only second language documents.
Upon further experiments I found that the limitation is more to do with the RAG process rather than the LLM model. Basically an embedded vector from English questions will not retrieve any embedded vector with Chinese data, thus leaving any subsequent LLM interaction irrelevant. However according to the description of BCEmbedding model: EmbeddingModel handle bilingual and crosslingual retrieval task in English and Chinese
So why would this become a hurdle when adopted and utilized within RAGFLOW?
Just thinking aloud:
Did you test the process outside of rag flow? May be issue is in embeddings
model , and not on rag flow side ?
I will also test some proprietary embedders available via api, will come
back if something interesting comes up
Describe your problem
As this project has a Chinese/English focus I tried to experiment with a bilingual test case.
So the source document is in Chinese:
Embedding is done with maidalun1020/bce-embedding-base_v1, which I understood to be a Bilingual and Crosslingual Embedding model.
I work under the assumption that it means while the source document is in Chinese, I will be able to perform retrieval testing, search, and chat in English should the semantic meaning of a chunk matches. Obviously the LLM deployed (Gemini) needs to be bilingual as well which is the case.
However that is not what I have experienced with.
Retrieval testing: Always return with "no data"
Search: No result
Chat: Knowledge base is empty
Please advise if multilingual support is available in Ragflow, or if what has attempted wasn't the correct approach for such a purpose? Thanks.
The text was updated successfully, but these errors were encountered: