pdf2markdown #200

Cheryl33990 · 2025-01-15T06:51:29Z

您好~我按照PDF項目中進行Document Content Extraction，
步驟如 https://pdf-extract-kit.readthedocs.io/en/latest/project/pdf_extract.html 所示。

在output的部分是能夠解析出JSON的，但在Markdown輸出的部分會有UnicodeEncodeError：
UnicodeEncodeError: 'cp950' codec can't encode character '\u5706' in position 78: illegal multibyte sequence
測試demo中的資料發現是中文的問題，猜測要使用UTF-8 (但我還沒有debug成功)，
故來請問有沒有解決方法，謝謝！

(此外想請問使用繁體中文會影響嗎？)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf2markdown #200

pdf2markdown #200

Cheryl33990 commented Jan 15, 2025

pdf2markdown #200

pdf2markdown #200

Comments

Cheryl33990 commented Jan 15, 2025