Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full basic multilingual plane unifont #2535

Open
Zipdox2 opened this issue Nov 15, 2024 · 22 comments
Open

Full basic multilingual plane unifont #2535

Zipdox2 opened this issue Nov 15, 2024 · 22 comments

Comments

@Zipdox2
Copy link

Zipdox2 commented Nov 15, 2024

My project involves displaying song titles and artists, and I'm looking for a font that has the most coverage. The most important characters for me are characters with diacritics, Japanese, and Cyrillic. The closest ones I've found are:
Unifont, which doesn't have one font with all the characters,
Efont, which doesn't have diacritics, and
Boutique, which is too small (should be 15/16px tall).

Would it be possible to combine all the Unifont fonts into one big font? A 500kB monstrosity wouldn't really be a problem for me.

@olikraus
Copy link
Owner

There is an external project which did this job: https://github.com/stgiga/UnifontEX/blob/main/UnifontExMonoU8G2.c

In general you can create u8g2 font files by yourself with "bdfconv.exe":

u8g2/doc/faq.txt

Lines 255 to 291 in 4b17158

Q: How can I generate my own font?
A: The font must be available in .bdf file format. Then use bdfconv to generate
the font data. The font data can be pasted into an existing file of your project.
There is also a nice Windows Bitmap Font Editor "Fony" (http://hukka.ncn.fi/?fony)
which can export .bdf files. A copy of Fony 1.4.7 is available here:
https://github.com/olikraus/u8g2/tree/master/tools/font/fony
Q: Where do I find bdfconv?
A: A Windows executable is available here:
https://github.com/olikraus/u8g2/tree/master/tools/font/bdfconv
Use the Makefile in this directory to create a Linux binary.
Q: How to convert .ttf to .bdf?
A1: With otf2bdf: http://sofia.nmsu.edu/~mleisher/Software/otf2bdf/
A2: With FontForge: https://learn.adafruit.com/custom-fonts-for-pyportal-circuitpython-display/conversion
Q: Which commandline options are required for bdfconv?
A: "bdfconv -f 1 -m '32-255' -n fontname -o myfont.c myfont.bdf"
"-f 1" generates a u8g2 font.
"-m '32-255'" selects unicode 32 to 255. On Windows, please use double quotes: -m "32-255"
"-n fontname": This is the name of the font in C/C++/Ino files.
"-o myfont.c": The font array will be stored in this file.
"myfont.bdf": The input file with the font data (bdf format).
Q: Is there any further help for the commandline options of bdfconv.exe?
A: There is an online tool which helps you to derive the commandline options:
https://stncrn.github.io/u8g2-unifont-helper/
Many thanks to "stncrn" for making this available.
Q: Are there any video-tutorials for bdfconv?
A1: This is what I found on youtube: https://www.youtube.com/watch?v=Igkb7ZmO31A
The above video uses the -M option (instead of -m) to provide a map file with the unicodes.
A2: There is a video about fony and how to convert fonts from different sources (ttf, png) with bdfconv.exe:
https://www.youtube.com/watch?v=WIAcy5FXuAA
Q: Is there any simple way to create a font with some custom chars?
A: This online tool might be useful: http://www.kidsgo.net/u8g2/

@Zipdox2
Copy link
Author

Zipdox2 commented Nov 15, 2024

That solved my problem. But in my opinion there should be a pre-made full basic multilingual plane font. I guess I'll change my issue to that.

@Zipdox2 Zipdox2 changed the title Font with diacritics, Japanese, and Cyrillic Full basic multilingual plane unifont Nov 15, 2024
@olikraus
Copy link
Owner

olikraus commented Nov 15, 2024

I guess most embedded systems don't have the memory for this

@Zipdox2
Copy link
Author

Zipdox2 commented Nov 15, 2024

I guess most embedded systems don't have the memory for this

ESP32 development boards typically have 4MB of flash. The whole basic multilingual plane takes up 2034754 bytes.

@Zipdox2
Copy link
Author

Zipdox2 commented Nov 16, 2024

Pruning combining characters and other non-characters seems like a good way to save a little bit of space, since they can't be rendered by U8g2 anyway. The original character will still be visible without diacritics or whatever. Here's what I've pruned so far:

  • Combining Diacritical Marks (0300–036F)
  • Combining Diacritical Marks Extended (1AB0–1AFF)
  • Combining Diacritical Marks Supplement (1DC0–1DFF)
  • Combining Diacritical Marks for Symbols (20D0–20FF)
  • Cyrillic Extended-A (2DE0–2DFF)
  • Combining Half Marks (FE20–FE2F)
  • High Surrogates (D800–DB7F)
  • High Private Use Surrogates (DB80–DBFF)
  • Low Surrogates (DC00–DFFF)
  • Private Use Area (E000–F8FF)

These may also be pruned (in whole or parts)

  • Vedic Extensions (1CD0–1CFF)
  • General Punctuation (2000–206F)
  • Control Pictures (2400–243F)
  • Ideographic Description Characters (2FF0–2FFF)
  • Syloti Nagri (A800–A82F)
  • Saurashtra (A880–A8D
  • Devanagari Extended (A8E0–A8FF)
  • Kayah Li (A900–A92F)
  • Rejang (A930–A95F)
  • Javanese (A980–A9DF)
  • Myanmar Extended-B (A9E0–A9FF)
  • Cham (AA00–AA5F)
  • Myanmar Extended-A (AA60–AA7F)
  • Tai Viet (AA80–AADF)
  • Meetei Mayek Extensions (AAE0–AAFF)
  • Meetei Mayek (ABC0–ABFF)
  • Variation Selectors (FE00–FE0F)
  • Specials (FFF0–FFFF)

@olikraus
Copy link
Owner

so, what had been the remaining size of the unifont then?

@stgiga
Copy link

stgiga commented Dec 1, 2024

Hello, UnifontEX developer here: There ARE Arduinos with 8MiB of RAM, like the Portenta H7. Also I had to use a specific version of bdfconv. Converting everything was done so that even song titles with emoji (which DO exist) can be displayed. You're welcome to compile it yourself as you need, I just did everything for the sake of completeness. Also, I targeted MANY more formats than regular Unifont does, in fact even this library, as well as its siblings. I really do want Unicode dot-matrix LCDs/VFDs/OLEDs. I just figured I'd chime in here.

@Zipdox2
Copy link
Author

Zipdox2 commented Dec 2, 2024

I tried to preserve combining characters that are large and to the side of characters. I wrote a little JS to generate the ranges.

const exclusions = [
    [0x0300, 0x036F], // Combining Diacritical Marks
    [0x1AB0, 0x1AFF], // Combining Diacritical Marks Extended
    [0x1CD0, 0x1CD2], // Vedic Extensions
    [0x1CD4, 0x1CE8], // Vedic Extensions
    [0x1CED, 0x1CED], // Vedic Extensions
    [0x1CD4, 0x1CF4], // Vedic Extensions
    [0x1CF8, 0x1CF9], // Vedic Extensions
    [0x1CFB, 0x1CFF], // Vedic Extensions
    [0x1DC0, 0x1DFF], // Combining Diacritical Marks Supplement 
    [0x20D0, 0x20FF], // Combining Diacritical Marks for Symbols
    [0x2DE0, 0x2DFF], // Cyrillic Extended-A
    [0xA802, 0xA802], // Syloti Nagri
    [0xA806, 0xA806], // Syloti Nagri
    [0xA80B, 0xA80B], // Syloti Nagri
    [0xA825, 0xA826], // Syloti Nagri
    [0xA82C, 0xA82F], // Syloti Nagri
    [0xA8B6, 0xA8B6], // Saurashtra
    [0xA8C4, 0xA8CD], // Saurashtra
    [0xA8DA, 0xA8DF], // Saurashtra
    [0xA8E0, 0xA8F1], // Devanagari Extended
    [0xA8FF, 0xA8FF], // Devanagari Extended
    [0xA926, 0xA92D], // Kayah Li
    [0xA947, 0xA95E], // Rejang
    [0xA980, 0xA982], // Javanese
    [0xA9B3, 0xA9B3], // Javanese
    [0xA9B6, 0xA9B9], // Javanese
    [0xA9BC, 0xA9BD], // Javanese
    [0xA9CE, 0xA9CE], // Javanese
    [0xA9DA, 0xA9DD], // Javanese
    [0xA9E5, 0xA9E5], // Myanmar Extended-B
    [0xA9FF, 0xA9FF], // Myanmar Extended-B
    [0xAA28, 0xAA2E], // Cham
    [0xAA31, 0xAA32], // Cham
    [0xAA35, 0xAA3F], // Cham
    [0xAA43, 0xAA43], // Cham
    [0xAA4C, 0xAA4C], // Cham
    [0xAA4E, 0xAA4F], // Cham
    [0xAA5A, 0xAA5B], // Cham
    [0xAA7C, 0xAA7C], // Myanmar Extended-A
    [0xAAB0, 0xAAB0], // Tai Viet
    [0xAAB2, 0xAAB4], // Tai Viet
    [0xAAB7, 0xAAB8], // Tai Viet
    [0xAABE, 0xAABF], // Tai Viet
    [0xAAC1, 0xAAC1], // Tai Viet
    [0xAAC3, 0xAADA], // Tai Viet
    [0xAAEC, 0xAAED], // Meetei Mayek Extensions
    [0xAAF6, 0xAAFF], // Meetei Mayek Extensions
    [0xABE5, 0xABE5], // Meetei Mayek
    [0xABE8, 0xABEA], // Meetei Mayek
    [0xABED, 0xABEF], // Meetei Mayek
    [0xABFA, 0xABFF], // Meetei Mayek
    [0xD800, 0xDB7F], // High Surrogate
    [0xDB80, 0xDBFF], // High Private Use Surrogates
    [0xDC00, 0xDFFF], // Low Surrogates
    [0xE000, 0xF8FF],  // Private Use Area
    [0xFE20, 0xFE2F] // Combining Half Marks
];

exclusions.sort((a, b) => a[0] - b[0]);
console.log('Sorted exclusion ranges:');
console.log(exclusions);

const merged = [exclusions[0]];

for(let i = 1; i < exclusions.length; i++){
    const lastRange = merged[merged.length - 1];
    const currentRange = exclusions[i];

    if(currentRange[0] <= lastRange[1] + 1){
        lastRange[1] = Math.max(lastRange[1], currentRange[1]);
    }else{
        merged.push(currentRange);
    }
}

console.log('Merged exclusion ranges:');
console.log(merged);

let range = '0-';
for(let exclusion of merged){
    range += String(exclusion[0] - 1) + ',' + String(exclusion[1] + 1) + '-';
}
range += '65535';

console.log('bfdconv ranges:');
console.log(range);

This got it down to 2023516 bytes. Honestly removing combining characters isn't worth saving the space more than it is to fix font rendering by ignoring them.

@stgiga
Copy link

stgiga commented Dec 2, 2024

UnifontEX has the SMP in it, and the way it fits it under 65535 characters (the base versions used is a factor too) is by removing ALL black hex box placeholders, which allows Plane 1 to fit.

@stgiga
Copy link

stgiga commented Dec 2, 2024

Also the LVGL version of UnifontEX is 2MiB.

@stgiga
Copy link

stgiga commented Dec 2, 2024

UnifontEX has the SMP in it, and the way it fits it under 65535 characters (the base versions used is a factor too) is by removing ALL black hex box placeholders, which allows Plane 1 to fit.

What is SMP?

Plane 1. Oh and UnifontEX also has some Plane 2 and Plane 3 Han characters (what Westerners would call Chinese characters, and what Japanese users would call Kanji.)

Most emoji live in Plane 1. Most "Fancy Text" (as the West calls it) lives in Plane 1. Musical notation lives in Plane 1.

@Zipdox2
Copy link
Author

Zipdox2 commented Dec 2, 2024

Can UnifontEX be used in u8g2?

@stgiga
Copy link

stgiga commented Dec 2, 2024

Yes, and I've made a version for it, though it's 6MiB, so it effectively requires an Arduino Portenta H7. But I had converted the whole font. It's the C file that's 6MiB, so the compiled version should be easier:
https://stgiga.github.io/UnifontEX/UnifontExMonoU8G2.c

@stgiga
Copy link

stgiga commented Dec 2, 2024

I'd love to see what the finished music player looks like.

@Zipdox2
Copy link
Author

Zipdox2 commented Dec 2, 2024

This is just a proof of concept UI.

poc.mp4

@stgiga
Copy link

stgiga commented Dec 2, 2024

UnifontEX actually supports ▶️//, 👤 and so it can literally represent the symbols in your player as text.

Also I'm loving what you have, it looks so cool! It reminds me of a car music display. What you've made so far is very beautiful.

@stgiga
Copy link

stgiga commented Dec 2, 2024

Honestly, this is exactly one of the intended use cases.

@olikraus
Copy link
Owner

olikraus commented Dec 2, 2024

Plane 1. Oh and UnifontEX also has some Plane 2 and Plane 3 Han characters (what Westerners would call Chinese characters, and what Japanese users would call Kanji.)

Most emoji live in Plane 1. Most "Fancy Text" (as the West calls it) lives in Plane 1. Musical notation lives in Plane 1.

Probably this is known, but one limitation in u8g2 is, that only base plane (plane 0) is supported as of now.

@olikraus
Copy link
Owner

olikraus commented Dec 2, 2024

For u8g2 the unifont emoji got mapped into ASCII range:
https://github.com/olikraus/u8g2/wiki/fntpic/u8g2_font_unifont_t_emoticons.png

of course a workaround :-/

@Zipdox2
Copy link
Author

Zipdox2 commented Dec 2, 2024

@stgiga How big would UnifontEX be if only the BMP is included?

@stgiga
Copy link

stgiga commented Dec 2, 2024

Plane 1. Oh and UnifontEX also has some Plane 2 and Plane 3 Han characters (what Westerners would call Chinese characters, and what Japanese users would call Kanji.)
Most emoji live in Plane 1. Most "Fancy Text" (as the West calls it) lives in Plane 1. Musical notation lives in Plane 1.

Probably this is known, but one limitation in u8g2 is, that only base plane (plane 0) is supported as of now.

It generated fine lol. Also, most characters are in the BMP so the savings just ain't there.

@stgiga
Copy link

stgiga commented Dec 3, 2024

I specifically used this converter:
https://github.com/olikraus/u8g2/blob/master/tools/font/bdfconv/bdfconv_2_22.exe

This specific build did NOT give assert errors when trying to do the entire font, yet RLE still worked. It seems that the other versions of bdfconv have trouble during the RLE step when dealing with the whole font unabridged.

If you open the UnifontEX U8G2 C file I provide in my UnifontEX repo, it says that it converted ALL 65414 characters in the BDF, the Plane 1, 2, 3, and 14 stuff included. So U8G2's format supports stuff above Plane 0, but if olikraus is correct, the actual library won't display any of it.

The Arduino Portenta H7 is an Arduino with 8MiB of RAM and 16MiB of flash memory, but it's still an Arduino. Now, the ONLY thing that has a chance at running the Adafruit_GFX version is the Portenta X8, which is more-or-less a Raspberry Pi and Arduino fusion (it can run Linux). Is that even an Arduino anymore? At least the Portenta H7 is a more-conventional Arduino, but it just has a LOT more memory. And yes, I checked to make sure the display libraries I target support it.

Basically, U8G2 UnifontEX can work if olikraus enables stuff above Plane 0, and if the Arduino you use is a Portenta H7 or Portenta X8, assuming you don't do anything over a Raspberry Pi GPIO.

Also bdfconv outputs UCGLIB, and UnifontEX exported as THAT will also fit in a Portenta H7's RAM, but with a lot less breathing-room than the U8G2 version. The LVGL version may run on a non-Pro (Portentas are Arduino's pro line) Arduino since it's only 2MiB when compiled.

The Adafruit_GFX version is in the C .h header format version (compilation of that is apparently done in real-time), and that .h file is 315MiB, all but requiring either a Raspberry Pi or a Portenta X8 to use.

Out of the four display libraries I support (U8G2, UCGLIB, LVGL, and Adafruit_GFX), the most ideal one is the LVGL version. Keep in mind that different libraries support different displays.

For the people who think even an Arduino with 2MiB of RAM (LVGL) is too much to put in your project, there is a fifth way of LCD usage (other than the BDF), and that is using the TTF2PNG version in a character generator IC.
THAT version is only 980KiB, but it has the full font in it. It's actually structured in a way that gets around DEFLATE's lack of random access. Also, because I'm an American and it isn't code, rather, an image, legally-speaking, THAT version is DEFINITELY public domain.

You know those ER3301 font ICs you can buy, well, UnifontEX flashed to a 1MiB SPI flash chip like you can buy from Microchip Technology would be the same package but have MANY more characters available, and I'd outright just buy and flash a bunch and then make them available somewhere as a new font IC that supports pretty much the vast majority of Unicode. The circuitry to display its contents would be up to you, but would likely involve a DEFLATE decoder IC too. Nothing too wild though.

Basically, if you don't like the overhead of using an Arduino but you want a dot-matrix Unicode LCD/VFD/OLED, then there are options.

If you want a VFD, I should mention that the VFD (and to a lesser extent other technologies) company Noritake makes VFDs (getting fancy with their other technologies takes a bit more convincing) that you can bake in a 16x16 font of your choice into the firmware of, AND you can customize the driver circuitry AND there is no minimum order quantity, so for 3 years I've wanted to order a VFD from Noritake that has UnifontEX as the display font, no extra hardware required, and I'd get it in that beautiful green glow. Unfortunately finding a non-VFD analogue to this was not successful because all the character LCD and character OLED people are still obsessed with 5x7, which just ain't enough. Let's just say that I'm all for more-or-less legally obsoleting said 5x7 text-only displays in favor of ones that use UnifontEX for the purposes of better language support.

And yes, Noritake provides Arduino stuff for their displays. The best base display of theirs you could use would be this one https://www.noritake-elec.com/products/model?part=GU256X128D-D903M which even has touch support.

I wish I had the funds to actually do any of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants