User:Cmglee/extract lang.py

This Python3 script by user:cmglee extracts and writes a monolingual SVG file from a multilingual SVG file, to let a language version be previewed in a Web browser during its development. (The alternative way to view a non-default language in a multilingual SVG in a Web browser is to install and change the language of the browser and restart it.)

Output filenames are the original filename with "-<ISO639_CODE>" added before the last ".".


python3 extract_lang.py <SVG_FILENAME>  [<ISO639_CODE> <ISO639_CODE> ...]

ISO639_CODE is as listed on commons:template:list of supported languages. If no codes are provided, all languages found in the file (and the default) are output.

Source code


As Wikimedia does not allow general executable files to be uploaded, the source code is provided below

#!/usr/bin/env python3
## Extract and write a monolingual SVG from a multilingual SVG, to preview in a browser, by CMG Lee.
## Usage: python3 extract_lang.py <SVG_FILENAME>  [<ISO639_CODE> <ISO639_CODE> ...] (all if omitted)
import re, sys
def extract_lang(svg_all, lang):
 svg_langs    = {} ## svg_langs[code] = source
 svg_currents = [] ## current language content
 level        = 1  ## DOM level under switch
 for svg_part in re.findall(r'.*?>', svg_all, flags=re.DOTALL):
  if       re.findall(r'<\s*/', svg_part): level -= 1
  elif not re.findall(r'/\s*>', svg_part): level += 1
  if level == 1:
   findall_lang = re_lang.findall(svg_currents[0])
   lang_current = findall_lang[0] if len(findall_lang) > 0 else None
   svg_langs[lang_current] = ''.join(svg_currents)
   svg_currents            = []
 return re_lang.sub('', svg_langs[lang] if lang in svg_langs else svg_langs[None])

re_lang = re.compile(r'\s*systemLanguage\s*=\s*"\s*([^\s"]+)"', flags=re.I)
path_in = sys.argv[1]
with open(path_in, encoding='utf-8', newline='') as f: svg_in = f.read()
for lang in sys.argv[2:] if len(sys.argv) > 2 else set(re_lang.findall(svg_in) + ['default']):
 path_out = re.sub(r'(\..+?)$', r'-%s\1' % (lang), path_in, flags=re.DOTALL)
 svg_out = re.sub(r'(<\s*switch[^>]*>)(.*?)(\s*<\s*/\s*switch[^>]*>)',
                  lambda matchs:extract_lang(matchs.group(2), lang),
                  re.sub(r'<!--.*?-->', '', svg_in), flags=re.I | re.DOTALL)
 with open(path_out, 'w', encoding='utf-8', newline='') as f: f.write(svg_out)