by User:GreenC (en.wikipedia.org)
February 2024
MIT License
This program converts a Google Cache URL to its original source URL.
To: https://www.allsaints-online.co.uk/
It works by removing Google Cache URL data from the start and end of the URL.
- The program is not 100% perfect because Google parameters can look like authentic parameters. YMMV but accuracy is high. See the testcases file
- The program was developed with thousands of GC URLs found on Wikipedia and manually verified.
- The function was written in Nim and converted to GNU awk for portability.
- The program is self contained requiring only awk to run
- GNU awk 4.1 or higher
-
chmod 750 googcacheparse.awk
-
Edit googcacheparse.awk and change the first shebang line to the location of awk on your system typically /usr/bin/awk
./googcacheparse url
The file testcases.txt is about 3,300 lines with " ---- " as the field break. Field 1 is the original Google Cache URL, and Field 2 is the result of googcacheparse.awk