Jump to content

Project Naptha

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jasslimhuimin (talk | contribs) at 06:23, 7 April 2015 (Jasslimhuimin moved page Jasslimhuimin/sandbox to Project Naptha: Shifting article out of sandbox to Wikipedia main page.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Project Naptha
Developer(s)Google Chrome.
Initial releaseApril 2013; 11 years ago (2013-04)
Written inJavaScript
Operating systemChrome
Websiteprojectnaptha.com

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Being the first of its kind, Project Naptha is a browser extension software for Google Chrome that allow users to highlight, copy, edit and translate text from within images.[1] It was created by developer Kevin Kwok,[2] and released on April 2014 as a Chrome add-on. This software was first made available only on Google Chrome, downloadable from the Chrome Web Store. It was then made available on Mozilla Firefox, downloadable from the Mozilla Firefox add-ons repository but soon removed from the repository. The reason behind the removal remains unknown and it is currently unsure if development for the software to be supported on other Internet Server will continue.[3] Currently, it is still not supported on other Internet servers such as Safari or the Internet Explorer.

This web browser extension uses advanced computing imaging technology,[4] which is the use of the computer as an instrument. This technology is widely used in fields such as the art arena. Digital imaging technology has been employed to produce hardcopy art, and the identification of these works.[5]

By adopting several Optical Character Recognition (OCR) algorithms, including libraries developed by Microsoft Research and Google, it automatically identifies text in images on the web. This OCR enables the build-up of a model of text regions, words and letters from all images.[6]

The OCR technology that Project Naptha adopts is a slightly differentiated technology that are used in softwares like Google Drive and Microsoft OneNote to facilitate and analyse text within images. Project Naptha also make use of a method called Stroke Width Transform (SWT),[7] developed by Microsoft Research in 2008 as a form of text detection.

Origin of name

The name Naptha is derived from Naphtha, which is a general term that originated few thousand years ago and refers to flammable liquid hydrocarbon. The process of highlighting texts also inspired the naming of the project.

Words on Web

Currently, words on the web can exist and are grouped into two forms. Firstly, there is the text of articles, emails, tweets, chats and blogs - essentially words that can be copied, searched, translated, edited and selected. Another form of words are texts which are shackled to images, found in comics, document scans, memes, GIFS, diagrams, charts - essentially texts that are found in pictorial images.

Difficulty in translation of words from images

To edit, copy or quote these second type of texts has always been a second class experience due to the lack of such softwares before Project Naptha. Previously, the only way to search or copy a sentence from an image will be to manually transcribe these regions of interest.

History

In May 2012, Kevin Kwok[2] was reading about seam carving, an algorithm which was able to rescale images without distorting or damaging the quality of the image. Kwok noticed that they tend to converge and arrange themselves in a way that cut through the spaces in between letters. A particularly verbose comic inspired him to develop a software which can read images (with canvas), figure the positions of the lines and letters, and draw selection overlays to assuage a pervasive text-selection habit.

Kwok’s first attempt was simple. He projected the image onto the side and a vertical pixel image histogram was formed. The significant valleys of the resulting histograms served as a signature for the ends of text lines. When horizontal lines are detected, each lines are automatically cropped, and the histogram process repeats itself until all horizontal lines in the image have been identified. In order to determine the letter position, a similar process was carried out, but vertically this time. However, carrying out the process vertically was unsuccessful as projections created were not readable. It was less effective, proving that the process was strictly applicable only for horizontal machine printed text. Faced with high technical difficulties, Kwok decided to abandon this project in 2012.

It was only until Kevin Kwok went on to study at Massachusetts Institute of Technology(MIT) and entered a hackathon, that he picked up this project again. This project eventually won him second place. To him, selecting texts in pictures was something that was manageable on a technical level. The relevant technology exists and was readily available for quite some time, yet for inexplicable reason, it hadn't been expanded for the application of translating texts from images. Once Kevin Kwok decided to start on his project again, the technology for transcription, translation, text erasure, and modification flowed naturally afterwards.

Technical Features

Before the Optical Character Recognition (OCR) can be applied, it has to first identify whether blocks of text exists in an image. Once the blocks of texts are identified, the OCR enables for the build-up of a model of text regions, words and letters from any images.[6] This function provides users with the option to copy, translate and even modify text directly in every image, in real-time and in their Google Chrome browser.[8]

The primary feature of Project Naptha is the text detection function. Running on an algorithm called the “Stroke Width Transform, developed by Microsoft Research in 2008,Cite error: A <ref> tag is missing the closing </ref> (see the help page).

Project Naptha automatically applies state-of-the-art computer vision algorithms on every image available when browsing the web, allowing users to highlight, copy and paste, edit and translate text which were formerly trapped within an image.

A technique similar to Photoshop's "Content-Aware Fill" feature[9] called "inpainting” is adopted. These types of algorithms are famously known as a part of Adobe Photoshop’s “Content-Aware Fill” feature. It involves the using of an algorithm that automatically fills in the space previously occupied by text with colors from the surrounding area, matching the font of the translated text in the style of the original image. This is done so by, first, detecting the text and retrieving the solid colours from the regions surrounding the text. Following, the colours will be spread around and inwards till the entire area is filled up. This technique allows user to reconstruct images as well as to edit and remove words from an image with the capturing and processing of the independent colours from regions around the edited text.[8]

In order to provide a seamless and intuitive experience for the user, the extension technique tracks cursor movements and continuously extrapolates a second ahead based on its position and velocity, predicting where highlights might be made over an image.[1] The Project Naptha software then scans and runs a processor-intensive character recognition algorithms, processing potential text that users might want to pick out from an image, ahead of time.[10]

Application

Project Naptha can be used on a few applications, enabling users to copy texts from any images displayed in the browser. This includes comics, photos, screenshots, images with text overlays such as internet memes, animated GIFS, scans, diagrams with labels, and translations.[11]

Comics

In October 2013, the first prototype for the extension for comics was released. The need for an extension for comic was due to the use of comic fonts, which are more casual and informal. Characters are often placed closely together as if they are connected and if one tries to copy and paste text from a comic, the copied text will usually appear to be jumbled up and unclear.

Photos

The algorithm used by Project Naptha for photos is the Stroke Width Transform, which was specially designed for detecting text in natural scenes and photographs. This is because photographs are generally tougher and more technically challenging to copy texts from as compared to most regular images.

Screenshots

For Screenshots, Project Naptha transforms static screenshots into something more similar to an interactive snapshot of the computer as it was when the screen was captured. The cursor changes when hovering over different parts, and blocks of text become selectable.

Editing Text on Images

Project Naptha allows one to erase and edit texts on an image by using the translation technology. This translation technology essentially makes use of “Inpainting”.

During the changing of a text, it uses the same trick that translation uses. The Translate menu includes the capability to translate in-image text to many other different languages such as English, Spanish, Russian, French, Chinese Simplified, Chinese Traditional, Japanese, or German.[8]


Technical Limitations

There are a few technical difficulties that Project Naptha still faces despite the constant improvements made to the software.

The language-agnostic nature of Project Naptha’s underlying Stroke Width Transform algorithm allows it detect the little squiggles as text. Despite it being a plus point since it is capable of detecting minor details, it can also end up to be seen as a bug by detecting and including too many unwanted details.

When the colours of the texts and background of an image are similar, it becomes challenging for words to be detected, as words become less distinctive from the image. This creates inaccuracies in the detection and copying of texts.[11]

Due to character segmentation, handwritings are especially tough for detection. The characters in handwritings are often written too close to each other, making it difficult to segment the characters or to separate the letters apart. Hence, copying texts from these types of sources will result in high inaccuracy and with jumbled letters.Cite error: A <ref> tag is missing the closing </ref> (see the help page). otherwise it would be of inferior quality. Project Naptha will aim to increase the quality in it’s future versions by using better-trained models and algorithms. There is also a possibility of the inclusion of transcription services that will be assisted by humans.

Also, the techniques of inpainting may leave marks on the original image, making it obvious that it has been edited. This technique is expected to improve as well, especially with a technique of detecting logic besides simply detecting fonts. Currently, inpainted reads fonts in this manner - if uppercase and super bold, then Impact font, if uppercase otherwise then XKCD font, and for everything else, Helvetica Neue.

As acknowledged by Kwok, Project Naptha still has to improve on many of it’s functionality. The main reason is because in terms of its various subcomponents and algorithms, Project Naptha is a few years behind the state of the art. However, he firmly believes that over time, text recognition, translation and deletion can all be developed further and this immense potential is definitely one that will be exciting.

References

  1. ^ a b Stu, Robarts. "New Google Chrome extension lets you copy and delete text in images". Gizmag. Retrieved 7 April 2015.
  2. ^ a b Kwok, Kevin. "Profile". Google+. Retrieved 7 April 2015.
  3. ^ Brinkmann, Martin. "Project Naptha text on image recognition technology comes to Firefox". ghacks.net. Retrieved 2 April 2015.
  4. ^ Hoffman, Chris. "Edit Image Text With Chrome's Project Naptha: What It Is & How To Use It". Makeuseof. Retrieved 7 April 2015.
  5. ^ Narelle, Jarry. "Computer Imaging Technology: The Process of Identification". The Book and Paper Group. The American Institute for Conservation. Retrieved 2 April 2015.
  6. ^ a b Matt, Brain. "This Chrome add-on lets you copy and erase text inside any image on the web". Engadget. Retrieved 7 April 2015.
  7. ^ "Stroke Width Transform". Stroke Width Transform. Retrieved 7 April 2015.
  8. ^ a b c Chacos, Brad. "Meet Project Naptha, an amazing Chrome extension for modifying text in web images". PCWorld. Retrieved 7 April 2015.
  9. ^ Wollman, Dana. "Adobe unveils Photoshop CS6 beta with redesigned UI and 65 new features, download it for free today". Engadget. Retrieved 30 March 2015.
  10. ^ Chan, Norman. "In Brief: Project Naptha OCRs Web Images". Tested. Retrieved 2 April 2015.
  11. ^ a b "Project Naptha". Project Naptha. Retrieved 7 April 2015.