myirefa.blogg.se - Photo to text reader

#PHOTO TO TEXT READER CODE#

It projected the image onto the side, forming a vertical pixel histogram. It was then, while reading a particularly verbose smbc comic, I thought that it should be possible to come up with something which would read images (with ), figure out where lines and letters were, and draw little selection overlays to assuage a pervasive text-selection habit. After playing with the the little seams that the seam carver tended to generate, I noticed that they tended to converge arrange themselves in a way that cut through the spaces in between letters (dynamic programming approaches are actually fairly common when it comes to letter segmentation, but I didn't know that). So, until I find it or replace it, you'll have to use Tesseract configured with the "Internet Meme" language.ĭuring May 2012, I was reading about seam carving, an interesting and almost magical algorithm which could rescale images without apparently squishing it.

#PHOTO TO TEXT READER CODE#

I started building a text recognizer algorithm specifically designed for Impact font, and it was actually working pretty well, but I kind of misplaced the code somewhere. Bold Impact font is actually notoriously hard to recognize with general-purpose text recognizers because a lot of what distinguishes letters isn't the overall shape, but rather the subtle rounding of corners (compare D, 0, O) or relatively short protrusions (the stubby little tail for L that differentiates it from an I). It's actually a bit difficult to recognize the text of the standard-template internet meme (mad props to CaptionBot, bro). The result is that my test corpus is something on the order of 50% internet meme (In particular, I'm a fan of Doge, in part because Comic Sans is interpreted remarkably well by the built-in Ocrad text recognizer). Time really does go by when you can rationalize procrastination as something "productive". The truth is that I've spent way too much time on reddit and 4chan in search of test images for the text detection and layout analysis algorithms. This might be improved in the future, because certain parts of the Naptha stack do lag behind the present state-of-the-art by a few years. Depending on how you look at it, this can be seen as a bug, or a feature.Īlso, because handwriting detection is particularly difficult (in particular, the issue is character segmentation, it's quite difficult to separate apart letters which are smushed so close as to be connected), if you try to copy and paste text from a comic, it ends up jumbled.

The comic decries websites which arbitrarily hinder users from absentmindedly selecting random blocks of text- but the irony is that xkcd should count himself among the long list of offenders because up until now, it simply wasn't possible to select text inside a comic.Īn interesting thing to note is the language agnostic nature of Project Naptha's underlying SWT algorithm (see the technical details by scrolling down a bit more) makes it detect the little squiggles as text as well. This was made by ( +KevinKwok on Google+), and Guillermo Webster.Įarly in October 2013, coincidentally less than a week before I developed the first prototype of this extension, xkcd published a comic (shown on the right) which somewhat ironically depicts the impetus for the extension. Right-click and you can erase the words from an image, edit the words, or even translate it into a different language. Hit Ctrl+C to copy the text, where you can paste it into a search bar, a Word document, an email or a chat window.

You can drag over a few lines and watch as a semitransparent blue box highlights the text, helping you keep track of where you are and what you’re reading. You can watch as moving your cursor over a block of words changes it into the little I-beam. Interaction with this second type of text has always been a second class experience, the only way to search or copy a sentence from an image would be to do as the ancient monks did, manually transcribing regions of interest. Words on the web exist in two forms: there’s the text of articles, emails, tweets, chats and blogs- which can be copied, searched, translated, edited and selected- and then there’s the text which is shackled to images, found in comics, document scans, photographs, posters, charts, diagrams, screenshots and memes.