Tech Word

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 19 October 2011

reCAPTCHA'd!

Posted on 12:25 by Unknown

reCAPTCHA is an excellent example of not only solving an informational processing problem in a creative way, but in solving the original problem, also solving a much larger one.

Before you can understand reCAPTCHA, you must first understand its predecessor: CAPTCHA. CAPTCHA was created to solve the problem of automated programs (or "bots") from logging into websites and thereby generating spam in the form of emails and mass postings.

A CAPTCHA screen displays a distorted image of letters or words. A person can read the letters, but a bot cannot. The user must enter the letters correctly to gain access to the system, for example, to sign up for an email account.

This technology alone is a great example of a creative solution to a complex problem. But reCAPTCHA takes it a step further by solving an even bigger problem.

This larger problem involves an ancient form of communication - the printed page. There are tens of thousands of books and newspapers that Google is trying to convert to digital text. Scanning the publications, then using OCR (optical character recognition) to convert the scanned image to text has its limits. If the text is distorted (as it is in many of the older publications), it cannot convert the text.

How does this relate to CAPTCHA? Well, about 200 million CAPTCHAs are done by people every day. If each CAPTCHA takes ten seconds, this effort represents about 63 person years of work every day.

Wouldn't it be amazing if there was a way to put all this time to good use? That is exactly what reCAPTCHA does.

Here's how it reCAPTCHA works:
  1. When a document is scanned, it detects a word that it cannot convert. Let's call this the "unknown word".
  2. The reCAPTCHA process sends this unknown word as a CAPTCHA for people to deciphere.
  3. The CAPTCHA contains not only the "unknown word", but another word which the system already knows. We'll call this the "known word".
  4. In the CATPCHA that is created, the user is asked to read both words and enter them.
  5. If the user solves the known word, the system assumes that their answer will be correct for the unknown word.
  6. The system also gives the unknown word to a few other people to verify that the original answer was correct.
  7. If enough people agree on what the unknown word is, the information is set back to the original system and the converted word is added to the document that is being digitized.
  8. This process is repeated until all the words in the document are converted.
Can you even begin to imagine the flash of genius that occurred in the mind of the Luis von Ahn, the creator of the reCAPTCHA process?

The problem is that these type of "eureka" moments are very difficult to create. They often just happen, much like the weather. You can no more force yourself to be creative that you can force yourself to love, hate, forget something, fall asleep or go back in time.

However, you can sometimes find creative solutions if you just stop what you're doing, and ask yourself some questions, such as:
  1. Is there a better way to present this information to the end user?
  2. What else would a user need to know about this concept, task, or thing?
  3. How does the user use our documents?
  4. What changes could be made to enhance the documentation development process?
I'll give some examples of real-life creative solutions that I've encountered:

Example 1: Our help files have to be checked into a version control system. Each help project can contain hundreds of individual files, and these files are often created, deleted, moved and renamed. It would have been very cumbersome to keep track of each file that was checked in and out. The solution (from a colleague of mine) was this: instead of checking in and out the various files, a zip file of the entire help system was created and checked in instead. The installation program then decompresses this zip file. Only one file now needs to be sent and tracked in the build.

Example 2: I was working with a developer on a complex database administration application. One of the functions the user could do was rerun a query by clicking a button labeled, appropriately enough, Rerun query. The developer said the problem was that there were many different queries that the user could run, and that they needed a quick way to know which one they had run before re-running it. I asked if was possible to embed the name of the query that had just run into the button name, so that, for example, if the user had run the Last Name query, the button label would be Rerun Last Name query? I still remember the developer's eyes widening and his face lighting up as recognized the elegant beauty of this solution. "Yes," he said, "it can be done!" 

Example 3: Many of our help projects share content, templates, and other settings. I wanted to develop a simple content management system that would allow all the writers to share these things across many locations. I created a master help project that contained all the common content and settings. I then linked my other help projects to this master project, so that if any of the common material changed, it would automatically be updated in the other help projects. Finally, I stored all the documentation on a version control system that could be accessed by any writer. As long as each writer has the current version of the master help project and links their other help projects to it, this will ensure the templates and content remained standard.

So don't just think "outside the box".

Ask yourself if you even need the box in the first place.
Email ThisBlogThis!Share to XShare to Facebook
Posted in creativity, technology | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Six Things That Should Be Single Sourced
    Single-sourcing, as we all know, is the art and science of using a single repository of information to produce multiple outputs. Typical ex...
  • Interviewing and Dating: A Single Source Solution
    Last month, people celebrated "Valentine's Day", a day to celebrate romance and love, a day to be extra-nice to your partner, ...
  • The Power of Words
    There's nothing like an election to illustrate how powerful words are. Politicians, pundits, and the media use words to advance their ca...
  • The Governing Dynamics of Documentation
    Game theory is a specialized field of mathematics that analyzes choices and results in strategic situations, or games , as the players try t...
  • Why info systems fail
    If you only have time to read one news article today, read this one from the Financial Post. Don't leave IT to the techies - Three probl...
  • How to update a document - NOT!
    Canadian International Co-operation Minister Bev Oda needs to work on her document management skills. She hand wrote the word 'NOT'...
  • Publishing for Pollard
    Most of you probably have never heard of Jonathan Pollard, the American who has been languishing in prison since November 21, 1985, almost 2...
  • The Dynamic Blogger
    Some of you may have noticed the new look of this blog. It's a new Blogger feature called dynamic views . You can now choose how this bl...
  • Dude, where's my document?
    Try this experiment: Think of a printed guide you worked on. Find the source document from your current location. Make a minor change to the...
  • Security breach!
    It's always entertaining to read about non-lethal lapses in security at a major event. Remember the debacle at the 2010 Winter Olympics?...

Categories

  • art
  • autism
  • bad communication
  • business
  • career
  • cloud computing
  • computers
  • creativity
  • entertainment
  • finance
  • food
  • Google
  • history
  • interviewing
  • math
  • media
  • medicine
  • misc.
  • music
  • nature
  • news
  • philosophy
  • politics
  • quantum theory
  • religion
  • resume
  • resumes
  • science
  • security
  • simplicity
  • sport
  • technology
  • usability

Blog Archive

  • ►  2012 (9)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  April (2)
    • ►  March (2)
    • ►  February (2)
  • ▼  2011 (36)
    • ►  December (2)
    • ►  November (2)
    • ▼  October (6)
      • A Note on the New Notes
      • How do you like them Apples?
      • I Can C Clearly Now
      • reCAPTCHA'd!
      • The Dynamic Blogger
      • The Art of the White Night
    • ►  September (2)
    • ►  August (4)
    • ►  July (5)
    • ►  June (3)
    • ►  May (6)
    • ►  April (2)
    • ►  February (3)
    • ►  January (1)
  • ►  2010 (47)
    • ►  December (3)
    • ►  November (6)
    • ►  October (4)
    • ►  September (2)
    • ►  August (1)
    • ►  July (2)
    • ►  June (2)
    • ►  May (3)
    • ►  April (5)
    • ►  March (11)
    • ►  February (7)
    • ►  January (1)
  • ►  2009 (36)
    • ►  December (11)
    • ►  November (5)
    • ►  October (4)
    • ►  September (2)
    • ►  August (2)
    • ►  July (3)
    • ►  May (1)
    • ►  April (3)
    • ►  March (2)
    • ►  January (3)
  • ►  2008 (24)
    • ►  December (9)
    • ►  November (1)
    • ►  October (7)
    • ►  July (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2007 (10)
    • ►  December (1)
    • ►  November (2)
    • ►  October (4)
    • ►  August (1)
    • ►  March (1)
    • ►  January (1)
  • ►  2006 (4)
    • ►  September (1)
    • ►  June (1)
    • ►  April (1)
    • ►  February (1)
  • ►  2005 (10)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2004 (9)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2003 (9)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  June (1)
    • ►  May (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2002 (3)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
Powered by Blogger.

About Me

Unknown
View my complete profile