CSC400: Chinese Handwriting Detection

From CSclasswiki
Jump to: navigation, search

Jennifer Sadler, Advising professor: Nick Howe

See Project Progress for a weekly journal about the research.

A basic test of the recognition model's performance for a simple handwritten character in Chinese on a passage written by the same writer. White boxes indicate locations of different characters, while black boxes indicate locations of the same character. Match strength is indicated by the concentration of blue.

Overview

Chinese handwriting recognition is an area of much active research in computer science. With over 50,000 characters in the language, Chinese poses a greater challenge than English when it comes to detection and recognition. In this project, I am exploring a handwriting recognition model developed by Nick Howe (PSM) and analyzing its compatibility and effectiveness if adapted for the Chinese language.

Goals

  • Test compatibility and explore effectiveness of PSM with handwritten Chinese characters for a variety of writers
  • Identify conditions which improve PSM performance on handwritten Chinese characters
    • Normalization methods
    • PSM structuring
    • Single character vs. whole word matching

Brief Timeline

  • September: Choose project topic, set up workspace, perform background research, begin wiki, set next stage goals
  • October: Prep data set, informal experimentation, devise formal experiment plan, complete background section of wiki
  • November: Formal experimentation on PSM, collect recall/precision of PSM performance
  • December: Continued experimentation, wrap-up, poster design

Background Information

The Chinese Language

Chinese is a very different language than English, fundamentally. The language itself contains nearly 50,000 characters, each which are made up of a number of components called 'radicals' (of which there are around 500). Additionally, words in Chinese typically contain between two and four characters. Handwriting in Chinese, like English, varies greatly from writer to writer. There is a distinction, as well, between cursive and print forms of written Chinese.

Handwriting Recognition

Handwriting recognition is a booming field in Computer Science, and in recent years, much of the research interest has been for handwriting in Chinese. There are two main types of handwriting recognition: online and offline. Online handwriting recognition occurs while the characters are being written, and such recognition requires the use of live input. This can be done with a tablet and some means of stroke recording. Typically, online handwriting recognition is easier to do, and this is certainly the case with Chinese. Offline handwriting recognition is done solely on images of handwritten characters. Especially in Chinese, accurate offline recognition is typically a greater challenge than online recognition, however it is in many ways more useful. In this project, the research performed is offline handwriting recognition.

There are a number of approaches one can take to develop an offline handwriting recognition system. This project is focused on an offline recognition system which Nick Howe developed originally for English. The method of recognition is not particularly dependent on the language, and so it is our interest to see how the algorithm performs with Chinese characters.

Results

Below is the average recall/precision curve for PSM same-writer single-character matching for 2,401 data points using the CASIA 1.0/2.0 handwriting database.

Total RP curve.png


Below are the recall/precision curves for the individual character matches over all of the writers (hover over the image for alt-text containing the Chinese character the recall/precision curve pertains to):

不是
大这
的中