Minyue Dai's CPT Project
OCR System of Historical Documents
This summer I interned in Engineering Practicum program at Google. My project is Optical Character Recognition System (OCR) of Historical Documents, which is supervised by GoogleOCR team under Google Research. My podmate Carrie and I worked on building a character recognition system optimized for historical documents in various languages.
Engineering Practicum is a 12-week long internship program designed for first-year and sophomore students. Interns work as a pair and get help from two hosts. My podmate Carrie and I have similar background and related experience about OCR of historical manuscripts, so our collaboration turns out to be great. An interesting but stressful fact is that our GoogleOCR team is under Google Research, which means almost all Googlers are PhDs. My host told me they never had undergraduate interns before and he did have same expectations as those for graduate students for us. Another challenging point is their whole system in built on C++, which both Carrie and I had no experience before. There are also many internal tools at Google and be honest we spent more of our first two weeks in learning new tools and frameworks.