# Graph Based Matching for Word Spotting in Documents

...in construction...

**September**

- 09/03/2017 - 09/07/2017: Basic background research regarding thesis & forms for thesis
- 09/10/2017 - 09/16/2017: More extensive research & paper reading
- 09/29/2017 - 10/02/2017: Looked into autoPsm, runKwsExperiments, runSbKwsExperiments, processKwsDatasets to understand the process & continued paper reading --> more clarification needed

**October**

- 10/03/2017 - 10/08/2017: attended GHC - no progress
- 10/09/2018 - 10/14/2017: finished basic matlab tutorials & looking at to different problems with varying number of edges & number of nodes ( photo to be attached ... )
- 10/15/2017 - 10/21/2017: attempt to develop mathematical algorithm : looked into shortest path algorithms & use of dynamic programming -- comparison research -- conclusion to use Dijkstra's because only positive cycles

1) Floyd-Warshall Algorithm 2) Dijkstra 3) Johnson's

- 10/22/2017 - 10/28/2017: Pivoted to look at code on psmSymFragmMatch & psmSymMatch & really trying to understand preexisting energy costs & identifying the problem & asking Nick for clarifications --> concept of springs
- 10/29/2017 - 10/30/2017: Tried to generate bimg using GWinit20 & learning MATLAB via actual coding: (1) initGW20 creates bimg dataset --> bimg sent to autoPsm(bimg{i}) to generate psm --> visPsm(psm) to visualize the psm

**November**

- 11/01/2017 - 11/04/2017: Successfully generated bimg of George Washington Dataset (many conflicts with paths & unfamiliarity with Matlab setting --> progress into jchung)
- 11/05/2017 - 11/12/2017: In the process of simulation runKwsExperiments using George Washington dataset & Learning Matlab & recording errors/path related issues / complexities | basically providing a spring constant | potentially comparing performance of psmSymMatch and regular runKwsExperiments --> need to identify some subset of strings and threshold to look at and extract those to use | problems regarding psmFit

- 11/13/2017 - 11/21/2017

- psmPageMatch --> "error" because prcoessPix < 2*ncol, where ncol is size(skpage) --- turned out to be a path problem set the C:\Matlab\Handwriting\psmPageMatch.m > H:\Matlab\Handwriting\psmPageMatch.m - need a better understanding of psmFit ( what scalar value do I put ) - rank wtags by correct matches\ - running jcGwProcessPart2 - does a run of the all the gw queries (estimated 40 ~ 80 days to complete lol) - jcMatchingWords. m - trying to find non unique words -- or wrong matches...

- 11/21/2017 - 11/26/2017

- Thanksgiving Break (No Progress)

- 11/27/2017 - 12/01/2017

** December **

- Initial write up

** January **

- reRead paper
- start of implementation - jcAlgo1
- edit of implementation (must add new autopsm with neighbrs)
- geodesicPointDistances
- convertBimgsToSkeletons

** February**

- 01/29/2018 - Present

- developed psmFitScoresTest & ran gwScoreGenerator - calculating fit scores using safePsmFit - estimated to finish in 10 days - instead, used preloaded data from C:Matlab\Handwriting\GW20PsmWordSpotApproxV0p5.mat created by C:Matlab\Handwriting\ICAR_Experiments.m which has faster performance because of pageMatch and use of gpu --> saved as scoresData.mat in working folder - order/rank matches of fit scores

Random Notes - skpage (skeleton page) created by bwmorph - initial processing of query prep took ~ 4 hrs: runTestExample1.docx for dtails

Notable Errors - contains.m --> myContains.m (shadowing other files dependent on it) - psmPageMatch --> error because prcoessPix < 2*ncol, where ncol is size(skpage) --- turned out to be a path problem set the C:\Matlab\Handwriting\psmPageMatch.m > H:\Matlab\Handwriting\psmPageMatch.m - cellsourceIndex (within Matlab/Utility) --> add to path

Paths
initGW20 - C:\Matlab\Handwriting