Handwriting Recognition & Deep Learning

From CSclasswiki
Jump to: navigation, search

Introduction

This is a project working on applying Deep Learning Models in Syriac Research, continuing Minyue Dai's SURF project .

All Python scripts are in H:\Fall2017. All data files are in C:\Syriac.

Week 1 (Sep. 18 -- Sep. 24)

  • Thursday (Sep. 21)
Convert data file into np form
Reg3RingLoc: Data file of scaled coordinates of critical points of Reg3 data; 60887*50, in each row: y1, x1, y2, x2 .... y25, x25. (down is positive x-axis and right is positive y axis in numpy in terms of matrix index)
DataPrepare/criticalPData.py: Read and save both scaled and original critical points data in .npy form. (60887*50)
CPtoImg.py: Read CP data and label them out on image and save the image. (60887*3600)


  • Friday (Sep. 22)
Meet with professor.
talk about applying model based on AutoEncoder on multiple tasks, including classification, denoise and find critical point simultaneously.
For labeling critical point, there are two possible ways: 1. Output 50 coordinates directly , 2. Produce an image with labeled cirtical points and train it as how AutoEncoder works.
Expand the CP to a 3x3 ,matrix to visualize them and better for image-based training. (25/3600 is a too small fraction)
Expand and visualize CP
CP_expand.py : Expand CP data into 3x3 matrix and visualize and compare it to reg3 data to make sure the CP coornidates are correct.

Week 2 (Sep. 25 -- Oct. 01)

  • Monday (Sep. 25)
The target image of Denoise is StackB, so the CP data for StackB is used. reg3 data is scaled from stackB.
Maultitask_B_S_VP.py: Read contextG, StackB, and CP data simultaneously and store them in .npy.
Build MultiTask Model based on AutoEncoder to combine classsification, Denoise, and locating CP
multiTask_3task.py:
Combime 3 tasks together when for CP just output an image, CP fails because the target image is really biased (just 1% of points should be labeled as CP). Should try directly output 25 CP's coordinate or solve the skewed target problem.
Classification and Denoise works fine.


  • Tuesday (Sep. 26)
Generate Multitask which directly output the scaled coordinate
multiTask_3task_scaledCP.py:
Add an output in encoder and this structure seems not work, because the cost does not converge.


  • Wednesday (Sep. 27)
Try CNN network just on locating CP (output coordinate)
CP_StackB: It converges but the performance is not so good.
The problem in this model is now the order of CP does matter, but actually it does not.


  • Friday (Sep. 29)
Try CNN network just on locating CP (output image)
CP_StackB_img: Rewrite the cost function to solve the skewed data issue. It converges but the output is likely to be the full letter rather than just CP. CP data acts like a boundary for it. This may greatly improve the performance of denoise model.
Try to build Denoise model with the help of CP
Structure:
1. Encoder(CP)-Decoder(CP)-Encoder(Denoise)-Decoder(Denoise)
2. Cost = cpcost+denoisecost
Different AutoEncoder structure for Denoise and CP
Denoise needs shared parameters but CP needs seperate parameters
The F-measure for comparing target and output based on training on CP is 0.7.

Week 3 (Oct.02 -- Oct. 08)

  • Monday (Oct. 02)
Read Paper Multitask CNN in Facial Landmark
Early Stopping is important.
Test uncleaned image for classification and accuracy is about 90%, which is much higher than that in multitask model (70%)
  • Tuesday (Oct. 03)
Embed 1 target image (same class) with uncleaned image for Denoise Model
AugDenoise\AugDenoise\sameLabel.py: Randomly take a same-label target image and input.
The test accuracy is similar to CDDenoise, which is about 82%. No further improvement.
  • Wednesday (Oct. 04)
Build CP-Denoise model
CP_Denoise.py: This model contains a chain of CP and Denoise model.
Performance similar to Conditional Denoise Model.
  • Friday (Oct. 06)
Discuss with professor
Try to make a table of different model performance of classification, AutoEn, and CP.
Discuss how to make z vector more independent from class label, and encode more information about writing style.

Week 4 (Oct.09 -- Oct. 15)

  • Tuesday (Oct. 10)
Run and write scripts and record performance of models (independent, two-task, three-task)
Independent: Classification: 0.89(Accuracy), Denoise: 0.79(F), CP: 0.007(reduce_mean)
2-task:
Class+CP : 0.76, 0.036
Class+Denoise: 0.76, 0.80
Denoise+CP: 0.83, 0.037
3-task: 0.73(Accuracy), Denoise: 0.78(F), CP: 0.034
Collect all main scripts and data in one folder for publish
Summer2017\Scripts: Main scripts for CBCDGAN, Denoise, and classification model.


  • Wednesday (Oct. 11)
Check the scripts that prepares dating data
It has code for encoding manuscript information, but I just have dataset for handreg data, and I prefer those of reg3 data.
Visualize high-dimension data for checking how well the data encodes letter information and class label informaion. It seems that t-SNE is the state-of-art method
Comparison of many methods
PCA nad t-SNE in Python
Use t-SNE more effectively
sklearn library for t-SNE
t-SNE Implementation
Handle reg3vec data for tsne
TSNE\tsne_reg3vec_dataprepare.py: Take z vec and label data and randomly select 1500 samples from 3 classes.
zvec_3class_1500samples.npy: randomly selected 1500 z vec of 3 classes.
label_3class_1500samples.npy: labels(number) of randomly selected 1500 z vec of 3 classes.
random_3class_1500samples.npy: randomly generated (same distribution as z vec) vectors with same shape as zvec.
tsne_test_1500reg3zvec.py : generate graph for both zvec and random data.


  • Friday (Oct. 13)
Meet with professor and discuss about models
Multi-task models trained simultaneously decreases the performance, but we can try training model alternatively
A new model to force labels represent more information about style rather than label
Encoder : Takes image and output z vector
Discriminator: Takes a pair of imgs and output if they are from same manuscripts (Same style).
Generator : Takes z vector in and generate fake images.

Week 5 (Oct.16 -- Oct. 22)

  • Monday (Oct. 16)
Try to train multi-task model alternatively
Find no paper describe training multi-task model alternatively, so just do experiments intuitively.
Try 50 steps,100 steps, 200 steps, 1 epoch, they performs the same and the performance further decreases.
Build the encodeGAN model.
reg3_manu.py : Get manuscript data, known manu index, and padding matrix for getting data from given manu.


  • Tuesday (Oct. 17)
Manupulate the data
reg3_manu.py : Modify this script to make padding matrix for training and test data seperately
Build GANEncoder model
GANEncoder\GANEncoder.py: Build the encoder-centered GAN model.


  • Wednesday (Oct. 18)
Check how GANEncoder works
GANEncoderIMG\GANEncoder_1.png: Train 20000 steps (seems not enough)
Row1: fake image with z-vector from x1 and label from x2
Row2: x1 (same manu of x2)
Row3: fake image with random fakeZ with label from x1
Row4: x2 (same manu of x1)
Row5: fake image with random fakeZ with label from x2
Update tensorflow and fix the tensorboard issue
Because tensorboard does not show scaler information, the tensorflow needs to be upgraded to 1.30, and the command changes to python -m tensorboard.main --logdir=/log_path
Test different parameters of GANEncoder
Fix the wrong equation for accuracy
Seperate pretraining for En&D and Autoencoder
The training error curve seems to be much more stable.
Should think about how to balance En&D and G, now D blindly outputs 0 for all pairs becasue D_loss is skewed.
Think about normalize output of En to makes it be in the distribution of random z : ~N(0,1)
Reproduce image examples for hip2017 paper
Denoise_hip2017.py: Just choose from 3 manuscripts, add imshowpair(), just save images directly.


  • Thursday (Oct. 19)
Check and save GANEncoder model
GANEncoder.py: Revised version which has seperate pretrain steps (En+D and AutoEn). The training curve shows that the training process is much more stable, but the D does not work because the input data (3 groups) is skewed, only 1 is 1 and 2 of them are 0.
pretrainedGANEncoder: Model saved.
GANEncoder_seperatepretrain.png: Image examples, it shows that image rebuilt from real image z vector has most noises, which make sense because the model does not force it to be ~N(0,1).
Include the idea from VAE(Variance Autoencoder) to force the output of encoder to be ~N(0,1)
Combination of VAE and GAN
Tensorflow implementation example
The KL cost seems to make the model diverge, so just adding BN at the end of Encoder seems work
  • Friday (Oct. 20)
The GANEncoder_VAE only with BN output layer seems work
The training curve looks better, and there are fewer noises on the fake image
Generate grayscale image for denoise examples in HIP2017
Dataprepare\Denoise_gray: All grayscale examples


  • Satureday (Oct. 21)
Paper about face-vector extraction
Cosine Similarity
Correlation Distance


Train GANEncoder with encoder and G-D structure independently
GANEncoder_independent.py:Feature no longer put in G, En_cost = Dist(same-source)/Dist(diff-source)
Train Encoder itself with only real images to show that this does not converge
Encoder_JustRealImage.py:
Train GANEncoder just on encoder and generator
GANEncoder_EnANDGe.py:
En_cost = Dist(same-source)/Dist(diff-source)
Ge_cost = 1-En_cost


  • Sunday (Oct. 22)
Train Encoder itself with only real images but the model does not converge
Encoder_JustRealImage.py: The cost is unstable and never converges. Tensorboard: --logdir=/tmp/tensorflow_logs/GANencoder_JustRealImage1


Week 6 (Oct.23 -- Oct. 29)

  • Monday (Oct. 23)
Go into the adversarial Encoder-Decoder model structure
https://arxiv.org/pdf/1704.02304.pdf


  • Wednesday (Oct. 25)
Implement Adversarial Learned Inference
Adversarial Learned Inference
ALI Github
ALI\ALI.py: Implement ALI with only CNN and DeCNN(Based on paper), but it doesn't work and just generate some random noises (lack pretrain steps)


  • Thursday (Oct. 26)
Implement new ALI model with pooling and fully connected layers
ALI_normal.py: model with ordinary pooling and fully connected layers, D and G trained in different pace.
/tmp/tensorflow_logs/ALI_Normal4: tensorboard directory for learning cost graphs
ALI_normal_30000.png: examples for fake image from ALI
Try to figure out why fully CNN/DeCNN structure do not work
Maybe the author use residual network to solve the Vanishing Gradient problem. The pure CNN structure is too deep for this task.


  • Friday (Oct. 27)
Train ALI_normal in 50000 rounds
pretrainedALI_normal_50000\model.ckpt: trained model
ALI_normal_50000.png: Images generated from ALI_normal
Discuss with professor
Try to evaluate each model (visualization)
Modify ALI to let D actually discriminate between images from same batch or real+fake image.


  • Sunday (Oct. 29)
Retrain GANEncoder_VAE on 128 z-vector length
pretrained_EncoderVAE_128\model.ckpt: trained model
Get z-vector of test data on different models and visualize them
Reg3_TestZ: Folder for all scripts
dataprepare.py: Prepare data index and labels for visualization (10 highest-frequency manu+letter in test set)
zVec\target: Folder for all visualization data index, manu, and letter label
zVec\chosen: Folder for all chosen zvectors from different models.


Week 7 (Oct.30 -- Nov.05)

  • Monday (Oct. 30)
Save all t_SNE visualization in Reg3_TestZ_img
ALI, GANEncoderVAE, VAE, Encoderonly
Test classification on manuscripts on naive SVM
\SVM: Folder for all scripts and data on SVM test
EncoderOnly : 44%
VAE : 67%
ALI: 72%
GANEncoderVAE: 78%
ALIEncoder: 80%


  • Tuesday (Oct. 31)
Train on simple GAN
GAN: 71%
Want to do svm experiment on seperate manusciprts
Previous train and test set are split randomly, which means the letter from same manuscipts in test set may have appeared in train set. We want to split 40 manuscripts out and make sure they are all not in train set, and then use the trained model to get z vectors for test set, and the do svm on them
seperateManu.npy: select train and test set and save all information


  • Wednesday (Nov. 01)
do SVM test on test data
spliit 40 manu in test data into 4 group, do SVM on each group; in each group, 70% are train data and 30% are test data for SVM.
ALI:
group 1 0.7292576419213974
group 2 0.6885714285714286
group 3 0.7489051094890511
group 4 0.6767676767676768
average: 0.710875464187
GAN:
group 1 0.6914119359534207
group 2 0.7085714285714285
group 3 0.724087591240876
group 4 0.5925925925925926
average: 0.67916588709


  • Thursday (Nov. 02)
Work on SVM test
VAE:
group 1 0.33915574963609896
group 2 0.3242857142857143
group 3 0.3562043795620438
group 4 0.32323232323232326
average: 0.335719541679
Encoder:
group 1 0.3333333333333333
group 2 0.2814285714285714
group 3 0.35036496350364965
group 4 0.4225589225589226
average: 0.346921447706
  • Friday (Nov. 03)
GANEncoder:
group 1 0.9737991266375546
group 2 0.93
group 3 0.9503649635036496
group 4 0.8720538720538721
average: 0.931554490549
ALIEncoder:
group 1 0.9592430858806404
group 2 0.93
group 3 0.9693430656934306
group 4 0.8939393939393939
0.938131386378

Week 7 (Nov.06 -- Nov.12)

  • Monday (Nov.06)
Try to figure out why the accuracy will be unreasonably high
Because the letter label is really correlated to the manuscript
Try to solve it by selecting 10 most used letter in 10 manuscripts.
Make the data more balanced.
New Data
Encoder: 0.4892
VAE: 0.4973
GAN: 0.5622
ALI: 0.5027
GANEncoder: 0.6351
ALIEncoder: 0.6486
  • Thursday (Nov. 09)
Work with large image data

Week 8 (Nov. 13 -- Nov.19)

Have all data we need, but Professor Nich is in Japan now. I am waiting him back and work on another project (Multi-task model on Yelp Data Set).


Week 9 (Nov. 20 -- Nov.26)

  • Monday (Nov. 20)
Talk with Professor
We seperate same letters with different forms, which should contain most information, so we should relabel all data and retrain all networks we have.
Start to write a paper on this project.


Relable all letters (Now we just have 22 groups of letters)
DataPrepare\relabel.npy: Relabel all letters
  • Saturday (Nov. 25)
Finish retrain all models with new label and test them with SVM
VAE Test:
group 1 0.28966521106259097
group 2 0.2742857142857143
group 3 0.32408759124087594
group 4 0.265993265993266
0.288507945646
Encoder Test:
group 1 0.5516739446870451
group 2 0.4714285714285714
group 3 0.6554744525547446
group 4 0.5336700336700336
0.553061750585
GAN Test:
group 1 0.7481804949053857
group 2 0.6914285714285714
group 3 0.6963503649635037
group 4 0.6835016835016835
0.7048652787
ALI Test:
group 1 0.7918486171761281
group 2 0.7371428571428571
group 3 0.743065693430657
group 4 0.6481481481481481
0.730051328974
GANEncoder Test:
group 1 0.8005822416302766
group 2 0.7357142857142858
group 3 0.8014598540145985
group 4 0.7323232323232324
0.767519903421
ALI_Encoder Test:
group 1 0.8078602620087336
group 2 0.8
group 3 0.8248175182481752
group 4 0.7626262626262627
0.798826010721