Handwriting Recognition & Machine Learning; Minyue Dai

From CSclasswiki
Jump to: navigation, search

Introduction

This is a project working on applying Deep Learning Models in Syriac Research, continuing Minyue Dai's project .

All Python scripts are in H:\Spring2017. All data files are in C:\Syriac.

Week 1 (Jan. 29 -- Feb. 04)

  • Monday (Jan. 29)
Try to compress the latent vector data we have
TSNE: It runs to slow and I fail to find a GPU version of it.
PCA: It performs not so well
  • Wednesday (Jan. 31)
Try to apply KNN (K-nearest neighbor) on the SVM test I have done in fall semester to see exactly what happens.
KNN\KNN.py: Script that runs KNN classification.
KNN\knnneightbors.py: Script that calculate neighbors.
It seems when n=3 the classifier performs the best, but because we want to see what happens, the script calculates 10 nearest neighbors.
Maybe use Softmax function to convert distance(KNN) to probability so that we can calculate an exact year for the samples.
  • Thursday (Feb. 01)
Use softmax function to convert 10 nn distance into probability and then compute the date.
KNN\knnneightbors.py: Add another function to compute probability.
  • Friday (Feb. 02)
Talk with professor and decide to use KNN and softmax to transform distance into probability and then date manuscript
Date\manu_date.py: Convert date information in .xls form into numpy array
  • Sunday (Feb. 04)
Continue working on KNN Dating
choose_manu.py: Choose train and test data's index for dating test. (36000 for train 7000 for test)

Week 2 (Feb. 05 -- Feb. 11)

  • Monday (Feb. 05)
Evaluate some special properties of different code
KNN_date.py: Date test letters based on 10 NN.
ALIEncoder's letter clustered together
  • Thursday (Feb. 08)
Try to date letters based on center of 1 group of images (same manuscript same label)
KNN_center.py: Prepare data for dating (Compute the center)
KNN_centerdate.py: Date manuscript based on center.

Week 4 (Feb. 26 -- Mar. 04)

  • Monday (Feb. 26)
Generate data that defines "same" manuscript as manuscript within 50 years
Groups of same author
31 33 7
74 17
18 21 (different letters chosen)
12 13 14 15
51 86 10 97
  • Wednesday (Feb. 28)
Try the Facenet, which is also based on Ecluidean distance of "simiilar" and "different",
FaceNet
Results:
group 1 0.5516739446870451
group 2 0.4714285714285714
group 3 0.6554744525547446
group 4 0.5336700336700336
0.553061750585