Handwriting Recognition & Machine Learning; Minyue Dai
Contents
Introduction
This is a project working on applying Deep Learning Models in Syriac Research, continuing Minyue Dai's project .
All Python scripts are in H:\Spring2017. All data files are in C:\Syriac.
Week 1 (Jan. 29 -- Feb. 04)
- Monday (Jan. 29)
- Try to compress the latent vector data we have
- TSNE: It runs to slow and I fail to find a GPU version of it.
- PCA: It performs not so well
- Try to compress the latent vector data we have
- Wednesday (Jan. 31)
- Try to apply KNN (K-nearest neighbor) on the SVM test I have done in fall semester to see exactly what happens.
- KNN\KNN.py: Script that runs KNN classification.
- KNN\knnneightbors.py: Script that calculate neighbors.
- It seems when n=3 the classifier performs the best, but because we want to see what happens, the script calculates 10 nearest neighbors.
- Maybe use Softmax function to convert distance(KNN) to probability so that we can calculate an exact year for the samples.
- Try to apply KNN (K-nearest neighbor) on the SVM test I have done in fall semester to see exactly what happens.
- Thursday (Feb. 01)
- Use softmax function to convert 10 nn distance into probability and then compute the date.
- KNN\knnneightbors.py: Add another function to compute probability.
- Use softmax function to convert 10 nn distance into probability and then compute the date.
- Friday (Feb. 02)
- Talk with professor and decide to use KNN and softmax to transform distance into probability and then date manuscript
- Date\manu_date.py: Convert date information in .xls form into numpy array
- Talk with professor and decide to use KNN and softmax to transform distance into probability and then date manuscript
- Sunday (Feb. 04)
- Continue working on KNN Dating
- choose_manu.py: Choose train and test data's index for dating test. (36000 for train 7000 for test)
- Continue working on KNN Dating
Week 2 (Feb. 05 -- Feb. 11)
- Monday (Feb. 05)
- Evaluate some special properties of different code
- KNN_date.py: Date test letters based on 10 NN.
- ALIEncoder's letter clustered together
- Evaluate some special properties of different code
- Thursday (Feb. 08)
- Try to date letters based on center of 1 group of images (same manuscript same label)
- KNN_center.py: Prepare data for dating (Compute the center)
- KNN_centerdate.py: Date manuscript based on center.
- Try to date letters based on center of 1 group of images (same manuscript same label)
Week 4 (Feb. 26 -- Mar. 04)
- Monday (Feb. 26)
- Generate data that defines "same" manuscript as manuscript within 50 years
- Groups of same author
- 31 33 7
- 74 17
- 18 21 (different letters chosen)
- 12 13 14 15
- 51 86 10 97
- Wednesday (Feb. 28)
- Try the Facenet, which is also based on Ecluidean distance of "simiilar" and "different",
- FaceNet
- Results:
- group 1 0.5516739446870451
- group 2 0.4714285714285714
- group 3 0.6554744525547446
- group 4 0.5336700336700336
- 0.553061750585
- Try the Facenet, which is also based on Ecluidean distance of "simiilar" and "different",