During this summer, I did a project on cross-lingual NLP tasks. Recently I was working my notes and I organized them into a better format. I would like to share some of the notes with the readers who might be interested in this topic.
For the past few years after my Master’s, I did many jobs, long term, short term, internship, or full-time. I also had too many interviews, some of them I failed. Together with my friends, we had collected many materials, including basic algorithms, popular questions, basic machine learning knowledge, and deep learning knowledge. Then I organized them as one huge PDF (150+ pages).
A very brief outline:
- Data structure + popular questions
- Machine Learning
- SoftDev interview questions
The material covers some screenshots from other people’s lectures and books. [Some slide pages are not in English! I am too lazy to translate them..]
I went through this PDF each time before there is an interview, in the case to answer questions like “what is knn”. I hope you may find the material useful. Download link:
Recently, I am working on a new version by adding more deep learning basics.
New items need to be updated: Merge sort; Sorting code in Python; Boyer-Moore Vote Algorithm.
In this short post, I will share a very brief GAN (Generative Adversarial Network) model and in practice, how do we train it using PyTorch. Also, I will include some tips about training as I myself found it is hard to train, especially when working with my own data and model.
Training GAN models
I wrote a blog about how to understand GAN models before, check it out. You can also find PyTorch official tutorial here . We will be focusing on the official tutorial and I will try to provide my understanding and tips of the main steps. Continue reading “Deep Learning 18: GANs with PyTorch”
If you are a big fun of PyTorch and NLP, you must try to use the PyTorch based BERT implementation! If you have your own dataset and want to try the state-of-the-art model, BERT is a good choice.
Please check the code from https://github.com/huggingface/pytorch-pretrained-BERT to get a close look. However, in this post, I will help you to apply pre-trained BERT model on your own data to do classification. Continue reading “Deep Learning 17: text classification with BERT using PyTorch”
My post about Auto-encoder.
For Variational Auto-Encoders (VAE) (from paper Auto-Encoding Variational Bayes), we actually add latent variables to the existing Autoencoders. The main idea is, we want to restrict the parameters from a known distribution. Why we want this? We wish the generative model to provide more “creative” things. If the model only sees the trained samples, it will eventually lose the ability to “create” more! So we add some “noises” to the parameters by forcing the parameters to adapt to a known distribution.
Continue reading “Understanding Variational Graph Auto-Encoders”
Corpora for general medical texts
Over 39 million published research papers in Computer Science, Neuroscience, and Biomedical.
Full dataset 36G, not restricted.
Continue reading “Resources for BioNLP: datasets and tools”