More resources! We have almost doubled the number of manually collected resources since our previous release, now totaling over 13,000.
Category Archives: Natural Language Processing
Slideshare (6): Cross-lingual Paper reading notes
During this summer, I did a project on cross-lingual NLP tasks. Recently I was working my notes and I organized them into a better format. I would like to share some of the notes with the readers who might be interested in this topic. Cross_lingual_NLP(PDF) Papers covered: A Robust Abstractive System for Cross-Lingual Summarization MASS:Continue reading “Slideshare (6): Cross-lingual Paper reading notes”
Deep Learning 17: text classification with BERT using PyTorch
Why BERT If you are a big fun of PyTorch and NLP, you must try to use the PyTorch based BERT implementation! If you have your own dataset and want to try the state-of-the-art model, BERT is a good choice. Please check the code from https://github.com/huggingface/pytorch-pretrained-BERT to get a close look. However, in this post,Continue reading “Deep Learning 17: text classification with BERT using PyTorch”
Resources for BioNLP: datasets and tools
Corpora for general medical texts Open Research Corpus Over 39 million published research papers in Computer Science, Neuroscience, and Biomedical. Full dataset 36G, not restricted.
Working with ROUGE 1.5.5 Evaluation Metric in Python
If you use ROUGE Evaluation metric for text summarization systems or machine translation systems, you must have noticed that there are many versions of them. So how to get it work with your own systems with Python? What packages are helpful? In this post, I will give some ideas based on engineering’s view (which meansContinue reading “Working with ROUGE 1.5.5 Evaluation Metric in Python”
NLP 05: From Word2vec to Doc2vec: a simple example with Gensim
Introduction First introduced by Mikolov 1 in 2013, the word2vec is to learn distributed representations (word embeddings) when applying neural network. It is based on the distributed hypothesis that words occur in similar contexts (neighboring words) tend to have similar meanings. Two models here: cbow ( continuous bag of words) where we use aContinue reading “NLP 05: From Word2vec to Doc2vec: a simple example with Gensim”
NLP 04: Log-Linear Models for Tagging Task (Python)
We will focus on POS tagging in this blog. Notations While HMM gives us a joint probability on tags and words: . Tags t and words w are one-to-one mapping, so in the series, they share the same length.
NLP 03: Finding Mr. Alignment, IBM Translation Model 1
It is somehow a little bit fast to start MT. Anyway, this blog is very superficial, giving you a view on basics, along with an implementation but a bad result…which gives you more chances to optimize. Btw, you might learn some Chinese here 😛
NLP 02: A Trigram Hidden Markov Model (Python)
After HMMs, let’s work on a Trigram HMM directly on texts.First will introduce the model, then pieces of code for practicing. But not going to give a full solution as the course is still going every year, find out more in references.
NLP 01: Language Modeling Problems
Lecture notes from Natural Language Processing (by Michael Collins)