If you use ROUGE Evaluation metric for text summarization systems or machine translation systems, you must have noticed that there are many versions of them. So how to get it work with your own systems with Python? What packages are helpful? In this post, I will give some ideas based on engineering’s view (which means I am not going to introduce what is ROUGE). I also suffered from few issues and finally got them solved. My methods might not be the best ways but they worked.