RNN, LSTM in TensorFlow for NLP in Python

We covered RNN for MNIST data, and it is actually even more suitable for NLP projects. You can find more details on Valentino Zocca, Gianmario Spacagna, Daniel Slater’s book Python Deep Learning. from __future__ import print_function, division # -*- coding: utf-8 -*- ###”War and peace” contains more than 500,000 words, making it the perfect ###candidate […]

Continue reading


Deep Learning Paper Sparks Online Feud!

Feature image is created by Jannoon028 – Freepik.com Researchers Yoav Goldberg and Yann Lecun face off on Natural Language Processing Social media is humanity’s new intellectual battlefield. Sports fans, social justice warriors, and even the President of the United States tweet, take to discussion boards or make memes to mock, preach, thrust and parry against […]

Continue reading


Word2vec: Google’s New Leap Forward on the Vectorized Representation of Words

Introduction Word2vec is an open source tool developed by a group of Google researchers led by Tomas Mikolov in 2013. It describes several efficient ways to represent words as M-dimensional real vectors, also known as word embedding, which is of great importance in many natural language processing applications. Word2vec also expresses the quality of the […]

Continue reading


Preprocess in Python-Scale

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###Preprocess methods ###The StandardScaler in scikit-learn ensures that for each feature, the mean is zero, ###and the variance is one, bringing all features to the same magnitude. However, ###this scaling does not ensure […]

Continue reading


Language understanding remains one of AI’s grand challenges

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: David Ferrucci on the evolution of AI systems for language understanding. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In […]

Continue reading


Mobvoi and Chinese Academy of Sciences’ Institute of Automation join forces and creates the Joint Laboratory of Language Intelligence and Human Machine Interaction

On March 29, The “Joint Laboratory of Language Intelligence and Human Machine Interaction” opened its doors in Beijing. This lab is the result of a collaboration between Chinese AI company Mobvoi and the NLP and Machine Translation research teams from the National Laboratory of Pattern Recognition, at the Chinese Academy of Sciences’ Institute of Automation […]

Continue reading


Natural language analysis using Hierarchical Temporal Memory

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Francisco Webber on building HTM-based enterprise applications. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the […]

Continue reading


A Practical Guide to Neural Machine Translation

Video link: https://www.microsoft.com/en-us/research/video/practical-guide-neural-machine-translation-2/ Intro: This talk focuses on engineering techniques for large-scale NMT systems, and will help us understand how GPU works, and how GPU computing makes machine translation faster. For example, how did engineers reduce the training time for a state-of-the-art NMT system from 15 days to 3 days on a single GPU? Summary […]

Continue reading


A Quick Sentiment Analysis Example with Tidy Text Package in R

Find the data here: https://charleshsliao.wordpress.com/2017/03/03/a-sms-spam-test-with-naive-bayes-in-r-with-text-processing/ If we want to, we can explore the sentiment of Ham and Spam messages separately. I chose not to filter like this. </pre> rawtext<-read.csv(“HamorSpam.csv”,header=F,sep=”,”,stringsAsFactors = F) str(rawtext) ## ‘data.frame’:    5572 obs. of  2 variables: ##  $ V1: chr  “ham” “ham” “spam” “ham” … ##  $ V2: chr  “Go until jurong point, […]

Continue reading