Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Our researchers at AYLIEN keep abreast of and contribute to the latest developments in the field of Machine Learning. Recently, two of our research scientists, John Glover and Sebastian Ruder, attended NIPS 2016 in Barcelona, Spain. In this post, Sebastian highlights some of the stand-out papers and trends from the conference.


The Conference on Neural Information Processing Systems (NIPS) is one of the two top conferences in machine learning. It took place for the first time in 1987 and is held every December, historically in close proximity to a ski resort. This year, it took place in sunny Barcelona. The conference (including tutorials and workshops) went on from Monday, December 5 to Saturday, December 10. The full conference program is available here.

Machine Learning seems to become more pervasive every month. However, it is still sometimes hard to keep track of the actual extent of this development. One of the most accurate barometers for this evolution is the growth of NIPS itself. The number of attendees skyrocketed at this year’s conference growing by over 50% year-over-year.


Image 1: The growth of the number of attendees at NIPS follows (the newly coined) Terry’s Law (named after Terrence Sejnowski, the president of the NIPS foundation; faster growth than Moore’s Law)

Unsurprisingly, Deep Learning (DL) was by far the most popular research topic, with about every fourth of more than 2,500 submitted papers (and 568 accepted papers) dealing with deep neural networks.


Image 2: Distribution of topics across all submitted papers (Source: The review process for NIPS 2016)

On the other hand, the distribution of research paper topics has quite a long tail and reflects the diversity of topics at the conference that span everything from theory to applications, from robotics to neuroscience, and from healthcare to self-driving cars.

Generative Adversarial Networks

One of the hottest developments within Deep Learning was Generative Adversarial Networks (GANs). The minimax game playing networks have by now won the favor of many luminaries in the field. Yann LeCun hails them as the most exciting development in ML in recent years. The organizers and attendees of NIPS seem to side with him: NIPS featured a tutorial by Ian Goodfellow about his brainchild, which led to a packed main conference hall.


Image 3: A full conference hall at the GAN tutorial

Though a fairly recent development, there are many cool extensions of GANs among the conference papers:

  • Reed et al. propose a model that allows you to specify not only what you want to draw (e.g. a bird) but also where to put it in an image.
  • Chen et al. disentangle factors of variation in GANs by representing them with latent codes. The resulting models allow you to adjust e.g. the type of a digit, its breadth and width, etc.

In spite of their popularity, we know alarmingly little about what makes GANs so capable of generating realistic-looking images. In addition, making them work in practice is an arduous endeavour and a lot of (undocumented) hacks are necessary to achieve the best performance. Soumith Chintala presents a collection of these hacks in his “How to train your GAN” talk at the Adversarial Training workshop.


Image 4: How to train your GAN (Source: Soumith Chintala)

Yann LeCun muses in his keynote that the development of GANs parallels the history of neural networks themselves: They were poorly understood and hard to get to work in the beginning and only took off once researchers figured out the right tricks and learned how to make them work. At this point, it seems unlikely that GANs will experience a winter anytime soon; the research community is still at the beginning in learning how to make the best use of them and it will be exciting to see what progress we can make in the coming years.

On the other hand, the success of GANs so far has been limited mostly to Computer Vision due to their difficulty in modelling discrete rather than continuous data. The Adversarial Training workshop showcased some promising work in this direction (see e.g. our own John Glover’s paper on modeling documents, this paper and this paper on generating text, and this paper on adversarial evaluation of dialogue models). It remains to be seen if 2017 will be the year in which GANs break through in NLP.

The Nuts and Bolts of Machine Learning

Andrew Ng gave one of the best tutorials of the conference with his take on building AI applications using Deep Learning. Drawing from his experience of managing the 1,300 people AI team at Baidu and hundreds of applied AI projects and equipped solely with two whiteboards, he shared many insights about how to build and deploy AI applications in production.

Besides better hardware, Ng attributes the success of Deep Learning to two factors: In contrast to traditional methods, deep NNs are able to learn more effectively from large amounts of data. Secondly, end-to-end (supervised) Deep Learning allows us to learn to map from inputs directly to outputs.

While this approach to training chatbots or self-driving cars is sufficient to write innovative research papers, Ng emphasized end-to-end DL is often not production-ready: A chatbot that maps from text directly to a response is not able to have a coherent conversation or fulfill a request, while mapping from an image directly to a steering command might have literally fatal side effects if the model has not encountered the corresponding part of the input space before. Rather, for a production model, we still want to have intermediate steps: For a chatbot, we prefer to have an inference engine that generates a response, while in a self-driving car, DL is used to identify obstacles, while the steering is performed by a traditional planning algorithm.


Image 5: Andrew Ng on end-to-end DL (right: end-to-end DL chatbot and chatbot with inference engine; left bottom: end-to-end DL self-driving car and self-driving car with intermediate steps)

Ng also shared that the most common mistakes he sees in project teams is that they track the wrong metrics: In an applied machine learning project, the only relevant metrics are the training error, the development error, and the test error. These metrics alone enable the project team to know what steps to take, as he demonstrated in the diagram below:


Image 6: Andrew Ng’s flowchart for applied ML projects

A key facilitator of the recent success of ML have been the advances in hardware that allowed faster computation and storage. Given that Moore’s Law will reach its limits sooner or later, one might reason that also the rise of ML might plateau. Ng, however, argued that the commitment by leading hardware manufacturers such as NVIDIA and Intel and the ensuing performance improvements to ML hardware would fuel further growth.

Among ML research areas, supervised learning is the undisputed driver of the recent success of ML and will likely continue to drive it for the foreseeable future. In second place, Ng saw neither unsupervised learning nor reinforcement learning, but transfer learning. We at AYLIEN are bullish on transfer learning for NLP and think that it has massive potential.

Recurrent Neural Networks

The conference also featured a symposium dedicated to Recurrent Neural Networks (RNNs). The symposium coincided with the 20 year anniversary of LSTM…


Image 7: Jürgen Schmidhuber kicking off the RNN symposium

… being rejected from NIPS 1996. The fact that papers that do not use LSTMs have been rare in the most recent NLP conferences (see our EMNLP blog post) is a testament to the perseverance of the authors of the original paper, Sepp Hochreiter and Jürgen Schmidhuber.

At NIPS, we had several papers that sought to improve RNNs in different ways:

Other improvements apply to Deep Learning in general:

  • Salimans and Kingma propose Weight Normalisation to accelerate training that can be applied in two lines of Python code.
  • Li et al. propose a multinomial variant of dropout that sets neurons to zero depending on the data distribution.

The Neural Abstract Machines & Program Induction (NAMPI) workshop also featured several speakers talking about RNNs:

  • Alex Graves focused on his recent work on Adaptive Computation Time (ACT) for RNNs that allows to decouple the processing time from the sequence length. He showed that a word-level language model with ACT could reach state-of-the-art with fewer computations.
  • Edward Grefenstette outlined several limitations and potential future research directions in the context of RNNs in his talk.

Improving classic algorithms

While Deep Learning is a fairly recent development, the conference featured also several improvements to algorithms that have been around for decades:

  • Ge et al. show in their best paper that the non-convex objective for matrix completion has no spurious local minima, i.e. every local minimum is a global minimum.
  • Bachem et al. present a method that guarantees accurate and fast seedings for large-scale k-means++ clustering. The presentation was one of the most polished ones of the conference and the code is open-source and can be installed via pip.
  • Ashtiani et al. show that we can make NP-hard k-means clustering problems solvable by allowing the model to pose queries for a few examples to a domain expert.

Reinforcement Learning

Reinforcement Learning (RL) was another much-discussed topic at NIPS with an excellent tutorial by Pieter Abbeel and John Schulman dedicated to RL. John Schulman also gave some practical advice for getting started with RL.

One of the best papers of the conference introduces Value Iteration Networks, which learn to plan by providing a differentiable approximation to a classic planning algorithm via a CNN. This paper was another cool example of one of the major benefits of deep neural networks: They allow us to learn increasingly complex behaviour as long as we can represent it in a differentiable way.

During the week of the conference, several research environments for RL were simultaneously released, among them OpenAI’s Universe, Deep Mind Lab, and FAIR’s Torchcraft. These will likely be a key driver in future RL research and should open up new research opportunities.

Learning-to-learn / Meta-learning

Another topic that came up in several discussions over the course of the conference was Learning-to-learn or Meta-learning:

  • Andrychowicz et al. learn an optimizer in a paper with the ingenious title “Learning to learn by gradient descent by gradient descent”.
  • Vinyals et al. learn how to one shot-learn in a paper that frames one-shot learning in the sequence-to-sequence framework and has inspired new approaches for one-shot learning.

Most of the existing papers on meta-learning demonstrate that wherever you are doing something that gives you gradients, you can optimize them using another algorithm via gradient descent. Prepare for a surge of “Meta-learning for X” and “(Meta-)+learning” papers in 2017. It’s LSTMs all the way down!

Meta-learning was also one of the key talking points at the RNN symposium. Jürgen Schmidhuber argued that a true meta-learner would be able to learn in the space of all programs and would have the ability to modify itself and elaborated on these ideas at his talk at the NAMPI workshop. Ilya Sutskever remarked that we currently have no good meta-learning models. However, there is hope as the plethora of new research environments should also bring progress in this area.

General Artificial Intelligence

Learning how to learn also plays a role in the pursuit of the elusive goal of attaining General Artificial Intelligence, which was a topic in several keynotes. Yann LeCun argued that in order to achieve General AI, machines need to learn common sense. While common sense is often vaguely mentioned in research papers, Yann LeCun gave a succinct explanation of what common sense is: “Predicting any part of the past, present or future percepts from whatever information is available.” He called this predictive learning, but notes that this is really unsupervised learning.

His talk also marked the appearance of a controversial and often tongue-in-cheek copied image of a cake, which he used to demonstrate that unsupervised learning is the most challenging task where we should concentrate our efforts, while RL is only the cherry on the icing of the cake.


Image 8: The Cake slide of Yann LeCun’s keynote

Drew Purves focused on the bilateral relationship between the environment and AI in what was probably the most aesthetically pleasing keynote of the conference (just look at those graphics!)


Image 9: Graphics by Max Cant of Drew Purves’ keynote (Source: Drew Purves)

He emphasized that while simulations of ecological tasks in naturalistic environments could be an important test bed for General AI, General AI is needed to maintain the biosphere in a state that will allow the continued existence of our civilization.


Image 10: Nature needs AI and AI needs Nature from Drew Purves’ keynote

While it is frequently — and incorrectly — claimed that neural networks work so well because they emulate the brain’s behaviour, Saket Navlakha argued during his keynote that we can still learn a great deal from the engineering principles of the brain. For instance, rather than pre-allocating a large number of neurons, the brain generates 1000s of synapses per minutes until its second year. Afterwards, until adolescence, the number of synapses is pruned and decreases by ~50%.


Image 11: Saket Navlakha’s keynote

It will be interesting to see how neuroscience can help us to advance our field further.

In the context of the Machine Intelligence workshop, another environment was introduced in the form of FAIR’s CommAI-env that allows to train agents through interaction with a teacher. During the panel discussion, the ability to learn hierarchical representations and to identify patterns was emphasized. However, although the field is making rapid progress on standard tasks such as object recognition, it is unclear if the focus on such specific tasks brings us indeed closer to General AI.

Natural Language Processing

While NLP is more of a niche topic at NIPS, there were a few papers with improvements relevant to NLP:

  • He et al. propose a dual learning framework for MT that has two agents translating in opposite directions teaching each other via reinforcement learning.
  • Sokolov et al. explore how to use structured prediction under bandit feedback.
  • Huang et al. extend Word Mover’s Distance, an unsupervised document similarity metric to the supervised setting.
  • Lee et al. model the helpfulness of reviews by taking into account position and presentation biases.

Finally, a workshop on learning methods for dialogue explored how end-to-end systems, linguistics and ML methods can be used to create dialogue agents.



Jürgen Schmidhuber, the father of the LSTM was not only present on several panels, but did his best to remind everyone that whatever your idea, he had had a similar idea two decades ago and you should better cite him lest he interrupt your tutorial.



Boston Robotics’ Spot proved that — even though everyone is excited by learning and learning-to-learn — traditional planning algorithms are enough to win the admiration of a hall full of learning enthusiasts.


Image 12: Boston Robotics’ Spot amid a crowd of fascinated onlookers


Apple, one of the most secretive companies in the world, has decided to be more open, to publish, and to engage with academia. This can only be good for the community. We’re looking forward to more apple research papers.


Image 13: Ruslan Salakhutdinov at the Apple lunch event


Uber announced their acquisition of Cambridge-based AI startup Geometric Intelligence and threw one of the most popular parties of NIPS.


Image 14: The Geometric Intelligence logo

Rocket AI

Talking about startups, the “launch” of Rocket AI and their patented Temporally Recurrent Optimal Learning had some people fooled (note the acronyms in the below tweets). Riva-Melissa Tez finally cleared up the confusion.


These were our impressions from NIPS 2016. We had a blast and hope to be back in 2017!


Text Analysis API - Sign up





Deep Learning is a new area of Machine Learning research that has been gaining significant media interest owing to the role it is playing in artificial intelligence applications like image recognition, self-driving cars and most recently the AlphaGo vs. Lee Sedol matches. Recently, Deep Learning techniques have become popular in solving traditional Natural Language Processing problems like Sentiment Analysis.

For those of you that are new to the topic of Deep Learning, we have put together a list of ten common terms and concepts explained in simple English, which will hopefully make them a bit easier to understand. We’ve done the same in the past for Machine Learning and NLP terms, which you might also find interesting.


In the human brain, a neuron is a cell that processes and transmits information. A perceptron can be considered as a super-simplified version of a biological neuron.

A perceptron will take several inputs and weigh them up to produce a single output. Each input is weighted according to its importance in the output decision.

Artificial Neural Networks

Artificial Neural Networks (ANN) are models influenced by biological neural networks such as the central nervous systems of living creatures and most distinctly, the brain.

ANN’s are processing devices, such as algorithms or physical hardware, and are loosely modeled on the cerebral cortex of mammals, albeit on a considerably smaller scale.

Let’s call them a simplified computational model of the human brain.


A neural network learns by training, using an algorithm called backpropagation. To train a neural network it is first given an input which produces an output. The first step is to teach the neural network what the correct, or ideal, output should have been for that input. The ANN can then take this ideal output and begin adapting the weights to yield an enhanced, more precise output (based on how much they contributed to the overall prediction) the next time it receives a similar input.

This process is repeated many many times until the margin of error between the input and the ideal output is considered acceptable.

Convolutional Neural Networks

A convolutional neural network (CNN) can be considered as a neural network that utilizes numerous identical replicas of the same neuron. The benefit of this is that it enables a network to learn a neuron once and use it in numerous places, simplifying the model learning process and thus reducing error. This has made CNNs particularly useful in the area of object recognition and image tagging.

CNNs learn more and more abstract representations of the input with each convolution. In the case of object recognition, a CNN might start with raw pixel data, then learn highly discriminative features such as edges, followed by basic shapes, complex shapes, patterns and textures.





Recurrent Neural Network

Recurrent Neural Networks (RNN) make use of sequential information. Unlike traditional neural networks, where it is assumed that all inputs and outputs are independent of one another, RNNs are reliant on preceding computations and what has previously been calculated. RNNs can be conceptualized as a neural network unrolled over time. Where you would have different layers in a regular neural network, you apply the same layer to the input at each timestep in an RNN, using the output, i.e. the state of the previous timestep as input. Connections between entities in a RNN form a directed cycle, creating a sort of internal memory, that helps the model leverage long chains of dependencies.

Recursive Neural Network

A Recursive Neural Network is a generalization of a Recurrent Neural Network and is generated by applying a fixed and consistent set of weights repetitively, or recursively, over the structure. Recursive Neural Networks take the form of a tree, while Recurrent is a chain. Recursive Neural Nets have been utilized in Natural Language Processing for tasks such as Sentiment Analysis.





Supervised Neural Network

For a supervised neural network to produce an ideal output, it must have been previously given this output. It is ‘trained’ on a pre-defined dataset and based on this dataset, can produce  accurate outputs depending on the input it has received. You could therefore say that it has been supervised in its learning, having for example been given both the question and the ideal answer.

Unsupervised Neural Network

This involves providing a programme or machine with an unlabeled data set that it has not been previously trained for, with the goal of automatically discovering patterns and trends through clustering.

Gradient Descent

Gradient Descent is an algorithm used to find the local minimum of a function. By initially guessing the solution and using the function gradient at that point, we guide the solution in the negative direction of the gradient and repeat this technique until the algorithm eventually converges at the point where the gradient is zero – local minimum. We essentially descend the error surface until we arrive at a valley.

Word Embedding

Similar to the way a painting might be a representation of a person, a word embedding is a representation of a word, using real-valued numbers. Word embeddings can be trained and used to derive similarities between both other words, and other relations. They are an arrangement of numbers representing the semantic and syntactic information of words in a format that computers can understand.

Word vectors created through this process manifest interesting characteristics that almost look and sound like magic at first. For instance, if we subtract the vector of Man from the vector of King, the result will be almost equal to the vector resulting from subtracting Woman from Queen. Even more surprisingly, the result of subtracting Run from Running almost equates to that of Seeing minus See. These examples show that the model has not only learnt the meaning and the semantics of these words, but also the syntax and the grammar to some degree.



So there you have it – some pretty technical deep learning terms explained in simple english. We hope this helps you get your head around some of the tricky terms you might come across as you begin to explore deep learning.



News API - Sign up