This Is What Sets Keras Apart From Other Libraries

What are some important engineering and design decisions you made in creating Keras? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by François Chollet, Deep learning researcher at Google, author of Keras, on Quora:

The most important decision was that Keras was going to be a self-contained framework for deep learning. That is to say, you can use Keras to solve problems end-to-end without ever having to interact with the underlying backend engine, Theano, or TensorFlow. Keras was initially built on top of Theano, but because it abstracts it away completely, it was easy to add TensorFlow as a backend shortly after the initial TensorFlow release. And in the future, we will be able to extend Keras to support next-generation computation graph engines when they come along. Other libraries like Lasagne chose to be a toolbox of utilities to work with Theano instead of wrapping it, requiring users to have extensive knowledge of Theano, and tying their success to the popularity of Theano. These libraries are all but dead now.

A consequence of this decision is that Keras has its own graph datastructure for handling computational graphs, rather than relying on the native graph datastructure from TensorFlow or Theano. As a result, Keras can do offline shape inference in Theano (a much needed yet missing feature in Theano), and can do easy model sharing or model copying. For instance, when you call a Keras model on a new input (`y = model(x)`), Keras is reapplying all operations contained in the graph underlying your model, which is made possible by the fact that Keras manages that graph independently of TensorFlow/Theano. In fact, it's even possible to: 1) define a Keras model with the Theano backend, 2) switch to the TensorFlow backend, and 3) re-apply your (Theano-built) Keras model on a TensorFlow input, this creating a TF version of what was initially a Theano model (note that in practice we don't allow to switch backends in the middle of a session, since that would be quite unsafe --the user may mix up TF and Theano tensors-- but it's possible to do it manually if you are familiar with Keras internals).

Another important decision was to use an object-oriented design. Deep learning models can be understood as chains of functions, thus making a functional approach look potentially interesting. However these functions are heavily parametrized, mostly by their weight tensors, and manipulating these parameters in a functional way would just be impractical. So in Keras, everything is an object: layers, models, optimizers, etc. All parameters of a model can be accessed as object properties: e.g. `model.layers[3].output` is the output tensor of the 3rd layer in the model, `model.layers[3].weights` is the list of symbolic weight tensors of the layer, and so on.

The functional approach would have implied layers as functions which would create weights when being called, and would store them in global name-indexed collections (this is the approach taken by TensorFlow-Slim, for instance). This means that many operations (model loading, accessing an existing weight tensor) must be done by name-matching, so you need to give carefully names to every tensor you create rather than relying on auto-generated names. And of course there is a constant risk of name collision, which typically prevents from being able to manipulating multiple independent models in a single session. To me, this looks a lot like an anti-pattern. The object-oriented approach is cleaner and scales better.

This question originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.

More questions:​