Machine learning (ML) is a subfield within artificial intelligence (AI) and is tied to mathematics and computational statistics focusing on prediction, classification, clustering, anomaly detection, and others. Deep learning (DL), a subfield of ML, is currently achieving some of the most impressive results across industries and use cases. Examples include computer vision, speech recognition, language translation, robotics, automated drug discovery, cybersecurity anomaly detection and so many more. DL leverages various Artificial Neural Network (ANN) architectures which are types of connectionist structures or systems inspired by biological brain physiology. Acronyms abound such as DNN, DBN, CNN, RNN, BMN, TCN, etc., each with its own pros and cons depending upon the objective(s), data, computing power, ML engineer preferences as well as numerous other factors. Interestingly, the concepts of deep learning via neural networks can be tracked back to the 1950’s.

Deep learning essentially boils down to a hierarchical, or non-linear approach to machine learning by using ANNs with an input layer, output layer, and any number (at least one) of hidden layers in between, each comprised of nodes or artificial neurons. The layered aspect makes them “deep” and inherently more complex. There is quite a bit more to this story with regard to node activation or transfer functions, weights, hyperparameter tuning, etc., and those details will be covered in a later post. Ultimately, many thought leaders, practitioners, academicians, computer scientists and others believe DL is our best chance of achieving true Artificial Intelligence (Strong AI). In part due to this belief, there exist some incredibly powerful open-source libraries (e.g. TensorFlow, Theano, Keras) and platforms which allow for deep learning models to be created quite easily, some requiring little or even no programming expertise whatsoever.

The overarching premise is that ANNs can become as deep and complex as is necessary to achieve the desired output(s) / outcome(s). Of course, there are many trade-offs which must be weighed such as training time, overfitting tendency, model accuracy, computing power, etc., but the possibilities are quite limitless. The truly interesting item of note is that larger networks tend to perform better as they grow and are fed more and more data, whereas traditional ML techniques often reach a point of diminishing returns. Moreover, deep learning diverts from traditional ML not just from the multilayered neural network architectures, but also in its uncanny ability to learn without hardcoded rules and/or expressly embedded human domain expertise. These important distinctions make deep learning much more inherently flexible and pseudo-dynamic in nature, truly echoing the distant perception of actual non-biological intelligence.

The real key to DL success is the availability of a staggering amount of relevant data available for training, testing, tweaking and monitoring. The data can be quite diverse and, depending upon the algorithms used, labeled or unlabeled (and therefore supervised versus unsupervised). The latter is particularly important given that the majority of data floating around our world today comes in the unlabeled form. To be clear, the importance of having access to sufficient computing power cannot be overstated and introduces the discussion of central processing unit (CPU) versus graphics processing unit (GPU) which will be tabled for a future installment. That aside, assuming the data is of sufficient quality and encapsulates some semblance of the features believed necessary to facilitate learning, size really does matter in this case. 

It is important to mention that one of the biggest challenges with leveraging deep learning is the “black box” predicament whereby the steps and rationale behind the output(s) / outcome(s) are not well understood. This significant issue around lack of transparency has even been approached by the likes of The Defense Advanced Research Projects Agency (DARPA) within the Defense Department. Without question, we humans tend to feel much more comfortable when we are able to understand how decisions, actions, etc., are made and/or why they happen. This is especially true when we begin to talk about potential military and societal applications of deep learning.