2018-06-29
你可能听说关于神经网络和深度学习的各种流言,也希望学得更多。但当你从书本上学习技术,许多人可能完全被其中数学模型和各种公式吓到了。我也一样。
对我这样的人来说,最棒的工具是可以帮助你抓住神经网络的思想而不需要任何难懂的数学。TensorFlow PlayGround,一个用JavaScript写的在线应用,可以让你在浏览器中玩真正的神经网络,仅仅需要你点按钮、微调参数来观察它是如何工作的。
在这篇文章,我想给大家展示如何使用PlayGround来理解神经网络的核心思想。那时你将明白为什么对未来技术是如此兴奋。
计算机编程需要程序员,人类通过编写很多行的精确到每一步的代码来指导计算机解决问题。但用机器学习和神经网络,你可以让计算机自己来尝试解决问题。神经网络是一个从给定的训练数据集作中学习预期输出的功能。
神经网络是一个从给定的训练数据集作中学习预期输出的功能
举例说明,构建一个识别猫图像的神经网络,你可以用很多猫的图像来训练网络。训练后的神经网络就有输入一个猫图像就会输出“猫”标签的功能。或者输入更多的实践样本—从游戏服务器输入一大堆用户的活动记录就可以训练输出哪些用户有高的转化率。
这是咋工作的呢?让我们看一个简单的分类器问题,假设你有下面的数据集,每个数据与2个值:x1(水平坐标)和x2(垂直坐标)。这里有2组数据点,橙色组和蓝色组。
你要写怎样的代码来归类数据点是橙还是蓝?也许你可以在两组数据中间随意画一个对角线,就像下图一样,来定义一个阈值来判断点属于哪一组数据。
IF语句中的条件可能是:
这里b就是决定那条线的阈值。如果把w1、w2作为x1、x2的相对权重,可以让代码的复用性更高。
再进一步,如果你微调w1和w2,你可以随意旋转线的角度。你也可以微调b的值来移动线的位置。所以,你可以在任何一个使用直线分类的情况下,复用这些条件。
但问题是,编程人员需要找到w1、w2、b的最佳值---所谓的参数,并指导计算机如何来分类这些点。
现在,让我们看PlayGround是如何解决这个特定问题的。在PlayGround按左上的Play按钮,那条在蓝橙中间的线开始慢慢移动。按reset按钮并多玩几次来观察线在不同的初始值下是如何一定的。你看到的就是计算机试图找到最佳的权重和阈值来画两组中间的值。

TensorFlow Playground上的简单分类问题
TensorFlow Playground 正是用单一人工神经元来完成分类的。神马是人工神经元?这是受人类大脑中的生理神经元的行为启发而来的想法。
如果需要详细的关于生理神经网络机制方面的描述,可以看维基页面:每个神经从其他相连接神经元接收到电信号会变得兴奋(活动),神经元之间的连接有不同的强度,有些连接可以强大到激活其他神经元,也有一些连接压制激活。我们大脑中上千亿的神经元和连接协力呈现出了人类的智慧。
通过对生理神经元的研究启发创造了新的计算机模式,人工神经网络。在人工神经网络中,我们用简单的数学模仿了生理神经元的行为。为了解决上述的分类问题,可以用下面的简单神经网络,单一的神经元(或者说感知器)为特色。
x1和x2是输入值,w1和w2是展现连接到神经元上的力度。B通称是偏差,决定输入是否需要激活的阈值。单一神经元可以用下面的公式来计算。
对,这就是我们用到的直线分类数据集的公式。实际上,人工神经元唯一做的一件事:通过检测带权重的输入值和偏差,把一个数据点分成两类中的一个。对于2个输入,一个神经元可以用一条直线在二维空间分成2类。如果你有3个输入,一个神经元可以在三维空间中用一个平面将数据点分成2部分,等等。这叫用超平面分割n维空间。

神经元将数据点分为两类
为啥超平面可以解决每天的问题,举个例子,想象你有一大堆的手写文字图像如下图。

手写文字的图像(来自:MNIST For ML Beginners)
你可以训练单个神经网络在一个图像数据集中区分出“8的图像”和“其他图像”。
那要怎么做呢?首先,你需要准备数万个手写图像来训练。假如每个图像有28x28的灰度点,它可以放到一个28x28=784的数组中。如果有55,000个样本图像,那就会有一个784*55000数字的数组。
对每一个来自5万5千个样本的图像,输入784个数字和是不是“8”的训练标签到一个单个神经元。

就像你看到Playground中的演示一样,计算机试图发现一个最优的权重和偏差集来分类每个图像是不是“8”。
通过训练5万5千个样本,神经元会生成如下的权重集,这里蓝色表示正值,红色表示负值。
就是这样,即便是这样一个非常基础的单个神经元,就可以完成手写文字图像90%的精度。如果识别所有0~9的数字,就需要10个神经元,结果是92%的正确率。
再说一遍,这个神经元只能做一件事就是对一个数据点进行二选一,是不是“8”。数据点的定义是啥?在本例中指每个图像中包含的28x28=784个数。用数学的术语,每个图像是784维空间中的一个单点。神经元用单个超平面把这个784维的空间一分为二,并分类每个数据点是不是“8”(的确很难想象这样的多维空间和超平面是神马样子,忘了它吧)。
在上面的这个例子中,我们用了手写图像作为样本数据,实际上,你可以用神经网络来归类各种不同的数据。比如,在线游戏商可以检查玩家的活动日志来判别玩家是否作弊;电商可以通过网络存取日志和交易历史来鉴定优质顾客。或者说,任何数据只要能转换并标识成n维空间中的数据点,你就可以让神经元尝试发现超平面,看看是否高效地归类了你的问题。
你已经看到,神经网络是可以用基础数学来实现 的简单的原理。在传统编程和神经网络中唯一的不同是,再说一遍,就是让计算机通过学习数据集来决定参数(权重和偏差)。或者说,我们例子中训练权重模式不是由人类编程写的。
本文,我将不讨论如何训练参数的算法细节诸如向后传导和梯度下降。计算机试图微微增加或减少每个参数来缩小和训练数据集的错误,期望能发现参数的最优组合,这应该就足够了。
想象计算机是个学生或者学徒工,在刚开始计算机犯了很多错误,过一段时间后它发现解决真实世界中的问题(包括未来可能出现的问题)的方法并最小化错误(也叫泛化)。
我们可能会在今后的文章中重新讨论这个问题。目前神经网络库诸如TensorFlow包装了大部分训练必须的数学,足够你使用,不必担心太多。
我们演示了一个单元的神经元可以完成一个简单的分类。但你可能惊奇如何用这么简单的神经元来构建一个可以识别千万不同的图像并和专业的围棋选手竞技的网络?神马神经网络可以比我们之前描述的聪明许多的理由是啥?让我们看一下PlayGround中另外一个例子。
这个数据集不能用单个神经元分类,两组数字点不能用一条线来分开。这就是非线性分类问题,在真实世界中,如同本例中类似的非线性和复杂的数据集无穷无尽,问题是如何捕获到这类复杂分类的模式呢?
答案就是在输入值和输出神经元中间增加隐藏层。点这试试。
非线性分类问题
到底发生了神马?当你点每一个隐藏层的神经元,你看到它们都是做了简单的单线分类。
第一个神经元检查点是左还是右
第二个神经元检查是否在右上
第三个神经元检查是否在右下
这三个结果叫数据的特征,这些神经元的输出代表他们相应的特征关联强度。
最终,输出层神经元用这些特征来分类数据,如果你用这些特征值构建一个3维空间,最终神经元可以简单地用超平面来区分,这是一个原始数据转化到特征空间的例子。
如果想看更多好看的可视转化的例子,可以看colah的博客。

隐藏层转化到特征空间,使之可以线性分类(来自:可视化展现)
在Playground演示这个例子中,转化产生多特征组合对应到三角形或者矩形区域。如果你点“plus”按钮添加更多的神经元,你将会看到输出神经元从数据集中捕获到更精密的多边形。
回到办公室工作人员的比喻,你可以说转化是抽取真知卓见,有经验的专业人员在每天的工作中有的感觉。一个新雇员对来自email、电话、老板、客户等等的各种随机信号感到迷糊和困扰,但高级雇员就会非常高效地从这些输入中抽取关键信号并依据几条重要的原则做到井然有序。
神经网络工作原理相同,试图在数据集中抽取最重要的特征来解决问题。这就是为什么神经网络有时候可以聪明到足够处理相当复杂的任务。
在每个单一的隐藏层有更多的神经元就意味着可以捕获更多的特征。有更多的隐藏层意味着可以从数据集中抽取更复杂的解释。你可以从下面例子中看到这有多强大。
需要写出怎样的代码才可以分类这个数据集?一打if语句加上多多的条件和阈值,每次检查数据点在哪个小区域?我个人是不想这么做。
这是为啥机器学习和神经网络可以超过人类程序员的工作。点这看其动作。
TensorFlow Playground的双螺旋问题(点这看其动作)
很棒,不是吗?你刚看到计算机试图构建多级抽象的神经网络,第一层神经元做的是相同的简单分类,然而第二层和第三层的神经元从简单的特征中构造出复杂的特征,最终生成双螺旋模式。
更多的神经元 深度网络=更精确地抽象。这就是简单神经元变聪明并出色地完成确定的问题诸如图像识别和下围棋。

谷歌发布的图像识别模型Inception
一些已发表的深度网络实例显示他们训练构建多等级模式识别,从简单的边缘和小块到对象的分块和分类。
在这篇文章,我们看了一些Playground演示和如何解释这种机制和神经网络的能力。就像你看到的,技术的基础都是相当简单的。每个神经元只是将一个数据点分成两类中的一个,但是通过更多的神经元和更深层处理,一个神经网络可以从一个训练数据集中抽取卓见和复杂的模式并构建多层抽象。
那么问题是,为啥不是所有人都用这种伟大的技术呢?当前对神经网络还存在两个大挑战。首先是训练深度网络需要大量的算力,第二是需要巨大的训练数据集。用强大的GPU服务器训练几百万图像的数据集可能需要花费数天甚至是数周。
当然,要获得最好的训练结果,需要不同的网络设计组合和算法,还需要很多的实验和失败。
但在不远的未来,全程可控的分布式训练和预测服务诸如支持TensorFlow的谷歌云机器学习或许可以用大量的云结构的CPU和GPU在一个能支付得起的费用下解决这些问题,也许会对所有人开放巨大的深度神经网络。
非常感谢David Ha、Etsuji Nakai、Christopher Olah和Alexandra Barrett对本文的审查改进以及提出宝贵意见。特别需要感谢TensorFlow Playground的作者Daniel Smilkov、Shan Carter和D. Sculley,他们为这一炫酷平台做出了巨大的贡献。
Tuesday, July 26, 2016
Kaz Sato
You may have heard the buzz about neural networks and deep learning, and want to learn more. But when you learn about the technology from a textbook, many people find themselves overwhelmed by mathematical models and formulas. I certainly was.
For people like me, there's an awesome tool to help you grasp the idea of neural networks without any hard math: TensorFlow Playground, a web app written in JavaScript that lets you play with a real neural network running in your browser and click buttons and tweak parameters to see how it works.
In this article, I'd like to show how you can play with TensorFlow Playground so that you can understand the core ideas behind neural networks. Then you can understand why people have become so excited by the technology as of late.
Computer programming requires a programmer. Humans instruct a computer to solve a problem by specifying each and every step through many lines of code. But with machine learning and neural networks, you can let the computer try to solve the problem itself. A neural network is a function that learns the expected output for a given input from training datasets.
A neural network is a function that learns from training datasets (From: Large-Scale Deep Learning for Intelligent Computer Systems, Jeff Dean, WSDM 2016, adapted from Untangling invariant object recognition, J DiCarlo et D Cox, 2007)
For example, to build a neural network that recognizes images of a cat, you train the network with a lot of sample cat images. The resulting network works as a function that takes a cat image as input and outputs the "cat" label. Or — to take a more practical example — you can train it to input a bunch of user activity logs from gaming servers and output which users have a high probability of conversion.
How does this work? Let's look at a simple classification problem. Imagine you have a dataset such as the one below. Each data point has two values: x1 (the horizontal axis) and x2 (the vertical axis). There are two groups of data points, the orange group and blue group.
How do you write code that classifies whether a data point is orange or blue? Perhaps you draw an arbitrary diagonal line between the two groups like below and define a threshold to determine in which group each data point belongs.
The condition of your IF statement would look like this.
where b is the threshold that determines the position of the line. By putting w1 and w2 as weights on x1 and x2 respectively, you make your code more reusable.
Further, if you tweak the values of w1 and w2, you can rotate the angle of the line as you like. You can also tweak the "b" value to move the line position. So you can reuse this condition for classifying any datasets that can be classified by a single straight line.
But the thing is, the programmer has to find appropriate values for w1, w2 and b — the so-called parameters — and instruct the computer how to classify the data points.
Now, let's look at how the computer behind TensorFlow Playground solves this particular problem. On the Playground, click the Play button in the upper left corner. The line between blue and orange data points begins to move slowly. Hit the reset button and click play again a few times to see how the line moves with different initial values. What you're seeing is the computer trying to find the best combination of weights and threshold to draw the straight line between two groups.

A simple classification problem on TensorFlow Playground.
TensorFlow Playground is using a single artificial neuron for this classification. What's an artificial neuron? It’s an idea inspired by the behavior of biological neurons in the human brain.
For a detailed description about the mechanism of a biological neural network, visit the Wikipedia page: each neuron gets excited (activated) when it receives electrical signals from other connected neurons. Each connection between neurons has different strengths. Some connections are strong enough to activate other neurons whereas some connections suppress activation. Together, the hundreds of billions of neurons and connections in our brain embody human intelligence.
The research into biological neurons led to the creation of a new computing paradigm, the artificial neural network. With artificial neural networks, we mimic the behavior of biological neurons with simple mathematics. To solve the above classification problem, you can use the following simple neural network, which features a single neuron (aka Perceptron).
x1 and x2 are the input values, and w1 and w2 are weights that represent the strength of each connection to the neuron. b is the so-called bias, representing the threshold to determine whether or not a neuron is activated by the inputs. This single neuron can be calculated with the following formula.
Yes, that's exactly the same formula we used for classifying the datasets with a straight line. And actually, that's the only thing an artificial neuron can do: classify a data point into one of two kinds by examining input values with weights and bias. With two inputs, a neuron can classify the data points in two-dimensional space into two kinds with a straight line. If you have three inputs, a neuron can classify data points in three-dimensional space into two parts with a flat plane, and so on. This is called "dividing n-dimensional space with a hyperplane."
A neuron classifies any data point into one of two kinds
How can a "hyperplane" solve everyday problems? As an example, imagine you have lots of handwritten text images like below.

Pixel images of handwritten texts (From: MNIST For ML Beginners, tensorflow.org)
You can a train single neuron to classify a set of images as "images of number 8" or "other images."
How do you do that? At first, you need to prepare tens of thousands of sample images for training. Let’s say a single image has 28 x 28 grayscale pixels; it will fit to an array with 28 x 28 = 784 numbers. Given 55,000 sample images, you'd have an array with 784 x 55000 numbers.
For each sample image in the 55K samples, you input the 784 numbers into a single neuron, along with the training label as to whether or not the image represents an "8."
As you saw on the Playground demo, the computer tries to find an optimal set of weights and bias to classify each image as an "8" or not.
After training with the 55K samples, this neuron will have generated a set of weights such as the ones below, where blue represents a positive value and red is a negative value.
That's it. Even with this very primitive single neuron, you can achieve 90% accuracy when recognizing a handwritten text image1. To recognize all the digits from 0 to 9, you would need just ten neurons to recognize them with 92% accuracy.
Again, the only thing this neuron can do is classify a data point as one of two kinds: "8" or not. What qualifies as a “data point" here? In this case, each image contains 28 x 28 = 784 numbers. In mathematical parlance, you could say that each image represents a single point in 784-dimensional space. The neuron divides the 784-dimensional space into two parts with a single hyperplane, and classifies each data point (or image) as "8" or not. (Yes, it's almost impossible to imagine what that dimensional space and hyperplane might look like. Forget about it.)
In the example above, we used handwritten text image1 as our sample data, but you can use a neural network to classify many kinds of data. For example, an online game provider could identify players that are cheating by examining player activity logs. An e-commerce provider can identify premium customers from web server access logs and transaction histories. In other words, you can express any data that can be converted and expressed as a number as a data point in n-dimensional space, let the neuron try to find the hyperplane, and see whether it helps you effectively classify your problem.
As you can see a neural network is a simple mechanism that’s implemented with basic math. The only difference between the traditional programming and neural network is, again, that you let the computer determine the parameters (weights and bias) by learning from training datasets. In other words, the trained weight pattern in our example wasn’t programmed by humans.
In this article, I won’t discuss in detail how you can train the parameters with algorithms such as backpropagation and gradient descent. Suffice it to say that the computer tries to increase or decrease each parameter a little bit to see how it reduces the error compared with training dataset, in hopes of finding the optimal combination of parameters.
Think of the computer as a student or junior worker. In the beginning, the computer makes many mistakes and it takes some time before it finds a practical way of solving real-world problems (including possible future problems) and minimizes the errors (so called generalization).
We may revisit the topic in a future article. For now, content yourself with the fact that a neural network library such as TensorFlow encapsulates most of the necessary math for training, and you don’t have to worry too much about it.
We’ve demonstrated how a single neuron can perform a simple classification, but you may be wondering how a simple neuron can be used to build a network that can recognize thousands of different images and compete with a professional Go player? There's a reason why neural networks can get much smarter than what we described above. Let's take a look at another example from TensorFlow Playground.
This dataset can not be classified by a single neuron, as the two groups of data points can't be divided by a single line. This is a so-called nonlinear classification problem. In the real world, there's no end to non-linear and complex datasets such as this one, and the question is how to capture these sorts of complex patterns?
The answer is to add a hidden layer between the input values and output neuron. Click hereto try it out.
Nonlinear classification problem on TensorFlow Playground (click here to try it)
What's happening here? If you click each one of the neurons in the hidden layer, you see they're each doing a simple, single-line classification:
The first neuron checks if a data point is on the left or right
The second neuron checks if it's in the top right
The third one checks if it's in the bottom right
These three results are called features of the data. Outputs from these neurons indicate the strength of their corresponding features.
Finally, the neuron on the output layer uses these features to classify the data. If you draw a three dimensional space consisting of the feature values, the final neuron can simply divide this space with a flat plane. This is an example of a transformation of the original data into a feature space.
For some great visual examples of transformations, visit colah's blog.

A hidden layer transforms inputs to feature space, making it linearly classifiable (From: Visualizing Representations: Deep Learning and Human Beings and Neural Networks, Manifolds and Topology, Christopher Olah)
In the case of the Playground demo, the transformation results in a composition of multiple features corresponding to a triangular or rectangular area. If you add more neurons by clicking the "plus" button, you'll see that the output neuron can capture much more sophisticated polygonal shapes from the dataset.
Getting back to the office worker analogy, you can say the transformation is extracting the insights that an experienced professional has in their daily work. A new employee gets confused and distracted by random signals coming from e-mails, phones, the boss, customers, etc., but senior employees are very efficient about extracting the essential signal from those inputs, and organize the chaos according to a few important principles.
Neural networks work the same way — trying to extract the most important features in a dataset to solve the problem. That's why neural networks can sometimes get smart enough to handle some pretty complex tasks.
With more neurons in a single hidden layer, you can capture more features. And having more hidden layers means more complex constructs that you can extract from the dataset. You can see how powerful this can be in the next example.
What kind of code would you write to classify this dataset? Dozens of IF statements with many many conditions and thresholds, each checking which small area a given data point is in? I personally wouldn’t want to do that.
This is where machine learning and neural networks exceed the performance of a human programmer. Click here to see it in action (it will take a couple of minutes to train).
Double spiral problem on TensorFlow Playground (click here to try it)
Pretty cool, isn't it? What you just saw was the computer trying to build a hierarchy of abstraction with a deep neural network. The neurons in the first hidden layers are doing the same simple classifications, whereas the neurons in the second and third layers are composing complex features out of the simple features, eventually coming up with the double spiral pattern.
More neurons a deeper network = more sophisticated abstraction. This is how simple neurons get smarter and perform so well for certain problems such as image recognition and playing Go.

Inception: an image recognition model published by Google (From: Going deeper with convolutions, Christian Szegedy et al.)
Some published examples of visualization by deep networks show how they're trained to build the hierarchy of recognized patterns, from simple edges and blobs to object parts and classes.
In this article, we looked at some TensorFlow Playground demos and how they explain the mechanism and power of neural networks. As you've seen, the basics of the technology are pretty simple. Each neuron just classifies a data point into one of two kinds. And yet, by having more neurons and deep layers, a neural network can extract hidden insights and complex patterns from a training dataset and build a hierarchy of abstraction.
The question is then, why isn't everybody using this great technology yet? There are two big challenges for neural networks right now. The first is that training deep neural networks requires a lot of computation power, and the second is that they require large training data sets. It can take several days or even weeks for a powerful GPU server to train a deep network with a dataset of millions of images.
Also, it takes a lot of trial and error to get the best training results with many combinations of different network designs and algorithms. Today, some researchers use tens of GPU servers or even supercomputers to perform large-scale distributed training.
But in very near future, fully managed distributed training and prediction services such as Google Cloud Machine Learning with TensorFlow may solve these problems with the availability of cloud-based CPUs and GPUs at an affordable cost, and may open the power of large and deep neural networks to everyone.
Thanks so much David Ha, Etsuji Nakai, Christopher Olah and Alexandra Barrett for reviewing and giving such valuable comments on the post as well as refining the text. And special thanks to the authors of TensorFlow Playground, Daniel Smilkov, Shan Carter and D. Sculley, for the truly awesome work.
附件:
《Understanding neural networks with TensorFlow Playground》--原文.pdf
《Understanding neural networks with TensorFlow Playground》--译文.pdf

微信公众号