Gary Marcus proposes a systematic criticism of deep learning

While people's application of AI technology is gradually moving toward the right track, the pioneers of artificial intelligence have already turned their attention to the distance. At the beginning of 2018, Gary Marcus, a professor at New York University and a former Uber AI Lab supervisor, published a long article critically exploring the current state and limitations of deep learning. In the article, Marcus said: We must step out of in-depth learning so that we can usher in truly universal AI.

Although the history of deep learning can be traced back decades ago, this method, even the term deep learning, has only just become popular 5 years ago, that is, the field was similarly sponsored by papers like Krizhevsky, Sutskever and Hinton. When the research results are rekindled. Their paper is now a classic deep network model on ImageNet.

What have you found in this area in the following five years? In the areas of speech recognition, image recognition, games, etc., where there has been considerable progress and the enthusiasm of the mainstream media has risen, I have raised ten concerns about deep learning, and if we want to achieve universal artificial intelligence, I suggest other technical supplements. Deep learning.

Many problems (visual and speech) that result in better solutions derived from deep learning have become less effective during the 2016-2017 period. - FranÃ§ois Chollet, Google, Keras Author, 2017.12.18

"Science is marching on the funeral," and the future is determined by the students who have questioned everything I said. - Geoffrey Hinton, Deep Learning Godfather, Google Brain Leader, 2017.9.15

1. Learn to hit the wall deeply?
Although the roots of deep learning can be traced back decades ago (Schmidhuber, 2015), up to 5 years ago, people's attention to it was extremely limited. In 2012, Krizhevsky, Sutskever and Hinton published the paper â€œImageNet Classification with Deep Convolutional Neural Networksâ€ (Krizhevsky, Sutskever, & Hinton, 2012) and achieved top results in the ImageNet Target Recognition Challenge (Deng et al.). With the publication of such a series of high-impact papers, everything has undergone a fundamental change. At the time, other laboratories were already doing similar work (CireÅŸan, Meier, Masci, & Schmidhuber, 2012). At the end of 2012, she learned in depth the front page of the New York Times. Then, quickly become the most well-known technology in artificial intelligence. The idea of â€‹â€‹training multi-layer neural networks is not new (and indeed it is), but deep learning has become practically available for the first time because of increased computing power and data.

Since then, deep learning has produced many top results in areas such as speech recognition, image recognition, and language translation, and has played an important role in many current AI applications. Large companies have also begun investing hundreds of millions of dollars to dig deeper learning talent. One of the key advocators of deep learning, Wu Enda, thought farther and said, â€œIf one person takes a mental task to take less than one second to consider, we may use AI in the present or in the near future. Its automation." (A, 2016). A recent article in the New York Times Sunday Magazine about deep learning suggests that deep learning technology is "ready to reinvent itself."

Nowadays, deep learning may be approaching the corner. Most of it is similar to what I expected in the rise of deep learning (Marcus 2012), and it is similar to the important people like Hinton (Sabour, Frosst, & Hinton, 2017) and Chollet (2017). It has been hinted for months.

What exactly is deep learning? It shows what the characteristics of intelligence? What can we expect from it? When will it not work? How far are we from GM? How close? When dealing with unfamiliar issues, when can machines be as flexible as humans? The purpose of this article is not only to ease the expansion of irrationality, but also to consider the direction that needs to be advanced.

The paper is also written for researchers in the field, writing to AI consumers who lack a technical background and may want to understand the field. As a result, in the second part I will briefly and non-technically introduce what the deep learning system can do and why it is done well. Then, in the third part, we introduce the weaknesses of deep learning. The fourth part introduces the misunderstanding of deep learning ability. Finally we introduce the direction we can move forward.

Deep learning is not likely to die, nor should it die. But five years after the rise of deep learning, it seems time to critically reflect on the ability and insufficiency of deep learning.

2. What is deep learning? What can deep learning do?
Deep learning is essentially a statistical technique based on sample data that classifies patterns using multilayer neural networks. The neural network in the deep learning literature includes a series of input units representing pixels or words, multiple hidden layers containing hidden units (also called nodes or neurons) (the more layers, the deeper the network), and a series of output units There is a connection between the nodes. In a typical application, such a network can be trained on large sets of hand-written numbers (input, represented as images) and labels (output, represented as images) that represent the category to which the input belongs.

Over time, an algorithm called back-propagation has emerged which allows the connection between the units to be adjusted through a gradient descent process so that any given input can have a corresponding output.

In general, we can understand the relationship between the inputs and outputs learned by the neural network as a map. Neural networks, especially neural networks with multiple hidden layers, are particularly good at learning input-output mapping.

Such systems are often described as neural networks because input, hidden, and output nodes are similar to biological neurons but have been greatly simplified. The connections between nodes resemble connections between neurons.

Most deep learning networks heavily use convolution techniques (LeCun, 1989), which constrains the neural connections in the network and allows them to instinctively capture translational invariance. This is essentially an object that can slide around the image while maintaining its own characteristics; as in the figure above, assuming it is a circle in the upper-left corner, even if it lacks direct experience, it can eventually transition to the lower-right corner.

Deep learning also has a well-known ability to self-generate intermediate representations, such as internal units that can respond to more complex elements in horizontal lines or graph structures.

In principle, for a given infinite number of data, the deep learning system can show the limited deterministic "mapping" between a given input set and the corresponding output set, but in practice whether the system can learn such mappings depends on many factors. A common concern is the local minimum trap, that is, the system is suboptimal and there is no better solution in the nearby solution space. (Experts use a variety of techniques to avoid such problems and achieve better results.) In practice, large data sets usually have better results because they have a large number of possible mappings.

For example, in speech recognition, neural networks learn the mapping between sets of speech and labels (such as words or phonemes). In target recognition, the neural network learns the mapping between image sets and tag sets. In DeepMind's Atari game system (Mnih et al., 2015), the neural network learns the mapping between pixel and joystick positions.

The deep learning system is most often used as a classification system because its mission is to determine the category to which a given input belongs (defined by the output unit of the neural network). As long as there is enough imagination, the ability to classify is enormous; the output can represent almost everything, such as words, positions on the chess board.

In a world with unlimited data and computing resources, other technologies may not be needed.

3. The limitations of deep learning The limitation of deep learning is above all the negative thesis: the data that we live in is not infinite. Systems that rely on deep learning often have to generalize to unseen data, which is not infinite, and the ability to ensure high-quality performance is limited.

We can use generalization as an extrapolation between known samples and data beyond the known training sample space (Marcus, 1998a).

For a neural network with good generalization performance, usually there must be a large amount of data, and the test data must be similar to the training data, so that the new answer is interpolated in the old data. In Krizhevsky et al.'s paper (Krizhevsky, Sutskever, & Hinton, 2012), a 9-layer convolutional neural network with 60 million parameters and 650,000 nodes is trained on approximately 1 million different samples from approximately 1000 categories.

This method of violence works well on a limited data set such as ImageNet, and all external stimuli can be classified into smaller categories. It also works well in stable areas. For example, in speech recognition, data can be mapped to a limited set of speech categories in a conventional manner, but for many reasons, deep learning is not a universal solution for artificial intelligence.

The following are the ten challenges facing the current deep learning system:

3.1 Current Deep Learning Requires Large Data Humans can learn abstract relationships with only a few attempts. If I tell you schmister is a sister between the ages of 10 and 21. You may only need one example, and you can immediately launch that you have no schmister, your good friend has no schmister, your child or parent has no schmister and so on.

You don't need hundreds or even millions of training samples. You just need to use an abstract relationship between the variables of a few class algebras to give schmister an accurate definition.

Humans can learn such abstractions, either through accurate definitions or more implicit means (Marcus, 2001). In fact even a 7-month-old baby can learn abstract linguistic rules from a small number of unlabeled samples in just two minutes (Marcus, Vijayan, Bandi Rao, & Vishton, 1999). A subsequent study (2012) conducted by Gervain and colleagues showed that newborns can also perform similar learning.

Deep learning currently lacks a mechanism for learning abstract concepts through the definition of exact words. When there are millions or even billions of training samples, deep learning can achieve the best performance, such as DeepMind's research in chess games and Atari. As Brenden Lake and his colleagues recently emphasized in a series of papers, humans are much more efficient at learning complex rules than deep learning systems (Lake, Salakhutdinov, & Tenenbaum, 2015; Lake, Ullman, Tenenbaum, & Gershman, 2016 (see also related research work by George et al., 2017). My study with Steven Pinker's comparison of the over-regularization error in children and neural networks also proves this point.

Geoff Hinton also expressed concern about deep learning relying on a large number of labeling samples. This view was expressed in his latest Capsule Network study, which points out that convolutional neural networks may encounter â€œindex inefficiencies,â€ resulting in network failures. Another problem is that the convolutional network has difficulty in generalizing to a new perspective. The ability to handle transformations (invariance) is inherent in the network, and for other common types of transformation invariance, we have to choose between duplicate feature detectors on the grid. The computational cost of this process is exponentially increasing. , Or increase the size of the marked training set, the amount of calculation is also exponential growth.

For problems without large amounts of data, deep learning is usually not the ideal solution.

3.2 Deep learning is still superficial and does not have sufficient ability to migrate It is very important here to realize that â€œdeepâ€ is a technical and architectural nature in deep learning (ie, in modern neural networks) Use a large number of hidden layers, rather than conceptual meanings (ie, the representation of such network acquisition can naturally be applied to concepts such as "justice," democracy," or "intervention.")

Even concrete concepts like "ball" or "opponent" are difficult to learn by deep learning. Consider DeepMind's research on Atari games using deep reinforcement learning, which combines deep learning and reinforcement learning. The results seem superb: The system uses a single "hyper-parameter" set (control network properties such as learning rate) to reach or defeat human experts in a large number of game samples, where the system does not have knowledge of the specific game Even the rules do not know. However, people can easily overinterpret this result. For example, according to an extensive video about the system learning to play Bricklayer Atari game Breakout, â€œAfter 240 minutes of training, the system realized that breaking the brick wall through one channel was the most efficient technique to get high marks.â€

But in fact, the system has not learned this kind of thinking: it does not understand what the channel is, or what the brick wall is; it only learns specific strategies in specific scenarios. Migration testing (where a deep-enhanced learning system needs to face slightly different scenes from the training process) suggests that depth-enhancement learning methods often learn something. For example, Vicarious's research team showed that DeepMind's more advanced advanced technology - the Atari system "Asynchronous Advantage Actor-Critic" (also called A3C) is playing a variety of changes in the version of Breakout (such as changing the racket's Y coordinate, or There was a failure when a wall was added to the center of the screen. These counterexamples demonstrate that DFS cannot learn to generalize concepts such as brick walls or racquets; more precisely, such comments are caused by over-attribution in biological psychology. The Atari system did not really learn about the robust concept of brick walls, but only superficially broken brick walls in a narrow collection of highly trained scenarios.

I found similar results in the ski game scenario of the research team of the startup Geometric Intelligence (later acquired by Uber). In 2017, a team of Berkeley and OpenAI found that it was easy to construct confrontation samples in multiple games, making DQN (original DeepMind algorithm), A3C, and other related technologies (Huang, Papernot, Goodfellow, Duan, & Abbeel, 2017) is invalid.

Recent experiments by Robin Jia and Percy Liang (2017) show similar views in different fields (language). They trained a variety of neural networks for question answering system tasks (known as SQuAD, Stanford Question Answering Database) where the goal of the task was to mark words related to a given question in a particular paragraph. For example, there is a trained system that can accurately identify the winner of the Super Bowl XXXIII based on a short essay as John Elway. However, Jia and Liang showed that simply inserting interference sentences (for example, claiming that Google's Jeff Dean won in another cup) can make the accuracy rate drop sharply. In the 16 models, the average accuracy rate dropped from 75% to 36%.

In general, the pattern of deep learning extraction is actually more superficial than giving a first impression.

3.3 Deep learning so far has no natural way to deal with hierarchical structures. For linguists like Chomsky, it is not surprising that the problems Robin Jia and Percy Liang have documented. Basically, most current deep learning methods are based on language models to express sentences as pure word sequences. However, Chomsky has always believed that languages â€‹â€‹have a hierarchical structure, that is, a small component loop is built into a larger structure. (For example, in the sentence "the teenager who has recently changed the Atlantic set a record for flying around the world", the main sentence is "the teenager set a record for flying around the world", and "who was the past for the flying around the world". One sentence of identity.

In the 1980s, Fodor and Pylyshyn (1988) also expressed the same concern, which is an early branch of neural networks. In my 2001 article, I also speculated that a single recurrent neural network (SRN) is the predecessor of today's more complex deep learning approach based on a recurrent neural network (ie, RNN); Elman, 1990) has difficulty expressing and expanding each The recursive structure of the unfamiliar statement (see the essay paper for a specific type).

Earlier in 2017, Brenden Lake and Marco Baroni tested whether such pessimism was still correct. As they were written in the title of the article, contemporary neural networks â€œhave remained unstructured for yearsâ€. RNN may "be different between training and testing... Generalization is good when it's very small, but when it comes to systematically combining skills to generalize, RNN is extremely unsuccessful."

Similar problems may also be exposed in other areas, such as planning and motor control, which require complex hierarchical structures, especially when new environments are encountered. We can indirectly see this from the difficulty mentioned above of the Atari game AI. More generally, in the robotics field, systems cannot generally summarize abstract plans in a brand new environment.

At the very least, the core issue of deep learning at present is that its learning feature set is relatively smooth or non-hierarchical. It is like a simple, unstructured list. Each feature is equal. Hierarchical structures (eg, syntax trees that distinguish main clauses from clauses in sentences) are not inherent in such a system, or are expressed directly. As a result, deep learning systems are forced to use a variety of fundamentally inappropriate agents, such as sentences. The sequence position of the word.

Systems like Word2Vec (Mikolov, Chen, Corrado, & Dean, 2013) express a single word as a vector with appropriate success. There are also a number of systems using little tricks trying to express complete sentences in deep learning compatible vector spaces (Socher, Huval, Manning, & Ng, 2012). However, as demonstrated by Lake and Baroni's experiments, the capacity of the circulating network is still limited, and it is not sufficient to accurately and reliably express and summarize the rich structural information.

3.4 Deep learning so far cannot be openly reasoned If you can't figure out the difference between "John promised Mary to leave" and "John promised to leave Mary," you can't tell who is leaving and who What will happen down. Current machine reading systems have achieved some degree of success in tasks such as SQuAD, where answers to given questions are explicitly contained in texts or integrated in multiple sentences (called multi-level inference ) Or integrate in a few explicit sentences of background knowledge, but no specific text is marked. For humans, we can often make extensive inferences when reading texts, forming new and implicit thinking. For example, we can determine the intention of a character only through dialogue.

Although Bowman et al. (Bowman, Angeli, Potts & Manning, 2015; Williams, Nangia & Bowman, 2017) have taken some important steps in this direction, for now, no deep learning system can be based on real-world knowledge. Open reasoning and achieve human-level accuracy.

3.5 Insufficient deep learning to date The characteristics of the "black box" of neural networks have been the focus of discussions in the past few years (Samek, Wiegand & MÃ¼ller, 2017; Ribeiro, Singh & Guestrin, 2016). In the current typical state, the deep learning system has millions or even billions of parameters. Its developer-recognizable form is not a conventionally used ("last_character_typed") human identifiable tag, but only in a complicated network. Applicable geographic form (eg, the activity value of the ith node at layer j in network module k). Although visual tools allow us to see individual node contributions in a complex network (Nguyen, Clune, Bengio, Dosvitskiy & Yosinski, 2016), most observers believe that the neural network as a whole remains a black box.

In the long run, the importance of this situation is still not clear (Lipton, 2016). If the system is robust and self-contained, then there is no problem; if the neural network occupies an important position in a larger system, its debuggability is crucial.

To solve the problem of transparency, the potential for deep learning in some areas such as financial or medical diagnosis is fatal, and humans must understand how the system makes decisions. As Catherine O.Neill (2016) pointed out, this opacity can also lead to serious bias problems.

3.6 So far, deep learning has not been well combined with prior knowledge . An important direction of deep learning is hermeneutics, which isolates itself from other potentially useful knowledge. The way deep learning works usually involves finding a training data set, the various outputs associated with the input, learning through any sophisticated architecture or variation, and data cleansing and/or enhancement techniques, and then learning the relationship between input and output. The method of the problem. Of these, there are only a few exceptions, such as LeCun's study of convolutional constraints on neural network connections (LeCun, 1989). Prior knowledge is intentionally minimized.

Thus, for example, the system proposed by Lerer et al. (2016) learns the physical properties of falling objects from the tower, and there is no prior knowledge of physics (except for the content implied by the convolution). Here, Newton's law is not coded, and the system learns this law from raw pixel-level data (and on some limited levels) and approximates them. As pointed out in my forthcoming paper, deep learning researchers seem to have a strong prejudice against prior knowledge, even though (such as physics) these prior knowledge is well-known.

In general, integrating prior knowledge into deep learning systems is not easy: partly because the knowledge representation in deep learning systems is mainly the relationship between features (mostly opaque), not abstract quantitative statements ( If mortals eventually die, see the discussion of universal quantification of one-to-one mapping (Marcus, 2001), or generics (a statement that can be violated if the dog has four legs or the mosquito carries the Nile virus (Gelman, Leslie, Was & Koch, 2015 )).

This problem is rooted in the machine learning culture, emphasizing that the system needs to be self-contained and competitive â€“ no need for even a little prior knowledge. The Kaggle Machine Learning Contest platform is an illustration of this phenomenon. Participants strive to obtain the best results for a given task on a given data set. The information needed for any given problem is neatly packaged with the relevant input and output files. We have made great progress in this paradigm (mainly in the field of image recognition and speech recognition).

The problem is, of course, that life is not a Kaggle contest; children don't pack all the data neatly into a single directory. In the real world we need to learn more fragmented data, and the problem is not so neatly encapsulated. Deep learning is very effective on issues such as speech recognition, which have many marks, but few people know how to apply them to more open issues. How to pick the rope stuck on the bicycle chain? Do I major in Mathematics or Neuroscience? The training set will not tell us.

The further away from the classification, the closer to common sense the more the problem cannot be solved by deep learning. In a recent study of common sense, I started with Ernie Davis (2015), starting with a series of easily deducible inferences, such as Prince William and his children Prince George who are taller? Can you use a polyester shirt for salads? If you put a needle on a carrot, is there a hole in the carrot or a hole in the needle?

As far as I know, there is no common sense to allow deep learning to answer such questions.

These very simple questions for human beings need to integrate a large number of different sources of knowledge, so there is still a long way to go from the category of deep learning to the style used. On the contrary, this may mean that if we want to achieve a human-level flexible cognitive ability, we need a completely different tool than deep learning.

3.7 So far, deep learning has not yet fundamentally differentiated causal relationships and related relationships If the causal relationship does not really equate with the related relationship, then the difference between the two is also a serious problem for deep learning. Roughly speaking, deep learning is a complex correlation between input features and output features, rather than an inherent causal representation. Deep learning system can regard people as a whole and it is easy to learn that height and vocabulary are related, but it is more difficult to characterize the interrelationship between growth and development (children learn more words at the same time as they grow longer. Big, but it doesn't mean that growing taller will lead them to learn more words, and learning more words will not lead them to grow taller.) Causality has been a central factor in other methods for artificial intelligence (Pearl, 2000), but perhaps it is because the goal of deep learning is not these problems, so the field of deep learning has traditionally tackled this problem. less.

3.8 Deep learning assumes that the world is generally stable. The approach adopted may be probabilistic. The logic of deep learning is that in a highly stable world (such as rule-invariant go), the effect is likely to be best, while in politics and The effects of changing areas such as the economy are not as good. Even if deep learning is applied to tasks such as stock forecasting, it is likely to encounter the same fate as Google Flu Trends; Google Flu Trends can predict epidemiological data very well based on search trends, but Missed the 2013 flu season completely (Lazer, Kennedy, King, & Vespignani, 2014).

3.9 So far, deep learning is only a good approximation, and its answer is not completely reliable. This problem is partly the result of other problems mentioned in this section. Deep learning is fairly comparable in a given area. Most of them work well, but they are still easy to be tricked into fooling.

More and more papers have shown this flaw, from the linguistic case given by Jia and Liang mentioned in the previous article to a large number of examples in the visual field, such as the image description system with deep learning mistakes the yellow and black stripes patterns. Think of the school bus (Nguyen, Yosinski, & Clune, 2014) as mislabeling the parking sign as a full-fledged refrigerator (Vinyals, Toshev, Bengio, & Erhan, 2014), while others appear to perform well.

There have also been cases where real-world stop signs have been mistaken for speed limit signs after being slightly modified (Evtimov et al., 2017), and 3D printed turtles have been mistaken for rifles (Athalye, Engstrom, Ilyas, & Kwok , 2017). There is also a recent news that a system of British police is difficult to distinguish between nudity and sand dunes. [10]

The paper that first proposed the "spoofability" of the deep learning system may be Szegedy et al. (2013). Four years have passed. Although research activities are very active, we have not yet found a robust solution.

3.10. So far, deep learning is hard to use in engineering. With the problems mentioned above, another fact is that it is still difficult to use deep learning for engineering development. As Googleâ€™s research team stated in the title of an important but still unanswered paper in 2014 (Sculley, Phillips, Ebner, Chaudhary, & Young, 2014): Machine learning is a â€œhigh interest technical debt credit cardâ€ This means that machine learning is relatively easy (short-term benefit) in creating systems that can work in certain limited environments, but it is important to ensure that they also work in other environments with entirely new data that may be different from previous training data. Difficulties (long-term debt, especially when a system is used as part of another larger system).

Leon Bottou (2015) compared machine learning with aircraft engine development in an important speech at ICML. He pointed out that while aircraft designs rely on the use of simpler systems to build complex systems, it is still possible to ensure reliable results and machine learning lacks the ability to obtain such assurance. As pointed out by Googleâ€™s Peter Norvig in 2016, machine learning currently lacks the progressiveness, transparency, and debuggability of traditional programming. In order to achieve deep learning robustness, some trade-offs need to be made in terms of simplicity.

Henderson and colleagues recently extended these ideas around deep learning, pointing out that the field faces some serious problems related to robustness and reproducibility (Henderson et al., 2017).

Although there has been some progress in the automation of the machine learning system's development process (Zoph, Vasudevan, Shlens, & Le, 2017), there is still a long way to go.

3.11 Discussion Of course, deep learning is itself mathematics; none of the problems given above is because there is a loophole in underlying mathematics for deep learning. In general, deep learning is a perfect way to optimize complex systems that characterize the mapping between input and output when there are enough large data sets.

The real problem lies in misunderstanding what deep learning is good at and what it is not. This technique is good at solving the closed classification problem, that is, mapping a large number of potential signals to a limited number of classifications when there is enough available data and the test set is similar to the training set.

Deviating from these assumptions may cause problems; deep learning is only a statistical technique, and all statistical techniques will have problems when deviating from the assumptions.

When the amount of available training data is limited or there is a significant difference between the test set and the training set or the sample space is wide and there is a lot of new data, the effect of the deep learning system is not so good. And under the restrictions of the real world, some problems cannot be regarded as classification problems at all. For example, open natural language understanding should not be seen as a mapping between different large collections of finite sentences, but should be viewed as a mapping between a potentially infinite range of input sentences and an array of meanings of the same scale, many of which Never encountered before. The use of deep learning in such a problem is like a roundabout chisel, and it can only be a rough approximation. There are certainly solutions in other places.

By considering a series of experiments I had done many years ago (1997), I could gain an intuitive understanding of the current errors when I tested language development on a class of neural networks that later became popular in cognitive science. Some simple aspects. This network is much simpler than the current model. They use no more than three layers (1 input layer, 1 hidden layer, 1 output layer) and do not use convolution techniques. They also use back-propagation technology.

In language, this problem is called generalization. When I heard a sentence "John pilked a football to Mary," I could grammatically infer "John pilked Mary the football." If I know what pilk means, I can infer a new sentence "Eliza pilked the ball. The meaning of "to Alec" was even heard for the first time.

I believe that the extraction of a large number of language problems as simple examples is still a concern at the moment. I have run a series of experiments on training three-level perceptrons (full connection, no convolution) on the identity function f(x) = x.

Training samples are characterized by input nodes (and related output nodes) that characterize binary numbers. For example, the number 7 is represented as 4, 2 and 1 on the input node. To test the generalization ability, I trained the network with a variety of even sets and tested it with odd and even inputs.

I experimented with a variety of parameters and the output was the same: the network could apply the identity function exactly to the trained even number (unless only the local optimum was reached), and some other even numbers, but applied to all Odd numbers have encountered failures, such as f(15)=14.

In general, the neural networks I tested can all learn from the training samples and can generalize the set of points to the neighbors of these samples in the n-dimensional space (ie, the training space), but they cannot infer the results beyond the training space. .

Odd numbers are outside this training space, and the network cannot generalize identity functions out of this space. Even adding more hidden units or more hidden layers is useless. Simple multi-layer perceptrons cannot be generalized beyond the training space (Marcus, 1998a; Marcus, 1998b; Marcus, 2001).

The above is the generalization challenge in the current deep learning network, and it may exist for twenty years. Many of the issues mentioned in this article - data hungriness, dealing with the fragility of fraud, dealing with open inference and migration - can all be seen as extensions of this basic issue. Modern neural networks have a good generalization effect on data that is close to the core training data, but the generalization effect on data that differs greatly from training samples begins to collapse.

Widely used convolution ensures the resolution of specific categories of problems (similar to my identity problem): the so-called translation invariance, the object still maintains its identity after the position is transformed. However, this solution does not apply to all issues, such as Lake's recent display. (Data augmentation extends the training sample space and provides another way to address the challenges of deep learning extrapolation, but such techniques are more effective in language 2d than in languages.)

There is currently no universal solution to the generalization problem in deep learning. For this reason, if we want to implement universal artificial intelligence, we need to rely on different solutions.

4. The potential risks of over-hype One of the biggest risks of current AI over-hype is to once again experience the cold winter of AI, just like in the 1970s.å°½ç®¡çŽ°åœ¨çš„AI åº”ç”¨æ¯”1970 å¹´ä»£å¤šå¾—å¤šï¼Œä½†ç‚’ä½œä»ç„¶æ˜¯ä¸»è¦æ‹…å¿§ã€‚å½“å´æ©è¾¾è¿™æ ·çš„é«˜çŸ¥ååº¦äººç‰©åœ¨ã€Šå“ˆä½›å•†ä¸šè¯„è®ºã€‹ä¸Šæ’°æ–‡ç§°è‡ªåŠ¨åŒ–å³å°†åˆ°æ¥ï¼ˆä¸ŽçŽ°å®žæƒ…å†µæœ‰å¾ˆå¤§å‡ºå…¥ï¼‰ï¼Œè¿‡åº¦é¢„æœŸå°±å¸¦æ¥äº†é£Žé™©ã€‚æœºå™¨å®žé™…ä¸Šæ— æ³•åšå¾ˆå¤šæ™®é€šäººä¸€ç§’å†…å°±å¯ä»¥å®Œæˆçš„äº‹æƒ…ï¼Œä»Žç†è§£ä¸–ç•Œåˆ°ç†è§£å¥åã€‚å¥åº·çš„äººç±»ä¸ä¼šæŠŠä¹Œé¾Ÿé”™è®¤æˆæ¥æžªæˆ–æŠŠåœè½¦ç‰Œè®¤æˆå†°ç®±ã€‚

å¤§é‡æŠ•èµ„AI çš„äººæœ€åŽå¯èƒ½ä¼šå¤±æœ›ï¼Œå°¤å…¶æ˜¯è‡ªç„¶è¯è¨€å¤„ç†é¢†åŸŸã€‚ä¸€äº›å¤§åž‹é¡¹ç›®å·²ç»è¢«æ”¾å¼ƒï¼Œå¦‚Facebook çš„M è®¡åˆ’ï¼Œè¯¥é¡¹ç›®äºŽ2015 å¹´8 æœˆå¯åŠ¨ï¼Œå®£ç§°è¦æ‰“é€ é€šç”¨ä¸ªäººè™šæ‹ŸåŠ©æ‰‹ï¼ŒåŽæ¥å…¶å®šä½ä¸‹é™ä¸ºå¸®åŠ©ç”¨æˆ·æ‰§è¡Œå°‘æ•°å®šä¹‰æ˜Žç¡®çš„äººç‰©ï¼Œå¦‚æ—¥åŽ†è®°å½•ã€‚

å¯ä»¥å…¬å¹³åœ°è¯´ï¼ŒèŠå¤©æœºå™¨äººè¿˜æ²¡æœ‰è¾¾åˆ°æ•°å¹´å‰ç‚’ä½œä¸çš„é¢„æœŸã€‚ä¸¾ä¾‹æ¥è¯´ï¼Œå¦‚æžœæ— äººé©¾é©¶æ±½è½¦åœ¨å¤§è§„æ¨¡æŽ¨å¹¿åŽè¢«è¯æ˜Žä¸å®‰å…¨ï¼Œæˆ–è€…ä»…ä»…æ˜¯æ²¡æœ‰è¾¾åˆ°å¾ˆå¤šæ‰¿è¯ºä¸æ‰€è¯´çš„å…¨è‡ªåŠ¨åŒ–ï¼Œè®©å¤§å®¶å¤±æœ›ï¼ˆä¸Žæ—©æœŸç‚’ä½œç›¸æ¯”ï¼‰ï¼Œé‚£ä¹ˆæ•´ä¸ªAI é¢†åŸŸå¯èƒ½ä¼šè¿Žæ¥å¤§æ»‘å¡ï¼Œä¸ç®¡æ˜¯çƒåº¦è¿˜æ˜¯èµ„é‡‘æ–¹é¢ã€‚æˆ‘ä»¬æˆ–è®¸å·²ç»çœ‹åˆ°è‹—å¤´ï¼Œæ£å¦‚Wired æœ€è¿‘å‘å¸ƒçš„æ–‡ç« ã€ŠAfter peak hype, self-driving cars 14 enter the trough of disillusionmentã€‹ä¸æ‰€è¯´çš„é‚£æ ·ï¼ˆhttps://ï¼‰ã€‚

è¿˜æœ‰å¾ˆå¤šå…¶ä»–ä¸¥é‡çš„æ‹…å¿§ï¼Œä¸åªæ˜¯æœ«æ—¥èˆ¬çš„åœºæ™¯ï¼ˆçŽ°åœ¨çœ‹æ¥è¿™ä¼¼ä¹Žè¿˜æ˜¯ç§‘å¹»å°è¯´ä¸çš„åœºæ™¯ï¼‰ã€‚æˆ‘è‡ªå·±æœ€å¤§çš„æ‹…å¿§æ˜¯AI é¢†åŸŸå¯èƒ½ä¼šé™·å…¥å±€éƒ¨æžå°å€¼é™·é˜±ï¼Œè¿‡åˆ†æ²‰è¿·äºŽæ™ºèƒ½ç©ºé—´çš„é”™è¯¯éƒ¨åˆ†ï¼Œè¿‡äºŽä¸“æ³¨äºŽæŽ¢ç´¢å¯ç”¨ä½†å˜åœ¨å±€é™çš„æ¨¡åž‹ï¼Œçƒè¡·äºŽæ‘˜å–æ˜“äºŽèŽ·å–çš„æžœå®žï¼Œè€Œå¿½ç•¥æ›´æœ‰é£Žé™©çš„ã€Œå°è·¯ã€ï¼Œå®ƒä»¬æˆ–è®¸æœ€ç»ˆå¯ä»¥å¸¦æ¥æ›´ç¨³å¥çš„å‘å±•è·¯å¾„ã€‚

æˆ‘æƒ³èµ·äº†Peter Thiel çš„è‘—åè¨€è®ºï¼šã€Œæˆ‘ä»¬æƒ³è¦ä¸€è¾†ä¼šé£žçš„æ±½è½¦ï¼Œå¾—åˆ°çš„å´æ˜¯140 ä¸ªå—ç¬¦ã€‚ã€æˆ‘ä»ç„¶æ¢¦æƒ³ç€Rosie the Robost è¿™ç§æä¾›å…¨æ–¹ä½æœåŠ¡çš„å®¶ç”¨æœºå™¨äººï¼Œä½†æ˜¯çŽ°åœ¨ï¼ŒAI å…åå¹´åŽ†å²ä¸ï¼Œæˆ‘ä»¬çš„æœºå™¨äººè¿˜æ˜¯åªèƒ½çŽ©éŸ³ä¹ã€æ‰«åœ°å’Œå¹¿å‘Šç«žä»·ã€‚

æ²¡æœ‰è¿›æ¥å°±æ˜¯è€»è¾±ã€‚AI æœ‰é£Žé™©ï¼Œä¹Ÿæœ‰å·¨å¤§çš„æ½œåŠ›ã€‚æˆ‘è®¤ä¸ºAI å¯¹ç¤¾ä¼šçš„æœ€å¤§è´¡çŒ®æœ€ç»ˆåº”è¯¥å‡ºçŽ°åœ¨è‡ªåŠ¨ç§‘å¦å‘çŽ°ç‰é¢†åŸŸã€‚ä½†æ˜¯è¦æƒ³èŽ·å¾—æˆåŠŸï¼Œé¦–å…ˆå¿…é¡»ç¡®ä¿è¯¥é¢†åŸŸä¸ä¼šé™·äºŽå±€éƒ¨æžå°å€¼ã€‚

5. ä»€ä¹ˆä¼šæ›´å¥½ï¼Ÿ
å°½ç®¡æˆ‘å‹¾ç”»äº†è¿™ä¹ˆå¤šçš„é—®é¢˜ï¼Œä½†æˆ‘ä¸è®¤ä¸ºæˆ‘ä»¬éœ€è¦æ”¾å¼ƒæ·±åº¦å¦ä¹ ã€‚ç›¸åï¼Œæˆ‘ä»¬éœ€è¦å¯¹å…¶è¿›è¡Œé‡æ–°æ¦‚å¿µåŒ–ï¼šå®ƒä¸æ˜¯ä¸€ä¸ªæ™®éçš„è§£å†³åŠžæ³•ï¼Œè€Œä»…ä»…åªæ˜¯ä¼—å¤šå·¥å…·ä¸çš„ä¸€ä¸ªã€‚æˆ‘ä»¬æœ‰ç”µåŠ¨èžºä¸åˆ€ï¼Œä½†æˆ‘ä»¬è¿˜éœ€è¦é”¤åã€æ‰³æ‰‹å’Œé’³åï¼Œå› æ¤æˆ‘ä»¬ä¸èƒ½åªæåˆ°é’»å¤´ã€ç”µåŽ‹è¡¨ã€é€»è¾‘æŽ¢å¤´å’Œç¤ºæ³¢å™¨ã€‚

åœ¨æ„ŸçŸ¥åˆ†ç±»æ–¹é¢ï¼Œå¦‚æžœæœ‰å¤§é‡çš„æ•°æ®ï¼Œé‚£ä¹ˆæ·±åº¦å¦ä¹ å°±æ˜¯ä¸€ä¸ªæœ‰ä»·å€¼çš„å·¥å…·ã€‚ä½†åœ¨å…¶å®ƒæ›´å®˜æ–¹çš„è®¤çŸ¥é¢†åŸŸï¼Œæ·±åº¦å¦ä¹ é€šå¸¸å¹¶ä¸æ˜¯é‚£ä¹ˆç¬¦åˆè¦æ±‚ã€‚é‚£ä¹ˆé—®é¢˜æ˜¯ï¼Œæˆ‘ä»¬çš„æ–¹å‘åº”è¯¥æ˜¯å“ªï¼Ÿä¸‹é¢æœ‰å››ä¸ªå¯èƒ½çš„æ–¹å‘ã€‚

5.1 æ— ç›‘ç£å¦ä¹ æœ€è¿‘æ·±åº¦å¦ä¹ å…ˆé©±Geoffrey Hinton å’ŒYann LeCun éƒ½è¡¨æ˜Žæ— ç›‘ç£å¦ä¹ æ˜¯è¶…è¶Šæœ‰ç›‘ç£ã€å°‘æ•°æ®æ·±åº¦å¦ä¹ çš„å…³é”®æ–¹æ³•ã€‚ä½†æ˜¯æˆ‘ä»¬è¦æ¸…æ¥šï¼Œæ·±åº¦å¦ä¹ å’Œæ— ç›‘ç£å¦ä¹ å¹¶ä¸æ˜¯é€»è¾‘å¯¹ç«‹çš„ã€‚æ·±åº¦å¦ä¹ ä¸»è¦ç”¨äºŽå¸¦æ ‡æ³¨æ•°æ®çš„æœ‰ç›‘ç£å¦ä¹ ï¼Œä½†æ˜¯ä¹Ÿæœ‰ä¸€äº›æ–¹æ³•å¯ä»¥åœ¨æ— ç›‘ç£çŽ¯å¢ƒä¸‹ä½¿ç”¨æ·±åº¦å¦ä¹ ã€‚ä½†æ˜¯ï¼Œè®¸å¤šé¢†åŸŸéƒ½æœ‰ç†ç”±æ‘†è„±ç›‘ç£å¼æ·±åº¦å¦ä¹ æ‰€è¦æ±‚å¤§é‡æ ‡æ³¨æ•°æ®ã€‚

æ— ç›‘ç£å¦ä¹ æ˜¯ä¸€ä¸ªå¸¸ç”¨æœ¯è¯ï¼Œå¾€å¾€æŒ‡çš„æ˜¯å‡ ç§ä¸éœ€è¦æ ‡æ³¨æ•°æ®çš„ç³»ç»Ÿã€‚ä¸€ç§å¸¸è§çš„ç±»åž‹æ˜¯å°†å…±äº«å±žæ€§çš„è¾“å…¥ã€Œèšç±»ã€åœ¨ä¸€èµ·ï¼Œå³ä½¿æ²¡æœ‰æ˜Žç¡®æ ‡è®°å®ƒä»¬ä¸ºä¸€ç±»ä¹Ÿèƒ½èšä¸ºä¸€ç±»ã€‚Google çš„çŒ«æ£€æµ‹æ¨¡åž‹ï¼ˆLe et al., 2012ï¼‰ä¹Ÿè®¸æ˜¯è¿™ç§æ–¹æ³•æœ€çªå‡ºçš„æ¡ˆä¾‹ã€‚

Yann LeCun ç‰äººæå€¡çš„å¦ä¸€ç§æ–¹æ³•ï¼ˆLuc, Neverova, Couprie, Verbeek, & LeCun, 2017ï¼‰èµ·åˆå¹¶ä¸ä¼šç›¸äº’æŽ’æ–¥ï¼Œå®ƒä½¿ç”¨åƒç”µå½±é‚£æ ·éšæ—¶é—´å˜åŒ–çš„æ•°æ®è€Œæ›¿ä»£æ ‡æ³¨æ•°æ®é›†ã€‚ç›´è§‚ä¸Šæ¥è¯´ï¼Œä½¿ç”¨è§†é¢‘è®ç»ƒçš„ç³»ç»Ÿå¯ä»¥åˆ©ç”¨æ¯ä¸€å¯¹è¿žç»å¸§æ›¿ä»£è®ç»ƒä¿¡å·ï¼Œå¹¶ç”¨æ¥é¢„æµ‹ä¸‹ä¸€å¸§ã€‚å› æ¤è¿™ç§ç”¨ç¬¬t å¸§é¢„æµ‹ç¬¬t+1 å¸§çš„æ–¹æ³•å°±ä¸éœ€è¦ä»»ä½•äººç±»æ ‡æ³¨ä¿¡æ¯ã€‚

æˆ‘çš„è§‚ç‚¹æ˜¯ï¼Œè¿™ä¸¤ç§æ–¹æ³•éƒ½æ˜¯æœ‰ç”¨çš„ï¼ˆå…¶å®ƒä¸€äº›æ–¹æ³•æœ¬æ–‡å¹¶ä¸è®¨è®ºï¼‰ï¼Œä½†æ˜¯å®ƒä»¬æœ¬èº«å¹¶ä¸èƒ½è§£å†³ç¬¬3 èŠ‚ä¸æåˆ°çš„é—®é¢˜ã€‚è¿™äº›ç³»ç»Ÿè¿˜æœ‰ä¸€äº›é—®é¢˜ï¼Œä¾‹å¦‚ç¼ºå°‘äº†æ˜¾å¼çš„å˜é‡ã€‚è€Œä¸”æˆ‘ä¹Ÿæ²¡çœ‹åˆ°é‚£äº›ç³»ç»Ÿæœ‰å¼€æ”¾å¼æŽ¨ç†ã€è§£é‡Šæˆ–å¯è°ƒå¼æ€§ã€‚

ä¹Ÿå°±æ˜¯è¯´ï¼Œæœ‰ä¸€ç§ä¸åŒçš„æ— ç›‘ç£å¦ä¹ æ¦‚å¿µï¼Œå®ƒè™½ç„¶å¾ˆå°‘æœ‰äººè®¨è®ºï¼Œä½†æ˜¯ä»ç„¶éžå¸¸æœ‰æ„æ€ï¼šå³å„¿ç«¥æ‰€è¿›è¡Œçš„æ— ç›‘ç£å¦ä¹ ã€‚å©åä»¬é€šå¸¸ä¼šä¸ºè‡ªå·±è®¾ç½®ä¸€ä¸ªæ–°çš„ä»»åŠ¡ï¼Œæ¯”å¦‚æå»ºä¸€ä¸ªä¹é«˜ç§¯æœ¨å¡”ï¼Œæˆ–è€…æ”€çˆ¬é€šè¿‡æ¤…åçš„çª—å£ã€‚é€šå¸¸æƒ…å†µä¸‹ï¼Œè¿™ç§æŽ¢ç´¢æ€§çš„é—®é¢˜æ¶‰åŠï¼ˆæˆ–è‡³å°‘ä¼¼ä¹Žæ¶‰åŠï¼‰è§£å†³å¤§é‡è‡ªä¸»è®¾å®šçš„ç›®æ ‡ï¼ˆæˆ‘è¯¥æ€Žä¹ˆåŠžï¼Ÿï¼‰å’Œé«˜å±‚æ¬¡çš„é—®é¢˜æ±‚è§£ï¼ˆæˆ‘æ€Žä¹ˆæŠŠæˆ‘çš„èƒ³è†Šç©¿è¿‡æ¤…åï¼ŒçŽ°åœ¨æˆ‘èº«ä½“çš„å…¶ä»–éƒ¨åˆ†æ˜¯ä¸æ˜¯å·²ç»é€šè¿‡äº†ï¼Ÿï¼‰ï¼Œä»¥åŠæŠ½è±¡çŸ¥è¯†çš„æ•´åˆï¼ˆèº«ä½“æ˜¯å¦‚ä½•å·¥ä½œçš„ï¼Œå„ç§ç‰©ä½“æœ‰å“ªäº›çª—å£å’Œæ˜¯å¦å¯ä»¥é’»è¿‡åŽ»ç‰ç‰ï¼‰ã€‚å¦‚æžœæˆ‘ä»¬å»ºç«‹äº†èƒ½è®¾å®šè‡ªèº«ç›®æ ‡çš„ç³»ç»Ÿï¼Œå¹¶åœ¨æ›´æŠ½è±¡çš„å±‚é¢ä¸Šè¿›è¡ŒæŽ¨ç†å’Œè§£å†³é—®é¢˜ï¼Œé‚£ä¹ˆäººå·¥æ™ºèƒ½é¢†åŸŸå°†ä¼šæœ‰é‡å¤§çš„è¿›å±•ã€‚

5.2 ç¬¦å·å¤„ç†å’Œæ··åˆæ¨¡åž‹çš„å¿…è¦æ€§ å¦ä¸€ä¸ªæˆ‘ä»¬éœ€è¦å…³æ³¨çš„åœ°æ–¹æ˜¯ç»å…¸çš„ç¬¦å·AIï¼Œæœ‰æ—¶å€™ä¹Ÿç§°ä¸ºGOFAIï¼ˆGood Old-Fashioned AIï¼‰ã€‚ç¬¦å·AI çš„åå—æ¥æºäºŽæŠ½è±¡å¯¹è±¡å¯ç›´æŽ¥ç”¨ç¬¦å·è¡¨ç¤ºè¿™ä¸€ä¸ªè§‚ç‚¹ï¼Œæ˜¯æ•°å¦ã€é€»è¾‘å¦å’Œè®¡ç®—æœºç§‘å¦çš„æ ¸å¿ƒæ€æƒ³ã€‚åƒf = ma è¿™æ ·çš„æ–¹ç¨‹å…è®¸æˆ‘ä»¬è®¡ç®—å¹¿æ³›è¾“å…¥çš„è¾“å‡ºï¼Œè€Œä¸ç®¡æˆ‘ä»¬ä»¥å‰æ˜¯å¦è§‚å¯Ÿè¿‡ä»»ä½•ç‰¹å®šçš„å€¼ã€‚è®¡ç®—æœºç¨‹åºä¹Ÿåšç€åŒæ ·çš„äº‹æƒ…ï¼ˆå¦‚æžœå˜é‡x çš„å€¼å¤§äºŽå˜é‡y çš„å€¼ï¼Œåˆ™æ‰§è¡Œæ“ä½œaï¼‰ã€‚

ç¬¦å·è¡¨å¾ç³»ç»Ÿæœ¬èº«ç»å¸¸è¢«è¯æ˜Žæ˜¯è„†å¼±çš„ï¼Œä½†æ˜¯å®ƒä»¬åœ¨å¾ˆå¤§ç¨‹åº¦ä¸Šæ˜¯åœ¨æ•°æ®å’Œè®¡ç®—èƒ½åŠ›æ¯”çŽ°åœ¨å°‘å¾—å¤šçš„æ—¶ä»£å‘å±•èµ·æ¥çš„ã€‚å¦‚ä»Šçš„æ£ç¡®ä¹‹ä¸¾å¯èƒ½æ˜¯å°†å–„äºŽæ„ŸçŸ¥åˆ†ç±»çš„æ·±åº¦å¦ä¹ ä¸Žä¼˜ç§€çš„æŽ¨ç†å’ŒæŠ½è±¡ç¬¦å·ç³»ç»Ÿç»“åˆèµ·æ¥ã€‚äººä»¬å¯èƒ½ä¼šè®¤ä¸ºè¿™ç§æ½œåœ¨çš„åˆå¹¶å¯ä»¥ç±»æ¯”äºŽå¤§è„‘ï¼›å¦‚åˆçº§æ„ŸçŸ¥çš®å±‚é‚£æ ·çš„æ„ŸçŸ¥è¾“å…¥ç³»ç»Ÿå¥½åƒå’Œæ·±åº¦å¦ä¹ åšçš„æ˜¯ä¸€æ ·çš„ï¼Œä½†è¿˜æœ‰ä¸€äº›å¦‚Broca åŒºåŸŸå’Œå‰é¢å¶çš®è´¨ç‰é¢†åŸŸä¼¼ä¹Žæ‰§è¡Œæ›´é«˜å±‚æ¬¡çš„æŠ½è±¡ã€‚å¤§è„‘çš„èƒ½åŠ›å’Œçµæ´»æ€§éƒ¨åˆ†æ¥è‡ªå…¶åŠ¨æ€æ•´åˆè®¸å¤šä¸åŒè®¡ç®—æ³•çš„èƒ½åŠ›ã€‚ä¾‹å¦‚ï¼Œåœºæ™¯æ„ŸçŸ¥çš„è¿‡ç¨‹å°†ç›´æŽ¥çš„æ„ŸçŸ¥ä¿¡æ¯ä¸Žå…³äºŽå¯¹è±¡åŠå…¶å±žæ€§ã€å…‰æºç‰å¤æ‚æŠ½è±¡çš„ä¿¡æ¯æ— ç¼åœ°ç»“åˆåœ¨ä¸€èµ·ã€‚

çŽ°å·²æœ‰ä¸€äº›å°è¯•æ€§çš„ç ”ç©¶æŽ¢è®¨å¦‚ä½•æ•´åˆå·²å˜çš„æ–¹æ³•ï¼ŒåŒ…æ‹¬ç¥žç»ç¬¦å·å»ºæ¨¡ï¼ˆBesold et al., 2017ï¼‰å’Œæœ€è¿‘çš„å¯å¾®ç¥žç»è®¡ç®—æœºï¼ˆGraves et al., 2016ï¼‰ã€é€šè¿‡å¯å¾®è§£é‡Šå™¨è§„åˆ’ï¼ˆBoÅ¡njak, RocktÃ¤schel, Naradowsky, & Riedel, 2016ï¼‰å’ŒåŸºäºŽç¦»æ•£è¿ç®—çš„ç¥žç»ç¼–ç¨‹ï¼ˆNeelakantan, Le, Abadi, McCallum, & Amodei, 2016ï¼‰ã€‚è™½ç„¶è¯¥é¡¹ç ”ç©¶è¿˜æ²¡æœ‰å®Œå…¨æ‰©å±•åˆ°å¦‚åƒfull-service é€šç”¨äººå·¥æ™ºèƒ½é‚£æ ·çš„æŽ¢è®¨ï¼Œä½†æˆ‘ä¸€ç›´ä¸»å¼ ï¼ˆMarcus, 2001ï¼‰å°†æ›´å¤šçš„ç±»å¾®å¤„ç†å™¨è¿ç®—é›†æˆåˆ°ç¥žç»ç½‘ç»œä¸æ˜¯éžå¸¸æœ‰ä»·å€¼çš„ã€‚

å¯¹äºŽæ‰©å±•æ¥è¯´ï¼Œå¤§è„‘å¯èƒ½è¢«è§†ä¸ºç”±ã€Œä¸€ç³»åˆ—å¯é‡å¤ä½¿ç”¨çš„è®¡ç®—åŸºå…ƒç»„æˆ- åŸºæœ¬å•å…ƒçš„å¤„ç†ç±»ä¼¼äºŽå¾®å¤„ç†å™¨ä¸çš„ä¸€ç»„åŸºæœ¬æŒ‡ä»¤ã€‚è¿™ç§æ–¹å¼åœ¨å¯é‡æ–°é…ç½®çš„é›†æˆç”µè·¯ä¸è¢«ç§°ä¸ºçŽ°åœºå¯ç¼–ç¨‹é€»è¾‘é—¨é˜µåˆ—ã€ï¼Œæ£å¦‚æˆ‘åœ¨å…¶å®ƒåœ°æ–¹ï¼ˆMarcusï¼ŒMarblestoneï¼Œï¼†Deanï¼Œ2014ï¼‰æ‰€è®ºè¿°çš„é‚£æ ·ï¼Œé€æ¥ä¸°å¯Œæˆ‘ä»¬çš„è®¡ç®—ç³»ç»Ÿæ‰€å»ºç«‹çš„æŒ‡ä»¤é›†ä¼šæœ‰å¾ˆå¤§çš„å¥½å¤„ã€‚

5.3 æ¥è‡ªè®¤çŸ¥å’Œå‘å±•å¿ƒç†å¦çš„æ›´å¤šæ´žè§ å¦ä¸€ä¸ªæœ‰æ½œåœ¨ä»·å€¼çš„é¢†åŸŸæ˜¯äººç±»è®¤çŸ¥ï¼ˆDavis & Marcus, 2015; Lake et al., 2016; Marcus, 2001; Pinker & Prince, 1988ï¼‰ã€‚æœºå™¨æ²¡æœ‰å¿…è¦çœŸæ£å–ä»£äººç±»ï¼Œè€Œä¸”è¿™æžæ˜“å‡ºé”™ï¼Œè¿œè°ˆä¸ä¸Šå®Œç¾Žã€‚ä½†æ˜¯åœ¨å¾ˆå¤šé¢†åŸŸï¼Œä»Žè‡ªç„¶è¯è¨€ç†è§£åˆ°å¸¸è¯†æŽ¨ç†ï¼Œäººç±»ä¾ç„¶å…·æœ‰æ˜Žæ˜¾ä¼˜åŠ¿ã€‚å€Ÿé‰´è¿™äº›æ½œåœ¨æœºåˆ¶å¯ä»¥æŽ¨åŠ¨äººå·¥æ™ºèƒ½çš„å‘å±•ï¼Œå°½ç®¡ç›®æ ‡ä¸æ˜¯ã€ä¹Ÿä¸åº”è¯¥æ˜¯ç²¾ç¡®åœ°å¤åˆ¶äººç±»å¤§è„‘ã€‚

å¯¹å¾ˆå¤šäººæ¥è®²ï¼Œä»Žäººè„‘çš„å¦ä¹ æ„å‘³ç€ç¥žç»ç§‘å¦ï¼›æˆ‘è®¤ä¸ºè¿™å¯èƒ½ä¸ºæ—¶å°šæ—©ã€‚æˆ‘ä»¬è¿˜ä¸å…·å¤‡è¶³å¤Ÿçš„ç¥žç»ç§‘å¦çŸ¥è¯†ä»¥çœŸæ£åˆ©ç”¨åå‘å·¥ç¨‹æ¨¡æ‹Ÿäººè„‘ã€‚äººå·¥æ™ºèƒ½å¯ä»¥å¸®åŠ©æˆ‘ä»¬ç ´è¯‘å¤§è„‘ï¼Œè€Œä¸æ˜¯ç›¸åã€‚

ä¸ç®¡æ€Žæ ·ï¼Œå®ƒåŒæ—¶åº”è¯¥æœ‰æ¥è‡ªè®¤çŸ¥å’Œå‘å±•å¿ƒç†å¦çš„æŠ€æœ¯ä¸Žè§è§£ä»¥æž„å»ºæ›´åŠ é²æ£’å’Œå…¨é¢çš„äººå·¥æ™ºèƒ½ï¼Œæž„å»ºä¸ä»…ä»…ç”±æ•°å¦é©±åŠ¨ï¼Œä¹Ÿç”±äººç±»å¿ƒç†å¦çš„çº¿ç´¢é©±åŠ¨çš„æ¨¡åž‹ã€‚

ç†è§£äººç±»å¿ƒæ™ºä¸çš„å…ˆå¤©æœºåˆ¶å¯èƒ½æ˜¯ä¸€ä¸ªä¸é”™çš„å¼€å§‹ï¼Œå› ä¸ºäººç±»å¿ƒæ™ºèƒ½ä½œä¸ºå‡è®¾çš„æ¥æºï¼Œä»Žè€Œæœ‰æœ›åŠ©åŠ›äººå·¥æ™ºèƒ½çš„å¼€å‘ï¼›åœ¨æœ¬è®ºæ–‡çš„å§Šå¦¹ç¯‡ä¸ï¼ˆMarcusï¼Œå°šåœ¨å‡†å¤‡ä¸ï¼‰ï¼Œæˆ‘æ€»ç»“äº†ä¸€äº›å¯èƒ½æ€§ï¼Œæœ‰äº›æ¥è‡ªäºŽæˆ‘è‡ªå·±çš„æ—©æœŸç ”ç©¶ï¼ˆMarcus, 2001ï¼‰ï¼Œå¦ä¸€äº›åˆ™æ¥è‡ªäºŽElizabeth Spelke çš„ç ”ç©¶ï¼ˆSpelke & Kinzler, 2007ï¼‰ã€‚æ¥è‡ªäºŽæˆ‘è‡ªå·±çš„ç ”ç©¶çš„é‚£äº›é‡ç‚¹å…³æ³¨çš„æ˜¯è¡¨ç¤ºå’Œæ“ä½œä¿¡æ¯çš„å¯èƒ½æ–¹å¼ï¼Œæ¯”å¦‚ç”¨äºŽè¡¨ç¤ºä¸€ä¸ªç±»åˆ«ä¸ä¸åŒç±»åž‹å’Œä¸ªä½“ä¹‹é—´ä¸åŒå˜é‡å’Œå·®å¼‚çš„ç¬¦å·æœºåˆ¶ï¼›Spelke çš„ç ”ç©¶åˆ™å…³æ³¨çš„æ˜¯å©´å„¿è¡¨ç¤ºç©ºé—´ã€æ—¶é—´å’Œç‰©ä½“ç‰æ¦‚å¿µçš„æ–¹å¼ã€‚

å¦ä¸€ä¸ªå…³æ³¨é‡ç‚¹å¯èƒ½æ˜¯å¸¸è¯†çŸ¥è¯†ï¼Œç ”ç©¶æ–¹å‘åŒ…æ‹¬å¸¸è¯†çš„å‘å±•æ–¹å¼ï¼ˆæœ‰äº›å¯èƒ½æ˜¯å› ä¸ºæˆ‘ä»¬çš„å¤©ç”Ÿèƒ½åŠ›ï¼Œä½†å¤§éƒ¨åˆ†æ˜¯åŽå¤©å¦ä¹ åˆ°çš„ï¼‰ã€å¸¸è¯†çš„è¡¨ç¤ºæ–¹å¼ä»¥åŠæˆ‘ä»¬å¦‚ä½•å°†å¸¸è¯†ç”¨äºŽæˆ‘ä»¬ä¸ŽçœŸå®žä¸–ç•Œçš„äº¤äº’è¿‡ç¨‹ï¼ˆDavis & Marcus, 2015ï¼‰ã€‚Lerer ç‰äººï¼ˆ2016ï¼‰ã€Watters åŠå…¶åŒäº‹ï¼ˆ2017ï¼‰ã€Tenenbaum åŠå…¶åŒäº‹ï¼ˆWu, Lu, Kohli, Freeman, & Tenenbaum, 2017ï¼‰ã€Davis å’Œæˆ‘ï¼ˆDavis, Marcus, & Frazier-Logue, 2017ï¼‰æœ€è¿‘çš„ç ”ç©¶æå‡ºäº†ä¸€äº›åœ¨æ—¥å¸¸çš„å®žé™…æŽ¨ç†é¢†åŸŸå†…æ€è€ƒè¿™ä¸€é—®é¢˜çš„ä¸åŒæ–¹æ³•ã€‚

ç¬¬ä¸‰ä¸ªå…³æ³¨é‡ç‚¹å¯èƒ½æ˜¯äººç±»å¯¹å™äº‹ï¼ˆnarrativeï¼‰çš„ç†è§£ï¼Œè¿™æ˜¯ä¸€ä¸ªåŽ†å²æ‚ ä¹…çš„æ¦‚å¿µï¼ŒRoger Schank å’ŒAbelson åœ¨1977 å¹´å°±å·²æå‡ºï¼Œå¹¶ä¸”ä¹Ÿå¾—åˆ°äº†æ›´æ–°ï¼ˆMarcus, 2014; KoÄiskÃ½ et al., 2017ï¼‰ã€‚

5.4. æ›´å¤§çš„æŒ‘æˆ˜ ä¸ç®¡æ·±åº¦å¦ä¹ æ˜¯ä¿æŒå½“å‰å½¢å¼ï¼Œè¿˜æ˜¯å˜æˆæ–°çš„ä¸œè¥¿ï¼ŒæŠ‘æˆ–è¢«æ›¿ä»£ï¼Œäººä»¬ä¹Ÿè®¸è®¤ä¸ºå¤§é‡çš„æŒ‘æˆ˜é—®é¢˜ä¼šå°†ç³»ç»ŸæŽ¨è¿›åˆ°æœ‰ç›‘ç£å¦ä¹ æ— æ³•é€šè¿‡å¤§åž‹æ•°æ®é›†å¦ä¹ åˆ°çŸ¥è¯†ã€‚ä»¥ä¸‹æ˜¯ä¸€äº›å»ºè®®ï¼Œå®ƒä»¬éƒ¨åˆ†æ‘˜è‡ªæœ€è¿‘ä¸€æœŸçš„ã€ŠAI Magazineã€‹ç‰¹åˆŠï¼ˆMarcus, Rossi, Veloso - AI Magazine, & 2016, 2016ï¼‰ï¼Œè¯¥æ‚å¿—è‡´åŠ›äºŽè¶…è¶Šæˆ‘å’ŒFrancesca Rossiã€Manuelo Veloso ä¸€èµ·ç¼–è¾‘çš„æ‚å¿—ã€ŠTuring Testã€‹ï¼š

ç†è§£åŠ›æŒ‘æˆ˜ï¼ˆParitosh & Marcus, 2016; KoÄiskÃ½ et al., 2017ï¼‰éœ€è¦ç³»ç»Ÿè§‚çœ‹ä¸€ä¸ªä»»æ„çš„è§†é¢‘ï¼ˆæˆ–è€…é˜…è¯»æ–‡æœ¬ã€å¬å¹¿æ’ï¼‰ï¼Œå¹¶å°±å†…å®¹å›žç”å¼€æ”¾é—®é¢˜ï¼ˆè°æ˜¯ä¸»è§’ï¼Ÿå…¶åŠ¨æœºæ˜¯ä»€ä¹ˆï¼Ÿå¦‚æžœå¯¹æ‰‹æˆåŠŸå®Œæˆä»»åŠ¡ï¼Œä¼šå‘ç”Ÿä»€ä¹ˆï¼Ÿï¼‰ã€‚æ²¡æœ‰ä¸“é—¨çš„ç›‘ç£è®ç»ƒé›†å¯ä»¥æ¶µç›–æ‰€æœ‰å¯èƒ½çš„æ„å¤–äº‹ä»¶ï¼›æŽ¨ç†å’ŒçŽ°å®žä¸–ç•Œçš„çŸ¥è¯†æ•´åˆæ˜¯å¿…éœ€çš„ã€‚
ç§‘å¦æŽ¨ç†ä¸Žç†è§£ï¼Œæ¯”å¦‚è‰¾ä¼¦äººå·¥æ™ºèƒ½ç ”ç©¶æ‰€çš„ç¬¬8 çº§çš„ç§‘å¦æŒ‘æˆ˜ï¼ˆSchoenick, Clark, Tafjord, P, & Etzioni, 2017; Davis, 2016ï¼‰ã€‚å°½ç®¡å¾ˆå¤šåŸºæœ¬ç§‘å¦é—®é¢˜çš„ç”æ¡ˆå¯è½»æ˜“ä»Žç½‘ç»œæœç´¢ä¸æ‰¾åˆ°ï¼Œå…¶ä»–é—®é¢˜åˆ™éœ€è¦æ¸…æ™°é™ˆè¿°ä¹‹å¤–çš„æŽ¨ç†ä»¥åŠå¸¸è¯†çš„æ•´åˆã€‚
ä¸€èˆ¬æ€§çš„æ¸¸æˆçŽ©æ³•ï¼ˆGenesereth, Love, & Pell, 2005ï¼‰ï¼Œæ¸¸æˆä¹‹é—´å¯è¿ç§»ï¼ˆKansky et al., 2017ï¼‰ï¼Œè¿™æ ·ä¸€æ¥ï¼Œæ¯”å¦‚å¦ä¹ ä¸€ä¸ªç¬¬ä¸€äººç§°çš„å°„å‡»æ¸¸æˆå¯ä»¥æé«˜å¸¦æœ‰å®Œå…¨ä¸åŒå›¾åƒã€è£…å¤‡ç‰çš„å¦ä¸€ä¸ªæ¸¸æˆçš„è¡¨çŽ°ã€‚ï¼ˆä¸€ä¸ªç³»ç»Ÿå¯ä»¥åˆ†åˆ«å¦ä¹ å¾ˆå¤šæ¸¸æˆï¼Œå¦‚æžœå®ƒä»¬ä¹‹é—´ä¸å¯è¿ç§»ï¼Œæ¯”å¦‚DeepMind çš„Atari æ¸¸æˆç³»ç»Ÿï¼Œåˆ™ä¸å…·å¤‡èµ„æ ¼ï¼›å…³é”®æ˜¯è¦èŽ·å–ç´¯åŠ çš„ã€å¯è¿ç§»çš„çŸ¥è¯†ã€‚ï¼‰
ç‰©ç†å…·åŒ–åœ°æµ‹è¯•ä¸€ä¸ªäººå·¥æ™ºèƒ½é©±åŠ¨çš„æœºå™¨äººï¼Œå®ƒèƒ½å¤ŸåŸºäºŽæŒ‡ç¤ºå’ŒçœŸå®žä¸–ç•Œä¸ä¸Žç‰©ä½“éƒ¨ä»¶çš„äº¤äº’è€Œä¸æ˜¯å¤§é‡è¯•é”™ï¼Œæ¥æå»ºè¯¸å¦‚ä»Žå¸ç¯·åˆ°å®œå®¶è´§æž¶è¿™æ ·çš„ç³»ç»Ÿï¼ˆOrtiz Jr, 2016ï¼‰ã€‚
æ²¡æœ‰ä¸€ä¸ªæŒ‘æˆ˜å¯èƒ½æ˜¯å……è¶³çš„ã€‚è‡ªç„¶æ™ºèƒ½æ˜¯å¤šç»´åº¦çš„ï¼ˆGardner, 2011ï¼‰ï¼Œå¹¶ä¸”åœ¨ä¸–ç•Œå¤æ‚åº¦ç»™å®šçš„æƒ…å†µä¸‹ï¼Œé€šç”¨äººå·¥æ™ºèƒ½ä¹Ÿå¿…é¡»æ˜¯å¤šç»´åº¦çš„ã€‚

é€šè¿‡è¶…è¶Šæ„ŸçŸ¥åˆ†ç±»ï¼Œå¹¶è¿›å…¥åˆ°æŽ¨ç†ä¸ŽçŸ¥è¯†çš„æ›´å…¨é¢æ•´åˆä¹‹ä¸ï¼Œäººå·¥æ™ºèƒ½å°†ä¼šèŽ·å¾—å·¨å¤§è¿›æ¥ã€‚

6.ç»“è¯ ä¸ºäº†è¡¡é‡è¿›æ¥ï¼Œæœ‰å¿…è¦å›žé¡¾ä¸€ä¸‹5 å¹´å‰æˆ‘å†™ç»™ã€Šçº½çº¦å®¢ã€‹çš„ä¸€ç¯‡æœ‰äº›æ‚²è§‚çš„æ–‡ç« ï¼ŒæŽ¨æµ‹ã€Œæ·±åº¦å¦ä¹ åªæ˜¯æž„å»ºæ™ºèƒ½æœºå™¨é¢ä¸´çš„15 ä¸ªæ›´å¤§æŒ‘æˆ˜çš„ä¸€éƒ¨åˆ†ã€ï¼Œå› ä¸ºã€Œè¿™äº›æŠ€æœ¯ç¼ºä¹è¡¨å¾å› æžœå…³ç³»ï¼ˆæ¯”å¦‚ç–¾ç—…ä¸Žç—‡çŠ¶ï¼‰çš„æ–¹æ³•ã€ï¼Œå¹¶åœ¨èŽ·å–ã€Œå…„å¼Ÿå§å¦¹ã€æˆ–ã€Œç›¸åŒã€ç‰æŠ½è±¡æ¦‚å¿µæ—¶é¢ä¸´æŒ‘æˆ˜ã€‚å®ƒä»¬æ²¡æœ‰æ‰§è¡Œé€»è¾‘æŽ¨ç†çš„æ˜¾å¼æ–¹æ³•ï¼Œæ•´åˆæŠ½è±¡çŸ¥è¯†è¿˜æœ‰å¾ˆé•¿çš„è·¯è¦èµ°ï¼Œæ¯”å¦‚å¯¹è±¡ä¿¡æ¯æ˜¯ä»€ä¹ˆã€ç›®æ ‡æ˜¯ä»€ä¹ˆï¼Œä»¥åŠå®ƒä»¬é€šå¸¸å¦‚ä½•ä½¿ç”¨ã€‚

æ£å¦‚æˆ‘ä»¬æ‰€è§ï¼Œå°½ç®¡ç‰¹å®šé¢†åŸŸå¦‚è¯éŸ³è¯†åˆ«ã€æœºå™¨ç¿»è¯‘ã€æ£‹ç›˜æ¸¸æˆç‰æ–¹é¢å‡ºçŽ°é‡å¤§è¿›å±•ï¼Œå°½ç®¡åœ¨åŸºç¡€è®¾æ–½ã€æ•°æ®é‡å’Œç®—åŠ›æ–¹é¢çš„è¿›å±•åŒæ ·ä»¤äººå°è±¡æ·±åˆ»ï¼Œä½†è¿™äº›æ‹…å¿§ä¸çš„å¾ˆå¤šä¾ç„¶å˜åœ¨ã€‚

æœ‰è¶£çš„æ˜¯ï¼ŒåŽ»å¹´å¼€å§‹ä¸æ–æœ‰å…¶ä»–å¦è€…ä»Žä¸åŒæ–¹é¢å¼€å§‹å¼ºè°ƒç±»ä¼¼çš„å±€é™ï¼Œè¿™å…¶ä¸æœ‰Brenden Lake å’ŒMarco Baroni (2017)ã€FranÃ§ois Chollet (2017)ã€Robin Jia å’ŒPercy Liang (2017)ã€Dileep George åŠå…¶ä»–Vicarious åŒäº‹(Kansky et al., 2017)ã€ Pieter Abbeel åŠå…¶Berkeley åŒåƒš(Stoica et al., 2017)ã€‚

ä¹Ÿè®¸è¿™å½“ä¸æœ€è‘—åçš„è¦æ•°Geoffrey Hintonï¼Œä»–å‹‡äºŽåšè‡ªæˆ‘é¢ è¦†ã€‚ä¸Šå¹´8 æœˆæŽ¥å—Axios é‡‡è®¿æ—¶ä»–è¯´è‡ªå·±ã€Œæ·±æ·±æ€€ç–‘ã€åå‘ä¼ æ’ï¼Œå› ä¸ºä»–å¯¹åå‘ä¼ æ’å¯¹å·²æ ‡æ³¨æ•°æ®é›†çš„ä¾èµ–æ€§è¡¨ç¤ºæ‹…å¿§ã€‚

ç›¸åï¼Œä»–å»ºè®®ã€Œå¼€å‘ä¸€ç§å…¨æ–°çš„æ–¹æ³•ã€ã€‚ä¸ŽHinton ä¸€æ ·ï¼Œæˆ‘å¯¹æœªæ¥çš„ä¸‹ä¸€æ¥èµ°å‘æ·±æ„Ÿå…´å¥‹ã€‚

Bitmain Miner

Bitmain Miner:Bitmain Antminer L7 (9.5Gh),Bitmain Antminer L7 (9.16Gh),Bitmain Antminer L7 (9.05Gh),Bitmain Antminer L3+ (504Mh),Bitmain Antminer L3+ (600Mh),Bitmain Antminer L3++ (580Mh),Bitmain Antminer L3++ (596Mh)

Bitmain is the world's leading digital currency mining machine manufacturer. Its brand ANTMINER has maintained a long-term technological and market dominance in the industry, with customers covering more than 100 countries and regions. The company has subsidiaries in China, the United States, Singapore, Malaysia, Kazakhstan and other places.

Bitmain has a unique computing power efficiency ratio technology to provide the global blockchain network with outstanding computing power infrastructure and solutions. Since its establishment in 2013, ANTMINER BTC mining machine single computing power has increased by three orders of magnitude, while computing power efficiency ratio has decreased by two orders of magnitude. Bitmain's vision is to make the digital world a better place for mankind.

Bitmain Miner,antminer L3 Miner,Asic L3 Antminer,Bitmain Antminer L7,ltc miner

Shenzhen YLHM Technology Co., Ltd. , https://www.sggminer.com