Although the history of deep learning can be traced back decades ago, this method, even the term deep learning, has only just become popular 5 years ago, that is, the field was similarly sponsored by papers like Krizhevsky, Sutskever and Hinton. When the research results are rekindled. Their paper is now a classic deep network model on ImageNet.
What have you found in this area in the following five years? In the areas of speech recognition, image recognition, games, etc., where there has been considerable progress and the enthusiasm of the mainstream media has risen, I have raised ten concerns about deep learning, and if we want to achieve universal artificial intelligence, I suggest other technical supplements. Deep learning.
Many problems (visual and speech) that result in better solutions derived from deep learning have become less effective during the 2016-2017 period. - François Chollet, Google, Keras Author, 2017.12.18
"Science is marching on the funeral," and the future is determined by the students who have questioned everything I said. - Geoffrey Hinton, Deep Learning Godfather, Google Brain Leader, 2017.9.15
1. Learn to hit the wall deeply?
Although the roots of deep learning can be traced back decades ago (Schmidhuber, 2015), up to 5 years ago, people's attention to it was extremely limited. In 2012, Krizhevsky, Sutskever and Hinton published the paper “ImageNet Classification with Deep Convolutional Neural Networks†(Krizhevsky, Sutskever, & Hinton, 2012) and achieved top results in the ImageNet Target Recognition Challenge (Deng et al.). With the publication of such a series of high-impact papers, everything has undergone a fundamental change. At the time, other laboratories were already doing similar work (Cireşan, Meier, Masci, & Schmidhuber, 2012). At the end of 2012, she learned in depth the front page of the New York Times. Then, quickly become the most well-known technology in artificial intelligence. The idea of ​​training multi-layer neural networks is not new (and indeed it is), but deep learning has become practically available for the first time because of increased computing power and data.
Since then, deep learning has produced many top results in areas such as speech recognition, image recognition, and language translation, and has played an important role in many current AI applications. Large companies have also begun investing hundreds of millions of dollars to dig deeper learning talent. One of the key advocators of deep learning, Wu Enda, thought farther and said, “If one person takes a mental task to take less than one second to consider, we may use AI in the present or in the near future. Its automation." (A, 2016). A recent article in the New York Times Sunday Magazine about deep learning suggests that deep learning technology is "ready to reinvent itself."
Nowadays, deep learning may be approaching the corner. Most of it is similar to what I expected in the rise of deep learning (Marcus 2012), and it is similar to the important people like Hinton (Sabour, Frosst, & Hinton, 2017) and Chollet (2017). It has been hinted for months.
What exactly is deep learning? It shows what the characteristics of intelligence? What can we expect from it? When will it not work? How far are we from GM? How close? When dealing with unfamiliar issues, when can machines be as flexible as humans? The purpose of this article is not only to ease the expansion of irrationality, but also to consider the direction that needs to be advanced.
The paper is also written for researchers in the field, writing to AI consumers who lack a technical background and may want to understand the field. As a result, in the second part I will briefly and non-technically introduce what the deep learning system can do and why it is done well. Then, in the third part, we introduce the weaknesses of deep learning. The fourth part introduces the misunderstanding of deep learning ability. Finally we introduce the direction we can move forward.
Deep learning is not likely to die, nor should it die. But five years after the rise of deep learning, it seems time to critically reflect on the ability and insufficiency of deep learning.
2. What is deep learning? What can deep learning do?
Deep learning is essentially a statistical technique based on sample data that classifies patterns using multilayer neural networks. The neural network in the deep learning literature includes a series of input units representing pixels or words, multiple hidden layers containing hidden units (also called nodes or neurons) (the more layers, the deeper the network), and a series of output units There is a connection between the nodes. In a typical application, such a network can be trained on large sets of hand-written numbers (input, represented as images) and labels (output, represented as images) that represent the category to which the input belongs.
Over time, an algorithm called back-propagation has emerged which allows the connection between the units to be adjusted through a gradient descent process so that any given input can have a corresponding output.
In general, we can understand the relationship between the inputs and outputs learned by the neural network as a map. Neural networks, especially neural networks with multiple hidden layers, are particularly good at learning input-output mapping.
Such systems are often described as neural networks because input, hidden, and output nodes are similar to biological neurons but have been greatly simplified. The connections between nodes resemble connections between neurons.
Most deep learning networks heavily use convolution techniques (LeCun, 1989), which constrains the neural connections in the network and allows them to instinctively capture translational invariance. This is essentially an object that can slide around the image while maintaining its own characteristics; as in the figure above, assuming it is a circle in the upper-left corner, even if it lacks direct experience, it can eventually transition to the lower-right corner.
Deep learning also has a well-known ability to self-generate intermediate representations, such as internal units that can respond to more complex elements in horizontal lines or graph structures.
In principle, for a given infinite number of data, the deep learning system can show the limited deterministic "mapping" between a given input set and the corresponding output set, but in practice whether the system can learn such mappings depends on many factors. A common concern is the local minimum trap, that is, the system is suboptimal and there is no better solution in the nearby solution space. (Experts use a variety of techniques to avoid such problems and achieve better results.) In practice, large data sets usually have better results because they have a large number of possible mappings.
For example, in speech recognition, neural networks learn the mapping between sets of speech and labels (such as words or phonemes). In target recognition, the neural network learns the mapping between image sets and tag sets. In DeepMind's Atari game system (Mnih et al., 2015), the neural network learns the mapping between pixel and joystick positions.
The deep learning system is most often used as a classification system because its mission is to determine the category to which a given input belongs (defined by the output unit of the neural network). As long as there is enough imagination, the ability to classify is enormous; the output can represent almost everything, such as words, positions on the chess board.
In a world with unlimited data and computing resources, other technologies may not be needed.
3. The limitations of deep learning<br> The limitation of deep learning is above all the negative thesis: the data that we live in is not infinite. Systems that rely on deep learning often have to generalize to unseen data, which is not infinite, and the ability to ensure high-quality performance is limited.
We can use generalization as an extrapolation between known samples and data beyond the known training sample space (Marcus, 1998a).
For a neural network with good generalization performance, usually there must be a large amount of data, and the test data must be similar to the training data, so that the new answer is interpolated in the old data. In Krizhevsky et al.'s paper (Krizhevsky, Sutskever, & Hinton, 2012), a 9-layer convolutional neural network with 60 million parameters and 650,000 nodes is trained on approximately 1 million different samples from approximately 1000 categories.
This method of violence works well on a limited data set such as ImageNet, and all external stimuli can be classified into smaller categories. It also works well in stable areas. For example, in speech recognition, data can be mapped to a limited set of speech categories in a conventional manner, but for many reasons, deep learning is not a universal solution for artificial intelligence.
The following are the ten challenges facing the current deep learning system:
3.1 Current Deep Learning Requires Large Data <br> Humans can learn abstract relationships with only a few attempts. If I tell you schmister is a sister between the ages of 10 and 21. You may only need one example, and you can immediately launch that you have no schmister, your good friend has no schmister, your child or parent has no schmister and so on.
You don't need hundreds or even millions of training samples. You just need to use an abstract relationship between the variables of a few class algebras to give schmister an accurate definition.
Humans can learn such abstractions, either through accurate definitions or more implicit means (Marcus, 2001). In fact even a 7-month-old baby can learn abstract linguistic rules from a small number of unlabeled samples in just two minutes (Marcus, Vijayan, Bandi Rao, & Vishton, 1999). A subsequent study (2012) conducted by Gervain and colleagues showed that newborns can also perform similar learning.
Deep learning currently lacks a mechanism for learning abstract concepts through the definition of exact words. When there are millions or even billions of training samples, deep learning can achieve the best performance, such as DeepMind's research in chess games and Atari. As Brenden Lake and his colleagues recently emphasized in a series of papers, humans are much more efficient at learning complex rules than deep learning systems (Lake, Salakhutdinov, & Tenenbaum, 2015; Lake, Ullman, Tenenbaum, & Gershman, 2016 (see also related research work by George et al., 2017). My study with Steven Pinker's comparison of the over-regularization error in children and neural networks also proves this point.
Geoff Hinton also expressed concern about deep learning relying on a large number of labeling samples. This view was expressed in his latest Capsule Network study, which points out that convolutional neural networks may encounter “index inefficiencies,†resulting in network failures. Another problem is that the convolutional network has difficulty in generalizing to a new perspective. The ability to handle transformations (invariance) is inherent in the network, and for other common types of transformation invariance, we have to choose between duplicate feature detectors on the grid. The computational cost of this process is exponentially increasing. , Or increase the size of the marked training set, the amount of calculation is also exponential growth.
For problems without large amounts of data, deep learning is usually not the ideal solution.
3.2 Deep learning is still superficial and does not have sufficient ability to migrate <br> It is very important here to realize that “deep†is a technical and architectural nature in deep learning (ie, in modern neural networks) Use a large number of hidden layers, rather than conceptual meanings (ie, the representation of such network acquisition can naturally be applied to concepts such as "justice," democracy," or "intervention.")
Even concrete concepts like "ball" or "opponent" are difficult to learn by deep learning. Consider DeepMind's research on Atari games using deep reinforcement learning, which combines deep learning and reinforcement learning. The results seem superb: The system uses a single "hyper-parameter" set (control network properties such as learning rate) to reach or defeat human experts in a large number of game samples, where the system does not have knowledge of the specific game Even the rules do not know. However, people can easily overinterpret this result. For example, according to an extensive video about the system learning to play Bricklayer Atari game Breakout, “After 240 minutes of training, the system realized that breaking the brick wall through one channel was the most efficient technique to get high marks.â€
But in fact, the system has not learned this kind of thinking: it does not understand what the channel is, or what the brick wall is; it only learns specific strategies in specific scenarios. Migration testing (where a deep-enhanced learning system needs to face slightly different scenes from the training process) suggests that depth-enhancement learning methods often learn something. For example, Vicarious's research team showed that DeepMind's more advanced advanced technology - the Atari system "Asynchronous Advantage Actor-Critic" (also called A3C) is playing a variety of changes in the version of Breakout (such as changing the racket's Y coordinate, or There was a failure when a wall was added to the center of the screen. These counterexamples demonstrate that DFS cannot learn to generalize concepts such as brick walls or racquets; more precisely, such comments are caused by over-attribution in biological psychology. The Atari system did not really learn about the robust concept of brick walls, but only superficially broken brick walls in a narrow collection of highly trained scenarios.
I found similar results in the ski game scenario of the research team of the startup Geometric Intelligence (later acquired by Uber). In 2017, a team of Berkeley and OpenAI found that it was easy to construct confrontation samples in multiple games, making DQN (original DeepMind algorithm), A3C, and other related technologies (Huang, Papernot, Goodfellow, Duan, & Abbeel, 2017) is invalid.
Recent experiments by Robin Jia and Percy Liang (2017) show similar views in different fields (language). They trained a variety of neural networks for question answering system tasks (known as SQuAD, Stanford Question Answering Database) where the goal of the task was to mark words related to a given question in a particular paragraph. For example, there is a trained system that can accurately identify the winner of the Super Bowl XXXIII based on a short essay as John Elway. However, Jia and Liang showed that simply inserting interference sentences (for example, claiming that Google's Jeff Dean won in another cup) can make the accuracy rate drop sharply. In the 16 models, the average accuracy rate dropped from 75% to 36%.
In general, the pattern of deep learning extraction is actually more superficial than giving a first impression.
3.3 Deep learning so far has no natural way to deal with hierarchical structures. <br> For linguists like Chomsky, it is not surprising that the problems Robin Jia and Percy Liang have documented. Basically, most current deep learning methods are based on language models to express sentences as pure word sequences. However, Chomsky has always believed that languages ​​have a hierarchical structure, that is, a small component loop is built into a larger structure. (For example, in the sentence "the teenager who has recently changed the Atlantic set a record for flying around the world", the main sentence is "the teenager set a record for flying around the world", and "who was the past for the flying around the world". One sentence of identity.
In the 1980s, Fodor and Pylyshyn (1988) also expressed the same concern, which is an early branch of neural networks. In my 2001 article, I also speculated that a single recurrent neural network (SRN) is the predecessor of today's more complex deep learning approach based on a recurrent neural network (ie, RNN); Elman, 1990) has difficulty expressing and expanding each The recursive structure of the unfamiliar statement (see the essay paper for a specific type).
Earlier in 2017, Brenden Lake and Marco Baroni tested whether such pessimism was still correct. As they were written in the title of the article, contemporary neural networks “have remained unstructured for yearsâ€. RNN may "be different between training and testing... Generalization is good when it's very small, but when it comes to systematically combining skills to generalize, RNN is extremely unsuccessful."
Similar problems may also be exposed in other areas, such as planning and motor control, which require complex hierarchical structures, especially when new environments are encountered. We can indirectly see this from the difficulty mentioned above of the Atari game AI. More generally, in the robotics field, systems cannot generally summarize abstract plans in a brand new environment.
At the very least, the core issue of deep learning at present is that its learning feature set is relatively smooth or non-hierarchical. It is like a simple, unstructured list. Each feature is equal. Hierarchical structures (eg, syntax trees that distinguish main clauses from clauses in sentences) are not inherent in such a system, or are expressed directly. As a result, deep learning systems are forced to use a variety of fundamentally inappropriate agents, such as sentences. The sequence position of the word.
Systems like Word2Vec (Mikolov, Chen, Corrado, & Dean, 2013) express a single word as a vector with appropriate success. There are also a number of systems using little tricks trying to express complete sentences in deep learning compatible vector spaces (Socher, Huval, Manning, & Ng, 2012). However, as demonstrated by Lake and Baroni's experiments, the capacity of the circulating network is still limited, and it is not sufficient to accurately and reliably express and summarize the rich structural information.
3.4 Deep learning so far cannot be openly reasoned <br> If you can't figure out the difference between "John promised Mary to leave" and "John promised to leave Mary," you can't tell who is leaving and who What will happen down. Current machine reading systems have achieved some degree of success in tasks such as SQuAD, where answers to given questions are explicitly contained in texts or integrated in multiple sentences (called multi-level inference ) Or integrate in a few explicit sentences of background knowledge, but no specific text is marked. For humans, we can often make extensive inferences when reading texts, forming new and implicit thinking. For example, we can determine the intention of a character only through dialogue.
Although Bowman et al. (Bowman, Angeli, Potts & Manning, 2015; Williams, Nangia & Bowman, 2017) have taken some important steps in this direction, for now, no deep learning system can be based on real-world knowledge. Open reasoning and achieve human-level accuracy.
3.5 Insufficient deep learning to date <br><br> The characteristics of the "black box" of neural networks have been the focus of discussions in the past few years (Samek, Wiegand & Müller, 2017; Ribeiro, Singh & Guestrin, 2016). In the current typical state, the deep learning system has millions or even billions of parameters. Its developer-recognizable form is not a conventionally used ("last_character_typed") human identifiable tag, but only in a complicated network. Applicable geographic form (eg, the activity value of the ith node at layer j in network module k). Although visual tools allow us to see individual node contributions in a complex network (Nguyen, Clune, Bengio, Dosvitskiy & Yosinski, 2016), most observers believe that the neural network as a whole remains a black box.
In the long run, the importance of this situation is still not clear (Lipton, 2016). If the system is robust and self-contained, then there is no problem; if the neural network occupies an important position in a larger system, its debuggability is crucial.
To solve the problem of transparency, the potential for deep learning in some areas such as financial or medical diagnosis is fatal, and humans must understand how the system makes decisions. As Catherine O.Neill (2016) pointed out, this opacity can also lead to serious bias problems.
3.6 So far, deep learning has not been well combined with prior knowledge . An important direction of deep learning is hermeneutics, which isolates itself from other potentially useful knowledge. The way deep learning works usually involves finding a training data set, the various outputs associated with the input, learning through any sophisticated architecture or variation, and data cleansing and/or enhancement techniques, and then learning the relationship between input and output. The method of the problem. Of these, there are only a few exceptions, such as LeCun's study of convolutional constraints on neural network connections (LeCun, 1989). Prior knowledge is intentionally minimized.
Thus, for example, the system proposed by Lerer et al. (2016) learns the physical properties of falling objects from the tower, and there is no prior knowledge of physics (except for the content implied by the convolution). Here, Newton's law is not coded, and the system learns this law from raw pixel-level data (and on some limited levels) and approximates them. As pointed out in my forthcoming paper, deep learning researchers seem to have a strong prejudice against prior knowledge, even though (such as physics) these prior knowledge is well-known.
In general, integrating prior knowledge into deep learning systems is not easy: partly because the knowledge representation in deep learning systems is mainly the relationship between features (mostly opaque), not abstract quantitative statements ( If mortals eventually die, see the discussion of universal quantification of one-to-one mapping (Marcus, 2001), or generics (a statement that can be violated if the dog has four legs or the mosquito carries the Nile virus (Gelman, Leslie, Was & Koch, 2015 )).
This problem is rooted in the machine learning culture, emphasizing that the system needs to be self-contained and competitive – no need for even a little prior knowledge. The Kaggle Machine Learning Contest platform is an illustration of this phenomenon. Participants strive to obtain the best results for a given task on a given data set. The information needed for any given problem is neatly packaged with the relevant input and output files. We have made great progress in this paradigm (mainly in the field of image recognition and speech recognition).
The problem is, of course, that life is not a Kaggle contest; children don't pack all the data neatly into a single directory. In the real world we need to learn more fragmented data, and the problem is not so neatly encapsulated. Deep learning is very effective on issues such as speech recognition, which have many marks, but few people know how to apply them to more open issues. How to pick the rope stuck on the bicycle chain? Do I major in Mathematics or Neuroscience? The training set will not tell us.
The further away from the classification, the closer to common sense the more the problem cannot be solved by deep learning. In a recent study of common sense, I started with Ernie Davis (2015), starting with a series of easily deducible inferences, such as Prince William and his children Prince George who are taller? Can you use a polyester shirt for salads? If you put a needle on a carrot, is there a hole in the carrot or a hole in the needle?
As far as I know, there is no common sense to allow deep learning to answer such questions.
These very simple questions for human beings need to integrate a large number of different sources of knowledge, so there is still a long way to go from the category of deep learning to the style used. On the contrary, this may mean that if we want to achieve a human-level flexible cognitive ability, we need a completely different tool than deep learning.
3.7 So far, deep learning has not yet fundamentally differentiated causal relationships and related relationships <br> If the causal relationship does not really equate with the related relationship, then the difference between the two is also a serious problem for deep learning. Roughly speaking, deep learning is a complex correlation between input features and output features, rather than an inherent causal representation. Deep learning system can regard people as a whole and it is easy to learn that height and vocabulary are related, but it is more difficult to characterize the interrelationship between growth and development (children learn more words at the same time as they grow longer. Big, but it doesn't mean that growing taller will lead them to learn more words, and learning more words will not lead them to grow taller.) Causality has been a central factor in other methods for artificial intelligence (Pearl, 2000), but perhaps it is because the goal of deep learning is not these problems, so the field of deep learning has traditionally tackled this problem. less.
3.8 Deep learning assumes that the world is generally stable. The approach adopted may be probabilistic. The logic of deep learning is that in a highly stable world (such as rule-invariant go), the effect is likely to be best, while in politics and The effects of changing areas such as the economy are not as good. Even if deep learning is applied to tasks such as stock forecasting, it is likely to encounter the same fate as Google Flu Trends; Google Flu Trends can predict epidemiological data very well based on search trends, but Missed the 2013 flu season completely (Lazer, Kennedy, King, & Vespignani, 2014).
3.9 So far, deep learning is only a good approximation, and its answer is not completely reliable. <br> This problem is partly the result of other problems mentioned in this section. Deep learning is fairly comparable in a given area. Most of them work well, but they are still easy to be tricked into fooling.
More and more papers have shown this flaw, from the linguistic case given by Jia and Liang mentioned in the previous article to a large number of examples in the visual field, such as the image description system with deep learning mistakes the yellow and black stripes patterns. Think of the school bus (Nguyen, Yosinski, & Clune, 2014) as mislabeling the parking sign as a full-fledged refrigerator (Vinyals, Toshev, Bengio, & Erhan, 2014), while others appear to perform well.
There have also been cases where real-world stop signs have been mistaken for speed limit signs after being slightly modified (Evtimov et al., 2017), and 3D printed turtles have been mistaken for rifles (Athalye, Engstrom, Ilyas, & Kwok , 2017). There is also a recent news that a system of British police is difficult to distinguish between nudity and sand dunes. [10]
The paper that first proposed the "spoofability" of the deep learning system may be Szegedy et al. (2013). Four years have passed. Although research activities are very active, we have not yet found a robust solution.
3.10. So far, deep learning is hard to use in engineering. With the problems mentioned above, another fact is that it is still difficult to use deep learning for engineering development. As Google’s research team stated in the title of an important but still unanswered paper in 2014 (Sculley, Phillips, Ebner, Chaudhary, & Young, 2014): Machine learning is a “high interest technical debt credit card†This means that machine learning is relatively easy (short-term benefit) in creating systems that can work in certain limited environments, but it is important to ensure that they also work in other environments with entirely new data that may be different from previous training data. Difficulties (long-term debt, especially when a system is used as part of another larger system).
Leon Bottou (2015) compared machine learning with aircraft engine development in an important speech at ICML. He pointed out that while aircraft designs rely on the use of simpler systems to build complex systems, it is still possible to ensure reliable results and machine learning lacks the ability to obtain such assurance. As pointed out by Google’s Peter Norvig in 2016, machine learning currently lacks the progressiveness, transparency, and debuggability of traditional programming. In order to achieve deep learning robustness, some trade-offs need to be made in terms of simplicity.
Henderson and colleagues recently extended these ideas around deep learning, pointing out that the field faces some serious problems related to robustness and reproducibility (Henderson et al., 2017).
Although there has been some progress in the automation of the machine learning system's development process (Zoph, Vasudevan, Shlens, & Le, 2017), there is still a long way to go.
3.11 Discussion <br> Of course, deep learning is itself mathematics; none of the problems given above is because there is a loophole in underlying mathematics for deep learning. In general, deep learning is a perfect way to optimize complex systems that characterize the mapping between input and output when there are enough large data sets.
The real problem lies in misunderstanding what deep learning is good at and what it is not. This technique is good at solving the closed classification problem, that is, mapping a large number of potential signals to a limited number of classifications when there is enough available data and the test set is similar to the training set.
Deviating from these assumptions may cause problems; deep learning is only a statistical technique, and all statistical techniques will have problems when deviating from the assumptions.
When the amount of available training data is limited or there is a significant difference between the test set and the training set or the sample space is wide and there is a lot of new data, the effect of the deep learning system is not so good. And under the restrictions of the real world, some problems cannot be regarded as classification problems at all. For example, open natural language understanding should not be seen as a mapping between different large collections of finite sentences, but should be viewed as a mapping between a potentially infinite range of input sentences and an array of meanings of the same scale, many of which Never encountered before. The use of deep learning in such a problem is like a roundabout chisel, and it can only be a rough approximation. There are certainly solutions in other places.
By considering a series of experiments I had done many years ago (1997), I could gain an intuitive understanding of the current errors when I tested language development on a class of neural networks that later became popular in cognitive science. Some simple aspects. This network is much simpler than the current model. They use no more than three layers (1 input layer, 1 hidden layer, 1 output layer) and do not use convolution techniques. They also use back-propagation technology.
In language, this problem is called generalization. When I heard a sentence "John pilked a football to Mary," I could grammatically infer "John pilked Mary the football." If I know what pilk means, I can infer a new sentence "Eliza pilked the ball. The meaning of "to Alec" was even heard for the first time.
I believe that the extraction of a large number of language problems as simple examples is still a concern at the moment. I have run a series of experiments on training three-level perceptrons (full connection, no convolution) on the identity function f(x) = x.
Training samples are characterized by input nodes (and related output nodes) that characterize binary numbers. For example, the number 7 is represented as 4, 2 and 1 on the input node. To test the generalization ability, I trained the network with a variety of even sets and tested it with odd and even inputs.
I experimented with a variety of parameters and the output was the same: the network could apply the identity function exactly to the trained even number (unless only the local optimum was reached), and some other even numbers, but applied to all Odd numbers have encountered failures, such as f(15)=14.
In general, the neural networks I tested can all learn from the training samples and can generalize the set of points to the neighbors of these samples in the n-dimensional space (ie, the training space), but they cannot infer the results beyond the training space. .
Odd numbers are outside this training space, and the network cannot generalize identity functions out of this space. Even adding more hidden units or more hidden layers is useless. Simple multi-layer perceptrons cannot be generalized beyond the training space (Marcus, 1998a; Marcus, 1998b; Marcus, 2001).
The above is the generalization challenge in the current deep learning network, and it may exist for twenty years. Many of the issues mentioned in this article - data hungriness, dealing with the fragility of fraud, dealing with open inference and migration - can all be seen as extensions of this basic issue. Modern neural networks have a good generalization effect on data that is close to the core training data, but the generalization effect on data that differs greatly from training samples begins to collapse.
Widely used convolution ensures the resolution of specific categories of problems (similar to my identity problem): the so-called translation invariance, the object still maintains its identity after the position is transformed. However, this solution does not apply to all issues, such as Lake's recent display. (Data augmentation extends the training sample space and provides another way to address the challenges of deep learning extrapolation, but such techniques are more effective in language 2d than in languages.)
There is currently no universal solution to the generalization problem in deep learning. For this reason, if we want to implement universal artificial intelligence, we need to rely on different solutions.
4. The potential risks of over-hype <br> One of the biggest risks of current AI over-hype is to once again experience the cold winter of AI, just like in the 1970s.尽管现在的AI 应用比1970 年代多得多,但炒作ä»ç„¶æ˜¯ä¸»è¦æ‹…å¿§ã€‚å½“å´æ©è¾¾è¿™æ ·çš„高知å度人物在《哈佛商业评论》上撰文称自动化å³å°†åˆ°æ¥ï¼ˆä¸ŽçŽ°å®žæƒ…å†µæœ‰å¾ˆå¤§å‡ºå…¥ï¼‰ï¼Œè¿‡åº¦é¢„æœŸå°±å¸¦æ¥äº†é£Žé™©ã€‚æœºå™¨å®žé™…ä¸Šæ— æ³•åšå¾ˆå¤šæ™®é€šäººä¸€ç§’内就å¯ä»¥å®Œæˆçš„事情,从ç†è§£ä¸–界到ç†è§£å¥å。å¥åº·çš„人类ä¸ä¼šæŠŠä¹Œé¾Ÿé”™è®¤æˆæ¥æžªæˆ–把åœè½¦ç‰Œè®¤æˆå†°ç®±ã€‚
å¤§é‡æŠ•èµ„AI 的人最åŽå¯èƒ½ä¼šå¤±æœ›ï¼Œå°¤å…¶æ˜¯è‡ªç„¶è¯è¨€å¤„ç†é¢†åŸŸã€‚一些大型项目已ç»è¢«æ”¾å¼ƒï¼Œå¦‚Facebook çš„M 计划,该项目于2015 å¹´8 月å¯åŠ¨ï¼Œå®£ç§°è¦æ‰“é€ é€šç”¨ä¸ªäººè™šæ‹ŸåŠ©æ‰‹ï¼ŒåŽæ¥å…¶å®šä½ä¸‹é™ä¸ºå¸®åŠ©ç”¨æˆ·æ‰§è¡Œå°‘æ•°å®šä¹‰æ˜Žç¡®çš„äººç‰©ï¼Œå¦‚æ—¥åŽ†è®°å½•ã€‚
å¯ä»¥å…¬å¹³åœ°è¯´ï¼ŒèŠå¤©æœºå™¨äººè¿˜æ²¡æœ‰è¾¾åˆ°æ•°å¹´å‰ç‚’作ä¸çš„预期。举例æ¥è¯´ï¼Œå¦‚æžœæ— äººé©¾é©¶æ±½è½¦åœ¨å¤§è§„æ¨¡æŽ¨å¹¿åŽè¢«è¯æ˜Žä¸å®‰å…¨ï¼Œæˆ–è€…ä»…ä»…æ˜¯æ²¡æœ‰è¾¾åˆ°å¾ˆå¤šæ‰¿è¯ºä¸æ‰€è¯´çš„全自动化,让大家失望(与早期炒作相比),那么整个AI 领域å¯èƒ½ä¼šè¿Žæ¥å¤§æ»‘å¡ï¼Œä¸ç®¡æ˜¯çƒåº¦è¿˜æ˜¯èµ„金方é¢ã€‚我们或许已ç»çœ‹åˆ°è‹—头,æ£å¦‚Wired 最近å‘å¸ƒçš„æ–‡ç« ã€ŠAfter peak hype, self-driving cars 14 enter the trough of disillusionmentã€‹ä¸æ‰€è¯´çš„é‚£æ ·ï¼ˆhttps://)。
还有很多其他严é‡çš„æ‹…忧,ä¸åªæ˜¯æœ«æ—¥èˆ¬çš„场景(现在看æ¥è¿™ä¼¼ä¹Žè¿˜æ˜¯ç§‘å¹»å°è¯´ä¸çš„场景)。我自己最大的担忧是AI 领域å¯èƒ½ä¼šé™·å…¥å±€éƒ¨æžå°å€¼é™·é˜±ï¼Œè¿‡åˆ†æ²‰è¿·äºŽæ™ºèƒ½ç©ºé—´çš„错误部分,过于专注于探索å¯ç”¨ä½†å˜åœ¨å±€é™çš„æ¨¡åž‹ï¼Œçƒè¡·äºŽæ‘˜å–易于获å–的果实,而忽略更有风险的「å°è·¯ã€ï¼Œå®ƒä»¬æˆ–许最终å¯ä»¥å¸¦æ¥æ›´ç¨³å¥çš„å‘展路径。
我想起了Peter Thiel 的著å言论:「我们想è¦ä¸€è¾†ä¼šé£žçš„æ±½è½¦ï¼Œå¾—åˆ°çš„å´æ˜¯140 个å—ç¬¦ã€‚ã€æˆ‘ä»ç„¶æ¢¦æƒ³ç€Rosie the Robost è¿™ç§æä¾›å…¨æ–¹ä½æœåŠ¡çš„å®¶ç”¨æœºå™¨äººï¼Œä½†æ˜¯çŽ°åœ¨ï¼ŒAI å…å年历å²ä¸ï¼Œæˆ‘们的机器人还是åªèƒ½çŽ©éŸ³ä¹ã€æ‰«åœ°å’Œå¹¿å‘Šç«žä»·ã€‚
没有进æ¥å°±æ˜¯è€»è¾±ã€‚AI 有风险,也有巨大的潜力。我认为AI 对社会的最大贡献最终应该出现在自动科å¦å‘现ç‰é¢†åŸŸã€‚ä½†æ˜¯è¦æƒ³èŽ·å¾—æˆåŠŸï¼Œé¦–å…ˆå¿…é¡»ç¡®ä¿è¯¥é¢†åŸŸä¸ä¼šé™·äºŽå±€éƒ¨æžå°å€¼ã€‚
5. 什么会更好?
尽管我勾画了这么多的问题,但我ä¸è®¤ä¸ºæˆ‘ä»¬éœ€è¦æ”¾å¼ƒæ·±åº¦å¦ä¹ 。相å,我们需è¦å¯¹å…¶è¿›è¡Œé‡æ–°æ¦‚å¿µåŒ–ï¼šå®ƒä¸æ˜¯ä¸€ä¸ªæ™®éçš„è§£å†³åŠžæ³•ï¼Œè€Œä»…ä»…åªæ˜¯ä¼—多工具ä¸çš„一个。我们有电动螺ä¸åˆ€ï¼Œä½†æˆ‘们还需è¦é”¤åã€æ‰³æ‰‹å’Œé’³åï¼Œå› æ¤æˆ‘们ä¸èƒ½åªæåˆ°é’»å¤´ã€ç”µåŽ‹è¡¨ã€é€»è¾‘探头和示波器。
在感知分类方é¢ï¼Œå¦‚果有大é‡çš„æ•°æ®ï¼Œé‚£ä¹ˆæ·±åº¦å¦ä¹ 就是一个有价值的工具。但在其它更官方的认知领域,深度å¦ä¹ é€šå¸¸å¹¶ä¸æ˜¯é‚£ä¹ˆç¬¦åˆè¦æ±‚。那么问题是,我们的方å‘åº”è¯¥æ˜¯å“ªï¼Ÿä¸‹é¢æœ‰å››ä¸ªå¯èƒ½çš„æ–¹å‘。
5.1 æ— ç›‘ç£å¦ä¹ <br> 最近深度å¦ä¹ 先驱Geoffrey Hinton å’ŒYann LeCun éƒ½è¡¨æ˜Žæ— ç›‘ç£å¦ä¹ 是超越有监ç£ã€å°‘æ•°æ®æ·±åº¦å¦ä¹ çš„å…³é”®æ–¹æ³•ã€‚ä½†æ˜¯æˆ‘ä»¬è¦æ¸…楚,深度å¦ä¹ å’Œæ— ç›‘ç£å¦ä¹ 并䏿˜¯é€»è¾‘对立的。深度å¦ä¹ 主è¦ç”¨äºŽå¸¦æ ‡æ³¨æ•°æ®çš„æœ‰ç›‘ç£å¦ä¹ ,但是也有一些方法å¯ä»¥åœ¨æ— 监ç£çŽ¯å¢ƒä¸‹ä½¿ç”¨æ·±åº¦å¦ä¹ 。但是,许多领域都有ç†ç”±æ‘†è„±ç›‘ç£å¼æ·±åº¦å¦ä¹ æ‰€è¦æ±‚大釿 ‡æ³¨æ•°æ®ã€‚
æ— ç›‘ç£å¦ä¹ 是一个常用术è¯ï¼Œå¾€å¾€æŒ‡çš„æ˜¯å‡ ç§ä¸éœ€è¦æ ‡æ³¨æ•°æ®çš„系统。一ç§å¸¸è§çš„类型是将共享属性的输入「èšç±»ã€åœ¨ä¸€èµ·ï¼Œå³ä½¿æ²¡æœ‰æ˜Žç¡®æ ‡è®°å®ƒä»¬ä¸ºä¸€ç±»ä¹Ÿèƒ½èšä¸ºä¸€ç±»ã€‚Google 的猫检测模型(Le et al., 2012ï¼‰ä¹Ÿè®¸æ˜¯è¿™ç§æ–¹æ³•最çªå‡ºçš„æ¡ˆä¾‹ã€‚
Yann LeCun ç‰äººæå€¡çš„å¦ä¸€ç§æ–¹æ³•(Luc, Neverova, Couprie, Verbeek, & LeCun, 2017)起åˆå¹¶ä¸ä¼šç›¸äº’排斥,它使用åƒç”µå½±é‚£æ ·éšæ—¶é—´å˜åŒ–的数æ®è€Œæ›¿ä»£æ ‡æ³¨æ•°æ®é›†ã€‚直观上æ¥è¯´ï¼Œä½¿ç”¨è§†é¢‘è®ç»ƒçš„系统å¯ä»¥åˆ©ç”¨æ¯ä¸€å¯¹è¿žç»å¸§æ›¿ä»£è®ç»ƒä¿¡å·ï¼Œå¹¶ç”¨æ¥é¢„æµ‹ä¸‹ä¸€å¸§ã€‚å› æ¤è¿™ç§ç”¨ç¬¬t 帧预测第t+1 帧的方法就ä¸éœ€è¦ä»»ä½•äººç±»æ ‡æ³¨ä¿¡æ¯ã€‚
æˆ‘çš„è§‚ç‚¹æ˜¯ï¼Œè¿™ä¸¤ç§æ–¹æ³•都是有用的(其它一些方法本文并ä¸è®¨è®ºï¼‰ï¼Œä½†æ˜¯å®ƒä»¬æœ¬èº«å¹¶ä¸èƒ½è§£å†³ç¬¬3 èŠ‚ä¸æåˆ°çš„é—®é¢˜ã€‚è¿™äº›ç³»ç»Ÿè¿˜æœ‰ä¸€äº›é—®é¢˜ï¼Œä¾‹å¦‚ç¼ºå°‘äº†æ˜¾å¼çš„å˜é‡ã€‚è€Œä¸”æˆ‘ä¹Ÿæ²¡çœ‹åˆ°é‚£äº›ç³»ç»Ÿæœ‰å¼€æ”¾å¼æŽ¨ç†ã€è§£é‡Šæˆ–å¯è°ƒå¼æ€§ã€‚
也就是说,有一ç§ä¸åŒçš„æ— 监ç£å¦ä¹ 概念,它虽然很少有人讨论,但是ä»ç„¶éžå¸¸æœ‰æ„æ€ï¼šå³å„¿ç«¥æ‰€è¿›è¡Œçš„æ— 监ç£å¦ä¹ 。å©å们通常会为自己设置一个新的任务,比如æå»ºä¸€ä¸ªä¹é«˜ç§¯æœ¨å¡”,或者攀爬通过椅å的窗å£ã€‚é€šå¸¸æƒ…å†µä¸‹ï¼Œè¿™ç§æŽ¢ç´¢æ€§çš„é—®é¢˜æ¶‰åŠï¼ˆæˆ–至少似乎涉åŠï¼‰è§£å†³å¤§é‡è‡ªä¸»è®¾å®šçš„ç›®æ ‡ï¼ˆæˆ‘è¯¥æ€Žä¹ˆåŠžï¼Ÿï¼‰å’Œé«˜å±‚æ¬¡çš„é—®é¢˜æ±‚è§£ï¼ˆæˆ‘æ€Žä¹ˆæŠŠæˆ‘çš„èƒ³è†Šç©¿è¿‡æ¤…åï¼ŒçŽ°åœ¨æˆ‘èº«ä½“çš„å…¶ä»–éƒ¨åˆ†æ˜¯ä¸æ˜¯å·²ç»é€šè¿‡äº†ï¼Ÿï¼‰ï¼Œä»¥åŠæŠ½è±¡çŸ¥è¯†çš„æ•´åˆï¼ˆèº«ä½“是如何工作的,å„ç§ç‰©ä½“有哪些窗å£å’Œæ˜¯å¦å¯ä»¥é’»è¿‡åŽ»ç‰ç‰ï¼‰ã€‚å¦‚æžœæˆ‘ä»¬å»ºç«‹äº†èƒ½è®¾å®šè‡ªèº«ç›®æ ‡çš„ç³»ç»Ÿï¼Œå¹¶åœ¨æ›´æŠ½è±¡çš„å±‚é¢ä¸Šè¿›è¡ŒæŽ¨ç†å’Œè§£å†³é—®é¢˜ï¼Œé‚£ä¹ˆäººå·¥æ™ºèƒ½é¢†åŸŸå°†ä¼šæœ‰é‡å¤§çš„进展。
5.2 符å·å¤„ç†å’Œæ··åˆæ¨¡åž‹çš„å¿…è¦æ€§ <br> å¦ä¸€ä¸ªæˆ‘们需è¦å…³æ³¨çš„地方是ç»å…¸çš„符å·AI,有时候也称为GOFAI(Good Old-Fashioned AI)。符å·AI çš„åå—æ¥æºäºŽæŠ½è±¡å¯¹è±¡å¯ç›´æŽ¥ç”¨ç¬¦å·è¡¨ç¤ºè¿™ä¸€ä¸ªè§‚点,是数å¦ã€é€»è¾‘å¦å’Œè®¡ç®—机科å¦çš„æ ¸å¿ƒæ€æƒ³ã€‚åƒf = ma è¿™æ ·çš„æ–¹ç¨‹å…许我们计算广泛输入的输出,而ä¸ç®¡æˆ‘们以剿˜¯å¦è§‚察过任何特定的值。计算机程åºä¹Ÿåšç€åŒæ ·çš„事情(如果å˜é‡x 的值大于å˜é‡y 的值,则执行æ“作a)。
符å·è¡¨å¾ç³»ç»Ÿæœ¬èº«ç»å¸¸è¢«è¯æ˜Žæ˜¯è„†å¼±çš„,但是它们在很大程度上是在数æ®å’Œè®¡ç®—能力比现在少得多的时代å‘展起æ¥çš„。如今的æ£ç¡®ä¹‹ä¸¾å¯èƒ½æ˜¯å°†å–„于感知分类的深度å¦ä¹ 与优秀的推ç†å’ŒæŠ½è±¡ç¬¦å·ç³»ç»Ÿç»“åˆèµ·æ¥ã€‚人们å¯èƒ½ä¼šè®¤ä¸ºè¿™ç§æ½œåœ¨çš„åˆå¹¶å¯ä»¥ç±»æ¯”于大脑;如åˆçº§æ„ŸçŸ¥çš®å±‚é‚£æ ·çš„æ„ŸçŸ¥è¾“å…¥ç³»ç»Ÿå¥½åƒå’Œæ·±åº¦å¦ä¹ åšçš„æ˜¯ä¸€æ ·çš„,但还有一些如Broca 区域和å‰é¢å¶çš®è´¨ç‰é¢†åŸŸä¼¼ä¹Žæ‰§è¡Œæ›´é«˜å±‚æ¬¡çš„æŠ½è±¡ã€‚å¤§è„‘çš„èƒ½åŠ›å’Œçµæ´»æ€§éƒ¨åˆ†æ¥è‡ªå…¶åŠ¨æ€æ•´åˆè®¸å¤šä¸åŒè®¡ç®—法的能力。例如,场景感知的过程将直接的感知信æ¯ä¸Žå…³äºŽå¯¹è±¡åŠå…¶å±žæ€§ã€å…‰æºç‰å¤æ‚æŠ½è±¡çš„ä¿¡æ¯æ— ç¼åœ°ç»“åˆåœ¨ä¸€èµ·ã€‚
现已有一些å°è¯•æ€§çš„ç ”ç©¶æŽ¢è®¨å¦‚ä½•æ•´åˆå·²å˜çš„æ–¹æ³•,包括神ç»ç¬¦å·å»ºæ¨¡ï¼ˆBesold et al., 2017)和最近的å¯å¾®ç¥žç»è®¡ç®—机(Graves et al., 2016)ã€é€šè¿‡å¯å¾®è§£é‡Šå™¨è§„划(BoÅ¡njak, Rocktäschel, Naradowsky, & Riedel, 2016)和基于离散è¿ç®—的神ç»ç¼–程(Neelakantan, Le, Abadi, McCallum, & Amodei, 2016ï¼‰ã€‚è™½ç„¶è¯¥é¡¹ç ”ç©¶è¿˜æ²¡æœ‰å®Œå…¨æ‰©å±•åˆ°å¦‚åƒfull-service é€šç”¨äººå·¥æ™ºèƒ½é‚£æ ·çš„æŽ¢è®¨ï¼Œä½†æˆ‘ä¸€ç›´ä¸»å¼ ï¼ˆMarcus, 2001)将更多的类微处ç†å™¨è¿ç®—集æˆåˆ°ç¥žç»ç½‘ç»œä¸æ˜¯éžå¸¸æœ‰ä»·å€¼çš„。
对于扩展æ¥è¯´ï¼Œå¤§è„‘å¯èƒ½è¢«è§†ä¸ºç”±ã€Œä¸€ç³»åˆ—å¯é‡å¤ä½¿ç”¨çš„计算基元组æˆ- 基本å•元的处ç†ç±»ä¼¼äºŽå¾®å¤„ç†å™¨ä¸çš„ä¸€ç»„åŸºæœ¬æŒ‡ä»¤ã€‚è¿™ç§æ–¹å¼åœ¨å¯é‡æ–°é…置的集æˆç”µè·¯ä¸è¢«ç§°ä¸ºçŽ°åœºå¯ç¼–程逻辑门阵列ã€ï¼Œæ£å¦‚我在其它地方(Marcus,Marblestone,&Dean,2014ï¼‰æ‰€è®ºè¿°çš„é‚£æ ·ï¼Œé€æ¥ä¸°å¯Œæˆ‘们的计算系统所建立的指令集会有很大的好处。
5.3 æ¥è‡ªè®¤çŸ¥å’Œå‘展心ç†å¦çš„æ›´å¤šæ´žè§ <br> å¦ä¸€ä¸ªæœ‰æ½œåœ¨ä»·å€¼çš„领域是人类认知(Davis & Marcus, 2015; Lake et al., 2016; Marcus, 2001; Pinker & Prince, 1988)。机器没有必è¦çœŸæ£å–ä»£äººç±»ï¼Œè€Œä¸”è¿™æžæ˜“出错,远谈ä¸ä¸Šå®Œç¾Žã€‚但是在很多领域,从自然è¯è¨€ç†è§£åˆ°å¸¸è¯†æŽ¨ç†ï¼Œäººç±»ä¾ç„¶å…·æœ‰æ˜Žæ˜¾ä¼˜åŠ¿ã€‚å€Ÿé‰´è¿™äº›æ½œåœ¨æœºåˆ¶å¯ä»¥æŽ¨åŠ¨äººå·¥æ™ºèƒ½çš„å‘å±•ï¼Œå°½ç®¡ç›®æ ‡ä¸æ˜¯ã€ä¹Ÿä¸åº”该是精确地å¤åˆ¶äººç±»å¤§è„‘。
对很多人æ¥è®²ï¼Œä»Žäººè„‘çš„å¦ä¹ æ„味ç€ç¥žç»ç§‘å¦ï¼›æˆ‘认为这å¯èƒ½ä¸ºæ—¶å°šæ—©ã€‚我们还ä¸å…·å¤‡è¶³å¤Ÿçš„神ç»ç§‘å¦çŸ¥è¯†ä»¥çœŸæ£åˆ©ç”¨åå‘工程模拟人脑。人工智能å¯ä»¥å¸®åŠ©æˆ‘ä»¬ç ´è¯‘å¤§è„‘ï¼Œè€Œä¸æ˜¯ç›¸å。
ä¸ç®¡æ€Žæ ·ï¼Œå®ƒåŒæ—¶åº”该有æ¥è‡ªè®¤çŸ¥å’Œå‘展心ç†å¦çš„æŠ€æœ¯ä¸Žè§è§£ä»¥æž„å»ºæ›´åŠ é²æ£’和全é¢çš„人工智能,构建ä¸ä»…仅由数å¦é©±åŠ¨ï¼Œä¹Ÿç”±äººç±»å¿ƒç†å¦çš„线索驱动的模型。
ç†è§£äººç±»å¿ƒæ™ºä¸çš„先天机制å¯èƒ½æ˜¯ä¸€ä¸ªä¸é”™çš„å¼€å§‹ï¼Œå› ä¸ºäººç±»å¿ƒæ™ºèƒ½ä½œä¸ºå‡è®¾çš„æ¥æºï¼Œä»Žè€Œæœ‰æœ›åŠ©åŠ›äººå·¥æ™ºèƒ½çš„å¼€å‘;在本论文的姊妹篇ä¸ï¼ˆMarcus,尚在准备ä¸ï¼‰ï¼Œæˆ‘总结了一些å¯èƒ½æ€§ï¼Œæœ‰äº›æ¥è‡ªäºŽæˆ‘è‡ªå·±çš„æ—©æœŸç ”ç©¶ï¼ˆMarcus, 2001),å¦ä¸€äº›åˆ™æ¥è‡ªäºŽElizabeth Spelke çš„ç ”ç©¶ï¼ˆSpelke & Kinzler, 2007)。æ¥è‡ªäºŽæˆ‘è‡ªå·±çš„ç ”ç©¶çš„é‚£äº›é‡ç‚¹å…³æ³¨çš„æ˜¯è¡¨ç¤ºå’Œæ“作信æ¯çš„å¯èƒ½æ–¹å¼ï¼Œæ¯”如用于表示一个类别ä¸ä¸åŒç±»åž‹å’Œä¸ªä½“之间ä¸åŒå˜é‡å’Œå·®å¼‚çš„ç¬¦å·æœºåˆ¶ï¼›Spelke çš„ç ”ç©¶åˆ™å…³æ³¨çš„æ˜¯å©´å„¿è¡¨ç¤ºç©ºé—´ã€æ—¶é—´å’Œç‰©ä½“ç‰æ¦‚念的方å¼ã€‚
å¦ä¸€ä¸ªå…³æ³¨é‡ç‚¹å¯èƒ½æ˜¯å¸¸è¯†çŸ¥è¯†ï¼Œç ”ç©¶æ–¹å‘包括常识的å‘展方å¼ï¼ˆæœ‰äº›å¯èƒ½æ˜¯å› 为我们的天生能力,但大部分是åŽå¤©å¦ä¹ 到的)ã€å¸¸è¯†çš„表示方å¼ä»¥åŠæˆ‘们如何将常识用于我们与真实世界的交互过程(Davis & Marcus, 2015)。Lerer ç‰äººï¼ˆ2016)ã€Watters åŠå…¶åŒäº‹ï¼ˆ2017)ã€Tenenbaum åŠå…¶åŒäº‹ï¼ˆWu, Lu, Kohli, Freeman, & Tenenbaum, 2017)ã€Davis 和我(Davis, Marcus, & Frazier-Logue, 2017ï¼‰æœ€è¿‘çš„ç ”ç©¶æå‡ºäº†ä¸€äº›åœ¨æ—¥å¸¸çš„实际推ç†é¢†åŸŸå†…æ€è€ƒè¿™ä¸€é—®é¢˜çš„ä¸åŒæ–¹æ³•。
第三个关注é‡ç‚¹å¯èƒ½æ˜¯äººç±»å¯¹å™äº‹ï¼ˆnarrative)的ç†è§£ï¼Œè¿™æ˜¯ä¸€ä¸ªåކ岿‚ 久的概念,Roger Schank å’ŒAbelson 在1977 年就已æå‡ºï¼Œå¹¶ä¸”也得到了更新(Marcus, 2014; KoÄiský et al., 2017)。
5.4. 更大的挑战 <br> ä¸ç®¡æ·±åº¦å¦ä¹ æ˜¯ä¿æŒå½“å‰å½¢å¼ï¼Œè¿˜æ˜¯å˜æˆæ–°çš„东西,抑或被替代,人们也许认为大é‡çš„æŒ‘战问题会将系统推进到有监ç£å¦ä¹ æ— æ³•é€šè¿‡å¤§åž‹æ•°æ®é›†å¦ä¹ 到知识。以下是一些建议,它们部分摘自最近一期的《AI Magazine》特刊(Marcus, Rossi, Veloso - AI Magazine, & 2016, 2016),该æ‚志致力于超越我和Francesca Rossiã€Manuelo Veloso 一起编辑的æ‚志《Turing Test》:
ç†è§£åŠ›æŒ‘æˆ˜ï¼ˆParitosh & Marcus, 2016; KoÄiský et al., 2017)需è¦ç³»ç»Ÿè§‚看一个任æ„的视频(或者阅读文本ã€å¬å¹¿æ’),并就内容回ç”å¼€æ”¾é—®é¢˜ï¼ˆè°æ˜¯ä¸»è§’?其动机是什么?如果对手æˆåŠŸå®Œæˆä»»åŠ¡ï¼Œä¼šå‘生什么?)。没有专门的监ç£è®ç»ƒé›†å¯ä»¥æ¶µç›–所有å¯èƒ½çš„æ„å¤–äº‹ä»¶ï¼›æŽ¨ç†å’ŒçŽ°å®žä¸–ç•Œçš„çŸ¥è¯†æ•´åˆæ˜¯å¿…需的。
ç§‘å¦æŽ¨ç†ä¸Žç†è§£ï¼Œæ¯”å¦‚è‰¾ä¼¦äººå·¥æ™ºèƒ½ç ”ç©¶æ‰€çš„ç¬¬8 çº§çš„ç§‘å¦æŒ‘战(Schoenick, Clark, Tafjord, P, & Etzioni, 2017; Davis, 2016)。尽管很多基本科å¦é—®é¢˜çš„ç”æ¡ˆå¯è½»æ˜“从网络æœç´¢ä¸æ‰¾åˆ°ï¼Œå…¶ä»–é—®é¢˜åˆ™éœ€è¦æ¸…晰陈述之外的推ç†ä»¥åŠå¸¸è¯†çš„æ•´åˆã€‚
一般性的游æˆçŽ©æ³•ï¼ˆGenesereth, Love, & Pell, 2005),游æˆä¹‹é—´å¯è¿ç§»ï¼ˆKansky et al., 2017ï¼‰ï¼Œè¿™æ ·ä¸€æ¥ï¼Œæ¯”如å¦ä¹ 一个第一人称的射击游æˆå¯ä»¥æé«˜å¸¦æœ‰å®Œå…¨ä¸åŒå›¾åƒã€è£…备ç‰çš„å¦ä¸€ä¸ªæ¸¸æˆçš„表现。(一个系统å¯ä»¥åˆ†åˆ«å¦ä¹ 很多游æˆï¼Œå¦‚果它们之间ä¸å¯è¿ç§»ï¼Œæ¯”如DeepMind çš„Atari 游æˆç³»ç»Ÿï¼Œåˆ™ä¸å…·å¤‡èµ„æ ¼ï¼›å…³é”®æ˜¯è¦èŽ·å–ç´¯åŠ çš„ã€å¯è¿ç§»çš„知识。)
物ç†å…·åŒ–地测试一个人工智能驱动的机器人,它能够基于指示和真实世界ä¸ä¸Žç‰©ä½“éƒ¨ä»¶çš„äº¤äº’è€Œä¸æ˜¯å¤§é‡è¯•é”™ï¼Œæ¥æå»ºè¯¸å¦‚ä»Žå¸ç¯·åˆ°å®œå®¶è´§æž¶è¿™æ ·çš„系统(Ortiz Jr, 2016)。
没有一个挑战å¯èƒ½æ˜¯å……足的。自然智能是多维度的(Gardner, 2011ï¼‰ï¼Œå¹¶ä¸”åœ¨ä¸–ç•Œå¤æ‚度给定的情况下,通用人工智能也必须是多维度的。
通过超越感知分类,并进入到推ç†ä¸ŽçŸ¥è¯†çš„æ›´å…¨é¢æ•´åˆä¹‹ä¸ï¼Œäººå·¥æ™ºèƒ½å°†ä¼šèŽ·å¾—å·¨å¤§è¿›æ¥ã€‚
6.ç»“è¯ <br> 为了衡é‡è¿›æ¥ï¼Œæœ‰å¿…è¦å›žé¡¾ä¸€ä¸‹5 年剿ˆ‘å†™ç»™ã€Šçº½çº¦å®¢ã€‹çš„ä¸€ç¯‡æœ‰äº›æ‚²è§‚çš„æ–‡ç« ï¼ŒæŽ¨æµ‹ã€Œæ·±åº¦å¦ä¹ åªæ˜¯æž„建智能机器é¢ä¸´çš„15 个更大挑战的一部分ã€ï¼Œå› 为「这些技术缺ä¹è¡¨å¾å› 果关系(比如疾病与症状)的方法ã€ï¼Œå¹¶åœ¨èŽ·å–「兄弟å§å¦¹ã€æˆ–「相åŒã€ç‰æŠ½è±¡æ¦‚念时é¢ä¸´æŒ‘战。它们没有执行逻辑推ç†çš„æ˜¾å¼æ–¹æ³•ï¼Œæ•´åˆæŠ½è±¡çŸ¥è¯†è¿˜æœ‰å¾ˆé•¿çš„è·¯è¦èµ°ï¼Œæ¯”å¦‚å¯¹è±¡ä¿¡æ¯æ˜¯ä»€ä¹ˆã€ç›®æ ‡æ˜¯ä»€ä¹ˆï¼Œä»¥åŠå®ƒä»¬é€šå¸¸å¦‚何使用。
æ£å¦‚我们所è§ï¼Œå°½ç®¡ç‰¹å®šé¢†åŸŸå¦‚è¯éŸ³è¯†åˆ«ã€æœºå™¨ç¿»è¯‘ã€æ£‹ç›˜æ¸¸æˆç‰æ–¹é¢å‡ºçްé‡å¤§è¿›å±•ï¼Œå°½ç®¡åœ¨åŸºç¡€è®¾æ–½ã€æ•°æ®é‡å’Œç®—力方é¢çš„è¿›å±•åŒæ ·ä»¤äººå°è±¡æ·±åˆ»ï¼Œä½†è¿™äº›æ‹…å¿§ä¸çš„很多ä¾ç„¶å˜åœ¨ã€‚
æœ‰è¶£çš„æ˜¯ï¼ŒåŽ»å¹´å¼€å§‹ä¸æ–有其他å¦è€…从ä¸åŒæ–¹é¢å¼€å§‹å¼ºè°ƒç±»ä¼¼çš„å±€é™ï¼Œè¿™å…¶ä¸æœ‰Brenden Lake å’ŒMarco Baroni (2017)ã€François Chollet (2017)ã€Robin Jia å’ŒPercy Liang (2017)ã€Dileep George åŠå…¶ä»–Vicarious åŒäº‹(Kansky et al., 2017)〠Pieter Abbeel åŠå…¶Berkeley åŒåƒš(Stoica et al., 2017)。
ä¹Ÿè®¸è¿™å½“ä¸æœ€è‘—åçš„è¦æ•°Geoffrey Hinton,他勇于åšè‡ªæˆ‘é¢ è¦†ã€‚ä¸Šå¹´8 月接å—Axios 采访时他说自己「深深怀疑ã€åå‘ä¼ æ’ï¼Œå› ä¸ºä»–å¯¹åå‘ä¼ æ’å¯¹å·²æ ‡æ³¨æ•°æ®é›†çš„ä¾èµ–性表示担忧。
相å,他建议「开å‘一ç§å…¨æ–°çš„æ–¹æ³•ã€ã€‚与Hinton ä¸€æ ·ï¼Œæˆ‘å¯¹æœªæ¥çš„下一æ¥èµ°å‘深感兴奋。
Bitmain Miner:Bitmain Antminer L7 (9.5Gh),Bitmain Antminer L7 (9.16Gh),Bitmain Antminer L7 (9.05Gh),Bitmain Antminer L3+ (504Mh),Bitmain Antminer L3+ (600Mh),Bitmain Antminer L3++ (580Mh),Bitmain Antminer L3++ (596Mh)
Bitmain is the world's leading digital currency mining machine manufacturer. Its brand ANTMINER has maintained a long-term technological and market dominance in the industry, with customers covering more than 100 countries and regions. The company has subsidiaries in China, the United States, Singapore, Malaysia, Kazakhstan and other places.
Bitmain has a unique computing power efficiency ratio technology to provide the global blockchain network with outstanding computing power infrastructure and solutions. Since its establishment in 2013, ANTMINER BTC mining machine single computing power has increased by three orders of magnitude, while computing power efficiency ratio has decreased by two orders of magnitude. Bitmain's vision is to make the digital world a better place for mankind.
Bitmain Miner,antminer L3 Miner,Asic L3 Antminer,Bitmain Antminer L7,ltc miner
Shenzhen YLHM Technology Co., Ltd. , https://www.sggminer.com