Baidu Research

Baidu Research Showcased at Top Artificial Intelligence Conferences

2018-01-31

The AAAI (Association for the Advancement of Artificial Intelligence) is one of the world’s premiere artificial conferences, with annual summits since 1979. This year, Baidu Research will be presenting four papers discussing research on how artificial intelligence is affecting a variety of fields, from language translation to job-skill popularity. AAAI 2018 will take place in New Orleans on February 2-7.

In addition, Baidu Research will also be presenting one paper on end-to-end neural approach for open-domain information extraction at WSDM, one of the premier conferences on web inspired research involving search and data mining. WSDM (Web Search and Data Mining) Conference, which will take place in Los Angeles during February 5-9, is a highly selective, single track meeting that includes invited talks as well as refereed full papers.

Ahead of our presentations, below is a quick look at Baidu’s papers that explore topics of improving language translation through multiple-channel encoding, discovery of a surface normal representation for an unsupervised space-depth estimation framework, a data-driven approach to measuring job-skill popularity, keyless attention to video classification, and open-domain information extraction.

1. Multi-channel Encoder for Neural Machine Translation

In this paper, we proposed a multiple-channel language encoder that can enhance both a standard encoder and the attention component of an attention-based neural machine translation. In order to enable the encoder to create a detailed sentence, Baidu researchers utilized an original word used for embedding a raw encoding and designed an external memory in their Neural Turing Machine. Then the researchers developed a gated annotation mechanism to automatically learn the weights of different encoding components.

The results revealed the multi-channel encoder improves the quality of translation when it comes to Chinese-English translations, while English-French translations indicate that the shallow recurrent neural networks-based model obtains a comparable performance against previously conducted experiments.

2. Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency

Recently, developing machines to observe the world from a 3D perspective, similar to how humans can interpret 3-dimensions, has become a heavily researched topic. In this paper, we introduce a surface normal representation for unsupervised, space-depth estimation framework by incorporating an edge-aware, depth-normal consistency constraint inside the network. This gives machines the capability to learn depths.

The results achieve state-of-the-art performance when only using monocular videos. We discovered that their novel depth-normal regularization enforces the geometry consistency between different projections of a 3D scene, improving the evaluation performances of machines.

3. Measuring the Popularity of Job Skills in Recruitment Market: A Multi-Criteria Approach

As the job market becomes increasingly more competitive, several questions remain about the types of skills needed for various industries. Data is quickly becoming a new way to understand not only the importance of certain skills, but also the importance employers place on these skills for certain jobs.

In this paper, we evaluated a data driven approach for modeling the popularity of job skills based on the analysis of large-scale recruitment data. Through the building of a job skill network, “Skill-Net” and a Skill Popularity based Topic Model (SPTM), we were able validate the effectiveness of SPTM for measuring the popularity of job skills, and also reveal some interesting findings, including the popular job skills which lead to high-paid employment.

4. Multimodal Keyless Attention Fusion for Video Classification

Image classification and understanding was one of the first major breakthroughs in AI, but understanding video has posed several additional challenges as video classification is inherently multimodal. Image, motion, as well as sound cues may be necessary to make an educated judgment. However, existing end-to-end classification approaches are restricted to small-scale datasets.

For videos, recurrent neural networks (RNNs) may be invoked to better capture longer-range temporal patterns and relationships, and multimodal features can then more easily be extracted from a new video using the trained models. Our research proposes a Multimodal Keyless Attention Fusion which allows for fast and effective learning of RNN models and is the most successful at discerning interactions between modalities.

5. Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set containing 48,248 sentences and the corresponding facts in the SAOKE format labeled by crowdsourcing. Using this data set, we train an end-to-end neural model using the sequence-to-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words.

An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states- of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction.