Applying Logic Tensor Networks (Part 1)

In previous blog posts I have already talked about Logic Tensor Networks in general, their relation to Conceptual Spaces, and several additional membership functions that are in line with the Conceptual Spaces framework. As I already mentioned before, I want to apply them in a “proof of concept” scenario. Today I’m going to sketch this scenario in more detail.

The current scenario consists of a conceptual space for movies. It has been extracted by Derrac and Schockaert [1] from a large collection of movie reviews and published online, freely accessible to everyone. This is quite nice, because we can use an off-the-shelf conceptual space instead of having to define one ourselves.

The data set contains 15,000 movies that have been annotated with 25 genres (e.g., children, horror, news, and thriller). Each movie comes with both a point describing its location in the conceptual space and with a list of genres that apply to it. This means that a movie can belong to multiple genres (e.g., action and thriller) at the same time.

Derrac and Schockaert have extracted four different spaces from their movie review data, having 20, 50, 100, and 200 dimensions, respectively. In each of these spaces, they searched for interpretable directions and created an additional version of this space by projecting all movies onto these interpretable directions. So in total, we have eight different conceptual spaces that each contain 15,000 movies.

That’s our data set, but what’s our task?

On the one hand, we want to learn a good classification for the different genres. This means, that we want to be able to predict the relevant genres for an given movie based on its position in the conceptual space.
On the other hand, we also want to extract rules from the data set. Based on the movies seen during training, we want to judge the likelihood of phrases like “action AND crime thriller” or “children ⇒ NOT horror” to be true.

Logic Tensor Networks can in theory provide a solution to both tasks. My goal is now to find out whether they are also able to do so in practice. Moreover, I want to investigate the influence of different membership on the result.

In order to make such an analysis, we need to measure the performance of the LTN and compare it to some sort of baseline. For now, I will only introduce the two baselines used in the two tasks. I will give more detail on performance measures in one of my future blog posts.

For the classification task, the baseline consists of a simple “k nearest neighbor” (kNN) classifier. In order to classify a new movie, the kNN classifier takes a look at its position in the conceptual space and at the k movies that are closest to it and for which the genres are known. The classification is then done by simply counting how often the different genres occur in this set of k movies. This classifier is quite simple, but nevertheless a standard tool in machine learning. The LTN should achieve a performance that is at least comparable to a kNN classifier.

For the rule extraction task, our baseline consists of simple label counting. We completely ignore the conceptual space and only look at the genres: In order to compute the likelihood of “children ⇒ NOT horror” to be true, we simply compute the percentage of movies labeled as “children” that were not labeled as “horror”. Similarly, for “action AND crime thriller” we look at all movies that are labeled as both “action” and “crime” and compute the percentage of these movies that were also labeled as “thriller”. Again, this is a simple yet reasonable baseline which the LTN should be able to beat.

This concludes the general overview of my “proof of concept” scenario. In one of my next blog posts, I will elaborate on ways to evaluate and compare the performance of the different approaches with respect to the two tasks sketched above.

References

[1] Derrac, Joaquín, and Steven Schockaert. “Inducing semantic relations from conceptual spaces: a data-driven approach to plausible reasoning.” Artificial Intelligence 228 (2015): 66-94. Preprint

5 thoughts on “Applying Logic Tensor Networks (Part 1)”

  1. Over the months I’ve skipped over all your logic tensor network posts, but today I printed them out, aggregated them into a single packet, and look forward to reading them as a whole unit.

    My full-time job and single class I’m taking (AI) has kept me from reading/studying conceptual spaces lately. But I did find an important nugget. Just three days ago Numenta published a paper about the relationship between grid cells and sparse distributed representations. Grid cells apparently work to orient the animal in physical space. But they are apparently much more important than just that. I didn’t get a chance to read the paper yet, but on a Numenta podcast, Jeff Hawkins, the paper’s first author and founder of Numenta, explained that their researchers hypothesize that grid cells also work to orient objects in a type of “object space,” which contributes to composition. Hawkins gave the example that grid cells help us understand that a coffee mug and its logo are a composite in the same space, even though they are different concepts. Hawkins said that he’s beginning to believe that grid cells are a primary tool of the cortex. As conceptual space theory is a theory of geometry/space, location, and distance, all this new neuroscience research supports some sort of physical instantiation of Gardenfors’ framework in the brain.

    Here is the abstract and a link to the paper. I want to emphasize the very last sentence of the abstract: “The similarity of circuitry observed in all cortical regions is strong evidence that even high-level cognitive tasks are learned and represented in a location-based framework.”

    Abstract

    How the neocortex works is a mystery. In this paper we propose a novel framework for understanding its function. Grid cells are neurons in the entorhinal cortex that represent the location of an animal in its environment. Recent evidence suggests that grid cell-like neurons may also be present in the neocortex. We propose that grid cells exist throughout the neocortex, in every region and in every cortical column. They define a location-based framework for how the neocortex functions. Whereas grid cells in the entorhinal cortex represent the location of one thing, the body relative to its environment, we propose that cortical grid cells simultaneously represent the location of many things. Cortical columns in somatosensory cortex track the location of tactile features relative to the object being touched and cortical columns in visual cortex track the location of visual features relative to the object being viewed. We propose that mechanisms in the entorhinal cortex and hippocampus that evolved for learning the structure of environments are now used by the neocortex to learn the structure of objects. Having a representation of location in each cortical column suggests mechanisms for how the neocortex represents object compositionality and object behaviors. It leads to the hypothesis that every part of the neocortex learns complete models of objects and that there are many models of each object distributed throughout the neocortex. The similarity of circuitry observed in all cortical regions is strong evidence that even high-level cognitive tasks are learned and represented in a location-based framework.

    I am also trying to read about category theory in my lack of free time.

    I have actively tasked my subconscious to think about the relationship between conceptual spaces and sparse distributed representations. Lots of goodies here, a real path to AGI in my humble opinion, just don’t have the time to properly think about this relationship.

    1. Thanks for the link, that indeed sounds relevant. I’ve already read Hawkins’ book “On Intelligence” a couple of years ago and found it quite interesting. I’ll definitely take a look at the paper you mentioned, thanks for the pointer!
      With respect to LTNs: As already hinted at in the blog post, there will be additional posts with updates regarding my research. I hope that reading them back to back is not too painful – after all I wrote them not in one session, so I don’t know how well they fit together. Nevertheless: Have fun reading 🙂

  2. 24 hours later, I did get a chance to read the full paper, and my 30,000-feet summary in a few sentences is this: Hierarchical functionality of cortical columns should be severely discounted. Instead, as Hawkins writes, “…the function of the neocortex is best understood in a framework of locations and location spaces.” There is even a section of the paper where Hawkins discusses how abstract concepts (not just objects) can be represented by distances and locations represented by grid cells.

    My other main takeaway is that this new neuroscience research is a huge slap in the face for Deep Learning as a viable model of the brain. DL may be a great statistical method, but not the path to AGI.

    1. My other main takeaway is that this new neuroscience research is a huge slap in the face for Deep Learning as a viable model of the brain.

      I never believed much in deep learning “as a viable model of the brain” anyways – as you say, it’s a good function approximator and a good tool, but not the ultimate answer.

Leave a Reply