On the topic of “NLP-like search interface”: we could do it with Controlled Natural Language (CNL).
AFAIK, two leading approaches to CNL are:
- Attempto Controlled English (ACE) at ETH Zurich
- Grammatical Framework, described below
- Grammatical Framework, is a programming language for multilingual grammar applications. It is
- a special-purpose language for grammars, like YACC, Bison, Happy, BNFC, but not restricted to programming languages
- a functional language, like Haskell, Lisp, OCaml, Scheme, SML, but specialized to grammar writing
- a natural language processing framework, like LKB, XLE, Regulus, but based on functional programming and type theory
- a categorial grammar formalism, like ACG, CCG, but different and equipped with different tools
- a logical framework, like Agda, Coq, Isabelle, but equipped with concrete syntax in addition to logic
It is used in the www.molto-project.eu in which Ontotext participates, in particular for NLP searches and verbalization of painting data
Grammatical Framework: Programming with Multilingual Grammars
Grammars of natural languages are complex systems, and their computer implementation requires both programming skills and linguistic knowledge, especially when dealing with other languages than English. This book makes such tasks accessible for a wide range of programmers. It introduces GF (Grammatical Framework), which is a programming language designed for writing grammars, which may moreover address several languages in parallel. The book shows how to write grammars in GF and use them in applications such as tourist phrasebooks, spoken dialogue systems, and natural language interfaces. The examples and exercises address several languages, and the readers are guided to look at their own languages from the computational perspective.
- Slides for teaching the book chapter by chapter.
- Code examples. You can also download the complete example set as a compressed tar file, gf-book-examples.tgz
- GF Web IDE: build grammars in the cloud, without installing GF. .
GF can be used for building
- translation systems
- multilingual web gadgets
- natural-language interfaces
- dialogue systems
- natural language resources
Quite interesting demos:
Mechanics of the Grammatical Framework
Grammatical Framework (GF) is a well known theoretical framework and a mature programming language for the description of natural languages. The GF community is growing rapidly and the range of applications is expanding. Within the framework, there are computational resources for 26 languages created from different people in different organizations. The coverage of the different resources varies but there are complete morphologies and grammars for at least 20 languages. This advancement would not be possible without the continuous development of the GF compiler and interpreter.
- Krasimir Angelov is a former Ontotext employee
- he did his PhD student at the Department of Computer Science of Chalmers, working on the compiler and the interpreter of GF, authoring tools, and translation.
- He continues work as a postdoc funded by MOLTO, see Molto publications
- He is one of 3 current developers and maintainers of the Grammatical Framework:
Let's try one of the GF examples. TODO: add a walkthrough here.
This will showcase grammatical structures http://grammaticalframework.org:41296/minibar/minibar.html
- play with a couple of the grammars
- notice how it lets you type, yet autocompletes words: very nice
In RS when it gets to a terminal spot, it will start thesaurus auto-complete.
- select same or different language and observe the translation (esp on "MiniGrammar.pgf")
- click on the tree icons to see the syntax trees
We MUST define the query syntax grammar before we dive into details.
- there are tons of parser generators available, ANTLR seems to be most popular in the java world
- we also want assisted grammar-controlled editing like the above example
- GF introduces a complication: it targets multi-lingual translation, so
there's an Abstract grammar that's mapped to various NL grammars (EN, FR, ...)
- controlled entry in any source NL is mapped to Abstract, then to any target NL
- for RS Search we also need 2 languages: EN and SPARQL
- so it may be appropriate to use GF
- another option (by Milen Chechev) is
EN grammar -> FSA to control autocompletion;
EN parse tree -> SPARQL generation (like in a compiler)
- the interesting twist is that our grammar is driven in part by FRs:
for each FR we know its domain, range, name; and that should generate part of the grammar
when you type, the terms can come from any thesaurus. Once it is selected the source is then known
That's what we discussed yesterday, but how does it mesh with
the left-to-right ordering of a sentence, and the fact that we have definitions of "thing to " but not " to thing".
I.e. FRs define "thing fromPerson <Rembrandt>" but not "<Rembrandt> isPersonOf thing"
How does the user edit the search?
In the above demo you have a <x] button meaning "backspace", so to change something, you have to delete everything to the right of it.
That is in the nature of grammars: they parse left-to-right (possibly with limited lookahead).
- “Cotton” (material) OR “Silk” (material) and you changed “Cotton” (material) to “Paris” (location) then the interface would then look like this:
- “Paris” OR _______ (Blank). The other term should be removed.
- If a new location was not put in the blank space then “Paris” would just become another AND.
I can't imagine how we can implement this with formal grammars.
Much simpler is to ask the user to backspace all the way (same as in the demo!) and put "Paris": the result is the same