In computing, an address space defines a range of discrete addresses, each of which may correspond to a physical or virtual memory register, a network host, peripheral device, disk sector or other logical or physical entity.
A memory address identifies a physical location in computer memory, somewhat similar to a street address in a town. The address points to the location where data is stored, just like your address points to where you live. In the analogy of a person's address, the address space would be an area of locations, such as a neighborhood, town, city, or country. Two addresses may be numerically the same but refer to different locations, if they belong to different address spaces. This is similar to your address being, say, "32, Main Street", while another person may reside in "32, Main Street" in a different town from yours. For more information, see Wikipedia .
Anchor is an area within a resource that can be the source or destination of zero, one or more links. An anchor may refer to the whole resource, particular parts of the resource, or to particular manifestations of the resource.
For example, an HTML <a name="http://www.w3.org/Protocols/#News">...</a> element.
An annotation , is a form of meta-data attached to a particular section of document content. The section may be a single word, a sentence or even a series of paragraphs. An annotation must have a type (or a name) which is used to create classes of similar annotations, usually linked together by their semantics. For more information, see Semantic Annotation .
The term "description logic" refers to a logic that focuses on descriptions as its principal means for expressing logical expressions. A description logic system emphasizes the use of classification and subsumption reasoning as its primary mode of inference.
Today description logic has become a cornerstone of the Semantic Web for its use in the design of ontologies. The OWL-DL and OWL-Lite sub-languages of the W3C-endorsed Web Ontology Language (OWL) are based on a description logic.
Document Repository is a KIM Platform component for storing, retrieving, and indexing of annotated documents with semantic, full-text and co-occurrence query support. To achieve that, the KIM Platform integrates and adapts different storage engines, like Oracle'TM' DB, Apache Lucene, and OWLIM .
An entity is something that has a distinct, separate existence, for example a particular person - Barack Obama or a particular object - Airforce One. The entity does not need to be a material existence. In particular, abstractions and legal fictions are usually regarded as entities. In the semantic web, entities have unique and persistent URIs.
FOL is a system of deduction that extends propositional logic by allowing quantification over individuals of a given domain of discourse. For example, it can be stated in FOL "Every individual has the property P". For more information, see Wikipedia.
FTS refers to a technique for searching a computer-stored document or database. In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user. However, when the number of documents to search is potentially large or the quantity of search queries to perform is substantial the problem of full text search is often divided into two tasks: indexing and searching. The indexing stage will scan the text of all the documents and build a list of search terms, often called an index, but more correctly named a concordance. In the search stage, when performing a specific query, only the index is referenced rather than the text of the original documents.
A gazetteer consists of a set of lists containing names of things such as cities, organizations, days of the week, etc. These lists are typically used to assist with the task of Named Entity Recognition (NER), although they may be used for any purpose. When the gazetteer is run on a document, annotations will be created for each matching string in the text.
Below is a small section from a list for units of currency:
- German mark
- German marks
Hypermedia is the use of text, data, graphics, audio, and video as elements of an extended hypertext system in which all elements are linked so that the user can move between them at will.
Search engine indexing collects, parses, and stores data to facilitate fast and accurate searching. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and computer science. Popular search engines focus on the full-text indexing of online, natural language documents. Media types such as video and audio and graphics are also searchable.
The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power.
Information Extraction is a process that takes unseen texts as input and produces fixed-format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in Information Retrieval (IR) applications. For more information, see GATE User Guide.
Information Retrieval simply finds texts and presents them to the user, while the typical Information Extraction application analyzes texts and presents only the specific information from them that the user is interested in.
KIMLO contains some lexical resource-related concepts, mostly used to represent lexis required by the KIM information extraction sub-system. Together with KIMSO they import the System module of PROTON.
KIMSO contains a few meta- or system-level primitives used by KIM. Together with KIMLO they import the System Module of PROTON.
Knowledge and Information Management (in Ontotext sense) is the process of capturing, semantic annotation, indexing, and storing of unstructured, semi-structured, and structured data from different sources. By collecting these artifacts in a central or distributed electronic environment (in a database called a knowledge base), it provides different search paradigms on top of this semantic index.
Knowledge Base is a kind of database that stores the knowledge of a particular domain. It consists of a set of data (entities, entity properties, descriptions, and aliases), a conceptual model (ontology), and rules for reasoning over this data. The knowledge base uses the ontology to specify its structure (entity types and relationships) and classification scheme. In other words, the ontology, together with the set of instances of its classes, constitutes the knowledge base.
Knowledge Domain is the content of a particular field of knowledge such as life science, finance, tourism, etc. Knowledge that may be efficient in every domain is called domain-independent knowledge, for example logics and mathematics.
Knowledge management comprises a range of strategies and practices used in an organization to identify, create, represent, distribute, and enable adoption of insights and experiences. Such insights and experiences comprise knowledge, either embodied in individuals or embedded in organizational processes or practice.
Knowledge Management System refers to a (generally IT based) system for managing knowledge in organizations for supporting creation, capture, storage and dissemination of information.
Language Resource refers to data-only resources such as lexicons, corpora, thesauri or ontologies. Some LRs come with software (e.g. Wordnet has both a user query interface and C and Prolog APIs), but where this is only a means of accessing the underlying data we will still define such resources as LRs.
- A way of depicting the logical structure or semantics of a document and providing instructions to computers on how to handle or display the contents of the file. HTML, XML and RDF are markup languages. Markup indicators are often called tags.
- A language that has codes for indicating layout and styling (such as boldface, italics, paragraphs, placement of graphics, etc.) within a text file. Widely used markup languages include SGML (Standard General Markup Language) and HTML (Hypertext Markup Language).
The Message Understanding Conferences (MUC) were initiated and financed by DARPA to encourage the development of new and better methods of information extraction. The character of this competition – many concurrent research teams competing against one another – required the development of standards for evaluation. For more information, see Tipster project.
Meta-data is data about data. Meta-data is information (authorship, classification, date, URL, etc.) about an informational resource. It could be a document (such as a webpage), an image, a dataset, or another resource. Metadata is valuable in the storage and retrieval of information. Resources supported by good-quality, structured metadata are more easily discoverable.
For instance, most websites contain metadata to tell the computer how to lay the words out on the screen.
(also known as entity identification (EI) and entity extraction)
NER is the simplest and most reliable IE technology. NE systems identify all the names of people, places, organisations, dates, and amounts of money.
For example, a NER system producing MUC-style output might tag the sentence "Jim bought 300 shares of Acme Corp. in 2006".
NER systems have been created that use linguistic grammar-based techniques as well as statistical models. Hand-crafted grammar-based systems typically obtain better results, but at the cost of months of work by experienced linguists. Statistical NER systems typically require a large amount of manually annotated training data.
Namespace is the part of a URI, which defines a set of resources, with a common source, location or purpose. Together with the local name of URI, the namespace guarantees the uniqueness of the uniform resource identifier.
For example, in the URI http://proton.semanticweb.org/2006/05/protons#Entity, http://protons.semanticweb.org/2006/05/protons# is the namespace and Entity is the local name. The namespace shows that Entity is one of the resources in the PROTON System ontology, version 2006/05.
Ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain. For more information, see Wikipedia .
For more details on how the KIM Platform helps you work with ontologies, see the User's guide (over Latest news).
OEM is an ambiguous and abstruse phrase used in relation to the manufacturing and marketing of products. Usage of the phrase is not consistent, but it typically relates to a situation in which one company uses a component made by a second company in its own product, or sells the product of the second company under its own brand. For more information, see Wikipedia .
"Platform as a Service (PaaS)" deliver a computing platform and/or solution stack as a service, often consuming cloud infrastructure and sustaining cloud applications. It facilitates deployment of applications without the cost and complexity of buying and managing the underlying hardware and software layers.
A processing resource is a plug-in to GATE whose character is principally programmatic or algorithmic, such as lemmatizers, generators, translators, parsers, or speech recognizers. For example, a part-of-speech tagger is best characterized by reference to the process it performs on text. PRs typically include language resources (LRs), e.g. a tagger often has a lexicon; a word sense disambiguator uses a dictionary or thesaurus. For more information, see the GATE documentation .
The full triples notation (in RDFs) requires that URI references be written out completely, in angle brackets, which can result in very long lines on a page. For convenience, sometimes is used a shorthand way of writing triples. This shorthand substitutes an XML qualified name (or QName) without angle brackets as an abbreviation for a full URI reference. A QName contains a prefix that has been assigned to a namespace URI, followed by a colon, and then a local name. The full URIref is formed from the QName by appending the local name to the namespace URI assigned to the prefix.
A relational database management system (RDBMS) is a program that lets you create, update, and administer a relational database. An RDBMS takes Structured Query Language (SQL) statements entered by a user or contained in an application program and creates, updates, or provides access to the database.
Relational database example
Here's a simple example of a relational database:
Your company needs a better way of keeping track of customers, products, and orders because your paper-based system just ain't cutting it anymore. One way of setting this up using the relational model is to create three tables: Customers, Products and Orders.
You can see that the Customer table doesn't care about orders or products, this keeps it focused on its objective - customers. Likewise, the Product table cares only about itself. The Order table uses the ' CustomerID ' and the ' ProductID ' to relate a product to a customer based on an order.
Java RMI enables the programmer to create distributed Java technology-based to Java technology-based applications, in which the methods of remote Java objects can be invoked from other Java virtual machines, possibly on different hosts. RMI uses object serialization to marshal and unmarshal parameters and does not truncate types, supporting true object-oriented polymorphism. For more information, see Sun's site...
А resource is a common term for "anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), as well as a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources".
For example, a web page, a collection of web pages, a service that provides information from a database, an e-mail message, Java classes, etc.
RDF is a language for representing information about resources in the World Wide Web. It is particularly intended for representing meta-data about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource.
RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.
RDFS is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources. The first version was published by W3C in April 1998, and the final W3C recommendation was released in February 2004. Main RDFS components are included in the more expressive language OWL.
In KIM semantic annotation is used both as:
- an annotation indicating the presence of a (semantic) entity in a particular place in a text
- the process of generating meta-data. For more information, see semantic annotation
Semantic repositories are engines similar to the database management systems (DBMS) - they allow for storage, querying, and management of structured data. The major differences with the DBMS can be summarized as follows:
- they use ontologies as semantic schemata. This allows them to automatically reason about the data.
- they work with flexible and generic physical data-models (e.g. graphs). This allows them to easily interpret and adopt "on the fly" new ontologies or meta-data schemata.
"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation."
"The goal of the Semantic Web initiative is to create a universal medium for the exchange of data where data can be shared and processed by automated tools as well as by people. The Semantic Web is designed to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data." Tim Berners-Lee
SOAP is a protocol for exchanging XML-based messages over computer networks, normally using HTTP/HTTPS. SOAP forms the foundation layer of the web services protocol stack providing a basic messaging framework upon which abstract layers can be built.
As a layman's example of how SOAP procedures can be used, a correctly formatted call could be sent to a Web Service enabled web site - for example, a house price database - with the data ranges needed for a search. The site could then return a formatted XML document with all the required results and associated data (prices, location, features, etc). These could then be integrated directly into a third-party site.
Software as a Service (SaaS) is software that is deployed over the internet and/or is deployed to run behind a firewall in your local area network or personal computer. With SaaS, a provider licenses an application to customers as a service on demand, through a subscription or a "pay-as-you-go" model. Saas is also called "software on demand."
Structured content refers to information or content that has been broken down and classified using meta-data. Structured content often refers to information that has been classified using XML, but can also relate to information classified using other standard or proprietary forms of meta-data.
URI is a compact string of characters used to identify or name a resource. The main purpose of this identification is to enable interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes defining a specific syntax and associated protocols.
Here's a URI example: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier A URI may be classified as a locator (URL) or a name (URN) or both.
A Uniform Resource Name (URN) is like a person's name, while a Uniform Resource Locator (URL) is like their street address. The URN defines something's identity, while the URL provides a method for finding something. Essentially, "what" vs. "where". For more information, see http://www.w3.org/TR/cooluris/
(also known as Human Computer Interface or Man-Machine Interface (MMI))
UI is the aggregate of means by which people - the users - interact with the system - a particular machine, device, computer program or other complex tool. The user interface provides means of input (allowing the users to manipulate a system) and output (allowing the system to indicate the effects of the users' manipulation).
OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language.