Calls: Send in your ideas. Deadline April 1, 2024

Information Retrieval

Projects primarily related to internet information retrieval technologies.

This page contains a concise overview of projects funded by NLnet foundation that belong to Information Retrieval (see the thematic index). There is more information available on each of the projects listed on this page - all you need to do is click on the title or the link at the bottom of the section on each project to read more. If a description on this page is a bit technical and terse, don't despair — the dedicated page will have a more user-friendly description that should be intelligible for 'normal' people as well. If you cannot find a specific project you are looking for, please check the alphabetic index or just search for it (or search for a specific keyword).

AGFL — parser generator system for natural languages

With the AGFL (Affix Grammars over a Finite Lattice) formalism for the syntactic description of Natural Languages, very large context free grammars can be described in a compact way. AGFLs belong to the family of two level grammars, along with attribute grammars: a first, context-free level is augmented with set-valued features for expressing agreement between parts of speech. The AGFL parser includes a lexicon system that is suitable for the large lexica needed in real life NLP applications.

>> Read more about AGFL

AHA! — transparent adaptive functionality for web servers

AHA! is a general-purpose adaptive hypermedia add-on for web servers. It enables a web server to serve pages with conditionally included page fragments, and with link anchors that are conditionally colored or hidden. Adaptation is based on a domain model, a user model, and an adaptation model, using concepts, pages, fragments and condition-action rules.

>> Read more about AHA!

ALIAS — analysis of legal and technical implications of the use of software agents

Properties associated to agents such as autonomy, pro-activity, reasoning, learning, collaboration, negotiation, and social and physical manifestation are properties developed by man. Notions such as anonymity and privacy acquire new meanings in the "digital world." New concepts such as pseudo-anonymity emerge. Until now, much research on deployment of information technology has been done as a separate discipline. Computer Science and AI develop the technical expertise and applications. Law then fits these applications into existing legal frameworks (taking US, European and Dutch traditions into account), proposing new frameworks if and when needed. In this project, members of the two disciplines AI and Law are collaboratively investigating the legal possibilities and limitations of agent technology, ultimately leading to recommendations for both disciplines.

>> Read more about ALIAS

CPAN6 — collecting collections of digital information

People are designed to collect things, whether it is food, postal stamps, or digital information. On our hard-drives, we collect software, photos, development sources, documents, music, e-mail, and much more. The typical application sees this `collecting' as secundary problem to their main task, offering little help in administering the data produced with it. CPAN6 focusses purely on this aspect, and can therefore improve the way people work in general.

>> Read more about CPAN6

Global Directories — Distributed contact information discovery mechanism

A global directory is a way of retrieving contact information from others, using standard technology, so you can employ automatic tools that download and update contact information without manual intervention - or without any third parties snooping into your private or business social environment. Moreover, you can use the same technology to share any relevant information (such as keys for protection of your email) to anyone.

>> Read more about Global Directories

Globule — user-centric Content Delivery Network

Globule is a research project that aims at developing a user-centric Content Delivery Network (CDN). Such a network consists as an overlay in which the nodes are owned by end users rather than ISPs.

In Globule, nodes transparently collaborate to provide strong guarantees with respect to performance and availability of Web documents. To this end, modules were developed that extend the basic functionality of the Apache2 Web server, and take care of automatically replicating Web documents, and redirecting clients to the replica server that can best service the request.

>> Read more about Globule

LCC — local content caching system for new search engine architecture

This six month pilot-project will investigate what would be needed to create a system of local content caching, in which a content provider can notify a Local Content Cache of new (or updated or deleted) content. This content will then be collected by that Local Content Cache. The cache can then be used by a search engine, or any other content "user" such as an intelligent agent, for its own purposes. A proof of concept implementation of the software needed for a Content Provider, a Local Content Cache and Content Users, such as search engines and intelligent agents, will be part of this pilot-project.

>> Read more about LCC

Parselov — Syntactic analysis of documents and protocol messages based on formal descriptions

Parselov is a system for the syntactic analysis of documents and protocol messages based on formal descriptions, as well as the analysis and manipulation of such formal descriptions. It makes it easy to build parsers, validators, converters, test case generators, and other tools. It also explains the process of syntactic analysis slightly differently than usual, which has helped me tremendously to "understand parsing". At the heart of the system is a computer program that converts a formal grammar (the IETF standard "ABNF" is used as input for testing, but it is easy to support W3C's "EBNF" format and similar formats thanks to this system) into a graph and additionally computes all possible traversals of this graph. The result is stored in a simple JSON-based data format.

>> Read more about Parselov

Searsia — Searsia is a protocol and implementation for large scale federated web search.

Searsia provides the means to create a personal, private, and configurable search engine, that combines search results freely from a very large number of sources. Searsia enables existing sources to cooperate such that they together provide a search service that resembles today’s large search engines. In addition to using external services at will, you can also use it to integrate whatever private information from within your organisation - so your users or community can use a single search engine to serve their needs.

>> Read more about Searsia

Sesame — storage and querying middleware for the Semantic Web

Sesame is a storage framework for RDF data, the proposed W3C standard modeling languages for the Semantic Web. The RDF format is used to describe all sorts of things (the meta-data); besides the content of documents and web pages, RDF can be used to describe real life things like persons and organisations. This data can, for instance, be used as basis for news readers, search applications, or indexing.

Sesame is a modular architecture for persistent storage and querying of RDF and RDF Schema. Sesame supports various querying languages and databases. Sesame also offers ontology management functionality such as change tracking and security.

>> Read more about Sesame

SIRS — Scalable Internet Resource Service

The SIRS project focuses on the development of a service that allows resources to be widely distributed and replicated across the Internet in a scalable way.

>> Read more about SIRS

ARPA2 Steamworks — Near-instantaneous controlled configuration settings over any network

ARPA2 SteamWorks is a set of tools that co-operate to transmit more-or-less centrally controlled configuration settings over any network, and make these settings available to individual programs. Updates are passed around instantaneously when network connections are good, but the last version of the information can be used when the network temporarily degrades. The project is part of the ARPA2 project, which is engineering towards an overall architecture scalable to run a future internet that is secure by design.

>> Read more about ARPA2 Steamworks