Specifications for News Aggregator project outlined at the Network X gathering¶

Purpose of project¶

To create a website that pulls in news feeds from multiple sources and aggregates them into a single source of reference for the user.

Essential specifications¶

The essential specifications for this project laid out at the Network X gathering held in Manchester on the weekend of 15/16th January 2011 are as follows:

Website will initially be English language
Website will Focused on anti-cuts content(for the launch), but with the plan to
expand in the future
The website will display user-filtered content: there is to be no editorial control, (however some degree of
moderation will need to be practised in an unobtrusive way)
The website is to be launched as soon as possible

Division of contributions to the project¶

The main areas of contribution required for development of this project are as follows

Tech and programming
Source finding for news resources
Web design
Promotion and publicity of the site, copy when it needs to be written
Feedback from people on matters of accessibility and aesthetics
Ongoing assistance to put tags on articles and content

Technical specifications for project¶

Design principles – essential¶

Ability to aggregate RSS/atom feeds from a multitude of sources into a single unified output
Ability for the user to filter that output into a personalised news feed
Must be simple to use
Must be initially deployable quickly with further enhancement and development continuing concurrently

Roadmap¶

Stage 1: Simple system produced that can aggregate feeds and display them to the user based on their preferences. System then handed to designers and usability group for testing.

Stage 2: Changes made based on feedback from these groups

Stage 3: Alpha version released and publicised

Stage 4: Gain feedback from wider user base of alpha release and incorporate recommended changes

Stage 5: Beta version release and wider publicity. Forum or other tool linked to project to ensure ongoing feedback possible

Stage 6: Full release schedule to be mapped. Enhancements to Indexing engine released as available

Technical components¶

The finished aggregator will be made up of several components described here. The components do not necessarily need to be part of a single CMS eg Drupal and could be developed seperately provided they are able to communicate with each other in a standards compliant way. The project as a whole could be developed as a single tech effort, or broken down into smaller components with distinct teams of geeks working on each.

Aggregation Engine Spec¶

The aggregation engine can be either a standalone component or part of a CMS and must be able to do the following

Retrieve feeds from other sources in RSS/Atom format
Retrieve tags and other metadata along with the feeds
Aggregate those feeds into a single output, preserving metadata and other XML and pass them to the Indexing Engine

Indexing Engine spec¶

The indexing engine will order and sort incoming data into the (most likely MySQL) database. It will handle linking additional metadata from external sources to articles that have been handed to it by the aggregation engine. It must be able to do the following:

Retrieve aggregated list of content from the aggregation engine
Directly alter the underlying database
Be able to link additional metadata to an article
Be able to construct database queries
Be able to index and sort data from a multitude of sources
Be able to order results of a search query by relevance
Return a list of results from the database that are relevantly linked to each other

FISE¶

FISE is an open source framework for semantic enhancement engines. Running FISE you can send English documents as plain text to the FISE server via a RESTful web interface and get back semantic annotations for this content. The annotations are computed using different “enhancement engines” which can be plugged into FISE. Depending on the active enhancement engines FISE will annotate e.g. people and places. The possibility exists for using FISE to enhance the metadata that is included by both the originating feed and user-generated tagging and provide relevant, linked output.

Search query handler¶

The search query handler will be responsible for constructing database queries from a number of sources including

Direct user searches (a search box)
User defined preferences
knowledge of user preferences gained from previous sessions

User interface¶

The user interface design will be led by user feedback. The core requirements for the user interface are that it is as simple as possible to use and understand.