Introduction
This document describes various projects for extending the functionality of the current TREADS application.
TREADS is a web-based system that reads an input text and generates several forms of output. The system, written in Perl, runs on a lightweight server. See http://treads.emich.edu/ for a working demo.
We desire web-based application(s) that will read files specified and give an interactive, well formatted, output. The application(s) should be lightweight, well documented, easy to maintain, and intuitive. It is preferable that the application(s) operates independently of the existing TREADS application before integration. The application(s) will need to have a flexible and easily updateable interface for reading data files from the TREADS system.
These features are requested:
Overview
The current process is as follows:
i. NOTE: TREADS will use a database for all storage in the future, but currently all data is stored on the file system.
Instead of treads generating a set of web pages (steps 1 and 4), a richer user experience is desired. This requires a robust user interface that can facilitate the various activities of the system (e.g., reading text and highlighting).
The user interface will handle logging in a user, basic preparation of the text for processing, and starting the TREADS application with the proper options for the specific user. If no user is logged in, treads should function in demo (guest) mode, with a set of defaults. Once summarization is complete, the user will be presented with the available output types and options as specified by the user’s profile.
TREADS will handle the summarization and xml-like tagging of the documents that will be returned to the interface for display to the user.
Interfacing to TREADS
The software interface is the connection between the display software and TREADS. TREADS output may be returned as a reference to an array of strings, or as is currently done, written as a set of text documents to the file system for further processing.
Each document on the file system will be named using a run number (a unique identifier for the current summarization), and any other necessary data, such as type of output.
The order and description of the files returned as an array reference will be decided upon with input from the development group.
A multitude type of output files are generated. They will all be in a tagged text format for the system. · Untagged Text Files o Any file not tagged will be displayed as a standard text file. · Tagged Files o A tagging system is not yet defined. o XML –like tagging is preferred. o Tags will include (but not limited to): § Font Size § Font Color § Sentence Score § Sentence Number § Paragraph § Header § Bold § Italics o Not all tags must be included by the summarizer, system defaults will be used to described data missing tags. o Tag names, requirements for closure, nesting, etc., are still mostly undefined, --- input is welcome. o Example: § <Paragraph><SentenceNumber=’4’><Font Color = ‘Red’><Score=’.432’>The brown fox <Color=’blue’>jumped</Color></Sentence><SentenceNumber=’8’>Example text here. </Sentence>
Requested Features in Detail
1. User InterfaceKeep in mind that students with learning disabilities are the intended target for this system. · Any output view should have the ability to send to the printer. · Readability o Large San-Serif Fonts o Simple Color Pallets § No specification, but theme it similar to EMU’s colors with more colorful elements when appropriate o Graphical icons / buttons o Few Options to parse at each level · Types of Output (Specific Views for the User) o Summarized text (May be of the following types): § Slider filtered · The user can adjust the amount of sentences seen by moving a slider back and forth o Sentences are tagged with a threshold value for deciding if they are to display or not § Sized All · All sentences from the input are included, just some are sized by various criteria (defined through the XML from TREADS) § Colored · All sentences may or may not be included from the input, just some are colored by various criteria (defined through the XML from TREADS) § User Default – Primary output, which may be any combination of the above.
o Raw statistics (if enabled -- the current user is “Privileged”) o Input Text o Additional files may be included § HTML input to be passed though to the output screen directly § Image files (possibly scanned texts, or images related to the text) § Multiple columns to a single column and vice versa.
2. Text-to-Speech
The software should be able to generate computer-simulated speech for the text from TREADS. Open source projects such as http://mary.dfki.de/ and http://www.xenocafe.com/tutorials/php/festival_text_to_speech/index.php exist. As long as the license of the software is acceptable, a third party package may be integrated into the project, as developing a TTS engine is not the goal. · When text-to-speech is enabled for the user o Clicking on the word will lead to a method for saying the word. § Light box? § Popup window? § Side bar? § Automatically say the word without any interruption? o The currently open output should have the option for being read in its entirety. § When reading the text to the user in a continuous fashion, the word currently being spoken should be highlighted on the screen. § Speed of playback should be adjustable in the user’s preferences.
3. Dictionary and Thesaurus LookupJust like the text to speech, the user should be able to click on a word and get it’s definition and synonyms and antonyms. · When Dictionary and Thesaurus are enabled for the user (may always have this option enabled): o Clicking on a word will lead to a method for displaying the information · The output of this information should make clear: o The word for which this information is being presented o What type of information it is (i.e., “this is a similar list of words”) o How to return to the previous view
4. User AccountsThe default user account will be “Guest” with no password. Further accounts should be creatable for users. These accounts will be used to configure the specific options for the user including the reading speed, user type (Privileged/Normal), etc. Only a “Privileged” user may add a new user to the system. Additionally, “Privileged” users will be the only ones able to set certain options available to the “Normal” users. Privileged users may access any other user’s account to configure options.
· Profile Information – these are options that apply to a user’s experience with the software. o A “P” indicates that only a Privileged user may configure this option. The possible values are described in parenthesis. o Parameters describing the types of features to enable for the user · Enable Dictionary and Thesaurus (True / False) “P” · Enable Text-to-Speech (True / False) “P” · Enable Colored Paragraphs (True / False) · Enable Varied Fonts (True / False) · User Default View (Text) “P” · User Type (Privileged / Normal ) “P” · …More to be described · Documents summarized, read, etc. should be saved by the system and available for review. · Any document the user has not completed shall be available for continuation. That is, if a user exits the program or navigates to a different view, the current location the user is at in the text should be saved and restored once the user returns to this document and view.
5. Summarization Database· All summaries generated should be accessible to any user, and all documents summarized should be stored. · A simple option to search previously summarized documents is requested. · Documents should be sortable and filterable by category, title, date, user, etc. · RSS Feeds from sites may be included and kept for a selectable time period. For example, CNN’s articles for the last month may automatically be stored to the database for quick retrieval and summarization.
Additional Notes
|