TREADS

450days since
Fall 2008 Classes Start

Web-Based Interface to TREADS

posted ‎‎Feb 29, 2008 4:45 PM‎‎ by Erik Schweller   [ updated ‎‎Mar 28, 2008 5:18 PM‎‎ ]

Introduction

 

 

This document describes various projects for extending the functionality of the current TREADS application. 

 

TREADS is a web-based system that reads an input text and generates several forms of output.  The system, written in Perl, runs on a lightweight server.   See http://treads.emich.edu/ for a working demo.

 

We desire web-based application(s) that will read files specified and give an interactive, well formatted, output.  The application(s) should be lightweight, well documented, easy to maintain, and intuitive.  It is preferable that the application(s) operates independently of the existing TREADS application before integration.  The application(s) will need to have a flexible and easily updateable interface for reading data files from the TREADS system.

 

These features are requested:

  1. Easily adjustable interface for parsing TREADS output.  Data returned by TREADS should be displayed in a user interface that:
    1. Has large input elements and is simplistic.
    2. Has a minimal number of interface elements.
    3. Has large fonts.
    4. Includes any further HCI elements appropriate for special education use.
  2. Text to Speech. Words, paragraphs, or the entire document may be read as the user’s request.
    1. Example Link (highlights words as they are read in MS Word):

http://www.wordtalk.org.uk/

  1. Dictionary and Thesaurus lookup for any word in the output section.
    1. Example Link (double click on words in the article): http://www.cbsnews.com/stories/2008/03/22/sunday/main3960219.shtml
  2. User accounts that track preferences, current progress though documents, and allow for restoring to the last location in a summary from which the reader was previously viewing. 
  3. Database of previously summarized and input texts and RSS feeds from a specified time.  Articles are sortable and searchable.
  4. Ability to accept multiple input formats including, RSS feeds, Websites, text documents, and pasted form information.  The raw text from these input forms, with white space preserved, should be passed on to TREADS for summarization and further processing.

 

 

Overview

 

The current process is as follows:

  1. A user accesses TREADS though a web-browser. 
  2. The user selects process options or a user profile, which is a collection of process options (not implemented in the web demo), and enters input text to be processed.
    1. The process options are optional inputs.
  3. TREADS summarizes the input text in the requested method. 
    1. TREADS generates multiple output files. 

                                                               i.      NOTE: TREADS will use a database for all storage in the future, but currently all data is stored on the file system. 

  1. Once all output is ready, the output files are made available to the user via a collection of web pages.

 

Instead of treads generating a set of web pages (steps 1 and 4), a richer user experience is desired.  This requires a robust user interface that can facilitate the various activities of the system (e.g., reading text and highlighting).   

 

The user interface will handle logging in a user, basic preparation of the text for processing, and starting the TREADS application with the proper options for the specific user.  If no user is logged in, treads should function in demo (guest) mode, with a set of defaults.    Once summarization is complete, the user will be presented with the available output types and options as specified by the user’s profile.

 

TREADS will handle the summarization and xml-like tagging of the documents that will be returned to the interface for display to the user. 

 

 

Interfacing to TREADS

 

The software interface is the connection between the display software and TREADS.  TREADS output may be returned as a reference to an array of strings, or as is currently done, written as a set of text documents to the file system for further processing.  

 

Each document on the file system will be named using a run number (a unique identifier for the current summarization), and any other necessary data, such as type of output. 

 

The order and description of the files returned as an array reference will be decided upon with input from the development group.


Files Generated by TREADS

A multitude type of output files are generated.  They will all be in a tagged text format for the system. 

·        Untagged Text Files

o       Any file not tagged will be displayed as a standard text file.

·        Tagged Files

o       A tagging system is not yet defined. 

o       XML –like tagging is preferred. 

o       Tags will include (but not limited to):

§         Font Size

§         Font Color

§         Sentence Score

§         Sentence Number

§         Paragraph

§         Header

§         Bold

§         Italics

o       Not all tags must be included by the summarizer, system defaults will be used to described data missing tags.

o       Tag names, requirements for closure, nesting, etc., are still mostly undefined, --- input is welcome. 

o       Example:

§         <Paragraph><SentenceNumber=’4’><Font Color = ‘Red’><Score=’.432’>The brown fox <Color=’blue’>jumped</Color></Sentence><SentenceNumber=’8’>Example text here. </Sentence>

 

Requested Features in Detail

           

1. User Interface

Keep in mind that students with learning disabilities are the intended target for this system. 

·        Any output view should have the ability to send to the printer. 

·        Readability

o       Large San-Serif Fonts

o       Simple Color Pallets

§         No specification, but theme it similar to EMU’s colors with more colorful elements when appropriate

o       Graphical icons / buttons

o       Few Options to parse at each level

·        Types of Output (Specific Views for the User)

o       Summarized text (May be of the following types):

§         Slider filtered

·        The user can adjust the amount of sentences seen by moving a slider back and forth

o       Sentences are tagged with a threshold value for deciding if they are to display or not

§         Sized All

·        All sentences from the input are included, just some are sized by various criteria (defined through the XML from TREADS)

§         Colored

·        All sentences may or may not be included from the input, just some are colored by various criteria (defined through the XML from TREADS)

§         User Default – Primary output, which may be any combination of the above.

 

o       Raw statistics (if enabled -- the current user is “Privileged”)

o       Input Text

o       Additional files may be included

§         HTML input to be passed though to the output screen directly

§         Image files (possibly scanned texts, or images related to the text)

§         Multiple columns to a single column and vice versa.

 

  • Performance
    • The system should respond promptly when running on a modern desktop.
      • Any performance issues that will lead to user confusion are unacceptable.  
      • Multiple users may be logged into and using the system at the same time.  The system should maintain its performance.

 

  • Reliability
    • Expect unusual inputs and handle them gracefully.
    • Handle multiple users simultaneously.

 

  • System Errors
    • Report errors in an un-intrusive fashion and write them to a log.

 

 

2. Text-to-Speech

 

The software should be able to generate computer-simulated speech for the text from TREADS.  Open source projects such as http://mary.dfki.de/ and http://www.xenocafe.com/tutorials/php/festival_text_to_speech/index.php exist.  As long as the license of the software is acceptable, a third party package may be integrated into the project, as developing a TTS engine is not the goal.

·        When text-to-speech is enabled for the user

o       Clicking on the word will lead to a method for saying the word.

§         Light box?

§         Popup window?

§         Side bar?

§         Automatically say the word without any interruption? 

o       The currently open output should have the option for being read in its entirety.

§         When reading the text to the user in a continuous fashion, the word currently being spoken should be highlighted on the screen.  

§         Speed of playback should be adjustable in the user’s preferences. 

 

 

3. Dictionary and Thesaurus Lookup

Just like the text to speech, the user should be able to click on a word and get it’s definition and synonyms and antonyms.

·        When Dictionary and Thesaurus are enabled for the user (may always have this option enabled):

o       Clicking on a word will lead to a method for displaying the information

·        The output of this information should make clear:

o       The word for which this information is being presented

o       What type of information it is (i.e., “this is a similar list of words”)

o       How to return to the previous view

 

 

4. User Accounts

            The default user account will be “Guest” with no password.  Further accounts should be creatable for users.  These accounts will be used to configure the specific options for the user including the reading speed, user type (Privileged/Normal), etc.  Only a “Privileged” user may add a new user to the system.  Additionally, “Privileged” users will be the only ones able to set certain options available to the “Normal” users.  Privileged users may access any other user’s account to configure options.

 

·        Profile Information – these are options that apply to a user’s experience with the software.

o        A “P” indicates that only a Privileged user may configure this option.  The possible values are described in parenthesis.

o       Parameters describing the types of features to enable for the user

·        Enable Dictionary and Thesaurus      (True / False)            “P”

·        Enable Text-to-Speech             (True / False)            “P”

·        Enable Colored Paragraphs      (True / False)

·        Enable Varied Fonts            (True / False)

·        User Default View            (Text)                  “P”

·        User Type                        (Privileged / Normal ) “P”

·        …More to be described

·        Documents summarized, read, etc. should be saved by the system and available for review.

·        Any document the user has not completed shall be available for continuation.  That is, if a user exits the program or navigates to a different view, the current location the user is at in the text should be saved and restored once the user returns to this document and view. 

 

5. Summarization Database

·        All summaries generated should be accessible to any user, and all documents summarized should be stored.  

·        A simple option to search previously summarized documents is requested.

·        Documents should be sortable and filterable by category, title, date, user, etc.

·        RSS Feeds from sites may be included and kept for a selectable time period.   For example, CNN’s articles for the last month may automatically be stored to the database for quick retrieval and summarization.

           

 

Additional Notes

  • This project will move to open source in time.
  • This project revolves around a developing research base, and such it is not possible to describe every option the user will be presented with.  This implies that an easily adjustable interface is required.
  • Suggestions are welcome.
  • For HTML displays, use CSS and valid HTML (preferably XHTML Transitional) and JavaScript where possible.
  • Suggestions are welcome.
  • Contact Erik any time (othererik at gmail) with questions.
  • Suggestions are welcome. :)
-erik