Using User-Provided Information to Improve Internet Services (Thesis)

Report ID: TR-708-04
Author: Chen, Mao
Date: 2004-07-00
Pages: 174
Download Formats: |PDF|
Abstract:

Web users are not passive, but provide valuable information throughout the process of information generation, delivery and access. User-provided information is openly available in various novel web services and can often be a valuable resource for construct ing further improved and enhanced services. This thesis explores the application of user-provided information in two important services, one of which is at application level while the other is at middleware level.

Our first study proposes and investigates a new reputation framework for improving rating service. Rating services allow users to harvest the collective wisdom of the broad community in making decisions. However, the difficulty with Internet ratings is that little is known about the people providing them. This thesis presents a powerful methodology that automatically computes the reputation of each online rater according to the quality and the quantity of the ratings given by the rater. This reputation information can be used to weight the ratings in aggregating multiple users opinions on a product and to guide readers to high-quality opinions. Using data collected from real rating sites, our experiments demonstrate that our system possesses a set of important properties and has the potential to greatly enhance the effectiveness of rating service.

This thesis also proposes and investigates the utilization of user provided information in middleware design for distributed content delivery and caching. The information needs of content consumers form the key to driving content delivery over the Internet. Typically, these information needs are determined based on access patterns. This thesis explores a set of novel content placement approaches enhanced by using stated user interest through subscriptions. Our algorithms proactively deliver contents at publishing time and on demand at access time to the edge servers that are close to end-users, based on subscription and access information. We studied the algorithms performances using a simulator and the workloads that we built to mimic the content and access dynamics of a busy news site. The results demonstrate that incorporating subscription information judiciously can substantially improve the hit rate in the local servers as compared to the access-based approaches, even when the subscription information does not reflect users actual accesses perfectly.