Harvest : a collaborative system for distributed retrieval of social data
In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses and sending messages to each other every day. These interactions produce vast amounts of social data. This data is the core of the social network providers business model, and it is sold to large companies to perform personalized advertisement, brand monitoring and viral marketing. The price of this data can be intimidating, and some might be unable or unwilling to pay for it because of its price. If the data was freely available, research that could benefit from this data would be derived more freely, leading to new knowledge.
This thesis presents Harvest, a collaborative system for retrieving social data. Harvest is a peer-to-peer system consisting of contributing social network users, inspired by public resource computing. Harvest shares social network account-bound resources to retrieve large social data sets. Contribution is achieved by running an application on the contributors computer like other public resource computing system such as the @home systems.
The system implements retrieval of data from Twitter. Experiments on real Twitter data show that the system scales with increased contribution. The data retrieval bandwidth per contributing user is quite low, and the number of contributors needed to achieve a considerably large data retrieval bandwidth is high, but there are no associated financial costs with the system. Harvest would benefit greatly by retrieving data from more sources as this would increase its data retrieval bandwidth, in addition to offer more abundant data. |
---|