Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework

Reference: Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework
2nd paper review for 2019 iSURE @ Social Sensing Lab at University of Notre Dame.

This paper developed a Scalable Streaming Truth Discovery (SSTD) solution to address the problems of truth discovery: dynamic truth, scalability and heterogeneity of streaming data.

  1. Dynamic truth discovery: the ground truth of claims changes over time.
  2. Scalability to large-scale social sensing events.
  3. The heterogeneity and unpredictability of the social sensing data traffic. And additional challenges to the resource allocation and system responsiveness.

Methods in the solution:

  1. Hidden Markov Models(HMM) help the dynamic truth discovery scheme effectively infer the evolving truth of reported claims.
  2. A distributed framework implements the dynamic truth discovery scheme using Work Queue in HTCcondor system.
  3. the SSTD scheme intergraded with an optimal workload allocation mechanism .

Evaluation using Twitter data feeds: Boston Bombing, Paris Shooting and College Football

Introduction

Goal of truth discovery: to identify the reliability of the sources and the truthfulness of claims they make without the prior knowl edge.

Three challenges

The dynamic truth challenge

two critical tasks:

  1. to capture the transition of truth in a timely manner
  2. to be robust against noisy data that may lead to the incorrect detection of the truth transition.

The Scalability aspect of the truth discovery problem

Current centralized truth discovery solutions are incapable of handling large volume of social sensing data due to the resource limitation on a single computing device.
A distributed solution based on Hadoop system is not very good, bucause:

  1. its a heavy-weight solution which requires a long time to start
  2. it is suitable for data of very large volume (far more large than we need for social sensing network)
  3. it ignores the heterogenity of the computational resources.

The heterogenity and unpredictability of the streaming data traffic

  1. different topics or evens generate different amounts i social sensing data
  2. the traffic volume of the same event is noy constant over time.

Solution of this paper

  1. a Hidden Markov Model based solution to dynamically estimate the true value of claims based on the observations reported by social sensors.
  2. a light-weight distributed framework that is both scalable and efficient to solve the truth discovery problem using Work Queue and HTCondor system.
  3. integrated the SSTD scheme with an optimal workload allocation mechanism using feedback control (i.e., Proportional Integral Derivative (PID) controller) to dynamically allocate the resources (e.g., cores, memories) to the truth discovery tasks.

Problem Formulation

For the soicial media

Problem Formulation
Problem Formulation

For the deployed system

Problem Formulation

Scalable and Streaming Truth Discovery

The Scalable and Streaming Truth Discovery (SSTD) scheme based on a Hidden Markov Model (HMM) is developed to decode the streaming social sensing date and output the corresponding truth values o claims in real time.
HMM
This solution can be implemented in a distributed system where multiple truth discovery jobs can run in parallel to address the scalability challenge in social sensing applications.

Deriving Hidden States and Observation Sequence

HMM

Estimating Parameters of the HMM Model

Define HMM parameters
Goal of Estimating
this estimating problem is solved by an expectation maximization (EM) based algorithm.

Decoding State Sequence

Goal of decoding

This decoding problem is solved by the Viterbi Algorithm.

By the esitimatied ${\lambda}_u$ of HMM model and the given observation sequence $F(u)$, we can infer the corresponding hidden true value sequence $VT_u$ that is most likely to generate the observations. It is solved recursively as follows:

Decoding

Implimentation on a Distributed Computer Framework

(to be continued…)