Reference: The Age of Social Sensing
1st paper review for 2019 iSURE @ Social Sensing Lab at University of Notre Dame.
Social Sensing aims to better understand the physical world through social networks. The challenge is how to extract information form the medium and find appropriate properties to characterize the extracted information and the world it represents.
Introduction
Technology: physical technological sensors and social media.
Today’s challenge: find and understand the valuable and truthful messages in the much larger volume of social media content.
Collective intelligence(群智):
human intelligence is derived from reflexive reasoning, with language being the semantic indexing scheme into the arguments and results.
Information Economy Meta Language (IEML):
empower reflexivity, facilitate discovery of semantic ties, and document information provenance (using blockchain-like technologies) to keep track of the origins of ideas and preserve the collective reasoning behind them.
Focus: physical (social) reality
Better algorithms: to curb misrepresentation of physical reality and help build smarter urban services.
Two perspective of challenge:
- challenge in the cyber-physical space
- challenge in the social and linguistic space
Cyber-physical challenges in social spaces
Understanding attributes of social sensing systems requires modeling three interdependent components
- the humans in the loop and their cognitive models
- the algorithms involved
- the laws of nature that govern the underlying physical and engineered artifacts.This may requires interdisciplinary approaches to address the complex interaction between cyber, physical and social components of the holistic system.
Modeling instrument distortion (4 broad categories)
- Intentional disinformation.
- Personal conclusions not sufficiently supported by data or observations, which might resonate with other people’s biases and propagate further as facts.
- Biased interpretations by communities of people with similar opinions.
- Genuine random mistakes by people processing information, such as misspelling and typos.
Understanding the signal
Physical events trigger responses in the social media. Humans work as “sensors” in this process.
e.g. Use Twitter as a data source viewed statements about physical reality as a binary signal.( 1 for truth, 0 for rumors)
Quantifying data reliability and performance bounds
Estimation theory
To build the error of an optimal estimator:
- Represent social media as noisy binary communication channels.
- Estimation-theoretical frameworks are applied to the reliability analysis of data cleaning system.
e.g. exploit expressions of error bounds of maximum-likelihood estimators to assess the quality of estimation results on social media.
Two factors complicates the analysis
- Correlated errors that result from rumor-spreading behaviors — A person reports as their own observations from others without verification.
- The expressed degree of vagueness in humans observations. The degree of confidence is low when people use words like “possibly” and “might”.
The conversion of natural language expressions of vagueness to quantifiable numbers is a very difficult problem that remains to be fully solved
The role of dependencies between sources
Dependency can be found in social networks. (e.g. follower-followee relationship on Twitter and friends relation-ship on Facebook.)The complex and dynamic source dependency graphs on social networks deserve more investigation.
Recent work
- develop source selection schemes to carefully select independent sources on social networks or
- build reliable social sensing models to explicitly model the source dependency into the social signal processing engine
Understanding communities, social trust and polarization
Communities and social Trust
Humans interact, operate, exchange and propagate information much more frequently within the communities than across them. Communities increase level of homophily among their members. People in the same community often share opinions and biases. It’s a key notion to understand information reliability.
People are likely to re-broadcast when:
- The information is from a source they trust.
- They agree with the opinion.
Thus, trust relations and biases of sources can be observed from the propagation patterns of information.
Polarization
When polarization is considered, the reconstruction from social observation tends to align more closely with ground truth in the physical world.
Fusion of physical and social sensors
Social networks can be viewd as an additional sensing resource of physical sensors.
Decision-makers will act on the data in ways that impact the physical state being observed. Hence, an interesting question is to understand what information to present to the decision-maker (and how) in order to offer the best decision-support despite the inherent noise in the underlying social channel.
Challenges in the linguistic space
To understand social media content, we need to seek advances in rapid low-cost development of Information Extraction (IE) and Text Mining technologies.
Basis of today’s language processing technologies:
- Supervised learning: suffers from high cost of large-scale manual annotation and limited predefined fact types.
- Unsupervised learning: need to systematically discover and unify latent and expressed knowledge from traditional symbolic semantics and modern distributional semantics through advanced machine learning models.
Ambiguity in a Sentence
Natural Language Processing (NLP) technologies currently rely heavily on surface processing. This makes it difficult to exploit deep structure, background knowledge and source information.
The Importance of Context
With the growth of online social network- ing services, people rapidly invent new ways to communicate sensitive ideas. We call this phenomenon information morphing, which also existed in the traditional forms of communications. Morphing raises unique challenges for entity and event co-reference resolution.
Discourse Ambiguity
Besides sentence-level and subsentence-level ambiguities, there is yet one more level of ambiguity: super-sentential, or discourse level ambiguity which goes beyond sentence boundaries.This third level of ambiguity comes from two sources:
- coreference ambiguity: pronouns often refer to entities outside of the current sentence and it is ambiguous to which entity they refer.
- discourse structural ambiguity: a discourse, like a sentence, has its own internal structure, often represented as a tree or graph. For deep understanding of the text, we need to know, for example, which sentence is the topic sentence, which sentence is the elaboration or contrast of another sentence, and the temporal structure between sentences (in a story line).
Expressions of Fuzziness and Vagueness
Human sources typically describe objects using fuzzy terms, e.g., “she is very tall” instead of precise terms, e.g., “she is 6’2’’ tall.” It is very difficult to convert fuzzy terms into numbers as the use of such terms varies wildly over different humans or societies. It is much easier to calibrate physical sensors whose noise performance can be consistently measured.
Future Directions
The field of social sensing lacks a unified interdisciplinary problem formulation that takes a holistic approach to modeling humans as sensors, and modeling social media as noisy measuring instruments or channels.
5 required synergistic disciplines
- computational social scientists: to model human behavior and quantify its susceptibility to errors, omissions, deceit, and other irregularities.
- linguistics: to model strengths and imperfections of human communication and their compounding effects on reliability of information dissemination.
- information theorists: to model social networks as imperfect
communication media and derive fundamental capacity
limits and uncertainty envelopes. - data mining experts: to investigate the impact of the underlying error models relying on knowledge extraction from imperfect information.
- cyber-physical experts: to develop estimation-theoretic observability and control limits and tools that offer closed-loop robustness guarantees in the face of derived capacity limits, uncertainty envelopes, and knowledge errors.
Social Sensing is interdisciplinary research area which aims to bring about novel solutions for a better theoretical and systematic understanding of the future sensor and media-rich world.