CS 410 Text Information Systems


Goals and Objectives

  • Explain why it is necessary and useful to perform joint analysis and mining for text and non-text data.
  • Explain the general idea of Contextual Probabilistic Latent Semantic Analysis (CPLSA) and the main difference between CPLSA and PLSA.
  • Give multiple application examples of CPLSA for contextual text mining.
  • Explain the general idea of using the social network of authors as context to analyze topics in text data and its potential benefit from an application perspective.
  • Explain how a time series (such as stock prices) can be used as context to analyze topics in text data that have time stamps using topic models

Guiding Questions

  • Why is text-based prediction interesting from an application perspective? Why are humans playing an important role in text-based prediction? What is the “data mining loop”?
  • Why is it necessary and useful to jointly mine and analyze text and non-text data? How can non-text data potentially help in analyzing text data? How can text data potentially help in mining non-text data?
  • Can you give some examples of context of a text article? How can we partition text data using context information? Can you give some examples where we can leverage context information to perform interesting comparative analysis of topics in text data?
  • What’s the general idea of Contextual Probabilistic Latent Semantic Analysis (CPLSA)? How is it different from PLSA?
  • Can you give some examples of interesting topic patterns that can be found by CPLSA? What’s the general idea of using CPLSA for analyzing the impact of an event? Can you think of an interesting application of this kind?
  • What’s the general idea of using the social network of authors of text data as a complex context to improve topic analysis for text data? Can you give an example of an interesting application of this kind?
  • What’s the general idea of using a time series like stock prices over time to supervise the discovery of topics from text data? Can you give an example of an interesting application of this kind?

Additional Readings and Resources

  • C. Zhai and S. Massung, Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM and Morgan & Claypool Publishers, 2016. Chapters 18 & 19.
  • Hongning Wang, Yue Lu, and ChengXiang Zhai, Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of ACM KDD 2010, pp. 783-792, 2010. doi: 10.1145/1835804.1835903
  • Hongning Wang, Yue Lu, and ChengXiang Zhai. 2011. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of ACM KDD 2011, pp. 618-626. doi: 10.1145/2020408.2020505
  • ChengXiang Zhai, Atulya Velivelli, and Bei Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004). ACM, New York, NY, USA, 743-748. doi: 10.1145/1014052.1014150
  • Qiaozhu Mei, Contextual Text Mining, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 2009.
  • Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Thomas Rietz, and Daniel Diermeier. Mining causal topics in text data: Iterative topic modeling with time series feedback. In Proceedings of the 22nd ACM international conference on information & knowledge management (CIKM 2013). ACM, New York, NY, USA, 885-890. doi: 10.1145/2505515.2505612
  • Noah Smith, Text-Driven Forecasting. Retrieved on May 31, 2015 from http://www.cs.cmu.edu/~nasmith/papers/smith.whitepaper10.pdf

Key Phrases and Concepts

  • Text-based prediction
  • The “data mining loop”
  • Context (of text data) and contextual text mining
  • Contextual probabilistic latent semantic analysis (CPLSA): views of a topic and coverage of topics
  • Spatiotemporal trends of topics
  • Event impact analysis
  • Network-regularized topic modeling
  • NetPLSA
  • Causal topics
  • Iterative topic modeling with time series supervision

Video Lecture Notes

12-1 Opinion Mining and Sentiment Analysis

12-1-1 Latent Aspect Rating Analysis Part 1

img
img
img
img
img
img
img
img
img
img
img
img

12-1-2 Latent Aspect Rating Analysis Part 2

img
img
img
img
img
img
img
img
img
img

12-2 Text-Based Prediction

img
img
img
img

12-3 Contextual Text Mining

12-3-1 Motivation

img
img
img

12-3-2 Contextual Probabilistic Latent Semantic Analysis

img
img
img
img
img
img

12-3-3 Mining topics with social network context

img
img
img
img
img
img

12-3-4 Mining Causal Topics with Time Series Supervision

img
img
img
img
img
img
img
img
img
img

12-4 Summary for Exam 2

img
img
img
img

CS 425 Distributed Systems


Goals

  • Know the internals of Distributed File Systems like NFS and AFS.
  • Know the internals of Distributed Shared Memory systems.
  • Know what’s inside a sensor mote and why networks of them are needed.

Key Concepts

  • Distributed File Systems: Why they’re different from single-node file systems
  • Internals of NFS
  • Internals of AFS
  • Distributed Shared Memory: How processes can share memory pages while communicating via messages
  • Invalidate protocols in Distributed Shared Memory systems
  • Sensor networks: Why they’ve emerged, what’s inside them, where they’re used, and what are the challenges

Guiding Questions

  • Why are Distributed File Systems stateless?
  • How does NFS provide transparency?
  • Why is whole file caching a reasonable approach in AFS?
  • When is invalidate preferable over update in Distributed Shared memory systems?
  • Why can’t embedded operating systems be used in sensor motes?
  • What is the disadvantage of using a spanning tree in sensor network, for aggregation?

Readings and Resources

  • TinyOS

Video Lecture Notes

Distributed File Systems

File System Abstraction

img
img
img
img
img
img
img
img
img
img
img
img
img
img

NFS and AFS

img
img
img
img
img
img
img
img
img
img

Distributed Shared Memory

img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img
img

Sensor Networks

CS 427 Software Engineering


Goals and Objectives

Video Lecture Notes