Friday, November 25, 2011

Detecting emotions in voice is interesting: this content could be added to the markup of a speech-to-text conversion (here ).

Wednesday, October 26, 2011

Visualizing a meme



I began to finish up the classification system I began to visualize a meme. Below is what I came up with.



  • First thing that stands out is the square it represents the package of the meme. It contains the meme and delivers its to all the targets.
  • Next are the targets and those are all the people who the meme is aimed at. They can be planned or  random. It is from the targets that the parts of the meme are developed.
  • The interactions between the meme and the targets is the stickiness of the meme.
  • Within in the package are the symbols, signs and signals that come together to form the message.
Here is the  OCW video that I developed this meme diagram.



I began to finish up the classification system I began to visualize a meme. Below is what I came up with.

  • First thing that stands out is the square it representative the package of the meme. It contains the meme and delivers its to all the targets.
  • Next are the targets and those are all the people who the meme is aimed at. They can be planned or they can just be random. It is from the targets that the parts of the meme are developed.
  • The interactions between the meme and the targets is the stickiness of the meme.
  • Within in the package are the symbols, signs and signals that come together to form the message.   


Wednesday, October 19, 2011

Beta Meme Classification Taxonomy 2.1

I have worked on the meme classification system and added a page so that we can track changes in the taxonomy.  Just go to the page if you are interested.  

Sunday, October 16, 2011

SIR Model for Meme Propagation

I've been thinking about this idea for a while, and thank goodness I am not. Here is a good paper on the idea that the SIR model will work well to describe meme movements. The paper has this abstract:

We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of weblogs over time as our example domain. We characterize and model this collection at two levels. First, we present a macroscopic characterization of topic propagation through our corpus, formalizing the notion of long-running “chatter” topics consisting recursively of “spike” topics generated by outside world events, or more rarely, by resonances within the community. Second, we present a microscopic characterization of propagation from individual to individual, drawing on the theory of infectious diseases to model the flow. We propose, validate, and employ an algorithm to induce the underlying propagation network from a sequence of posts, and report on the results.

Saturday, September 24, 2011

Converting All Input to Text

I thought on this and decided maybe the whole system needs to be able to pick out memes by looking at everything it sees as text so that the semantic content can be extracted with a tool like ResearchCyc.

So how do we convert everything to text?
  1. Text is already text. Lucky!
  2. Audio can be converted with open-source speech recognition tools. Example: CMUSphnyx
  3. Convert videos to text by sampling each frame through open-source OCR, and grabbing all text visible at each frame and noting it in some kind of markup so we might be able to piece together the subtitles in a paragraph and the signage in each frame or "movie set" as a separate (and potentially useful) idea/meme.  (Does this mean we need to think of "place" as a factor in memes?)
 This is how we can absorb everything the internet can dish out.  Of course, this is all just a thought-experiment until we piece it all together, and the state of every open source package is subject to great capability variation (i.e. we might not be able to use the example packages or any other tools found on Sourceforge etc.

Just how much data must TA 1 mine?

Here's a quote from the initial introduction to TA 1 technologies:
TA1 performers will develop automated and semi-automated operator support tools and techniques for the systematic and methodical use of social media at data scale and in a timely fashion 
Since I have been working on TA 2 test systems with just 5k users, and finding 185k posts per fake year at a posting rate p of 0.1 posts/day,  I wondered what the real world has in store for SMiSC.  That is, what is "social media at data scale and in a timely fashion?"

Well here is "data scale" as of   By The Numbers: Twitter Vs. Facebook Vs. Google Buzz
Updates/Posts
  • Facebook status updates: 700 per second
  • Twitter tweets: 600 per second
  • Buzz posts: 55 per second
1355 updates per second, discriminated, categorized, aggregated, and reported on.  A "timely fashion" implies that it is okay to be "behind" by some time, but eventually the system must process everything.  I figure the requirement for maximum delay is set up to give a report on any new/significant meme within our leaders' decision-making cycle so that leaders cannot be outfoxed by a rapidly-spreading strategic message.

Yikes.

Here's stuff just on Facebook (current): FB stats
Twitter doesn't seem to have a similar page.
Couldn't find one for Google+ either.