- Users
- Posts
- Friendships
- Re-posts (+1 or "Share on XYZ social network" buttons)
- Fan pages (for artists or movies or companies)
- Discussions related to posts
- Relationships between users and between fan pages
- Relationships between posts (your friends are posting about XYZ idea too!)
- Simulation picks user to sign up (pick a random user from the list and then cross it off)
- Simulation picks user to sign up (cannot be first user again unless the user comes with a pseudonym) (decide pseudonym and if not, pick a different random user and mark it used)
- Simulation picks post for user 1 to contribute (pick a random post from the list and cross it off)
- Simulation picks post for user 1 to contribute (cannot be first post again unless user 2 is a pseudonym of 1) (pick another random post...)
- Simulation picks post for user 2 to contribute (might be either post 1 or 2 or some other post) (pick another random post)
To get a handle on the size of this test database, consider users and posts. An attribute of a user/fan page is posts-per-day/ posting frequency. Given that each of those entities will have a given posting frequency, we will need to understand the total number of posts that a 5000 user social network can generate in the course of t days. (m posters (users), f posting frequency (average posts per user per day), t duration of simulation (days) , so p=mft is the simple equation for total posts in the system in units of posts). Also a "real" discussion sometimes gets more or less play than others so that means we will need a lot of fake discussions....
A whole test database... wow:
- t days worth of activity (suggest 1 year)
- m user entities posting at f posts/day (suggest 5k users per spec) (suppose average is 0.1 p/d but has wide variation , thanks Gauss)
- p posts (suggests 182,500 posts!!! all unique!! None can have junk data because we need to toss the meme detector at the posts!)
- discussions with more or less play (again no junk data for the meme detector)
- it's already huge and relationships between users, reposts still need to be treated; I think I need another article to write about all those things. (use some kind of connectedness term to show relationships/user.... I have over 200 social networking friends, for example) (posts visible to friends might be reacted to by friends... provoking posts etc but not to exceed the average posts per day for that specific user entity)
- mechanisms for providing "random" event data
- mechanisms for providing "repeatable" event data
- mechanisms for relating entities that mirror the capabilities of the test social network
No comments:
Post a Comment