Recently, I have been interested in analyzing social networking data. Mainly, I have been playing with Foursquare data. To those who don't know what Foursquare is, it is a mobile location-based social network application. Foursquare users are associated with a home city, and alert friends when visiting a venue (e.g., restaurant) by "checking-in" on their mobile phones. During a "check-in", users can also leave "tips", which are free text notes describing what that they liked about the venue. Any other user can add the "tip" to her "to-do list" if interested in visiting the venue. Once a user visits a venue in the "to-do list" , she marks it as "done". Also, users who check into a venue the most are considered the "mayor" of that venue.
I found out that we can come up with cool findings if we statistically analyze Foursquare data. The data was fetched through APIs provided by Foursquare, which are pretty similar to Facebook APIs (but not exactly the same). Collecting the data is not a piece of cake though; as you may know, all social networking applications start to lock their data as it gets more user base. Anyways, one can always find his way to such data, it is not a big deal after all.
Given such data, I thought of posing this question: "Do Foursquare users visit the same places as their friends?". For instance, if Bob is a Foursquare user, the question is whether he checks-in at the same spatial locations (restaurants, theaters...) as his Foursquare friends. To this end, I have applied a simple statistical analysis on 2186 Foursquare users in the twin cities area. For each user, all her friends were retrieved and all the places she visited. The average number of friends for each user is 44 and the average number of visits per user is 14. The users were then classified into three categories (1) popular, (2) moderate, and (3) unpopular, based upon the number of friends they have. For instance, if the user has 200 friends or more, he is a popular user. If the user has 70 to 200 friends, he is a moderate user, and if he has less than 70 friends, he is unpopular. Different class is assigned for each user (1) active, (2) moderate, (3) inactive, based upon his number of visits. For example, if the user has more than 90 visits he is an active user, 40 to 90 visits he is a moderate user, and less than 40 visits he is an inactive user. The histograms for the user activity as well as user popularity are given in the following figures.
In order to answer the aforementioned question, we define a parameter called "CoVisits Ratio" for each user is as follows: Let U be the set of user's visits and n be the total number of friends of the user. Fi is the set of visits of friend i, and V is the set of unique visits among the user and all his friends.
The average CoVisits ratio for each user category (active, moderate, inactive) and also for the other three user categories (popular, moderate, unpopular) was calculated. In the following two figures, the red bars represents the CoVisits ratio for the user with his friends and the blue bar represents the CoVisits ration for the user with random users (who are not his friends). These random users were selected to be exactly the same as the user's friends (with same activity level) As depicted in both figures, in the case of moderate activity and moderate popularity, users seems to visits place that their friends visit more than those places visited by strangers (non-friends).
The correlation between being friends and the number of co-visits between two users was measured, and the result is shown in figure 3. The X-axis represents the total number of co-visits between any pair of users and the y-axis represents the total number of users pairs. The blue bar represents the total number of user pairs which are friends and the red bar represents the total number of users pairs that are not friends. Notice that the values for (X1 to X3) are trimmed from the graph as they are very large compared to other values. In addition, a correlation test was applied between two variables for each pair of users: (1) Being friends (1) or not friends (0), and (2) Number of Covisits between this pair of users. A Pearson and Spearman correlation methods were applied (using R statistical analysis tool) on both variables and the correlation coefficient was 0.1113512 and 0.08696049, respectively and the 95% confidence interval is [ 0.1100980 , 0.1126042 ]. That means that there is a positive correlation between being friends and visiting the same places, but the correlation is not high.
The analysis showed that there is a small positive correlation between being friends and visiting the same spatial locations. I believe the results could be extended to answer a more general question which is "Do friends in online social networking websites are also friends in real life (i.e., hang out together in real life) ?". Even though the question was raised before in many articles (Kari Henley 2009, T. J. Borchard 2011, etc...), I still believe figuring out an answer to this question using spatial co-visits could be a good future research direction.