# Finding Popular Places using Trajectories

Recently, I have read the following paper: finding popular places published in the 18th international conference on Algorithms and computation (ISAAC 2007). I have presented the paper with Abhigna adusumilli and Rahul Saladi in the advanced computational geometry seminar. The paper is quite interesting as the ubiquity of mobile devices (e.g., Smart Phones, GPS) made it easy to track humans (objects) movement. Storing users (objects) visits to different places can be leveraged to come up with new insights on how users move from one place to another. In the paper, the authors propose techniques to find the popular places by analyzing a set of given trajectories for n different entities and \tau time steps. A popular place is the one that is visited by at least K different entities. To this end, the users defines two different models to addressing the problem (1) Discrete Model and (2) Continuous Model, as follows:

Discrete Model: A popular place is defined as the unit square that is visited by k different entities. An entity i visits a unit square \sigma if at some point t \in \tau, i is within the spatial extents of \sigma.

Continuos Model: A popular place is defined as the unit square that is intersected by the trajectory of k different entities.

The problem of finding the most popular place in the discrete model is defined as follows:
Input: A set of n entities moving the 2D space for \tau time steps, forming a set of \taun points in space.
Output: The axis aligned unit square\sigma that is visited by the highest number of entities.

Finding the most popular place in the discrete model can be solved by employing a colored range counting algorithm. That can be achieved by exhaustively applying the colored range counting query of each unit square around each point in \taun points. That results into 2-approximation algorithm to solve the problem, which requires O(\tau^2n^2log^2\tau n) time and space.

The authors proposes an initial algorithm to solve the problem which relies of sweeping a vertical strip over the given \tau n points. A start event happens when a point p, belonging to entity \Lambda, enters the sweep strip. An end event happens when a point p, belonging to entity \Lambda, is about to leave the sweep strip. For each event, we check the rectangle R_p of width 1 and height 2, in which point p represents the midpoint of the right side of R_p. The algorithm sweeps a unit square over R_p while keeping of the number of entities in the sweep square to get the most popular unit square. The algorithm requires O(\tau n log \tau n) time per event as the number of points in R_p is O(\tau n). That means that the naive algorithm requires O(\tau^2 n^2 log \tau n) time to find the most popular unit square.

The authors presented a new technique to get the most popular place in the discrete model. They build a set of y-intervals I_\Lambda of length one for each entity \Lambda. Each interval I represents a y-interval for a point p (x_p, y_p) such that I is equal to [y_p-1/2, y_p+1/2]. The most popular unit square is the one for which the center belongs to as many Intervals as possible (each interval belongs to a different entity).

To this end, the proposed technique relies on the following data structure to maintain the set of intervals for each event (point) during the sweep: (1) B_\Lambda. (2) T_\Lambda, described as follows:

B_\Lambda: It is a balanced binary search tree storing all points in entity \Lambda ordered with respect to their y-coordinates. B_\Lambda can be queried and maintained in O(log\tau) time and space using Red-Black Trees such that \tau is the number of points in entity \Lambda.

T_\Lambda: It is a tree that stores all intervals I_\Lambda. To this end, it stored all endpoints of all intervals in I_\Lambda. The authors proved that maintaining T_\Lambda requires O(log \tau) time and O(\tau) space.

Throughout the sweep, the set of all B_\Lambda trees \cal{B}^{ent} and the set of T_\Lambda trees \cal{T}^{ent} can be maintained in O(\tau n log \tau) time and O(\tau n) space. Then, the authors explained how a single data structure T_{int} can be maintained for all entities during the sweep. T_{int} is a tree structure that stores all the intervals belonging to all entities. T_{int} stores the start and end points in \cal{I} such that \cal{I}=\cup I_\Lambda.
The main algorithm finds the leaf node in T_{int} that stabs the highest number of intervals. As the intervals for a single entity are disjoint, the algorithm guarantees that the intervals stabbed by a leaf node belong to different entities. The authors proved that finding the most popular place using T_{int} can be reported in O(\tau nlog\tau n) time using O(\tau n) space.

This page contains a single entry by elsay004 published on May 11, 2012 8:46 PM.

Spatial Location matters in Recommender Systems was the previous entry in this blog.

A quick look inside Sindbad is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.