Introduction

Community-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Infomap, which optimizes the objective function known as the map equation. While Infomap is known to be an effective algorithm, its serial implementation cannot take advantage of multicore processing in modern computers.

Thus, we propose a novel parallel generalization of Infomap called RelaxMap. This algorithm relaxes concurrency assumptions to avoid lock overhead, achieving high parallel efficiency in shared-memory multicore experiments while exhibiting similar convergence properties and finding similar community structures as the serial algorithm.

In addition to parallelization, we have worked on more efficient searching mechanism for the flow-based community detection algorithms. The new searching mechanism examines only subset of vertices, which selected based on the neighbors' status change during the previous iteration, for finding better outputs in each iteration, called prioritized search. We achieve more efficient search time, up to about 1.5 times faster, for the flow-based community detection without loss of output quality.

For the details of the RelaxMap algorithm, please refer to our paper appeared in the Proceedings of 2013 ICDM Workshops.

Source Code

You can find the source code of the RelaxMap from its Github repository.

Publications

Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, Bill Howe, "Scalable Flow-Based Community Detection for Large-Scale Network Analysis," in Proceedings of IEEE International Conference on Data Mining Workshops (ICDMW 2013), Dallas, Texas, Dec. 2013.

Dependency

RelaxMap is implemented in C++ and uses OpenMP for shared-memory parallelism, so your C++ compiler should contain OpenMP library in order to compile and run in parallel.

If your C++ compiler contains OpenMP library, you can compile this application by simple 'make' command.

User Guide

How to run RelaxMap

[Usage]: $ ./ompRelaxmap [seed] [network data] [# threads] [# attempts] [threshold] [vThresh] [maxIter] [outDir] [prior/normal]

example: $ ./ompRelaxmap 1 data/flow.txt 4 1 1e-3 0.0 10 outputs/ prior

The Required Arguments:

args[0]: seed - random seed value for generating random sequential order of vertices for each iteration.
args[1]: network data - the input graph data.
       RelaxMap supports 1) pajek format (.net) and 2) edge list format (.txt).
args[2]: # thread - the number of threads
args[3]: # attempts - the number of attempts.
       (this is not applied yet, so it only return with 1 attempt.)
args[4]: threshold - the stop condition threshold (recommended 1e-3 or 1e-4)
args[5]: vThresh - the threshold value for each vertex movement (recommended 0.0)
args[6]: maxIter - the number of maximum iteration for each super-step.
args[7]: outDir - the directory where the output files will be located.
args[8]: prior/normal flag - apply the prioritized search for efficient runs (prior) or not (normal).

Output Files

  • .clu file:
    This file will represent the assigned module (community) ID for each vertex in the ascending order of vertex ID.

    The .clu file looks like below:
    *Vertices 16
    1
    1
    1
    1
    7
    7
    7
    7
    12
    12
    12
    12
    13
    13
    13
    13
    example .clu file: flow.clu

  • .tree file:
    This file will show members (vertices) of each module with the names of members and corresponding pageranks. At first line, it will provide the MDL value of this mapping and the number of modules.

    The .tree file looks like below:
    # Code length 3.32773 in 4 modules.
    1:1 0.0820133 "1"
    1:2 0.0790863 "2"
    1:3 0.0429867 "3"
    1:4 0.0459137 "4"
    2:1 0.0820133 "5"
    2:2 0.0790863 "6"
    2:3 0.0429867 "7"
    2:4 0.0459137 "8"
    3:1 0.0820133 "9"
    3:2 0.0790863 "10"
    3:3 0.0429867 "11"
    3:4 0.0459137 "12"
    4:1 0.0820133 "13"
    4:2 0.0790863 "14"
    4:3 0.0429867 "15"
    4:4 0.0459137 "16"
    example .tree file: flow.tree
  • Related Resources

    www.mapequation.org

    The Map Equation is the objective function for searching communities with respect to the network flows. In this website, you can find a community detection algorithm, called Infomap, based on the map equation, as well as various useful information related to the map equation and the Infomap.

    SNAP Network Data Collection From this network data repository, you can find a bunch of realworld network datasets with various sizes.

    Contact Us

    If you have any question or comment with respect to RelaxMap project, please contact us:

    Seung-Hee Bae:
    Bill Howe: