Skip to content

Reading graph files

Narayanan edited this page Apr 20, 2017 · 4 revisions

GraphMat File I/O

File reading

We support many formats for reading in graph files. Currently, the only restriction is that vertices be numbered from 1, 2 .. <nvertices>.

File naming and distributed I/O

Files should be named as file0, file1, .. etc. GraphMat will try to read one file per rank at a time until there are no more files left. For mtx files, there can be more or less files than MPI ranks. Given file prefix, GraphMat will continue trying to read files fileprefix0, fileprefix1 and so on until the file read fails.

For files with header information, every file should have the header information. The header in each file should be "local" i.e. it should refer to the number of edges in that file only.

Reading text mtx files

We can read text mtx files without any comments.

  • To read in text format with no header

    Say file "edgefile0" has the following lines

1 2 1
2 3 2

This edgelist has two edges 1 -> 2 with weight 1 and 2 -> 3 with weight 2. Load this as follows -

    typedef int edge_type;
    
    GraphMat::edgelist_t<edge_type> E;
    GraphMat::load_edgelist("edgefile", &E, false, false, true); //the three boolean parameters are: binaryformat, header, edgeweights_present
    E.m = std::max(E.m, E.n); //possible that without headers, the edgelist is interpreted as a "rectangular" matrix.
    E.n = E.m; //Fix by setting E to be a "square" matrix by setting both dimensions to be the max of the two.
    GraphMat::Graph<vertex_property, edge_type> G;
    G.ReadEdgelist(E);
    E.clear();
  • To read in text format with a <nvertices nvertices nedges> header

    Say file "edgefile0" has the following lines

3 3 2
1 2 1
2 3 2

Change to GraphMat::load_edgelist("edgefile", &E, false, true, true);.

  • To read in text format with no header and no edge weights

    Say file "edgefile0" has the following lines

1 2
2 3

Change to GraphMat::load_edgelist("edgefile", &E, false, false, false);. By default, all the edges are given a weight of 1.

Reading binary mtx files

Use GraphMat::load_edgelist("edgefile", &E, true, XXX, XXX); by setting the appropriate values for header and edgeweights_present boolean arguments. The binary mtx files can be read much faster than text files in general.

If your binary mtx file has header and edge weights, then we also provide a function to directly create a graph from that file (named edgelist0, edgelist1 ... ):

    typedef int edge_type; //if the edge type is int
    
    GraphMat::Graph<vertex_property, edge_type> G;
    G.ReadMTX("edgelist");

You can create the binary mtx files from the text edge list using the graph_converter utility (see below).

Reading GraphMat binary files (experimental)

This is an experimental file format for reading/writing GraphMat compatible graph files. This format is extremely fast to load. However, there are limitations. The file can be loaded only with the same number of MPI Ranks and OpenMP threads as was used to save the file. This may be useful in scenarios where the runtime system configuration is not expected to change between runs.

This file can be loaded in GraphMat code as follows:

    typedef int edge_type;
    
    GraphMat::Graph<vertex_property, edge_type> G;
    G.ReadGraphMatBin("edgefile");

You can create the GraphMat binary files from the text/binary mtx files using the graph_converter utility (see below).

Using the graph converter utility to convert between file formats

You can convert between GraphMat compatible file formats using the graph_converter utility.

To convert from a text file with 3 white space separated columns (src, dst, edge_value) with no header and integer weights to binary mtx format, do

    mpirun -np <NRANKS> bin/graph_converter --selfloops 1 --duplicatededges 1 --inputformat 1 --outputformat 0 --inputheader 0 --outputheader 1 --nvertices < nvertices > < input text file prefix> < output graphmat file prefix>

This command reads input files (inputfile0, inputfile1, inputfile2 .. inputfile<n>) in text format and converts it into files (outfile0, outfile1, outfile2 ... outfile<nranks>) in the required format.

You can remove selfloops and duplicate edges (when multiple edges with same src and dst are found, only one is retained) by changing their values in the command line from 1 to 0.

Run ./bin/graph_converter to get a list of options and transformations available. This tool is useful to generate bidirectional edges, adding random edge weights, converting between file formats etc.