-
Notifications
You must be signed in to change notification settings - Fork 41
Reading graph files
We support many formats for reading in graph files. Currently, the only restriction is that vertices be numbered from 1, 2 .. <nvertices>.
Files should be named as file0, file1, .. etc. GraphMat will try to read one file per rank at a time until there are no more files left. For mtx files, there can be more or less files than MPI ranks. Given file prefix, GraphMat will continue trying to read files fileprefix0, fileprefix1 and so on until the file read fails.
For files with header information, every file should have the header information. The header in each file should be "local" i.e. it should refer to the number of edges in that file only.
We can read text mtx files without any comments.
-
To read in text format with no header
Say file "edgefile0" has the following lines
1 2 1
2 3 2
This edgelist has two edges 1 -> 2 with weight 1 and 2 -> 3 with weight 2. Load this as follows -
typedef int edge_type;
GraphMat::edgelist_t<edge_type> E;
GraphMat::load_edgelist("edgefile", &E, false, false, true); //the three boolean parameters are: binaryformat, header, edgeweights_present
E.m = std::max(E.m, E.n); //possible that without headers, the edgelist is interpreted as a "rectangular" matrix.
E.n = E.m; //Fix by setting E to be a "square" matrix by setting both dimensions to be the max of the two.
GraphMat::Graph<vertex_property, edge_type> G;
G.ReadEdgelist(E);
E.clear();
-
To read in text format with a <nvertices nvertices nedges> header
Say file "edgefile0" has the following lines
3 3 2
1 2 1
2 3 2
Change to GraphMat::load_edgelist("edgefile", &E, false, true, true);
.
-
To read in text format with no header and no edge weights
Say file "edgefile0" has the following lines
1 2
2 3
Change to GraphMat::load_edgelist("edgefile", &E, false, false, false);
. By default, all the edges are given a weight of 1.
Use GraphMat::load_edgelist("edgefile", &E, true, XXX, XXX);
by setting the appropriate values for header and edgeweights_present boolean arguments. The binary mtx files can be read much faster than text files in general.
If your binary mtx file has header and edge weights, then we also provide a function to directly create a graph from that file (named edgelist0, edgelist1 ... ):
typedef int edge_type; //if the edge type is int
GraphMat::Graph<vertex_property, edge_type> G;
G.ReadMTX("edgelist");
You can create the binary mtx files from the text edge list using the graph_converter
utility (see below).
This is an experimental file format for reading/writing GraphMat compatible graph files. This format is extremely fast to load. However, there are limitations. The file can be loaded only with the same number of MPI Ranks and OpenMP threads as was used to save the file. This may be useful in scenarios where the runtime system configuration is not expected to change between runs.
This file can be loaded in GraphMat code as follows:
typedef int edge_type;
GraphMat::Graph<vertex_property, edge_type> G;
G.ReadGraphMatBin("edgefile");
You can create the GraphMat binary files from the text/binary mtx files using the graph_converter
utility (see below).
You can convert between GraphMat compatible file formats using
the graph_converter
utility.
To convert from a text file with 3 white space separated columns
(src, dst, edge_value
) with no header and integer weights to binary mtx format, do
mpirun -np <NRANKS> bin/graph_converter --selfloops 1 --duplicatededges 1 --inputformat 1 --outputformat 0 --inputheader 0 --outputheader 1 --nvertices < nvertices > < input text file prefix> < output graphmat file prefix>
This command reads input files (inputfile0, inputfile1, inputfile2 .. inputfile<n>) in text format and converts it into files (outfile0, outfile1, outfile2 ... outfile<nranks>) in the required format.
You can remove selfloops and duplicate edges (when multiple edges with same src and dst are found, only one is retained) by changing their values in the command line from 1 to 0.
Run ./bin/graph_converter
to get a list of options and transformations available. This tool is useful to generate bidirectional edges, adding random edge weights, converting between file formats etc.