A synthetic data generator for online social network graphs

9/10/2023

Metadata includes number of edges, nodes and triangles. You can access any desired files directly by crafting a HTTPS or AWS CLI URL using the following URL suffixes and instructions below.Ī csv file with metadata about the SNAP datasets below is available here : SNAP Metadata Real-world graphs from Stanford’s Large Network Dataset Collection ( ) as well as synthetic data at various scales generated using the scalable Graph500 Kronecker generator ( ) are being provided.Įach of the SNAP datasets is provided in both TSV (Tab-Separated Values) and MMIO (Matrix Market I/O) formats. Real and Synthetic Data for the Static Graph Challenge: Subgraph Isomorphism

Sparse DNNs generated using interpolated sparse versions of images in MNIST corpus resized to produce neural networks of varying dimensions. Sparse Deep Neural Networks with 65536 neurons per layer (very large, 16.3 GB) Sparse Deep Neural Networks with 16384 neurons per layer (large, 3.6 GB) Sparse Deep Neural Networks with 4096 neurons per layer (medium, 800 MB) Sparse Deep Neural Networks with 1024 Neurons per layer (small, 176 MB) Truth categories for MNIST are included for performing inference using DNN with specific numbers of layers. Synthetic Sparse Deep Neural Network data for the Sparse DNN Graph Challenge Official 2019 Sparse Deep Neural Network Challenge ( click to expand) Synthetic DNNs created using RadiX-Net with varying number of neurons and layers. The data is being presented in several file formats, and there are a variety of ways to access it.ĭata is available in the 'graphchallenge' Amazon S3 Bucket. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, which is intended as test data for scalable graph analysis algorithms. We are pleased to announce that Synthetic Data Showcase has been adopted by the UN International Organization for Migration ( IOM).Amazon is making the Graph Challenge data sets available to the community free of charge as part of the AWS Public Data Sets program. Synthetic Data Showcase started as a project within our Tech Against Trafficking initiative, and we believe that its ability to improve the representation of at-risk groups can help us solve pressing societal problems and build a more resilient world. Capable of being easily customized to meet specific visualization goals, these dashboards enable rich and code-free analysis independent of data science expertise. The synthetic and aggregate data are automatically loaded into a Power BI interface for interactive, privacy-preserving data exploration. We enable the selection of a privacy resolution k that provides both a minimum reporting threshold and rounding precision to prevent disclosing small counts that can pose privacy risks. The synthetic data is complemented with precomputed aggregate data for reportable, short attribute combinations that appear in the sensitive dataset. Attribute combinations that do not meet this privacy resolution aren’t disclosed to prevent singling out individual data subjects or linking small groups of subjects to known individuals in the real world. The algorithm constructs synthetic records whose attribute combination values appear at least a pre-determined number of times, k, in the original, sensitive dataset. Synthetic datasets are produced using our concept of, and algorithm for, k -synthetic anonymity. We developed and open-source Synthetic Data Showcase, an automated pipeline for generating both synthetic and aggregate datasets that conserve the utility of the original, along with dashboards for visualizing and exploring these derived datasets. Technical details for Synthetic Data Generator

0 Comments

A synthetic data generator for online social network graphs

Leave a Reply.

Author

Archives

Categories