RoaringBitmap/real-roaring-datasets

Name: real-roaring-datasets

Owner: Roaring bitmaps: A better compressed bitset

Description: for benchmarking other implementations, just the datasets from https://github.com/RoaringBitmap/RoaringBitmap/tree/master/real-roaring-dataset/src/main/resources/real-roaring-dataset

Created: 2016-12-19 22:18:24.0

Updated: 2018-05-12 09:42:16.0

Pushed: 2016-12-19 22:30:22.0

Homepage:

Size: 45219

Language: null

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Real data sets for bitmap testing

See also https://github.com/RoaringBitmap/CRoaring/tree/master/benchmarks/realdata for uncompressed .txt versions.

Essentially, each file represents a set of integer values. You can create bitmaps out of these files.

In many cases, the description of the data sets is provided in :

To be used with software published on http://roaringbitmap.org/

Files starting with the prefix “dimension” were prepared by Xavier Léauté from a Druid dump.


Some bitsets from a real world example, that can be used to transform them into BitSets

Can be deserialized by first reading an int, that is the amout of rows to come (e.g. 1925630 rows) A row is read by first reading an int, the amount of longs to come (e.g. 96 longs), and then reading those longs. Used DataInputStream to write this.

See also BitSetUtilBenchmark.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.