simpleweb/randumb

Name: randumb

Owner: Simpleweb

Description: Adds ability to pull back random records from Active Record

Created: 2014-02-21 16:04:24.0

Updated: 2014-02-21 18:03:44.0

Pushed: 2014-02-21 18:01:45.0

Homepage:

Size: 226

Language: Ruby

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

randumb

Gem Version Build Status Code Climate

randumb is a ruby gem that allows you to easily pull random records from your database of choice

Requires ActiveRecord >= 3.0.0 and supports SQLite, MySQL and Postgres/PostGIS

Install
d the following to you Gemfile
'randumb'

date your bundle
le install
Usage
st.random # a random Artist if there are any, otherwise nil
st.random(3)  # an array of three Artists picked at random
st.random(1)  # an array containing one random Artist
Scopes
ndumb works like the active record "all, first, and last" methods
st.has_views.includes(:albums).where(["created_at > ?", 2.days.ago]).random(10)

If only 5 records matched the conditions specified above, randumb will return an array with those 5 records in random order (as opposed to 10 records with duplicates).

How It Works

randumb simply tacks an additional `ORDER BY RANDOM()(or ``RAND()``` for mysql) to your query.

It will have the least amount of sort precedence if you include other orders in your scope.

Advanced Usage
Stacking the Deck

You can use the `random_weighted` method to favor certain records more than others.

For example, if you want to favor higher-rated Movies, and your Movie model has a numeric `score` column, you can do any of the the following:

e.random_weighted(:score)
e.random_weighted_by_score
turns 1 random movie by:
lect * from movies ORDER BY (score * RANDOM() DESC)

e.random_weighted(:score, 10)
e.random_weighted_by_score(10)
turns an array of up to 10 movies and executes:
lect * from movies ORDER BY (score * RANDOM() DESC) LIMIT 10
Planting A Seed

If you wish to seed the randomness so that you can have predictable outcomes, provide an optional integer seed to any of randumb's method calls:

suming no no records have been added between calls
ese will return the same 2 artists in the same order both times
st.random(2, seed: 123) 
st.random(2, seed: 123) 
Pick Your Poison

The adventurous may wish to try randumb's earlier algorithm for random record selection: `random_by_id_shuffle`.

You cannot apply weighting when using this method and limits/orders also behave a little differently:

mmie 5 random artists that are in the top 100 most viewed
sts = Artist.limit(100).order("view_count DESC").random_by_id_shuffle(5)

ecutes:
lect artist.id from artists ORDER BY view_count DESC LIMIT 100
 ruby:  artist_ids = ids.shuffle[0..4]
lect * from artists WHERE id in (artist_ids)

Compare this to the default `random()which will use the lesser of the limits you provide and apply ``ORDER BY RANDOM()``` sorting after any other orders you provide.

elligerently) Gimme the top 5 artists and I'll pointlessly provide a limit of 100!
us I want artists with the same view count to be sorted randomly!
is clearly a silly thing to do...
sts = Artist.limit(100).order("view_count DESC").random(5)

ecutes:
lect * from artists ORDER BY view_count DESC, RANDOM() LIMIT 5
A Note on Performance

As stated above, by default, randumb uses a simple approach of applying an order by random() statement to your query. In many sets, this performs well enough to not really be a big deal. However, as many blog posts and articles will note, the database must generate a random number for each row matching the scope and this can result in rather slow queries for large result sets. The last time I tested randumb on a test data set with 1 million rows (with no scopes) it took over 2 seconds.

In earlier versions of randumb I tried to alleviate this by doing two db queries. One to select the possibly IDs into an array, and a second with a randomly selected set of those ids. This was sometimes faster in very high data sets, however, for most sizes I tested, it did not perform significatly better than ORDER BY RAND() and it had the possibility of running out of memory due to selecting all the ids into into a ruby array.

If you are noticing slow speeds on your random queries and you have a very very large database table, my advice is to scope down your query to a subset of the table via an indexed scope. Ex: `Artist.where('views > 10').randomThis will result in less calls to RAND() and a faster query. You might also experiment with the old method by using ``random_by_id_shuffle``` and gauge the resulting speeds.

ActiveRecord Caching

By default, ActiveRecord keeps a cache of the queries executed during the current request. If you call random multiple times on the same model or scope, you will end up with the same SQL query again, which causes the cache to return the result of the last query. You will see the following in your log if this happens:

st Load (0.3ms)  SELECT "artists".* FROM "artists" ORDER BY RANDOM() LIMIT 1
E (0.0ms)  SELECT "artists".* FROM "artists" ORDER BY RANDOM() LIMIT 1

Fortunately, there is an easy workaround: Just wrap your query in a call to `uncached, e.g. ``Artist.uncached { Artist.random }```.

Why

I built this for use on Compare Vinyl. Check out the homepage to see it in action :)


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.