Dark theme | Light theme

April 21, 2023

Groovy Goodness: Calculate The Median Of A Collection

Since Groovy 4 we can use SQL like queries on in-memory collections with GINQ (Groovy-Integrated Query). GINQ provides some built-in aggregate functions like min, max, sum and others. One of these functions is median. With median we can get the value that is in the middle of the sorted list of values we want to calculate the median for. If the list has an uneven number of elements the element in the middle is returned, but if the list has an even number of elements the average of the two numbers in the middle is returned.

In the following example we see the use of the median function with GINQ:

// List of uneven number of response times.
def responseTimes = [201, 200, 179, 211, 350]

// Get the median from the list of response times.
// As the list has an uneven number of items
// the median is in the middle of the list after
// it has been sorted.
assert GQ {
    from time in responseTimes
    select median(time)
}.toList() == [201]

// List of even number of response times.
responseTimes = [201, 200, 179, 211, 350, 192]

// 2 numbers are the median so the result
// is the average of the 2 numbers.
assert GQ {
    from time in responseTimes
    select median(time)
}.findResult() == 200.5

// Use the GQ annotation and return a List from the method.
def medianSize(List<String> values) {
    from s in values
    // We can also use an expression to get the median.
    // Here we take the size of the string values to
    // calculage the median.
    select median(s.size())

assert medianSize(["Java", "Clojure", "Groovy", "Kotlin", "Scala"]) == [6]

// Sample data structure where each record
// is structured data (map in this case).
// Could also come from JSON for example.
def data = [
    [test: "test1", time: 200],
    [test: "test1", time: 161],
    [test: "test2", time: 427],
    [test: "test2", time: 411],
    [test: "test1", time: 213]

// We want to get each record, but also
// the median for all times belonging to a single test.
// We can use the windowing functions provided by GINQ
// together with median.
def query = GQ {
    from result in data
    orderby result.test
    select result.test as test_name,
           result.time as response_time,
           (median(result.time) over(partitionby result.test)) as median_per_test

assert query
        .collect { row -> [name: row.test_name,
                           response: row.response_time,
                           median: row.median_per_test] } ==
    [name: "test1", response: 200, median: 200],
    [name: "test1", response: 161, median: 200],
    [name: "test1", response: 213, median: 200],
    [name: "test2", response: 427, median: 419],
    [name: "test2", response: 411, median: 419]

Written with Groovy 4.0.11.