Sequences vs Collections

Introduction

Do you often work on large collections with tons of transformations required? You have to know how to improve the performance.

Collections

  • New collection is created after each transformation
  • Good for small list with low amount of transformations needed
  • Transformations are executed on whole collection
  • Non-Lazy

Sequences

  • Creates TransformingSequence instead of new collections
  • Good for large list with a lot of transformations needed
  • Transformations are executed on each element
  • Lazy

Performance

Code

The task: Cleanup those urls. Remove protocol, not needed characters. Remember that’s just example.

For test purposes we will multiply initial list

val urls = listOf(
        "https://www.example.com/?battle=authority",
        "https://example.com/?achiever=bubble#amusement",
        "http://actor.example.net/?blow=amusement&afterthought=adjustment",
        "http://www.example.net/box/badge?beef=bottle#bag",
        "http://example.com/",
        "https://www.example.com/blow#balance",
        "http://brake.example.com/box.aspx",
        "https://www.example.com/base",
        "https://example.net/belief",
        "http://www.example.com/",
        "http://www.example.com/",
        "https://breath.example.com/afterthought.htm?bit=balance",
        "https://example.net/bite/believe",
        "http://www.example.com/#bird",
        "http://example.com/behavior/basin.html?airport=berry&bedroom=acoustics",
        "https://www.example.net/",
        "https://www.example.com/books?bead=ants",
        "http://www.example.com/",
        "http://www.example.com/anger",
        "http://beginner.example.com/birds?account=badge&amusement=bike#birds"
    )

    val manyUrls = urls * 100
    println("Urls ${manyUrls.count()}")

    val (nonSequenceList, nonSequenceTime) = measureTimedValue {
        manyUrls
            .filter { it.contains("example.com") }
            .map { it.removePrefix("https://") }
            .map { it.removePrefix("http://") }
            .map { it.removePrefix("www.") }
            .map { it.removeSuffix("/") }
            .map { "$it/random_path" }
    }

    val (sequenceList, sequenceTime) = measureTimedValue {
        manyUrls
            .asSequence()
            .filter { it.contains("example.com") }
            .map { it.removePrefix("https://") }
            .map { it.removePrefix("http://") }
            .map { it.removePrefix("www.") }
            .map { it.removeSuffix("/") }
            .map { "$it/random_path" }
            .toList()
    }

    println("Count Sequence ${sequenceList.count()} vs Collection ${nonSequenceList.count()}")
    println("Time Sequence $sequenceTime vs Collection $nonSequenceTime")
}

Remember to import needed resources

@file:OptIn(ExperimentalTime::class)

import kotlin.time.ExperimentalTime
import kotlin.time.measureTimedValue

Performance tests

6 transformations

            .filter { it.contains("example.com") }
            .map { it.removePrefix("https://") }
            .map { it.removePrefix("http://") }
            .map { it.removePrefix("www.") }
            .map { it.removeSuffix("/") }
            .map { "$it/random_path" }

20 urls & 6 transformations

RunSequenceCollection
110.655706ms17.304207ms
210.392919ms20.190484ms
310.275253ms18.017698ms

2k urls & 6 transformations

RunSequenceCollection
119.171929ms31.647107ms
215.764996ms27.570382ms
317.748853ms31.490926ms

20k urls & 6 transformations

RunSequenceCollection
127.254874ms53.043007ms
227.231176ms52.437605ms
327.075628ms55.805330ms

200k urls & 6 transformations

RunSequenceCollection
155.448596ms124.119595ms
249.847435ms134.345523ms
352.328476ms144.570364ms

2kk urls & 6 transformations

RunSequenceCollection
1420.379349ms759.274169ms
2368.751423ms689.146264ms
3373.576750ms691.970425ms

1 transformation

            .filter { it.contains("example.com") }

20 urls & 1 transformation

RunSequenceCollection
110.474949ms12.888582ms
28.225425ms12.196659ms
37.857158ms13.368340ms

2k urls & 1 transformation

RunSequenceCollection
19.959861ms13.648944ms
215.699487ms24.050235ms
312.971994ms16.896744ms

20k urls & 1 transformation

RunSequenceCollection
112.637821ms19.825526ms
211.659926ms18.008385ms
312.154623ms20.246985ms

200k urls & 1 transformation

RunSequenceCollection
127.254874ms39.902933ms
227.231176ms52.437605ms
327.075628ms55.805330ms

2kk urls & 1 transformation

RunSequenceCollection
1145.232857ms85.404677ms
2140.731783ms83.639622ms
3142.480267ms89.809333ms

Summary

You can have huge performance gain if you work with large sets of data. It also can reduce performance in some cases so use carefully.