Sequences vs Collections#
Introduction#
Do you often work on large collections with tons of transformations required? You have to know how to improve the performance.
Collections#
- New collection is created after each transformation
- Good for small list with low amount of transformations needed
- Transformations are executed on whole collection
- Non-Lazy
Sequences#
- Creates TransformingSequence instead of new collections
- Good for large list with a lot of transformations needed
- Transformations are executed on each element
- Lazy
Code#
The task:
Cleanup those urls. Remove protocol, not needed characters. Remember that’s just example.
For test purposes we will multiply initial list
val urls = listOf(
"https://www.example.com/?battle=authority",
"https://example.com/?achiever=bubble#amusement",
"http://actor.example.net/?blow=amusement&afterthought=adjustment",
"http://www.example.net/box/badge?beef=bottle#bag",
"http://example.com/",
"https://www.example.com/blow#balance",
"http://brake.example.com/box.aspx",
"https://www.example.com/base",
"https://example.net/belief",
"http://www.example.com/",
"http://www.example.com/",
"https://breath.example.com/afterthought.htm?bit=balance",
"https://example.net/bite/believe",
"http://www.example.com/#bird",
"http://example.com/behavior/basin.html?airport=berry&bedroom=acoustics",
"https://www.example.net/",
"https://www.example.com/books?bead=ants",
"http://www.example.com/",
"http://www.example.com/anger",
"http://beginner.example.com/birds?account=badge&amusement=bike#birds"
)
val manyUrls = urls * 100
println("Urls ${manyUrls.count()}")
val (nonSequenceList, nonSequenceTime) = measureTimedValue {
manyUrls
.filter { it.contains("example.com") }
.map { it.removePrefix("https://") }
.map { it.removePrefix("http://") }
.map { it.removePrefix("www.") }
.map { it.removeSuffix("/") }
.map { "$it/random_path" }
}
val (sequenceList, sequenceTime) = measureTimedValue {
manyUrls
.asSequence()
.filter { it.contains("example.com") }
.map { it.removePrefix("https://") }
.map { it.removePrefix("http://") }
.map { it.removePrefix("www.") }
.map { it.removeSuffix("/") }
.map { "$it/random_path" }
.toList()
}
println("Count Sequence ${sequenceList.count()} vs Collection ${nonSequenceList.count()}")
println("Time Sequence $sequenceTime vs Collection $nonSequenceTime")
}
Remember to import needed resources
@file:OptIn(ExperimentalTime::class)
import kotlin.time.ExperimentalTime
import kotlin.time.measureTimedValue
.filter { it.contains("example.com") }
.map { it.removePrefix("https://") }
.map { it.removePrefix("http://") }
.map { it.removePrefix("www.") }
.map { it.removeSuffix("/") }
.map { "$it/random_path" }
Run | Sequence | Collection |
---|
1 | 10.655706ms | 17.304207ms |
2 | 10.392919ms | 20.190484ms |
3 | 10.275253ms | 18.017698ms |
Run | Sequence | Collection |
---|
1 | 19.171929ms | 31.647107ms |
2 | 15.764996ms | 27.570382ms |
3 | 17.748853ms | 31.490926ms |
Run | Sequence | Collection |
---|
1 | 27.254874ms | 53.043007ms |
2 | 27.231176ms | 52.437605ms |
3 | 27.075628ms | 55.805330ms |
Run | Sequence | Collection |
---|
1 | 55.448596ms | 124.119595ms |
2 | 49.847435ms | 134.345523ms |
3 | 52.328476ms | 144.570364ms |
Run | Sequence | Collection |
---|
1 | 420.379349ms | 759.274169ms |
2 | 368.751423ms | 689.146264ms |
3 | 373.576750ms | 691.970425ms |
.filter { it.contains("example.com") }
Run | Sequence | Collection |
---|
1 | 10.474949ms | 12.888582ms |
2 | 8.225425ms | 12.196659ms |
3 | 7.857158ms | 13.368340ms |
Run | Sequence | Collection |
---|
1 | 9.959861ms | 13.648944ms |
2 | 15.699487ms | 24.050235ms |
3 | 12.971994ms | 16.896744ms |
Run | Sequence | Collection |
---|
1 | 12.637821ms | 19.825526ms |
2 | 11.659926ms | 18.008385ms |
3 | 12.154623ms | 20.246985ms |
Run | Sequence | Collection |
---|
1 | 27.254874ms | 39.902933ms |
2 | 27.231176ms | 52.437605ms |
3 | 27.075628ms | 55.805330ms |
Run | Sequence | Collection |
---|
1 | 145.232857ms | 85.404677ms |
2 | 140.731783ms | 83.639622ms |
3 | 142.480267ms | 89.809333ms |
Summary#
You can have huge performance gain if you work with large sets of data. It also can reduce performance in some cases so use carefully.