0.8.2 on Kubuntu 20.04: extremely slow scan of large library
I'm encountering extremely slow scans by Strawberry 0.8.2 of my library (72K audio files) under Kubuntu 20.04. The machine should be sufficient: 8 × Intel Core i5-10210U CPU @ 1.60GHz, 32GB RAM. It takes nearly a whole day to build the database from scratch, and I can watch the process slowing down in the course of the day, on the console.
What I've tried, so far:
- tested scanning from a remote NAS vs. from local files: no significant scan speed difference, similar slowdown
- tested Clementine, in comparison, on both remote NAS vs. local files: finishes scan from scratch within 1-2 hours
What I've noticed:
- I don't know the source code, so I don't know whether I could speed up the scanning by having Strawberry place any "intermediate files" on a RAM disk (~/.local/share/strawberry/strawberry/strawberry.db has a size of about 290KB at the start of a full scan, and it does not grow until the end of the scanning process; after the scan has finished, its size is around 64MB)
- Updating the database, a day or more later, almost never completes within a day; Clementine takes about 2 minutes, in comparison
I'm running out of ideas what might go wrong or what else I could try?
jonas last edited by jonas
The scan times you are seeing does not sound normal. Even though the speed of the HDD is what is most important, not the CPU and RAM. What harddrive is it?
Where did you install Strawberry from? Are you running it as a snap, from PPA or from the regular releases? What audio files do you have? Do you have CUE sheets?
I compared clementine vs strawberry here using a stopwatch, and this is the result:
My collection has 26610 files including music and album covers in almost all album directories, mostly FLAC, but a few MP4, no CUE sheets. It is stored on a KINGSTON SA400S3 SSD 900GB. The HDD only have 200GB of space left which might or might not decrease the speed of the HDD.
I used the latest code of Clementine from source compiled with Qt 5.15.1.
I used the latest code of Strawberry from source compiled with the latest source of Qt 6 from the dev branch.
I use openSUSE thumbleweed.
First I rebooted the computer so any caching on the HDD is cleared. Then I added my music folder to Clementine, the scan was done in 9 minutes and 41 seconds.
Then I rebooted the computer (so the caching is cleared) and did the same with Strawberry, it was done in 9 minutes and 50 seconds, so only circa 10 seconds difference, but that's probably coincidental, ie other things on the computer going on, etc.
I have not tested on a regular mechanical drive, only SSD, the speed would be significantly slower on a regular HDD.
When removing the music directory and re-adding it the scan is done in 20 seconds (because of HDD cache).
I tried to replicate a similar setup as yours. Purged (including .config & .local files) & re-installed from scratch
- Clementine 1.3.1 (Ubuntu stock version)
- Clementine 1.4.0rc1-347-gfc4cb6fc7 (most recent build I could find; I'm using 1.4 RC1 as my standard player)
- Strawberry 0.8.2 (.deb downloaded straight from www.strawberrymusicplayer.org, as before)
My audio file collection consist of about 56k mp3s, 16k flacs, and less than 100 .oggs and others, combined.
I've put the HDD containing the collection into an external enclosure, connected via USB 3.0. Disabled all unnecessary services and background processes. Rebooted before every test run, and even let the HDD cool down, in between tests (verified equal start temperatures, via smartctl).
A scan from scratch is taking, respectively,
- 13 minutes with Clementine 1.3.1
- 32 minutes with Clementine 1.4 RC1 (roughly equal to your scan time, for ~3x the number of files)
- running since 3h 15 mins, still not finished, with Strawberry 0.8.2
Since a real HDD is used, a distinctive difference is audible: Clementine 1.3.1 and 1.4 RC1 give the disk heads a "proper workout", i.e. I heard very fast movements. While Strawberry 0.8.2 almost makes them sound as if every single file scan results in a separate head movement, i.e. much lower frequency of head movements.
Since I haven't changed any part of the hardware and even let the HDDs cool down before the next test run, it seems to me that Clementine and Strawberry must perform the scan differently, somehow. HDD speed does not seem to be a limiting factor here, judged by the other results.
I'm willing to try any test that you suggest.
It would be nice to figure out what is causing this, but this one was quite puzzling.
I tried some more things to try and reproduce the problem here.
I booted up Ubuntu 20.04 in a virtualbox and mounted the music as a virtualbox share.
Also tried to attach a USB HDD (non-SSD) with the same music.
The scan speed is normal on both attempts.
One thing you can check is how many tagreader processes Clementine uses vs Strawberry, you can check it with: ps xua|grep tagreader
In the newer builds of Clementine there is a setting for it too under the Behavior settings. While Strawberry does never use more than 4. But I doubt that could lead to hours of scanning, but it might worth checking, set Clementine to use 4 and see if it has the same problem then.
The collection/library watcher/scanner code isn't too much changed from Clementine. But the tagreader and taglib we use is completely changed.
I did not check with MP3, so the issue could be with MP3.
Another thing I could think of is different filesystems could trigger different behavior. I use ext4 for everything. What filesystem do you use?
Increasing tagreader workers to 8 (4 before) made the scan finish 1 minute sooner for my collection, not really worth it, since each process takes up around 72 MB RAM, but maybe there could be a setting like Clementine has now.
I realize that of course increasing tagreaders won't really make much difference as we use the blocking method in the collection watcher.
in the test installation, 2 tag readers were running.
Anyway, I see your efforts and testing and I don't think we can resolve this issue within a reasonable time frame. I think I'll simply reduce my collection to be scanned by around 50K files that I have mainly for archival purposes; I can manage them elsewhere. With 20K files, Strawberry performs ok, as-is. I'll continue to test and will let you know when the larger collection gets scanned fine.
BTW: Thank you for the great work on Strawberry, I like the configurability and the easy way of retrieving good quality cover images. So, I keep using it. As I saw on github, somebody else has already posted a wish for multiple genres per file, which seems to imply a lot of rework with respect to the tag reader and the database. It's really the one feature that would make Strawberry perfect for me. Using picard and EasyTag for tagging, I like to keep my collection clean and well-tagged, and I'd love to see the player list all tags separately, in the tree. But that's another story, for another day...
I think it would help if you start strawberry from a terminal, because then you see the files scanned, it might give a hint to what files take unreasonable time to scan.