Remove a loop, adding a new dependency or having two loops
I'm developing a game with C++ and Unreal Engine 5.4.4.
I currently have a problem on how to remove a loop without having to add a new dependency to a module.
Unreal has modules for grouping related classes. I have a module to access a database (SQLite 3) and another module to perform mathematical calculations. First I get the data, using a loop to get it all. Then I go back through the data to do some calculations. My problem is that it is a lot of data, more than 200,000 records and twice.
If I do the calculations inside the database method, I add a dependency to the database module. But if I leave it as it is, it takes twice as long.
Also, I don't think it's right to do calculations in a module whose job is to retrieve data from a database. Or maybe it can be done.
How would you do it?
Knowing nothing about the data structure details, I would say:
it's very common to have a separate module which is the data access layer ("repository" pattern)
if you have to process every record, then you have to loop over them - but that doesn't imply they all have to be in memory at the same time unless they're interdependent
this is a common case for an iterator pattern.
You can do that one of two ways. You can put the iteration in the repository and have a callback function which is called per-record which does the calculation; or you have the repository return some sort of "results" object which provides something like a Next() method. You call Next() until you run out of result records.
The iterator pattern would call sqlite3_step; the callback of sqlite3_exec is the other approach.
When weâre talking about 200,000 records I suspect the problem isnât the second loop directly. Itâs that you have to put those records somewhere.
If youâre blasting them out to a file youâll take a IO hit. If youâre keeping them in memory thereâs a good chance you have a cache miss if you arenât careful. Either will cost you time that goes well beyond simple looping.
The strategy youâre employing is called slurping. Youâre loading all the data before acting on it. Sometimes this is required.
The typical alternative is called chunking. You read as much as is handy to load (respecting both DB and cache limitations) and then do your work on that chunk. So long as the work only depends on what is in the chunk this is simple. When the work is done the result is saved off somewhere it wonât interfere with getting the next chunk. If this save is small enough that it doesnât hit the DB, file system, network, or cause a cache miss you can see some real savings here.
Sometimes some accumulating state is required as you go from chunk to chunk. This is fine so long as it doesnât increase the required memory enough to cause a cache miss.
Far too many people who study big O notation focus on time cycles. They forget that big O can also be used for space. Which can slow you down as well. You can trade between them. Itâs actually called the time-space trade off.