Remove a loop, adding a new dependency or having two loops

image

I'm developing a game with C++ and Unreal Engine 5.4.4.

I currently have a problem on how to remove a loop without having to add a new dependency to a module.

Unreal has modules for grouping related classes. I have a module to access a database (SQLite 3) and another module to perform mathematical calculations. First I get the data, using a loop to get it all. Then I go back through the data to do some calculations. My problem is that it is a lot of data, more than 200,000 records and twice.

If I do the calculations inside the database method, I add a dependency to the database module. But if I leave it as it is, it takes twice as long.

Also, I don't think it's right to do calculations in a module whose job is to retrieve data from a database. Or maybe it can be done.

How would you do it?

Knowing nothing about the data structure details, I would say:

it's very common to have a separate module which is the data access layer ("repository" pattern)

if you have to process every record, then you have to loop over them - but that doesn't imply they all have to be in memory at the same time unless they're interdependent

this is a common case for an iterator pattern.

You can do that one of two ways. You can put the iteration in the repository and have a callback function which is called per-record which does the calculation; or you have the repository return some sort of "results" object which provides something like a Next() method. You call Next() until you run out of result records.

The iterator pattern would call sqlite3_step; the callback of sqlite3_exec is the other approach.

When we’re talking about 200,000 records I suspect the problem isn’t the second loop directly. It’s that you have to put those records somewhere.

If you’re blasting them out to a file you’ll take a IO hit. If you’re keeping them in memory there’s a good chance you have a cache miss if you aren’t careful. Either will cost you time that goes well beyond simple looping.

The strategy you’re employing is called slurping. You’re loading all the data before acting on it. Sometimes this is required.

The typical alternative is called chunking. You read as much as is handy to load (respecting both DB and cache limitations) and then do your work on that chunk. So long as the work only depends on what is in the chunk this is simple. When the work is done the result is saved off somewhere it won’t interfere with getting the next chunk. If this save is small enough that it doesn’t hit the DB, file system, network, or cause a cache miss you can see some real savings here.

Sometimes some accumulating state is required as you go from chunk to chunk. This is fine so long as it doesn’t increase the required memory enough to cause a cache miss.

Far too many people who study big O notation focus on time cycles. They forget that big O can also be used for space. Which can slow you down as well. You can trade between them. It’s actually called the time-space trade off.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70