The Detail-Backfill Pattern or DataLoader Before It Was Cool

While optimization is often premature and, therefore, the root of all evil[1], using an efficient design for data access is choosing a good path rather than paving a bad one. A common task for APIs is to retrieve a “page” of master rows and their detail rows (e.g. Invoice and Items), as well as a total count of the master rows that match the given criteria. Here are three ways to get the data from a standard SQL database:

Option 1: for each row of the master table query result, execute a query per detail table to get the detail rows for that master row (also known, derisively, as N+1) #

Pro #

Con #

Option 2: join all the detail rows to the master rows in one query #

Pro #

Con #

Option 3: get the master rows with one query and all the details for those rows in one other query per detail table #

Pro #

Con #

The effects of these three implementations can affect the usage, if not the interface, of the function

Pseudo-code Examples #

Option 1: Run a Query on the Detail for each Master

DataQuery = "SELECT *"
CountQuery = "SELECT COUNT(*) AS count"
DataQuery & CountQuery += " FROM table T1 WHERE T1.status = 'active'"
DataQuery += " LIMIT @startOffset, @pageSize

Result = empty Array

execute(CountQuery)
if (CountQuery.count <= startOffset) OR (pageSize == 0) OR 
  (startOffset > CountQuery.count)
    return CountQuery.count, Result

execute(DataQuery)
for each DataQuery.row
    store master column data to new CurrentMaster
    append CurrentMaster to Result

    DetailQuery = "SELECT * from detail T2 \
                 WHERE T2.master_id = @CurrentMaster.id

    execute(DetailQuery)
    for each DetailQuery.row
        append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

Option 2: SQL Join the Master and Detail
This code is for one detail table. Every additional table adds a join to the query

CountQuery = "SELECT COUNT(*) AS count FROM table T1"
DataQuery = "SELECT * FROM table T1 \
             JOIN detail T2 ON (T1.id = T2.master_id)"
DataQuery & CountQuery += "WHERE T1.status = 'active'"
DataQuery += "ORDER BY T1.id, T2.id"
Result = empty Array

execute(CountQuery)
if CountQuery.count == 0
    return CountQuery.count, Result

CurrentMaster = null
execute(DataQuery)
MasterRowIndex = 0
for each DataQuery.row
    if (CurrentMaster == null) OR (CurrentMaster.id != DataQuery.id)
        MasterRowIndex += 1
        if MasterRowIndex > startOffset + pageSize
            break

        store master column data to new CurrentMaster
        if MasterRowIndex > startOffset
            append CurrentMaster to Result

    if MasterRowIndex > startOffset
        append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

Because joining the Master and Detail tables can produce 1 - n rows for each master row, one cannot paginate using the query LIMIT clause. One must load enough rows to ensure that one can return a page-full of rows

Option 3: Manage the Join between Master and Detail in Code

CountQuery = "SELECT COUNT(*) AS count FROM table T1"
DataQuery = "SELECT * FROM table T1"
DataQuery & CountQuery += "WHERE T1.status = 'active'"
DataQuery += " LIMIT @startOffset, @pageSize", 
Result = empty Array

execute(CountQuery)
if (CountQuery.count <= startOffset) OR (pageSize == 0) OR 
  (startOffset > CountQuery.count)
    return CountQuery.count, Result

execute(DataQuery)
for each DataQuery.row
    append master column data to Result

MasterIds = the id values for all master rows in Result
DetailQuery = "SELECT * from detail T2 \
                 WHERE T2.master_id IN @MasterIds \
                 ORDER BY T2.master_id"

CurrentMaster = null
execute DetailQuery
for each DetailQuery.row
    if (CurrentMaster == null) OR 
       (CurrentMaster.id != DetailQuery.master_id)

        CurrentMaster = find Result[] for DetailQuery.master_id

    append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

One can iterate Result to gather the MasterIds or save them in a list as one fetches the master rows

To find an item in Result[] for a master id in order to add a detail result, one can search Result linearly or index it depending on the possible size of Result, memory cost, etc.

[1]: “…; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

 
0
Kudos
 
0
Kudos

Now read this

We Can Afford To Buy Vowels

TL;DR Programs must be written for people to read, and only incidentally for machines to execute —Harold Abelson In the old days <eyeroll/>, we had 26 variables: the letters of the alphabet. The next leap in programming, we could... Continue →