The Detail-Backfill Pattern or DataLoader Before It Was Cool

While optimization is often premature and, therefore, the root of all evil[1], using an efficient design for data access is choosing a good path rather than paving a bad one. A common task for APIs is to retrieve a “page” of master rows and their detail rows (e.g. Invoice and Items), as well as a total count of the master rows that match the given criteria. Here are three ways to get the data from a standard SQL database:

Option 1: for each row of the master table query result, execute a query per detail table to get the detail rows for that master row (also known, derisively, as N+1) #

Pro #

Con #

Option 2: join all the detail rows to the master rows in one query #

Pro #

Con #

Option 3: get the master rows with one query and all the details for those rows in one other query per detail table #

Pro #

Con #

The effects of these three implementations can affect the usage, if not the interface, of the function

Pseudo-code Examples #

Option 1: Run a Query on the Detail for each Master

DataQuery = "SELECT *"
CountQuery = "SELECT COUNT(*) AS count"
DataQuery & CountQuery += " FROM table T1 WHERE T1.status = 'active'"
DataQuery += " LIMIT @startOffset, @pageSize

Result = empty Array

execute(CountQuery)
if (CountQuery.count <= startOffset) OR (pageSize == 0) OR 
  (startOffset > CountQuery.count)
    return CountQuery.count, Result

execute(DataQuery)
for each DataQuery.row
    store master column data to new CurrentMaster
    append CurrentMaster to Result

    DetailQuery = "SELECT * from detail T2 \
                 WHERE T2.master_id = @CurrentMaster.id

    execute(DetailQuery)
    for each DetailQuery.row
        append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

Option 2: SQL Join the Master and Detail
This code is for one detail table. Every additional table adds a join to the query

CountQuery = "SELECT COUNT(*) AS count FROM table T1"
DataQuery = "SELECT * FROM table T1 \
             JOIN detail T2 ON (T1.id = T2.master_id)"
DataQuery & CountQuery += "WHERE T1.status = 'active'"
DataQuery += "ORDER BY T1.id, T2.id"
Result = empty Array

execute(CountQuery)
if CountQuery.count == 0
    return CountQuery.count, Result

CurrentMaster = null
execute(DataQuery)
MasterRowIndex = 0
for each DataQuery.row
    if (CurrentMaster == null) OR (CurrentMaster.id != DataQuery.id)
        MasterRowIndex += 1
        if MasterRowIndex > startOffset + pageSize
            break

        store master column data to new CurrentMaster
        if MasterRowIndex > startOffset
            append CurrentMaster to Result

    if MasterRowIndex > startOffset
        append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

Because joining the Master and Detail tables can produce 1 - n rows for each master row, one cannot paginate using the query LIMIT clause. One must load enough rows to ensure that one can return a page-full of rows

Option 3: Manage the Join between Master and Detail in Code

CountQuery = "SELECT COUNT(*) AS count FROM table T1"
DataQuery = "SELECT * FROM table T1"
DataQuery & CountQuery += "WHERE T1.status = 'active'"
DataQuery += " LIMIT @startOffset, @pageSize", 
Result = empty Array

execute(CountQuery)
if (CountQuery.count <= startOffset) OR (pageSize == 0) OR 
  (startOffset > CountQuery.count)
    return CountQuery.count, Result

execute(DataQuery)
for each DataQuery.row
    append master column data to Result

MasterIds = the id values for all master rows in Result
DetailQuery = "SELECT * from detail T2 \
                 WHERE T2.master_id IN @MasterIds \
                 ORDER BY T2.master_id"

CurrentMaster = null
execute DetailQuery
for each DetailQuery.row
    if (CurrentMaster == null) OR 
       (CurrentMaster.id != DetailQuery.master_id)

        CurrentMaster = find Result[] for DetailQuery.master_id

    append detail column data to CurrentMaster.DetailArray

return CountQuery.count, Result

One can iterate Result to gather the MasterIds or save them in a list as one fetches the master rows

To find an item in Result[] for a master id in order to add a detail result, one can search Result linearly or index it depending on the possible size of Result, memory cost, etc.

[1]: “…; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

 
0
Kudos
 
0
Kudos

Now read this

Source Code as Poetic

source code, like poetry, is more than its words and symbols (or the bytes they represent) it has shapes that communicate its intentions While we might share Mr. Jourdain’s[1] delight, we should not ignore the shape of our code as well... Continue →