I have a table with structure like this:
create table to_much_data ( id primary key clustered, dt datetime, data varbinary(400) )
it didn't have an index by datetime, but i know that dt non-decreasing sequence. I need to query data from this table with specific condition by date field like this:
select * from to_much_data where dt between '20190220' and '20190221'
since no index for dt, i prefer to convert query to:
select * from to_much_data where id between StartDateID and EndDateID
I believe that
EndDateID could be found with log(N) or better complexity. But I didn't know any solution to do this.
Does any one know the way to do it?
It looks like no wide-known ready to use solution exists. If index creation is not possible some workarounds can be used:
- filtered index, but it can affect table performance and increase locks
- another table with mapping, but it needs to be manually updated (or through trigger or stored procedure) and it can affect performance and increase locks
- t-sql code with silly binary search but it looks like bicycle reinventing
despite this I believe that databases can be more effective and intuitive in some cases like this. I will be glad if some day we will be able to write:
select * from to_much_data with(sequence_order(id asc, dt asc)) where dt between '20190220' and '20190221'
You could just reproduce the binary search algorithm in TSQL or use a recursive CTE but this is still going to need greater than 70 seeks to get both ends and is tedious to do.
A possible middle ground might be to create an indexed view with at least every nth row. For example
CREATE VIEW dbo.to_much_data_Sample WITH SCHEMABINDING AS SELECT id, dt FROM dbo.to_much_data WHERE id % 100000 = 0 GO CREATE UNIQUE CLUSTERED INDEX ix ON dbo.to_much_data_Sample(dt, id);
you can then use (assuming
id is an integer)
DECLARE @StartDate DATETIME = '20190220', @EndDate DATETIME = '20190221'; DECLARE @StartDateID INT, @EndDateID INT; SELECT TOP 1 @StartDateID = id FROM dbo.to_much_data_Sample WITH (NOEXPAND) WHERE dt < @StartDate ORDER BY dt DESC; SELECT TOP 1 @EndDateID = id FROM dbo.to_much_data_Sample WITH (NOEXPAND) WHERE dt > @EndDate ORDER BY dt ASC; SELECT * FROM to_much_data WHERE id BETWEEN isnull(@StartDateID, -2147483648) AND isnull(@EndDateID, 2147483647) AND dt BETWEEN @StartDate AND @EndDate;
The value of
n will be a trade off between index size and number of additional rows read at runtime.