BinaryIndexed Trees (BIT) were first introduced by Boris Ryabko in 1989 and later popularized by Peter M. Fenwick in A New Data Structure for Cumulative Frequency Tables (1994). This data structure provides an efficient solution to the problem of range queries and updates.
Given a positive integer $n$, two indices $i$ and $j$ such that $0 \le i \le j < n,$ and an integer $d$, implement an arraylike type $T$ that provides the following operations:
 $\texttt{QUERYRANGE}(T, i, j)$ returns the sum of the values in the range $i..j,$
 $\texttt{UPDATE}(T, i, d)$ adds $d$ to the value at index $i.$
The goal is to minimize the complexity of the solution, evaluated by performing a sequence of $q$ calls to $\texttt{QUERYRANGE}$ and $u$ calls to $\texttt{UPDATE}$ in arbitrary order.
The full source code for the solutions presented in this article is available here.
§Introduction
Let's review two common solutions to this problem that demonstrate the tradeoff between fast queries and fast updates.
§Naive solution
The naive solution is the direct translation of the problem statement into code:
If you are are not entirely familiar with C++:
NaiveSolution()
is a constructor, responsible for initializing class members of new instances.A(n)
initializes the array withn
elements set to their default value (0 for integers).assert(bool)
is used to enforce preconditions, causing the program to abort when violated. It ensures indices passed as arguments are valid.size_t
is an unsigned integer type that represents the size of objects. It is commonly used for indexing into arrays (or vectors), and because it is always positive, you only have to assert values of this type against the size of the array.
Assuming $n$ is the maximum number of elements that can be stored in $T$, here are the time and space complexities of this implementation:
Method  Time  Space 

queryRange 
$\mathcal{O}(n)$  $\mathcal{O}(1)$ 
update 
$\mathcal{O}(1)$  $\mathcal{O}(1)$ 
As a reminder:
 The complexity of an algorithm tells how much time or space it consumes relative to the size of the input, where "time" is best understood as the number of steps.
 $\mathcal{O}(.)$ (big O) denotes an upper bound on the complexity of an algorithm, but it commonly indicates the worst case complexity (least upper bound).
For example:
 $\mathcal{O}(1)$ indicates the algorithm takes constant time or space, regardless of the size of the input.
 $\mathcal{O}(n)$ indicates the algorithm takes linear time or space, relative to an input of size $n$. That means the time or space the algorithm takes is proportional to the size of the input.
Going back the original problem, in a series of $q$ calls to queryRange
and $u$ calls to update
, the naive solution takes
$\mathcal{O}(qn+u)$ time and $\mathcal{O}(1)$ space.
§Prefix sums
It is possible to frame this problem in terms of prefix sums. For all integers $k$ such that $0 \le k < A$, the prefix sum $P[k]$ is defined as $\sum_{i = 0}^k A[i]$. The following figure shows an array $A$ and its prefix sum array $P$:
$\begin{array}{ccccccc} A[i] & 0 & 1 & 2 & 3 & 4 & 5\\ \hline P[i] & 0 & 1 & 3 & 6 & 10 & 15 \end{array}$The cumulative sum of the elements in a subarray (called partial sum) can be expressed as the difference between two prefix sums. Given two indices $i$ and $j$ such that $0 < i \le j < A$:
$\begin{align*} A[i] + A[i + 1] + \ldots + A[j] &= \sum_{k = 0}^j A[k]  \sum_{k = 0}^{i  1} A[k] \\ &= P[j]  P[i  1]. \end{align*}$Instead of having $T$ directly expose the values from the internal array $A$ like naive solution, you can replace $A$ by its prefix sum array $P$ to implement queries and updates:
You should not to confuse the interface that $T$ exposes (an arraylike
type with the methods queryRange
and update
), and how this interface is
actually implemented ($P$ is a prefix sum array, hence $P[i] \neq
T[i]$).
The prefix sum solution has the following complexity:
Method  Time  Space 

queryRange 
$\mathcal{O}(1)$  $\mathcal{O}(1)$ 
update 
$\mathcal{O}(n)$  $\mathcal{O}(1)$ 
In a sequence of $q$ calls to queryRange
and $u$ calls to
update
, the total time complexity is $\mathcal{O}(q + un)$. Given an
arbitrary sequence of queries and updates, this solution doesn't beat the naive
solution.
§A better solution
You can represent the prefix sum array $P$ from the previous section as a sequence of ranges covering parts of the original array $A$:
To understand the inefficiency of updates, consider that changing the first element implies updating all the prefix sums that contain this element. That explains why the maximum number of steps for updates is proportional to the number of elements, matching the $\mathcal{O}(n)$ time complexity we determined previously.
§Aggregating partial sums
We need to find a way to split the information contained in this array so both queries and updates are performed in fewer steps, assuming the worst possible sequence of operations. This rearrangement must consists of:
 Less overlapping between prefix sums, to provide better update performance. The only way to do that is to rely on partial sums instead of full prefix sums, with just enough overlapping not to sacrifice range query performance.
 A simple heuristic to compose partial sums into full prefix sums, so we can apply the same algorithm as the prefix sum solution to efficiently compute range queries.
Here's the incremental construction of candidate arrangements up to 3 elements:
 On the left side, there is only one layer. Each partial sum only aggregates a single element, so the partial sum array is simply equal to the original array. This arrangement corresponds to the naive solution.
 The right side has one layer for each new element, and each of them represents a full prefix sum. This arrangement corresponds to the prefix sum solution.
In the middle, the arrangement provides better worst case performance for both queries and updates than the two other solutions:
 The first two elements have a regular prefix sum that you can access in a single operation.
 The third partial sum only contains the value of the third element, making its update a single operation.
Thus, at most two steps are required to compute any prefix sums:
This decrease in query efficiency compared to the prefix sum solution allows updates of any elements in at most 2 steps instead of 3:
You may find other arrangements that appear equally efficient. But our goal is to compute full prefix sums from these partial sums in a way that generalizes to any number of elements. That means you cannot arrange them arbitrarily, their structure needs well defined properties that we will review next.
§Making it scale
The previous arrangement decreases the maximum number of steps required for both queries and updates compared to the naive solutions, at least up to 3 elements. But that tells us nothing about the complexity of the associated algorithm.
To evaluate its efficiency, the only thing that matters is how the number of operations scales with the size of the input. In particular, you want to make sure that adding new elements doesn't increase the number of steps proportionally, otherwise it would only be as good as our naive solutions.
Suppose you have to go from 3 to 4 elements by adding as few additional steps as possible. For the 4th element, there is no other choice than adding another step, either for queries or for updates:
We can establish that starting with 4 elements, a BIT requires at least 3 update steps and 2 query steps. But if the 4th partial sum is the full prefix sum shown on the right of the previous figure, then you can add 2 more elements without changing the maximum number of steps for both operations:
Note how the structure of the last two partial sums matches the arrangement of the first two. This is a good sign that some generalization is possible.
To better understand the relationship between the number of items and the number of steps, try to increase the number of elements to 8 while minimizing the maximum number of steps for both queries and updates. You should find a structure that looks like this:
A BIT with 8 elements requires at least 4 update steps and 3 query steps, whereas 3 update steps and 2 query steps were required for 4 elements. We managed to double the size of the input without adding more than 1 step to either queries and updates. That gives us good hopes for reaching a $\mathcal{O}(\log n)$ time complexity.
§Implementation
The two core operations of BITs are range queries and point updates. To keep things as simple as possible, this section ignores the element at index 0 (as if we were working with 1indexed array).
§Range queries
To compute the result of a query, you have to aggregate partial sums until you can reconstruct two full prefix sums, and then take the difference between them like in the naive prefix sum solution.
For all $i > 1,$ the next index after $i$ is the index of the next nonoverlapping partial sum you have to consider in order to reconstruct the full prefix sum up to $i$.
In the example that follows, the sequence of next indices is $7 \rightarrow 6 \rightarrow 4,$ because adding the nonoverlapping partial sums $T[7] + T[6] + T[4]$ allows you to retrieve the full prefix sum $P[7]$:
In the previous figure, each index is written in binary. Going to the next index is equivalent to clearing the least significant set bit (LSSB):
If your bit manipulation is a bit rusty:

i  1
sets all the bits until the LSSB, which becomes unset: 
i & (i  1)
clears all the new bits that were set ini  1
before the LSSB, and also the LSSB itself because it was cleared ini  1
:
Given nextIndex
, you can implement a function prefixSum
that returns the
prefix sum $P[i]$ from the BIT. Compared to the prefix sum solution, this
method replaces the direct indexing into the prefix sum array $P$:
It follows the implementation of queryRange
:
The time complexity of queryRange
depends on the time complexity of
prefixSum
, and consequently, on the number of partial sums this function
needs to aggregate in order to recompute a full prefix sum. Given that the
number of such partial sums corresponds to the number of next indices:
 The maximum index cannot exceed the length of the array $n$, which means any valid index can be written with only $\lfloor\log_2 n\rfloor + 1$ bits.
 At each step, the algorithm clears the LSSB, which effectively removes 1 set bit from the index, so the number of steps cannot exceed $\lfloor\log_2 n\rfloor + 1$, the maximum number of bits from the previous point.
As a conclusion, this implementation of queryRange
works in
$\mathcal{O}(\log n)$ time, $\mathcal{O}(1)$ space.
§Point updates
The point update operation is similar to its implementation in the prefix sum solution: you have to update all the partial sums that contain the changed element.
For all $i > 1$, the parent index of $i$ is the index of the smallest (by number of elements) overlapping partial sum greater (by number of elements) than the $i\text{th}$ partial sum.
In the following example, the sequence of parent indices is $5 \rightarrow 6 \rightarrow 8$ because the partial sums $T[5]$, $T[6]$, and $T[8]$ all contain $A[5]$, and they are ordered by increasing number of elements:
Looking back at the binary representation, you can go from any index $i$
to its parent by adding the LSSB, computed as i & (i)
:
Consider how ~i
(the negation of i
) behaves around the LSSB:
Performing i & ~i
gives you 0, but looking at the binary representation,
you could almost get the LSSB if only the least significant unset bit in
~i
was set:
Since what's on the right of the LSSB doesn't matter for the &
operation
(by definition of the LSSB, these bits are unset in i
) you can just add 1
to ~i
:
The &
operation between i
and ~i+1
gives you the LSSB:
Because negative integers are represented using the two's complement
operation, we have i = ~i + 1
, and the formula for the LSSB simplifies
to i & (i)
.
It follows the implementation of update
:
Because each iteration adds the LSSB up to the maximum index $n$, and adding the LSSB moves it by at least 1 bit towards the most significant bit, there can't be more than $\log n$ parent indices.
As a conclusion, this implementation of update
works in $\mathcal{O}(\log
n)$ time, $\mathcal{O}(1)$ space.
§Construction
Now that we have the core BIT operations in place, let's see how to build a BIT from an array.
§Using update
update
The simplest method is to initialize an empty BIT (where all the partial sums are initially set to 0), and to call update for each element:
The time complexity is $n$ times the complexity of update
, which equals
$\mathcal{O}(n\log n)$, but we can do better.
§Linear time
Partial sums in BITs have two interesting properties:
 Parent indices are greater than their children.
 The partial sum at index
i
ends with the element at that index.
So you can update the array from left to right, adding the value of the current element to its parent, and keep on iterating. Each time you reach a new element, you have already accumulated the partial sums from its descendants (property 1), and the current element is the partial sum at that position (property 2):
The complexity is much better than the previous solution: $\mathcal{O}(n)$ time.
§Constant space
Our constructors so far work in linear space because the reference to the input vector is only valid while inside the constructor. When you further assign this vector to the class member, all of its elements are copied in $\mathcal{O}(n)$.
To eliminate the copy from the input vector to the class member, you can
indicate to the compiler that it can move the ownership of the vector with the
function std::move
:
The generated code will just copy a few bytes for the vector layout (address to the data, size, capacity), instead of copying all the elements over.
Because you can only move an object that you own, a copy is still happening when the input vector is provided to the constructor, unless the caller also allows the compiler to move it, using the same technique.
The final complexity for this implementation is $\mathcal{O}(n)$ time, $\mathcal{O}(1)$ space.
§Index 0
The previous section ignored the element at index 0 because it makes the implementation simpler, but there are two ways to reintroduce this element:
 Adding it back as a special case (the easiest and it's commonly how BITs are described).
 Tweaking the formulas of
nextIndex
andparentIndex
to make it work (more difficult).
Let's look at both.
§Special case
The first solution is to handle index 0 as a special case.
This solution only requires minor tweaks to queryRange
and update
but this
is nothing new. Looking back at the prefix sum solution, we handled index 0
similarly.
First, you have to include $T[0]$ in the computation of the prefix sums:
Then, you have to update queryRange
, because calling queryRange(0, j)
would
call prefixSum(1)
which is undefined:
As for update
, the first index is handled in a similar way:
For the constructor, you just have to the start iterating at index 0:
§Tweaked formulas
Alternatively, you could shift the representation of the BIT to include the index 0 as a regular partial sum.
As an exercise, try to find how you could adjust nextIndex
and parentIndex
to make that work (spoilers ahead).
§Advanced modes of operation
We've only covered one mode of operation: range queries / point updates. But regular BITs are the building block for two other modes of operation.
§Point queries / range updates
The point queries / range updates problem consists in implementing an arraylike type $T$ that supports the following operations:
 $\texttt{QUERY}(T, i)$ returns the value at index $i$.
 $\texttt{UPDATERANGE}(T, i, j, v)$ adds $v$ to all the values in the range $i..j$.
Starting from a regular BIT (range queries / point updates), you could implement:
query(i)
asqueryRange(i, i)
,updateRange(i, j, v)
by callingupdate(k, v)
for $k$ in $i..j.$
Unfortunately, the time complexity of updateRange
would be
$\mathcal{O}(n\log n)$. The implementation from the naive solution can
already provide a better $\mathcal{O}(n)$ time complexity for this
method:
It is helpful to step back and examine the values of query(i)
after calling
updateRange(2, 4, v)
on an empty BIT:
In the introduction, you saw that prefix sums can shift a constant time
complexity from updates to queries. Indeed, the previous sequence of query(i)
values can be interpreted as prefix sums of the following array:
So you can implement this problem using prefix sums:
query(i)
returns the $i$th prefix sum of $T_1$ in $\mathcal{O}(n)$ time,updateRange(i, j, v)
adds $v$ to $T_1[i]$ and $v$ to $T_1[j]$ in $\mathcal{O}(1)$ time.
The astute reader would have recognized the mode of operation of a regular BIT:
query(i)
is the same asprefixSum(i)
,updateRange(i, j, v)
can be implemented asupdate(i, v)
followed byupdate(j + 1, v)
.
It follows the implementation of a point queries / range updates BIT from a range queries / point updates BIT:
Both of these methods take $\mathcal{O}(\log n)$ time, $\mathcal{O}(1)$ space.
Looking at this implementation, you may be wondering why do we need another
class instead of just implementing updateRange
on a regular BIT. The
reason is that it would be incompatible with the semantics of queryRange
.
After calling updateRange(2, 4, v)
on an empty BIT:
query(3)
would return $v$ as expected (because it is implemented asprefixSum(3)
),queryRange(3, 3)
would return 0 (because it is implemented asprefixSum(3)  prefixSum(2)
).
Similarly, you wouldn't want keep update
public because update(i, v)
doesn't have the same semantics as updateRange(i, i, v)
, which is
implemented as update(i, v)
followed by update(i, v)
.
A separate implementation makes it clear that the arraylike datastructure
exposed by the PointQueryRangeUpdate
BIT T
(which can be represented as
$[0, 0, v, v, v, 0, 0]$) differs from the arraylike datastructure
exposed by the RangeQueryPointUpdate
BIT T1
(which can be represented
as [0, 0, v, 0, 0, v, 0]).
That's why point queries / range updates is an entirely different mode of operation that answers to this specific problem. But you'll see in the next section that it is actually possible to implement a BIT that supports both range queries and range updates.
§Range queries / range updates
The range queries / range updates problem consists in implementing an arraylike type $T$ that supports the following operations:
 $\texttt{QUERYRANGE}(T, i, j)$ returns the sum of the values in the range $i..j,$
 $\texttt{UPDATERANGE}(T, i, j, v)$ adds $v$ to all the values in the range $i..j.$
§Example
The point queries / range updates BIT from the previous section only lacks
support for range queries. Implementing queryRange
by calling query
for
each element would give a $\mathcal{O}(n\log n)$ time complexity, so we
need another representation of this BIT to allow efficient range queries.
After calling updateRange(2, 4, v)
, the BIT could be represented by the
following array:
Applying the same logic as in the introduction, we would like to leverage prefix sums to perform efficient range queries on $A$, so let's look at $S$, the prefix sum array of $A$:
$S = [0,\ 0,\ v,\ 2v,\ 3v,\ 3v,\ 3v].$$S$ would have to support an operation that looks like point queries if we ever hope to compute efficient range queries on $A$, but it doesn't seem to have any of the properties that would make updates particularly efficient or that would make it suitable for being backed by a BIT.
The key insight is that you don't need an intermediate representation because you can almost express $S$ directly from $A$ by considering another array $B$ defined by $B[k] = k * A[k]$, such that
$\begin{align*} B &= [0,\ 0,\ 2v,\ 3v,\ 4v,\ 0,\ 0]. \end{align*}$This array doesn't quite match $S$:
 the values are shifted at the beginning,
 the prefix sums are missing at the end.
But this is something you can easily fix by adding some corrective terms, collected in the following array:
$\begin{align*} C & = [0,\ 0,\ v,\ v,\ v,\ 3v,\ 3v]. \end{align*}$You can verify that $S = B + C$:
$\def\arraystretch{1.2}\begin{array}{rllrrrrrrr} B & = & [0, & 0, & 2v, & 3v, & 4v, & 0, & 0]\\ C & = & [0, & 0, & v, & v, & v, & 3v, & 3v]\\ S = B + C & = & [0, & 0, & v, & 2v, & 3v, & 3v, & 3v] \end{array}$$C$ exhibits an interesting property: it is the prefix sum array of an array with only two nonzero elements:
$T_2 = [0,\ 0,\ v,\ 0,\ 0,\ 4v,\ 0].$Assuming we can generalize from this example:
 We already have access to efficient point queries / point updates on $A$ thanks to $T_1$, which allows point queries on $B$ with the same $\mathcal{O}(\log n)$ time complexity using the relation $B[k] = k * \texttt{query($k$)}$.
 We can use a regular BIT to represent $T_2$, providing $\mathcal{O}(\log n)$ time range queries / point updates on $C$.
That means we can implement efficient queries and updates for $S = B + C,$ which solves the problem of efficient range queries and updates on $A$.
§Generalization
Now that you get the intuition behind the solution, let's work through the
general case. After updateRange(i, j, v)
,
Hence, the prefix sum array $S$ of $A$ is defined by
$S[k] = \begin{cases} 0 &\text{if}\ 0 \le k \le i,\\ (k  i + 1)v &\text{if}\ i \le k \le j,\\ (j  i + 1)v &\text{if}\ j < k < n. \end{cases}$In an attempt to approximate $S$, consider $B[k] = k * A[k]$, which gives
$B[k] = \begin{cases} 0 &\text{if } 0 \le k \le i,\\ kv &\text{if } i \le k \le j,\\ 0 &\text{if } j < k < n. \end{cases}$To verify $S[k] = B[k] + C[k]$, the corrective terms array $C$ is defined by
$C[k] = \begin{cases} 0 &\text{if } 0 \le k \le i,\\ (i + 1)v &\text{if } i \le k \le j,\\ (j  i + 1)v &\text{if } j < k < n. \end{cases}$As a result, both $A$ and $C$ can be represented by point queries / range updates BITs:

$A$ can be represented by $T_1$, as shown in the previous section. To compute $B[k]$ from $T_1$, you just have to multiply the result of
T1.prefixSum(k)
byk
. 
$C$ can be represented by $T_2$, another regular BIT updated as follows:
T2.update(i, (i + 1) * v)
T2.update(j + 1, j * v)
§Implementation
Range queries / range updates BITs are implemented using two range queries / point updates BITs:
The complexity for queryRange
and updateRange
is $\mathcal{O}(\log
n)$ time.
§Going further
Here are common problems that can be solved using BITs:

2D range queries and updates on matrices. [GeeksForGeeks]

Counting the number of element inversions in an array. [GeeksForGeeks]

Maintaining the XOR of a range of elements instead of their sum. [GeeksForGeeks]

Orderstatistic tree (BIT + common algorithm). [GeeksForGeeks]

Queue reconstruction by height (BIT + common algorithm). [LeetCode]

Maximum sum increasing subsequence (BIT + another data structure). [GeeksForGeeks]

The problem of arithmetic coding, that requires storing and updating cumulative frequencies, and lead to the discovery of BITs.
Some operation have more efficient solutions:

Implementation of updates with lower constant factor. [Wikipedia]

Binary search in $\mathcal{O}(\log n)$ instead of $\mathcal{O}(\log\log n)$. [SOI, CodeForces]
Finally, you may be interested by segment trees. They are harder to implement, have a higher constant factor compared to BITs, but they provide a wider range of operations and also work with noninvertible functions (for example, you could compute the min/max of a subrange of elements).