Range queries and updates with Binary-Indexed Trees

Binary-Indexed Trees (BIT) were first introduced by Boris Ryabko in 1989 and later popularized by Peter M. Fenwick in A New Data Structure for Cumulative Frequency Tables (1994). This data structure provides an efficient solution to the problem of range queries and updates.

Given a positive integer $n$ , two indices $i$ and $j$ such that $0 \le i \le j < n,$ and an integer $d$ , implement an array-like type $T$ that provides the following operations:

$\texttt{QUERY-RANGE}(T, i, j)$ returns the sum of the values in the range $i..j,$
$\texttt{UPDATE}(T, i, d)$ adds $d$ to the value at index $i.$

The goal is to minimize the complexity of the solution, evaluated by performing a sequence of $q$ calls to $\texttt{QUERY-RANGE}$ and $u$ calls to $\texttt{UPDATE}$ in arbitrary order.

The full source code for the solutions presented in this article is available here.

Content

Introduction
1. Naive solution
2. Prefix sums
A better solution
1. Aggregating partial sums
2. Making it scale
Implementation
Index 0
1. Special case
2. Tweaked formulas
Advanced modes of operation
1. Point queries / range updates
2. Range queries / range updates
Going further

§
Introduction

Let's review two common solutions to this problem that demonstrate the trade-off between fast queries and fast updates.

§
Naive solution

The naive solution is the direct translation of the problem statement into code:

naive.cpp

class NaiveSolution {
    std::vector<int> A;
public:
    NaiveSolution(size_t n) : A(n) {}

    int queryRange(size_t i, size_t j) {
        assert(i <= j && j < A.size());

        int ans = 0;

        for (size_t k = i; k <= j; k++) {
            ans += A[k];
        }

        return ans;
    }

    void update(size_t i, int v) {
        assert(i < A.size());

        A[i] += v;
    }
};

If you are are not entirely familiar with C++:

NaiveSolution() is a constructor, responsible for initializing class members of new instances.
A(n) initializes the array with n elements set to their default value (0 for integers).
assert(bool) is used to enforce preconditions, causing the program to abort when violated. It ensures indices passed as arguments are valid.
size_t is an unsigned integer type that represents the size of objects. It is commonly used for indexing into arrays (or vectors), and because it is always positive, you only have to assert values of this type against the size of the array.

Assuming $n$ is the maximum number of elements that can be stored in $T$ , here are the time and space complexities of this implementation:

Method	Time	Space
`queryRange`	$\mathcal{O}(n)$	$\mathcal{O}(1)$
`update`	$\mathcal{O}(1)$	$\mathcal{O}(1)$

As a reminder:

The complexity of an algorithm tells how much time or space it consumes relative to the size of the input, where "time" is best understood as the number of steps.
$\mathcal{O}(.)$ (big O) denotes an upper bound on the complexity of an algorithm, but it commonly indicates the worst case complexity (least upper bound).

For example:

$\mathcal{O}(1)$ indicates the algorithm takes constant time or space, regardless of the size of the input.
$\mathcal{O}(n)$ indicates the algorithm takes linear time or space, relative to an input of size $n$ . That means the time or space the algorithm takes is proportional to the size of the input.

Going back the original problem, in a series of $q$ calls to queryRange and $u$ calls to update, the naive solution takes $\mathcal{O}(qn+u)$ time and $\mathcal{O}(1)$ space.

§
Prefix sums

It is possible to frame this problem in terms of prefix sums. For all integers $k$ such that $0 \le k < |A|$ , the prefix sum $P[k]$ is defined as $\sum_{i = 0}^k A[i]$ . The following figure shows an array $A$ and its prefix sum array $P$ :

\begin{array}{c|c|c|c|c|c|c} A[i] & 0 & 1 & 2 & 3 & 4 & 5\\ \hline P[i] & 0 & 1 & 3 & 6 & 10 & 15 \end{array}

The cumulative sum of the elements in a subarray (called partial sum) can be expressed as the difference between two prefix sums. Given two indices $i$ and $j$ such that $0 < i \le j < |A|$ :

\begin{align*} A[i] + A[i + 1] + \ldots + A[j] &= \sum_{k = 0}^j A[k] - \sum_{k = 0}^{i - 1} A[k] \\ &= P[j] - P[i - 1]. \end{align*}

Instead of having $T$ directly expose the values from the internal array $A$ like naive solution, you can replace $A$ by its prefix sum array $P$ to implement queries and updates:

prefix-sums.cpp

class PrefixSumSolution {
    std::vector<int> P;
public:
    PrefixSumSolution(size_t n) : P(n) {}

    int queryRange(size_t i, size_t j) {
        assert(i <= j && j < P.size());

        if (i == 0) {
            return P[j];
        }

        return P[j] - P[i - 1];
    }

    void update(size_t i, int v) {
        assert(i < P.size());

        while (i < P.size()) {
            P[i] += v;
            i++;
        }
    }
};

You should not to confuse the interface that $T$ exposes (an array-like type with the methods queryRange and update), and how this interface is actually implemented ( $P$ is a prefix sum array, hence $P[i] \neq T[i]$ ).

The prefix sum solution has the following complexity:

Method	Time	Space
`queryRange`	$\mathcal{O}(1)$	$\mathcal{O}(1)$
`update`	$\mathcal{O}(n)$	$\mathcal{O}(1)$

In a sequence of $q$ calls to queryRange and $u$ calls to update, the total time complexity is $\mathcal{O}(q + un)$ . Given an arbitrary sequence of queries and updates, this solution doesn't beat the naive solution.

§
A better solution

You can represent the prefix sum array $P$ from the previous section as a sequence of ranges covering parts of the original array $A$ :

To understand the inefficiency of updates, consider that changing the first element implies updating all the prefix sums that contain this element. That explains why the maximum number of steps for updates is proportional to the number of elements, matching the $\mathcal{O}(n)$ time complexity we determined previously.

§
Aggregating partial sums

We need to find a way to split the information contained in this array so both queries and updates are performed in fewer steps, assuming the worst possible sequence of operations. This rearrangement must consists of:

Less overlapping between prefix sums, to provide better update performance. The only way to do that is to rely on partial sums instead of full prefix sums, with just enough overlapping not to sacrifice range query performance.
A simple heuristic to compose partial sums into full prefix sums, so we can apply the same algorithm as the prefix sum solution to efficiently compute range queries.

Here's the incremental construction of candidate arrangements up to 3 elements:

On the left side, there is only one layer. Each partial sum only aggregates a single element, so the partial sum array is simply equal to the original array. This arrangement corresponds to the naive solution.
The right side has one layer for each new element, and each of them represents a full prefix sum. This arrangement corresponds to the prefix sum solution.

In the middle, the arrangement provides better worst case performance for both queries and updates than the two other solutions:

The first two elements have a regular prefix sum that you can access in a single operation.
The third partial sum only contains the value of the third element, making its update a single operation.

Thus, at most two steps are required to compute any prefix sums:

This decrease in query efficiency compared to the prefix sum solution allows updates of any elements in at most 2 steps instead of 3:

You may find other arrangements that appear equally efficient. But our goal is to compute full prefix sums from these partial sums in a way that generalizes to any number of elements. That means you cannot arrange them arbitrarily, their structure needs well defined properties that we will review next.

§
Making it scale

The previous arrangement decreases the maximum number of steps required for both queries and updates compared to the naive solutions, at least up to 3 elements. But that tells us nothing about the complexity of the associated algorithm.

To evaluate its efficiency, the only thing that matters is how the number of operations scales with the size of the input. In particular, you want to make sure that adding new elements doesn't increase the number of steps proportionally, otherwise it would only be as good as our naive solutions.

Suppose you have to go from 3 to 4 elements by adding as few additional steps as possible. For the 4th element, there is no other choice than adding another step, either for queries or for updates:

We can establish that starting with 4 elements, a BIT requires at least 3 update steps and 2 query steps. But if the 4th partial sum is the full prefix sum shown on the right of the previous figure, then you can add 2 more elements without changing the maximum number of steps for both operations:

Note how the structure of the last two partial sums matches the arrangement of the first two. This is a good sign that some generalization is possible.

To better understand the relationship between the number of items and the number of steps, try to increase the number of elements to 8 while minimizing the maximum number of steps for both queries and updates. You should find a structure that looks like this:

A BIT with 8 elements requires at least 4 update steps and 3 query steps, whereas 3 update steps and 2 query steps were required for 4 elements. We managed to double the size of the input without adding more than 1 step to either queries and updates. That gives us good hopes for reaching a $\mathcal{O}(\log n)$ time complexity.

§
Implementation

The two core operations of BITs are range queries and point updates. To keep things as simple as possible, this section ignores the element at index 0 (as if we were working with 1-indexed array).

§
Range queries

To compute the result of a query, you have to aggregate partial sums until you can reconstruct two full prefix sums, and then take the difference between them like in the naive prefix sum solution.

For all $i > 1,$ the next index after $i$ is the index of the next non-overlapping partial sum you have to consider in order to reconstruct the full prefix sum up to $i$ .

In the example that follows, the sequence of next indices is $7 \rightarrow 6 \rightarrow 4,$ because adding the non-overlapping partial sums $T[7] + T[6] + T[4]$ allows you to retrieve the full prefix sum $P[7]$ :

BIT queries add least significant set bit to find next index

In the previous figure, each index is written in binary. Going to the next index is equivalent to clearing the least significant set bit (LSSB):

bit.cpp

size_t nextIndex(size_t i) {
    return i & (i - 1);
}

Explanation

If your bit manipulation is a bit rusty:

i - 1 sets all the bits until the LSSB, which becomes unset:
i & (i - 1) clears all the new bits that were set in i - 1 before the LSSB, and also the LSSB itself because it was cleared in i - 1:

Given nextIndex, you can implement a function prefixSum that returns the prefix sum $P[i]$ from the BIT. Compared to the prefix sum solution, this method replaces the direct indexing into the prefix sum array $P$ :

bit.cpp

size_t prefixSum(size_t i) {
    assert(i < T.size());

    int sum = 0;

    while (i > 0) {
        sum += T[i];
        i = nextIndex(i);
    }

    return sum;
}

It follows the implementation of queryRange:

bit.cpp

int queryRange(size_t i, size_t j) {
    assert(i <= j);

    return prefixSum(j) - prefixSum(i - 1);
}

The time complexity of queryRange depends on the time complexity of prefixSum, and consequently, on the number of partial sums this function needs to aggregate in order to recompute a full prefix sum. Given that the number of such partial sums corresponds to the number of next indices:

The maximum index cannot exceed the length of the array $n$ , which means any valid index can be written with only $\lfloor\log_2 n\rfloor + 1$ bits.
At each step, the algorithm clears the LSSB, which effectively removes 1 set bit from the index, so the number of steps cannot exceed $\lfloor\log_2 n\rfloor + 1$ , the maximum number of bits from the previous point.

As a conclusion, this implementation of queryRange works in $\mathcal{O}(\log n)$ time, $\mathcal{O}(1)$ space.

§
Point updates

The point update operation is similar to its implementation in the prefix sum solution: you have to update all the partial sums that contain the changed element.

For all $i > 1$ , the parent index of $i$ is the index of the smallest (by number of elements) overlapping partial sum greater (by number of elements) than the $i\text{-th}$ partial sum.

In the following example, the sequence of parent indices is $5 \rightarrow 6 \rightarrow 8$ because the partial sums $T[5]$ , $T[6]$ , and $T[8]$ all contain $A[5]$ , and they are ordered by increasing number of elements:

BIT updates subtract the least significant set bit to find parent index

Looking back at the binary representation, you can go from any index $i$ to its parent by adding the LSSB, computed as i & (-i):

bit.cpp

size_t parentIndex(size_t i) {
    return i + (i & (-i));
}

Explanation

Consider how ~i (the negation of i) behaves around the LSSB:

Performing i & ~i gives you 0, but looking at the binary representation, you could almost get the LSSB if only the least significant unset bit in ~i was set:

1 wanted in ~i at the position of i's LSSB

Since what's on the right of the LSSB doesn't matter for the & operation (by definition of the LSSB, these bits are unset in i) you can just add 1 to ~i:

The & operation between i and ~i+1 gives you the LSSB:

Because negative integers are represented using the two's complement operation, we have -i = ~i + 1, and the formula for the LSSB simplifies to i & (-i).

It follows the implementation of update:

bit.cpp

void update(size_t i, int v) {
    while (i < T.size()) {
        T[i] += v;
        i = parentIndex(i);
    }
}

Because each iteration adds the LSSB up to the maximum index $n$ , and adding the LSSB moves it by at least 1 bit towards the most significant bit, there can't be more than $\log n$ parent indices.

As a conclusion, this implementation of update works in $\mathcal{O}(\log n)$ time, $\mathcal{O}(1)$ space.

§
Construction

Now that we have the core BIT operations in place, let's see how to build a BIT from an array.

§
Using `update`

The simplest method is to initialize an empty BIT (where all the partial sums are initially set to 0), and to call update for each element:

bit.cpp

RangeQueryPointUpdate(const vector<int>& A) {
    this->T = vector<int>(A.size(), 0);

    for (size_t i = 1; i < A.size(); i++) {
        update(i, A[i]);
    }
}

The time complexity is $n$ times the complexity of update, which equals $\mathcal{O}(n\log n)$ , but we can do better.

§
Linear time

Partial sums in BITs have two interesting properties:

Parent indices are greater than their children.
The partial sum at index i ends with the element at that index.

So you can update the array from left to right, adding the value of the current element to its parent, and keep on iterating. Each time you reach a new element, you have already accumulated the partial sums from its descendants (property 1), and the current element is the partial sum at that position (property 2):

bit.cpp

RangeQueryPointUpdate(vector<int>& A) {
    size_t n = A.size();

    for (size_t i = 1; i < n; i++) {
        size_t j = parentIndex(i);

        if (j < n) {
            A[j] += A[i];
        }
    }

    T = A;
}

The complexity is much better than the previous solution: $\mathcal{O}(n)$ time.

§
Constant space

Our constructors so far work in linear space because the reference to the input vector is only valid while inside the constructor. When you further assign this vector to the class member, all of its elements are copied in $\mathcal{O}(n)$ .

To eliminate the copy from the input vector to the class member, you can indicate to the compiler that it can move the ownership of the vector with the function std::move:

bit.cpp

RangeQueryPointUpdate(vector<int> A) {
    for (size_t i = 1; i < A.size(); i++) {
        size_t j = parentIndex(i);

        if (j < n) {
            A[j] += A[i];
        }
    }

    T = std::move(A);
}

The generated code will just copy a few bytes for the vector layout (address to the data, size, capacity), instead of copying all the elements over.

Because you can only move an object that you own, a copy is still happening when the input vector is provided to the constructor, unless the caller also allows the compiler to move it, using the same technique.

The final complexity for this implementation is $\mathcal{O}(n)$ time, $\mathcal{O}(1)$ space.

§
Index 0

The previous section ignored the element at index 0 because it makes the implementation simpler, but there are two ways to reintroduce this element:

Adding it back as a special case (the easiest and it's commonly how BITs are described).
Tweaking the formulas of nextIndex and parentIndex to make it work (more difficult).

Let's look at both.

§
Special case

The first solution is to handle index 0 as a special case.

This solution only requires minor tweaks to queryRange and update but this is nothing new. Looking back at the prefix sum solution, we handled index 0 similarly.

First, you have to include $T[0]$ in the computation of the prefix sums:

bit.cpp

size_t prefixSum(size_t i) {
    assert(i < T.size());

    int sum = T[0];

    while (i > 0) {
        sum += T[i];
        i = nextIndex(i);
    }

    return sum;
}

Then, you have to update queryRange, because calling queryRange(0, j) would call prefixSum(-1) which is undefined:

bit.cpp

int queryRange(size_t i, size_t j) {
    assert(i <= j);

    if (i == 0) {
        return prefixSum(j);
    }

    return prefixSum(j) - prefixSum(i - 1);
}

As for update, the first index is handled in a similar way:

bit.cpp

void update(size_t i, int d) {
    if (i == 0) {
        T[0] += d;
        return;
    }

    while (i < T.size()) {
        T[i] += d;
        i = parentIndex(i);
    }
}

For the constructor, you just have to the start iterating at index 0:

bit.cpp

RangeQueryPointUpdate(vector<int> A) {
    size_t n = A.size();

    for (int i = 0; i < n; i++) {
        int j = parentIndex(i);

        if (j < n) {
            A[j] += A[i];
        }
    }

    T = std::move(A);
}

§
Tweaked formulas

Alternatively, you could shift the representation of the BIT to include the index 0 as a regular partial sum.

index 0 as the first regular element of a BIT

As an exercise, try to find how you could adjust nextIndex and parentIndex to make that work (spoilers ahead).

zero-bit.cpp

class ZeroBIT {
	std::vector<int> T;

	static size_t nextIndex(size_t i) {
		return (i & (i + 1)) - 1;
	}

	static size_t parentIndex(size_t i) {
		return i | (i + 1);
	}

	size_t prefixSum(size_t i) {
		int sum = 0;

		while (i < T.size()) {
			sum += T[i];
			i = nextIndex(i);
		}

		return sum;
	}

public:
	ZeroBIT(size_t n) : T(n) {}

	ZeroBIT(std::vector<int> A) {
		size_t n = A.size();

		for (size_t i = 0; i < n; i++) {
			size_t j = parentIndex(i);

			if (j < n) {
				A[j] += A[i];
			}
		}

		T = std::move(A);
	}

	int queryRange(size_t i, size_t j) {
		assert(i <= j);

		if (i == 0) {
			return prefixSum(j);
		}

		return prefixSum(j) - prefixSum(i - 1);
	}

	void update(size_t i, int v) {
		while (i < T.size()) {
			T[i] += v;
			i = parentIndex(i);
		}
	}
};

§
Advanced modes of operation

We've only covered one mode of operation: range queries / point updates. But regular BITs are the building block for two other modes of operation.

§
Point queries / range updates

The point queries / range updates problem consists in implementing an array-like type $T$ that supports the following operations:

$\texttt{QUERY}(T, i)$ returns the value at index $i$ .
$\texttt{UPDATE-RANGE}(T, i, j, v)$ adds $v$ to all the values in the range $i..j$ .

Starting from a regular BIT (range queries / point updates), you could implement:

query(i) as queryRange(i, i),
updateRange(i, j, v) by calling update(k, v) for $k$ in $i..j.$

Unfortunately, the time complexity of updateRange would be $\mathcal{O}(n\log n)$ . The implementation from the naive solution can already provide a better $\mathcal{O}(n)$ time complexity for this method:

bit.cpp

void updateRange(size_t i, size_t j, int v) {
    assert(j < A.size());

    for (size_t k = i; k <= j; k++) {
        A[k] += v;
    }
}

It is helpful to step back and examine the values of query(i) after calling updateRange(2, 4, v) on an empty BIT:

\begin{array}{c|c|c|c|c|c|c} i & 0 & 1 & 2 & 3 & 4 & 5 & 6\\ \hline \texttt{query($i$)} & 0 & 0 & v & v & v & 0 & 0 \end{array}

In the introduction, you saw that prefix sums can shift a constant time complexity from updates to queries. Indeed, the previous sequence of query(i) values can be interpreted as prefix sums of the following array:

T_1 = [0,\ 0,\ v,\ 0,\ 0,\ -v,\ 0].

So you can implement this problem using prefix sums:

query(i) returns the $i$ th prefix sum of $T_1$ in $\mathcal{O}(n)$ time,
updateRange(i, j, v) adds $v$ to $T_1[i]$ and $-v$ to $T_1[j]$ in $\mathcal{O}(1)$ time.

The astute reader would have recognized the mode of operation of a regular BIT:

query(i) is the same as prefixSum(i),
updateRange(i, j, v) can be implemented as update(i, v) followed by update(j + 1, -v).

It follows the implementation of a point queries / range updates BIT from a range queries / point updates BIT:

bit.cpp

class PointQueryRangeUpdate {
	RangeQueryPointUpdate T1;
public:
	PointQueryRangeUpdate(size_t n) : T1(n) {}

	int query(size_t i) {
		return T1.prefixSum(i);
	}

	void updateRange(size_t i, size_t j, int v) {
		T1.update(i,      v);
		T1.update(j + 1, -v);
	}
};

Both of these methods take $\mathcal{O}(\log n)$ time, $\mathcal{O}(1)$ space.

Note

Looking at this implementation, you may be wondering why do we need another class instead of just implementing updateRange on a regular BIT. The reason is that it would be incompatible with the semantics of queryRange.

After calling updateRange(2, 4, v) on an empty BIT:

query(3) would return $v$ as expected (because it is implemented as prefixSum(3)),
queryRange(3, 3) would return 0 (because it is implemented as prefixSum(3) - prefixSum(2)).

Similarly, you wouldn't want keep update public because update(i, v) doesn't have the same semantics as updateRange(i, i, v), which is implemented as update(i, v) followed by update(i, -v).

A separate implementation makes it clear that the array-like datastructure exposed by the PointQueryRangeUpdate BIT T (which can be represented as $[0, 0, v, v, v, 0, 0]$ ) differs from the array-like datastructure exposed by the RangeQueryPointUpdate BIT T1 (which can be represented as [0, 0, v, 0, 0, -v, 0]).

That's why point queries / range updates is an entirely different mode of operation that answers to this specific problem. But you'll see in the next section that it is actually possible to implement a BIT that supports both range queries and range updates.

§
Range queries / range updates

The range queries / range updates problem consists in implementing an array-like type $T$ that supports the following operations:

$\texttt{QUERY-RANGE}(T, i, j)$ returns the sum of the values in the range $i..j,$
$\texttt{UPDATE-RANGE}(T, i, j, v)$ adds $v$ to all the values in the range $i..j.$

§
Example

The point queries / range updates BIT from the previous section only lacks support for range queries. Implementing queryRange by calling query for each element would give a $\mathcal{O}(n\log n)$ time complexity, so we need another representation of this BIT to allow efficient range queries.

After calling updateRange(2, 4, v), the BIT could be represented by the following array:

A = [0,\ 0,\ v,\ v,\ v,\ 0,\ 0].

Applying the same logic as in the introduction, we would like to leverage prefix sums to perform efficient range queries on $A$ , so let's look at $S$ , the prefix sum array of $A$ :

S = [0,\ 0,\ v,\ 2v,\ 3v,\ 3v,\ 3v].

$S$ would have to support an operation that looks like point queries if we ever hope to compute efficient range queries on $A$ , but it doesn't seem to have any of the properties that would make updates particularly efficient or that would make it suitable for being backed by a BIT.

The key insight is that you don't need an intermediate representation because you can almost express $S$ directly from $A$ by considering another array $B$ defined by $B[k] = k * A[k]$ , such that

\begin{align*} B &= [0,\ 0,\ 2v,\ 3v,\ 4v,\ 0,\ 0]. \end{align*}

This array doesn't quite match $S$ :

the values are shifted at the beginning,
the prefix sums are missing at the end.

But this is something you can easily fix by adding some corrective terms, collected in the following array:

\begin{align*} C & = [0,\ 0,\ -v,\ -v,\ -v,\ 3v,\ 3v]. \end{align*}

You can verify that $S = B + C$ :

\def\arraystretch{1.2}\begin{array}{rllrrrrrrr} B & = & [0, & 0, & 2v, & 3v, & 4v, & 0, & 0]\\ C & = & [0, & 0, & -v, & -v, & -v, & 3v, & 3v]\\ S = B + C & = & [0, & 0, & v, & 2v, & 3v, & 3v, & 3v] \end{array}

$C$ exhibits an interesting property: it is the prefix sum array of an array with only two non-zero elements:

T_2 = [0,\ 0,\ -v,\ 0,\ 0,\ 4v,\ 0].

Assuming we can generalize from this example:

We already have access to efficient point queries / point updates on $A$ thanks to $T_1$ , which allows point queries on $B$ with the same $\mathcal{O}(\log n)$ time complexity using the relation $B[k] = k * \texttt{query($k$)}$ .
We can use a regular BIT to represent $T_2$ , providing $\mathcal{O}(\log n)$ time range queries / point updates on $C$ .

That means we can implement efficient queries and updates for $S = B + C,$ which solves the problem of efficient range queries and updates on $A$ .

§
Generalization

Now that you get the intuition behind the solution, let's work through the general case. After updateRange(i, j, v),

A[k] = \begin{cases} 0 &\text{if}\ 0 \le k \le i,\\ v &\text{if}\ i \le k \le j,\\ 0 &\text{if}\ j < k < n. \end{cases}

Hence, the prefix sum array $S$ of $A$ is defined by

S[k] = \begin{cases} 0 &\text{if}\ 0 \le k \le i,\\ (k - i + 1)v &\text{if}\ i \le k \le j,\\ (j - i + 1)v &\text{if}\ j < k < n. \end{cases}

In an attempt to approximate $S$ , consider $B[k] = k * A[k]$ , which gives

B[k] = \begin{cases} 0 &\text{if } 0 \le k \le i,\\ kv &\text{if } i \le k \le j,\\ 0 &\text{if } j < k < n. \end{cases}

To verify $S[k] = B[k] + C[k]$ , the corrective terms array $C$ is defined by

C[k] = \begin{cases} 0 &\text{if } 0 \le k \le i,\\ (-i + 1)v &\text{if } i \le k \le j,\\ (j - i + 1)v &\text{if } j < k < n. \end{cases}

As a result, both $A$ and $C$ can be represented by point queries / range updates BITs:

$A$ can be represented by $T_1$ , as shown in the previous section. To compute $B[k]$ from $T_1$ , you just have to multiply the result of T1.prefixSum(k) by k.
$C$ can be represented by $T_2$ , another regular BIT updated as follows:
- T2.update(i, (-i + 1) * v)
- T2.update(j + 1, j * v)

§
Implementation

Range queries / range updates BITs are implemented using two range queries / point updates BITs:

bit.cpp

class RangeQueryRangeUpdate {
	RangeQueryPointUpdate T1;
	RangeQueryPointUpdate T2;

	int prefixSum(size_t i) {
		return i * T1.prefixSum(i) + T2.prefixSum(i);
	}

public:
	RangeQueryRangeUpdate(size_t n) : T1(n), T2(n) {}

	int queryRange(size_t i, size_t j) {
		if (i == 0) {
			return prefixSum(j);
		}
		return prefixSum(j) - prefixSum(i - 1);
	}

	void updateRange(size_t i, size_t j, int v) {
		T1.update(i,                v);
		T1.update(j + 1,           -v);
		T2.update(i,     (-i + 1) * v);
		T2.update(j + 1,        j * v);
	}
};

The complexity for queryRange and updateRange is $\mathcal{O}(\log n)$ time.

§
Going further

Here are common problems that can be solved using BITs:

2D range queries and updates on matrices. [GeeksForGeeks]
Counting the number of element inversions in an array. [GeeksForGeeks]
Maintaining the XOR of a range of elements instead of their sum. [GeeksForGeeks]
Order-statistic tree (BIT + common algorithm). [GeeksForGeeks]
Queue reconstruction by height (BIT + common algorithm). [LeetCode]
Maximum sum increasing subsequence (BIT + another data structure). [GeeksForGeeks]
The problem of arithmetic coding, that requires storing and updating cumulative frequencies, and lead to the discovery of BITs.

Some operation have more efficient solutions:

Implementation of updates with lower constant factor. [Wikipedia]
Binary search in $\mathcal{O}(\log n)$ instead of $\mathcal{O}(\log\log n)$ . [SOI, CodeForces]

Finally, you may be interested by segment trees. They are harder to implement, have a higher constant factor compared to BITs, but they provide a wider range of operations and also work with non-invertible functions (for example, you could compute the min/max of a subrange of elements).

§Introduction

§Naive solution

§Prefix sums

§A better solution

§Aggregating partial sums

§Making it scale

§Implementation

§Range queries

§Point updates

§Construction

§Using update

§Linear time

§Constant space

§Index 0

§Special case

§Tweaked formulas

§Advanced modes of operation

§Point queries / range updates

§Range queries / range updates

§Example

§Generalization

§Implementation

§Going further

§
Introduction

§
Naive solution

§
Prefix sums

§
A better solution

§
Aggregating partial sums

§
Making it scale

§
Implementation

§
Range queries

§
Point updates

§
Construction

§
Using `update`

§
Linear time

§
Constant space

§
Index 0

§
Special case

§
Tweaked formulas

§
Advanced modes of operation

§
Point queries / range updates

§
Range queries / range updates

§
Example

§
Generalization

§
Implementation

§
Going further