Querying large collections in Cloud Firestore
2018-01-13Cloud Firestore is a NoSQL database optimized for storing large collections of small documents. It might not always be obvious how to effectively query Firestore collections if you, like me, come from a more traditional RDBMS background. So lets take a look at some common scenarios.
Realtime updates
The awesome realtime updates functionality means we can listen to changes to documents in a collection with onSnapshot method. The initial call contains added events for all existing documents that match the query, this might result in performance problems if we have thousands or more documents. The solution is to add a date criteria, see more: https://github.com/firebase/firebase-js-sdk/issues/265
For a concrete example, lets say you are building an app where users can rate books, and you are storing the rates in firebase.firestore().collection('rates') and you want new ratings to be seen in realtime for all connected users. See code example below:
//NOT SO GOOD
firebase.firestore().collection('rates')
.onSnapshot(snap => {
//snap.docs contains all documents in collection on initial call
})
// BETTER
firebase.firestore().collection('rates')
.where('createdAt', '>', new Date()) // added a date criteria
.onSnapshot(snap => {
//snap empty on initial call
if (snap.size > 0) {
console.log( snap.docs[0].data() ) //do something with subsequent events
}
})
Pagination and sorting
Lets say another requirement is to list ratings with paging, a totals count, and the possibility to change the order of items. Maybe the UI looks something like this:
The document structure looks like this for this example:
{
createdAt: "January 10, 2018 at 10:37:10 PM UTC+1", //timestamp type
rate: 7,
uid: "7DwKkQIqFYbU72Gbuzb9WVtvkR53",
userDisplayName: "Test User",
volumeId: "dEocBAAAQBAJ",
volumeTitle: "Php Architect's Guide to Php 5 Migration ",
image: "http://..."
}
Clearly you do not want to fetch all items in one request, it is better to fetch data in batches. This also has the advantage that you avoid pagination and sorting logic on the client side. With query cursors you can split data returned by a query into batches according to the parameters you define in your query. See more in the official documentation. This is very convenient when you want to paginate data and can't use simple values to define a start point in a dataset.
Totals count
There is currently no built-in way to get only the count of all documents in a collection, but it is possible to check the size of the querySnapshot, however this would also mean that all the documents are loaded and might result in performance problems:
firebase.firestore().collection('rates') .get() .then( function(querySnapshot) { console.log(querySnapshot.size); } );
Code example
Lets look at how we could write some code in JavaScript that query Firestore for ratings data and a ratings total count:
import * as firebase from 'firebase/app'
import 'firebase/firestore'
//helper to return a QuerySnapShot data as an array
export const snapToArray = snap => {
if (snap.empty) return []
return snap.docs.map(doc => doc.data())
}
//helper to get the last document in a QuerySnapShot
export const getLastDoc = snap => {
if (snap.empty) return null
return snap.docs[snap.docs.length - 1]
}
// assumes you have a document where rates total are incremented
export const getRatesCount = async () => {
const doc = firebase.firestore().collection('totals').doc('rates')
const count = await doc.get()
return count.data().numRates
}
//dynamically build query, param "doc" is the last visible document
export const getRates = (doc, orderBy, pagesize, dir = 'desc') => {
let query =
firebase.firestore().collection('rates')
.orderBy(orderBy, dir)
.limit(pagesize)
if (doc !== null) {
query = query.startAfter(doc)
}
return query.get()// returns result as a QuerySnapShot
}
The question is where to keep our state used for building the query, one approach is to avoid stateful custom made classes or variables and use the UI state mechanism available if you use a framework. Lets look at a ReactJS example where we have all state in a React component:
import React, { Component } from 'react'
import RatesRender from './RatesRender'
import { getRates, getRatesCount, snapToArray, getLastDoc } from './RatesQuery'
class Rates extends Component {
state = {
loading: true,
rates: [],
start: 0,
count: 0,
orderBy: 'createdAt',
pageSize: 10,
lastDoc: null
}
componentDidMount() {
this._getRates()
this._getRatesCount()
}
_getRatesCount = async () => {
const count = await getRatesCount()
this.setState({ count: count })
}
_getRates = async (startAfter = null, order = null) => {
// calculate new state
const { pageSize, orderBy, start } = this.state
const newOrderBy = order || orderBy
const newStart = startAfter === null ? 0 : start + pageSize
// get new data
const snap = await getRates(startAfter, newOrderBy, pageSize)
// re-render with new data
this.setState({
rates: snapToArray(snap),
lastDoc: getLastDoc(snap),
start: newStart,
loading: false,
orderBy: newOrderBy
})
}
setLoading = () => this.setState({ loading: true, rates: [] })
orderBy = orderBy => {
this.setLoading()
this._getRates(null, orderBy)
}
page = startAt => {
this.setLoading()
this._getRates(startAt || this.state.lastDoc) // sending null when displaying first in dataset
}
render() {
// filter out state 'lastDoc' not needed in rendering
const { lastDoc, ...passThroughState } = this.state
return (
<RatesRender
{...passThroughState} //send rest of state as props
onOrderBy={this.orderBy}
onPage={this.page} />
)
}
}
export default Rates
The markup could be separated from state handling and reside in a separate component:
import React from 'react'
// stateless function to handle rendering
const RatesRender = props =>
<div>
{props.loading && <span>loading...</span>}
{ //rendering code for paging omitted for brewity ...}
{props.rates.map(rate =>
<div key={r.uid + r.volumeId}>
{rate.volumeTitle}
//...etc
</div>
)}
</div>
RatesRender.propTypes = {
rates: T.array.isRequired,
count: T.number.isRequired,
pageSize: T.number.isRequired,
start: T.number.isRequired,
loading: T.bool.isRequired,
orderBy: T.string.isRequired,
onPage: T.func.isRequired,
onOrderBy: T.func.isRequired,
}
export default RatesRender
Some things to be aware of with this example:
- No error handling to handle api errors or similar. This is rather easy easily handled by wrapping api calls with try/catch and have an error property in our state.
- There is no loading indicator/handling for the total count api call. Also an initial re-render is done once when the count completes and once when the data fetch completes, this is good for percieved performance but might complicate rendering logic.
- The total count is not recalculated on paging and change of order by, this could lead to a mismatch if rates are submitted after the first render.
- We are essentially exposing the FireStore document property names in our UI logic/rendering. We could map the data to a custom object if this is a concern.
- You might want to prevent multiple click on pagination links while loading is in progress
Indexing
What about indexing? No problems, from the official docs:
All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes.
If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error.
The error message includes a direct link to create the missing index in the Firebase console.