sections_fts
451 rows
This data as json, CSV (advanced)
Link | rowid ▼ | title | content | sections_fts | rank |
---|---|---|---|---|---|
1 | 1 | Plugins | Datasette's plugin system allows additional features to be implemented as Python code (or front-end JavaScript) which can be wrapped up in a separate Python package. The underlying mechanism uses pluggy . See the Datasette plugins directory for a list of existing plugins, or take a look at the datasette-plugin topic on GitHub. Things you can do with plugins include: Add visualizations to Datasette, for example datasette-cluster-map and datasette-vega . Make new custom SQL functions available for use within Datasette, for example datasette-haversine and datasette-jellyfish . Define custom output formats with custom extensions, for example datasette-atom and datasette-ics . Add template functions that can be called within your Jinja custom templates, for example datasette-render-markdown . Customize how database values are rendered in the Datasette interface, for example datasette-render-binary and datasette-pretty-json . Customize how Datasette's authentication and permissions systems work, for example datasette-auth-tokens and datasette-permissions-sql . | 1 | |
2 | 2 | Installing plugins | If a plugin has been packaged for distribution using setuptools you can use the plugin by installing it alongside Datasette in the same virtual environment or Docker container. You can install plugins using the datasette install command: datasette install datasette-vega You can uninstall plugins with datasette uninstall : datasette uninstall datasette-vega You can upgrade plugins with datasette install --upgrade or datasette install -U : datasette install -U datasette-vega This command can also be used to upgrade Datasette itself to the latest released version: datasette install -U datasette These commands are thin wrappers around pip install and pip uninstall , which ensure they run pip in the same virtual environment as Datasette itself. | 1 | |
3 | 3 | One-off plugins using --plugins-dir | You can also define one-off per-project plugins by saving them as plugin_name.py functions in a plugins/ folder and then passing that folder to datasette using the --plugins-dir option: datasette mydb.db --plugins-dir=plugins/ | 1 | |
4 | 4 | Deploying plugins using datasette publish | The datasette publish and datasette package commands both take an optional --install argument. You can use this one or more times to tell Datasette to pip install specific plugins as part of the process: datasette publish cloudrun mydb.db --install=datasette-vega You can use the name of a package on PyPI or any of the other valid arguments to pip install such as a URL to a .zip file: datasette publish cloudrun mydb.db \ --install=https://url-to-my-package.zip | 1 | |
5 | 5 | Seeing what plugins are installed | You can see a list of installed plugins by navigating to the /-/plugins page of your Datasette instance - for example: https://fivethirtyeight.datasettes.com/-/plugins You can also use the datasette plugins command: $ datasette plugins [ { "name": "datasette_json_html", "static": false, "templates": false, "version": "0.4.0" } ] [[[cog from datasette import cli from click.testing import CliRunner import textwrap, json cog.out("\n") result = CliRunner().invoke(cli.cli, ["plugins", "--all"]) # cog.out() with text containing newlines was unindenting for some reason cog.outl("If you run ``datasette plugins --all`` it will include default plugins that ship as part of Datasette::\n") plugins = [p for p in json.loads(result.output) if p["name"].startswith("datasette.")] indented = textwrap.indent(json.dumps(plugins, indent=4), " ") for line in indented.split("\n"): cog.outl(line) cog.out("\n\n") ]]] If you run datasette plugins --all it will include default plugins that ship as part of Datasette: [ { "name": "datasette.actor_auth_cookie", "static": false, "templates": false, "version": null, "hooks": [ "actor_from_request" ] }, { "name": "datasette.blob_renderer", "static": false, "templates": false, "version": null, "hooks": [ "register_output_renderer" ] }, { "name": "datasette.default_magic_parameters", "static": false, "templates": false, "version": null, "hooks": [ "register_magic_parameters" ] }, { "name": "datasette.default_menu_links", "static": false, "templates": false, "version": null, "hooks": [ "menu_links" ] }, { "name": "datasette.default_permissions", "static": false, "templ… | 1 | |
6 | 6 | Plugin configuration | Plugins can have their own configuration, embedded in a Metadata file. Configuration options for plugins live within a "plugins" key in that file, which can be included at the root, database or table level. Here is an example of some plugin configuration for a specific table: { "databases": { "sf-trees": { "tables": { "Street_Tree_List": { "plugins": { "datasette-cluster-map": { "latitude_column": "lat", "longitude_column": "lng" } } } } } } } This tells the datasette-cluster-map column which latitude and longitude columns should be used for a table called Street_Tree_List inside a database file called sf-trees.db . | 1 | |
7 | 7 | Secret configuration values | Any values embedded in metadata.json will be visible to anyone who views the /-/metadata page of your Datasette instance. Some plugins may need configuration that should stay secret - API keys for example. There are two ways in which you can store secret configuration values. As environment variables . If your secret lives in an environment variable that is available to the Datasette process, you can indicate that the configuration value should be read from that environment variable like so: { "plugins": { "datasette-auth-github": { "client_secret": { "$env": "GITHUB_CLIENT_SECRET" } } } } As values in separate files . Your secrets can also live in files on disk. To specify a secret should be read from a file, provide the full file path like this: { "plugins": { "datasette-auth-github": { "client_secret": { "$file": "/secrets/client-secret" } } } } If you are publishing your data using the datasette publish family of commands, you can use the --plugin-secret option to set these secrets at publish time. For example, using Heroku you might run the following command: $ datasette publish heroku my_database.db \ --name my-heroku-app-demo \ --install=datasette-auth-github \ --plugin-secret datasette-auth-github client_id your_client_id \ --plugin-secret datasette-auth-github client_secret your_client_secret This will set the necessary environment variables and add the following to the deployed metadata.json : { "plugins": { "datasette-auth-github": { "client_id": { "$env": "DATASETTE_AUTH_GITHUB_CLIENT_ID" }, "client_secret": { "$env": "DATASETTE_AUTH_GITHUB_CLIENT_SECRET" } } } } | 1 | |
8 | 8 | Facets | Datasette facets can be used to add a faceted browse interface to any database table. With facets, tables are displayed along with a summary showing the most common values in specified columns. These values can be selected to further filter the table. Facets can be specified in two ways: using query string parameters, or in metadata.json configuration for the table. | 1 | |
9 | 9 | Facets in query strings | To turn on faceting for specific columns on a Datasette table view, add one or more _facet=COLUMN parameters to the URL. For example, if you want to turn on facets for the city_id and state columns, construct a URL that looks like this: /dbname/tablename?_facet=state&_facet=city_id This works for both the HTML interface and the .json view. When enabled, facets will cause a facet_results block to be added to the JSON output, looking something like this: { "state": { "name": "state", "results": [ { "value": "CA", "label": "CA", "count": 10, "toggle_url": "http://...?_facet=city_id&_facet=state&state=CA", "selected": false }, { "value": "MI", "label": "MI", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&state=MI", "selected": false }, { "value": "MC", "label": "MC", "count": 1, "toggle_url": "http://...?_facet=city_id&_facet=state&state=MC", "selected": false } ], "truncated": false } "city_id": { "name": "city_id", "results": [ { "value": 1, "label": "San Francisco", "count": 6, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=1", "selected": false }, { "value": 2, "label": "Los Angeles", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=2", "selected": false }, { "value": 3, "label": "Detroit", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=3", "selected": false }, { "value": 4, "label": "Memnonia", "count": 1, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=4", "selected": false } ], "truncated": false } } If Datasette detects that a column is a foreign key… | 1 | |
10 | 10 | Facets in metadata.json | You can turn facets on by default for specific tables by adding them to a "facets" key in a Datasette Metadata file. Here's an example that turns on faceting by default for the qLegalStatus column in the Street_Tree_List table in the sf-trees database: { "databases": { "sf-trees": { "tables": { "Street_Tree_List": { "facets": ["qLegalStatus"] } } } } } Facets defined in this way will always be shown in the interface and returned in the API, regardless of the _facet arguments passed to the view. You can specify array or date facets in metadata using JSON objects with a single key of array or date and a value specifying the column, like this: { "facets": [ {"array": "tags"}, {"date": "created"} ] } | 1 | |
11 | 11 | Suggested facets | Datasette's table UI will suggest facets for the user to apply, based on the following criteria: For the currently filtered data are there any columns which, if applied as a facet... Will return 30 or less unique options Will return more than one unique option Will return less unique options than the total number of filtered rows And the query used to evaluate this criteria can be completed in under 50ms That last point is particularly important: Datasette runs a query for every column that is displayed on a page, which could get expensive - so to avoid slow load times it sets a time limit of just 50ms for each of those queries. This means suggested facets are unlikely to appear for tables with millions of records in them. | 1 | |
12 | 12 | Speeding up facets with indexes | The performance of facets can be greatly improved by adding indexes on the columns you wish to facet by. Adding indexes can be performed using the sqlite3 command-line utility. Here's how to add an index on the state column in a table called Food_Trucks : $ sqlite3 mydatabase.db SQLite version 3.19.3 2017-06-27 16:48:08 Enter ".help" for usage hints. sqlite> CREATE INDEX Food_Trucks_state ON Food_Trucks("state"); Or using the sqlite-utils command-line utility: $ sqlite-utils create-index mydatabase.db Food_Trucks state | 1 | |
13 | 13 | Facet by JSON array | If your SQLite installation provides the json1 extension (you can check using /-/versions ) Datasette will automatically detect columns that contain JSON arrays of values and offer a faceting interface against those columns. This is useful for modelling things like tags without needing to break them out into a new table. Example here: latest.datasette.io/fixtures/facetable?_facet_array=tags | 1 | |
14 | 14 | Facet by date | If Datasette finds any columns that contain dates in the first 100 values, it will offer a faceting interface against the dates of those values. This works especially well against timestamp values such as 2019-03-01 12:44:00 . Example here: latest.datasette.io/fixtures/facetable?_facet_date=created | 1 | |
15 | 15 | Testing plugins | We recommend using pytest to write automated tests for your plugins. If you use the template described in Starting an installable plugin using cookiecutter your plugin will start with a single test in your tests/ directory that looks like this: from datasette.app import Datasette import pytest @pytest.mark.asyncio async def test_plugin_is_installed(): datasette = Datasette(memory=True) response = await datasette.client.get("/-/plugins.json") assert response.status_code == 200 installed_plugins = {p["name"] for p in response.json()} assert ( "datasette-plugin-template-demo" in installed_plugins ) This test uses the datasette.client object to exercise a test instance of Datasette. datasette.client is a wrapper around the HTTPX Python library which can imitate HTTP requests using ASGI. This is the recommended way to write tests against a Datasette instance. This test also uses the pytest-asyncio package to add support for async def test functions running under pytest. You can install these packages like so: pip install pytest pytest-asyncio If you are building an installable package you can add them as test dependencies to your setup.py module like this: setup( name="datasette-my-plugin", # ... extras_require={"test": ["pytest", "pytest-asyncio"]}, tests_require=["datasette-my-plugin[test]"], ) You can then install the test dependencies like so: pip install -e '.[test]' Then run the tests using pytest like so: pytest | 1 | |
16 | 16 | Using pdb for errors thrown inside Datasette | If an exception occurs within Datasette itself during a test, the response returned to your plugin will have a response.status_code value of 500. You can add pdb=True to the Datasette constructor to drop into a Python debugger session inside your test run instead of getting back a 500 response code. This is equivalent to running the datasette command-line tool with the --pdb option. Here's what that looks like in a test function: def test_that_opens_the_debugger_or_errors(): ds = Datasette([db_path], pdb=True) response = await ds.client.get("/") If you use this pattern you will need to run pytest with the -s option to avoid capturing stdin/stdout in order to interact with the debugger prompt. | 1 | |
17 | 17 | Using pytest fixtures | Pytest fixtures can be used to create initial testable objects which can then be used by multiple tests. A common pattern for Datasette plugins is to create a fixture which sets up a temporary test database and wraps it in a Datasette instance. Here's an example that uses the sqlite-utils library to populate a temporary test database. It also sets the title of that table using a simulated metadata.json configuration: from datasette.app import Datasette import pytest import sqlite_utils @pytest.fixture(scope="session") def datasette(tmp_path_factory): db_directory = tmp_path_factory.mktemp("dbs") db_path = db_directory / "test.db" db = sqlite_utils.Database(db_path) db["dogs"].insert_all( [ {"id": 1, "name": "Cleo", "age": 5}, {"id": 2, "name": "Pancakes", "age": 4}, ], pk="id", ) datasette = Datasette( [db_path], metadata={ "databases": { "test": { "tables": { "dogs": {"title": "Some dogs"} } } } }, ) return datasette @pytest.mark.asyncio async def test_example_table_json(datasette): response = await datasette.client.get( "/test/dogs.json?_shape=array" ) assert response.status_code == 200 assert response.json() == [ {"id": 1, "name": "Cleo", "age": 5}, {"id": 2, "name": "Pancakes", "age": 4}, ] @pytest.mark.asyncio async def test_example_table_html(datasette): response = await datasette.client.get("/test/dogs") assert ">Some dogs</h1>" in response.text Here the datasette() function defines the fixture, which is than automatically passed to the two test functions based on pytest automatically matching their datasette function parameters. The @pytest.fixture(scope="session") line here ensures the fixture is reused for the full pytest execution session. This… | 1 | |
18 | 18 | Testing outbound HTTP calls with pytest-httpx | If your plugin makes outbound HTTP calls - for example datasette-auth-github or datasette-import-table - you may need to mock those HTTP requests in your tests. The pytest-httpx package is a useful library for mocking calls. It can be tricky to use with Datasette though since it mocks all HTTPX requests, and Datasette's own testing mechanism uses HTTPX internally. To avoid breaking your tests, you can return ["localhost"] from the non_mocked_hosts() fixture. As an example, here's a very simple plugin which executes an HTTP response and returns the resulting content: from datasette import hookimpl from datasette.utils.asgi import Response import httpx @hookimpl def register_routes(): return [ (r"^/-/fetch-url$", fetch_url), ] async def fetch_url(datasette, request): if request.method == "GET": return Response.html( """ <form action="/-/fetch-url" method="post"> <input type="hidden" name="csrftoken" value="{}"> <input name="url"><input type="submit"> </form>""".format( request.scope["csrftoken"]() ) ) vars = await request.post_vars() url = vars["url"] return Response.text(httpx.get(url).text) Here's a test for that plugin that mocks the HTTPX outbound request: from datasette.app import Datasette import pytest @pytest.fixture def non_mocked_hosts(): # This ensures httpx-mock will not affect Datasette's own # httpx calls made in the tests by datasette.client: return ["localhost"] async def test_outbound_http_call(httpx_mock): httpx_mock.add_response( url="https://www.example.com/", text="Hello world", ) datasette = Datasette([], memory=True) response = await datasette.client.post( "/-/fetch-url", data={"url": "https://www.example.com/"}, ) assert response.text == "Hello world" outbound_request = httpx_mock.get_request()… | 1 | |
19 | 19 | Binary data | SQLite tables can contain binary data in BLOB columns. Datasette includes special handling for these binary values. The Datasette interface detects binary values and provides a link to download their content, for example on https://latest.datasette.io/fixtures/binary_data Binary data is represented in .json exports using Base64 encoding. https://latest.datasette.io/fixtures/binary_data.json?_shape=array [ { "rowid": 1, "data": { "$base64": true, "encoded": "FRwCx60F/g==" } }, { "rowid": 2, "data": { "$base64": true, "encoded": "FRwDx60F/g==" } }, { "rowid": 3, "data": null } ] | 1 | |
20 | 20 | Linking to binary downloads | The .blob output format is used to return binary data. It requires a _blob_column= query string argument specifying which BLOB column should be downloaded, for example: https://latest.datasette.io/fixtures/binary_data/1.blob?_blob_column=data This output format can also be used to return binary data from an arbitrary SQL query. Since such queries do not specify an exact row, an additional ?_blob_hash= parameter can be used to specify the SHA-256 hash of the value that is being linked to. Consider the query select data from binary_data - demonstrated here . That page links to the binary value downloads. Those links look like this: https://latest.datasette.io/fixtures.blob?sql=select+data+from+binary_data&_blob_column=data&_blob_hash=f3088978da8f9aea479ffc7f631370b968d2e855eeb172bea7f6c7a04262bb6d These .blob links are also returned in the .csv exports Datasette provides for binary tables and queries, since the CSV format does not have a mechanism for representing binary data. | 1 | |
21 | 21 | Binary plugins | Several Datasette plugins are available that change the way Datasette treats binary data. datasette-render-binary modifies Datasette's default interface to show an automatic guess at what type of binary data is being stored, along with a visual representation of the binary value that displays ASCII strings directly in the interface. datasette-render-images detects common image formats and renders them as images directly in the Datasette interface. datasette-media allows Datasette interfaces to be configured to serve binary files from configured SQL queries, and includes the ability to resize images directly before serving them. | 1 | |
22 | 22 | Performance and caching | Datasette runs on top of SQLite, and SQLite has excellent performance. For small databases almost any query should return in just a few milliseconds, and larger databases (100s of MBs or even GBs of data) should perform extremely well provided your queries make sensible use of database indexes. That said, there are a number of tricks you can use to improve Datasette's performance. | 1 | |
23 | 23 | Immutable mode | If you can be certain that a SQLite database file will not be changed by another process you can tell Datasette to open that file in immutable mode . Doing so will disable all locking and change detection, which can result in improved query performance. This also enables further optimizations relating to HTTP caching, described below. To open a file in immutable mode pass it to the datasette command using the -i option: datasette -i data.db When you open a file in immutable mode like this Datasette will also calculate and cache the row counts for each table in that database when it first starts up, further improving performance. | 1 | |
24 | 24 | Using "datasette inspect" | Counting the rows in a table can be a very expensive operation on larger databases. In immutable mode Datasette performs this count only once and caches the results, but this can still cause server startup time to increase by several seconds or more. If you know that a database is never going to change you can precalculate the table row counts once and store then in a JSON file, then use that file when you later start the server. To create a JSON file containing the calculated row counts for a database, use the following: datasette inspect data.db --inspect-file=counts.json Then later you can start Datasette against the counts.json file and use it to skip the row counting step and speed up server startup: datasette -i data.db --inspect-file=counts.json You need to use the -i immutable mode against the database file here or the counts from the JSON file will be ignored. You will rarely need to use this optimization in every-day use, but several of the datasette publish commands described in Publishing data use this optimization for better performance when deploying a database file to a hosting provider. | 1 | |
25 | 25 | HTTP caching | If your database is immutable and guaranteed not to change, you can gain major performance improvements from Datasette by enabling HTTP caching. This can work at two different levels. First, it can tell browsers to cache the results of queries and serve future requests from the browser cache. More significantly, it allows you to run Datasette behind a caching proxy such as Varnish or use a cache provided by a hosted service such as Fastly or Cloudflare . This can provide incredible speed-ups since a query only needs to be executed by Datasette the first time it is accessed - all subsequent hits can then be served by the cache. Using a caching proxy in this way could enable a Datasette-backed visualization to serve thousands of hits a second while running Datasette itself on extremely inexpensive hosting. Datasette's integration with HTTP caches can be enabled using a combination of configuration options and query string arguments. The default_cache_ttl setting sets the default HTTP cache TTL for all Datasette pages. This is 5 seconds unless you change it - you can set it to 0 if you wish to disable HTTP caching entirely. You can also change the cache timeout on a per-request basis using the ?_ttl=10 query string parameter. This can be useful when you are working with the Datasette JSON API - you may decide that a specific query can be cached for a longer time, or maybe you need to set ?_ttl=0 for some requests for example if you are running a SQL order by random() query. | 1 | |
26 | 26 | datasette-hashed-urls | If you open a database file in immutable mode using the -i option, you can be assured that the content of that database will not change for the lifetime of the Datasette server. The datasette-hashed-urls plugin implements an optimization where your database is served with part of the SHA-256 hash of the database contents baked into the URL. A database at /fixtures will instead be served at /fixtures-aa7318b , and a year-long cache expiry header will be returned with those pages. This will then be cached by both browsers and caching proxies such as Cloudflare or Fastly, providing a potentially significant performance boost. To install the plugin, run the following: datasette install datasette-hashed-urls Prior to Datasette 0.61 hashed URL mode was a core Datasette feature, enabled using the hash_urls setting. This implementation has now been removed in favor of the datasette-hashed-urls plugin. Prior to Datasette 0.28 hashed URL mode was the default behaviour for Datasette, since all database files were assumed to be immutable and unchanging. From 0.28 onwards the default has been to treat database files as mutable unless explicitly configured otherwise. | 1 | |
27 | 27 | Getting started | 1 | ||
28 | 28 | Play with a live demo | The best way to experience Datasette for the first time is with a demo: global-power-plants.datasettes.com provides a searchable database of power plants around the world, using data from the World Resources Institude rendered using the datasette-cluster-map plugin. fivethirtyeight.datasettes.com shows Datasette running against over 400 datasets imported from the FiveThirtyEight GitHub repository . | 1 | |
29 | 29 | Follow a tutorial | Datasette has several tutorials to help you get started with the tool. Try one of the following: Exploring a database with Datasette shows how to use the Datasette web interface to explore a new database. Learn SQL with Datasette introduces SQL, and shows how to use that query language to ask questions of your data. | 1 | |
30 | 30 | Try Datasette without installing anything using Glitch | Glitch is a free online tool for building web apps directly from your web browser. You can use Glitch to try out Datasette without needing to install any software on your own computer. Here's a demo project on Glitch which you can use as the basis for your own experiments: glitch.com/~datasette-csvs Glitch allows you to "remix" any project to create your own copy and start editing it in your browser. You can remix the datasette-csvs project by clicking this button: Find a CSV file and drag it onto the Glitch file explorer panel - datasette-csvs will automatically convert it to a SQLite database (using sqlite-utils ) and allow you to start exploring it using Datasette. If your CSV file has a latitude and longitude column you can visualize it on a map by uncommenting the datasette-cluster-map line in the requirements.txt file using the Glitch file editor. Need some data? Try this Public Art Data for the city of Seattle - hit "Export" and select "CSV" to download it as a CSV file. For more on how this works, see Running Datasette on Glitch . | 1 | |
31 | 31 | Using Datasette on your own computer | First, follow the Installation instructions. Now you can run Datasette against a SQLite file on your computer using the following command: datasette path/to/database.db This will start a web server on port 8001 - visit http://localhost:8001/ to access the web interface. Add -o to open your browser automatically once Datasette has started: datasette path/to/database.db -o Use Chrome on OS X? You can run datasette against your browser history like so: datasette ~/Library/Application\ Support/Google/Chrome/Default/History --nolock The --nolock option ignores any file locks. This is safe as Datasette will open the file in read-only mode. Now visiting http://localhost:8001/History/downloads will show you a web interface to browse your downloads data: http://localhost:8001/History/downloads.json will return that data as JSON: { "database": "History", "columns": [ "id", "current_path", "target_path", "start_time", "received_bytes", "total_bytes", ... ], "rows": [ [ 1, "/Users/simonw/Downloads/DropboxInstaller.dmg", "/Users/simonw/Downloads/DropboxInstaller.dmg", 13097290269022132, 626688, 0, ... ] ] } http://localhost:8001/History/downloads.json?_shape=objects will return that data as JSON in a more convenient format: { ... "rows": [ { "start_time": 13097290269022132, "interrupt_reason": 0, "hash": "", "id": 1, "site_url": "", "referrer": "https://www.dropbox.com/downloading?src=index", ... } ] } | 1 | |
32 | 32 | datasette --get | The --get option can specify the path to a page within Datasette and cause Datasette to output the content from that path without starting the web server. This means that all of Datasette's functionality can be accessed directly from the command-line. For example: $ datasette --get '/-/versions.json' | jq . { "python": { "version": "3.8.5", "full": "3.8.5 (default, Jul 21 2020, 10:48:26) \n[Clang 11.0.3 (clang-1103.0.32.62)]" }, "datasette": { "version": "0.46+15.g222a84a.dirty" }, "asgi": "3.0", "uvicorn": "0.11.8", "sqlite": { "version": "3.32.3", "fts_versions": [ "FTS5", "FTS4", "FTS3" ], "extensions": { "json1": null }, "compile_options": [ "COMPILER=clang-11.0.3", "ENABLE_COLUMN_METADATA", "ENABLE_FTS3", "ENABLE_FTS3_PARENTHESIS", "ENABLE_FTS4", "ENABLE_FTS5", "ENABLE_GEOPOLY", "ENABLE_JSON1", "ENABLE_PREUPDATE_HOOK", "ENABLE_RTREE", "ENABLE_SESSION", "MAX_VARIABLE_NUMBER=250000", "THREADSAFE=1" ] } } The exit code will be 0 if the request succeeds and 1 if the request produced an HTTP status code other than 200 - e.g. a 404 or 500 error. This means you can use datasette --get / to run tests against a Datasette application in a continuous integration environment such as GitHub Actions. Running datasette without specifying a command runs the default command, datasette serve . See datasette serve --help for the full list of options for that command. | 1 | |
33 | 33 | Pages and API endpoints | The Datasette web application offers a number of different pages that can be accessed to explore the data in question, each of which is accompanied by an equivalent JSON API. | 1 | |
34 | 34 | Top-level index | The root page of any Datasette installation is an index page that lists all of the currently attached databases. Some examples: fivethirtyeight.datasettes.com global-power-plants.datasettes.com register-of-members-interests.datasettes.com Add /.json to the end of the URL for the JSON version of the underlying data: fivethirtyeight.datasettes.com/.json global-power-plants.datasettes.com/.json register-of-members-interests.datasettes.com/.json | 1 | |
35 | 35 | Database | Each database has a page listing the tables, views and canned queries available for that database. If the execute-sql permission is enabled (it's on by default) there will also be an interface for executing arbitrary SQL select queries against the data. Examples: fivethirtyeight.datasettes.com/fivethirtyeight global-power-plants.datasettes.com/global-power-plants The JSON version of this page provides programmatic access to the underlying data: fivethirtyeight.datasettes.com/fivethirtyeight.json global-power-plants.datasettes.com/global-power-plants.json | 1 | |
36 | 36 | Table | The table page is the heart of Datasette: it allows users to interactively explore the contents of a database table, including sorting, filtering, Full-text search and applying Facets . The HTML interface is worth spending some time exploring. As with other pages, you can return the JSON data by appending .json to the URL path, before any ? query string arguments. The query string arguments are described in more detail here: Table arguments You can also use the table page to interactively construct a SQL query - by applying different filters and a sort order for example - and then click the "View and edit SQL" link to see the SQL query that was used for the page and edit and re-submit it. Some examples: ../items lists all of the line-items registered by UK MPs as potential conflicts of interest. It demonstrates Datasette's support for Full-text search . ../antiquities-act%2Factions_under_antiquities_act is an interface for exploring the "actions under the antiquities act" data table published by FiveThirtyEight. ../global-power-plants?country_long=United+Kingdom&primary_fuel=Gas is a filtered table page showing every Gas power plant in the United Kingdom. It includes some default facets (configured using its metadata.json ) and uses the datasette-cluster-map plugin to show a map of the results. | 1 | |
37 | 37 | Row | Every row in every Datasette table has its own URL. This means individual records can be linked to directly. Table cells with extremely long text contents are truncated on the table view according to the truncate_cells_html setting. If a cell has been truncated the full length version of that cell will be available on the row page. Rows which are the targets of foreign key references from other tables will show a link to a filtered search for all records that reference that row. Here's an example from the Registers of Members Interests database: ../people/uk.org.publicwhip%2Fperson%2F10001 Note that this URL includes the encoded primary key of the record. Here's that same page as JSON: ../people/uk.org.publicwhip%2Fperson%2F10001.json | 1 | |
38 | 38 | The Datasette Ecosystem | Datasette sits at the center of a growing ecosystem of open source tools aimed at making it as easy as possible to gather, analyze and publish interesting data. These tools are divided into two main groups: tools for building SQLite databases (for use with Datasette) and plugins that extend Datasette's functionality. The Datasette project website includes a directory of plugins and a directory of tools: Plugins directory on datasette.io Tools directory on datasette.io | 1 | |
39 | 39 | sqlite-utils | sqlite-utils is a key building block for the wider Datasette ecosystem. It provides a collection of utilities for manipulating SQLite databases, both as a Python library and a command-line utility. Features include: Insert data into a SQLite database from JSON, CSV or TSV, automatically creating tables with the correct schema or altering existing tables to add missing columns. Configure tables for use with SQLite full-text search, including creating triggers needed to keep the search index up-to-date. Modify tables in ways that are not supported by SQLite's default ALTER TABLE syntax - for example changing the types of columns or selecting a new primary key for a table. Adding foreign keys to existing database tables. Extracting columns of data into a separate lookup table. | 1 | |
40 | 40 | Dogsheep | Dogsheep is a collection of tools for personal analytics using SQLite and Datasette. The project provides tools like github-to-sqlite and twitter-to-sqlite that can import data from different sources in order to create a personal data warehouse. Personal Data Warehouses: Reclaiming Your Data is a talk that explains Dogsheep and demonstrates it in action. | 1 | |
41 | 41 | Publishing data | Datasette includes tools for publishing and deploying your data to the internet. The datasette publish command will deploy a new Datasette instance containing your databases directly to a Heroku or Google Cloud hosting account. You can also use datasette package to create a Docker image that bundles your databases together with the datasette application that is used to serve them. | 1 | |
42 | 42 | datasette publish | Once you have created a SQLite database (e.g. using csvs-to-sqlite ) you can deploy it to a hosting account using a single command. You will need a hosting account with Heroku or Google Cloud . Once you have created your account you will need to install and configure the heroku or gcloud command-line tools. | 1 | |
43 | 43 | Publishing to Google Cloud Run | Google Cloud Run allows you to publish data in a scale-to-zero environment, so your application will start running when the first request is received and will shut down again when traffic ceases. This means you only pay for time spent serving traffic. Cloud Run is a great option for inexpensively hosting small, low traffic projects - but costs can add up for projects that serve a lot of requests. Be particularly careful if your project has tables with large numbers of rows. Search engine crawlers that index a page for every row could result in a high bill. The datasette-block-robots plugin can be used to request search engine crawlers omit crawling your site, which can help avoid this issue. You will first need to install and configure the Google Cloud CLI tools by following these instructions . You can then publish a database to Google Cloud Run using the following command: datasette publish cloudrun mydatabase.db --service=my-database A Cloud Run service is a single hosted application. The service name you specify will be used as part of the Cloud Run URL. If you deploy to a service name that you have used in the past your new deployment will replace the previous one. If you omit the --service option you will be asked to pick a service name interactively during the deploy. You may need to interact with prompts from the tool. Many of the prompts ask for values that can be set as properties for the Google Cloud SDK if you want to avoid the prompts. For example, the default region for the deployed instance can be set using the command: gcloud config set run/region us-central1 You should replace us-central1 with your desired region . Alternately, you can specify the region by setting the CLOUDSDK_RUN_REGION environment variable. … | 1 | |
44 | 44 | Publishing to Heroku | To publish your data using Heroku , first create an account there and install and configure the Heroku CLI tool . You can publish a database to Heroku using the following command: datasette publish heroku mydatabase.db This will output some details about the new deployment, including a URL like this one: https://limitless-reef-88278.herokuapp.com/ deployed to Heroku You can specify a custom app name by passing -n my-app-name to the publish command. This will also allow you to overwrite an existing app. See datasette publish heroku --help for the full list of options for this command. | 1 | |
45 | 45 | Publishing to Vercel | Vercel - previously known as Zeit Now - provides a layer over AWS Lambda to allow for quick, scale-to-zero deployment. You can deploy Datasette instances to Vercel using the datasette-publish-vercel plugin. pip install datasette-publish-vercel datasette publish vercel mydatabase.db --project my-database-project Not every feature is supported: consult the datasette-publish-vercel README for more details. | 1 | |
46 | 46 | Publishing to Fly | Fly is a competitively priced Docker-compatible hosting platform that supports running applications in globally distributed data centers close to your end users. You can deploy Datasette instances to Fly using the datasette-publish-fly plugin. pip install datasette-publish-fly datasette publish fly mydatabase.db --app="my-app" Consult the datasette-publish-fly README for more details. | 1 | |
47 | 47 | Custom metadata and plugins | datasette publish accepts a number of additional options which can be used to further customize your Datasette instance. You can define your own Metadata and deploy that with your instance like so: datasette publish cloudrun --service=my-service mydatabase.db -m metadata.json If you just want to set the title, license or source information you can do that directly using extra options to datasette publish : datasette publish cloudrun mydatabase.db --service=my-service \ --title="Title of my database" \ --source="Where the data originated" \ --source_url="http://www.example.com/" You can also specify plugins you would like to install. For example, if you want to include the datasette-vega visualization plugin you can use the following: datasette publish cloudrun mydatabase.db --service=my-service --install=datasette-vega If a plugin has any Secret configuration values you can use the --plugin-secret option to set those secrets at publish time. For example, using Heroku with datasette-auth-github you might run the following command: $ datasette publish heroku my_database.db \ --name my-heroku-app-demo \ --install=datasette-auth-github \ --plugin-secret datasette-auth-github client_id your_client_id \ --plugin-secret datasette-auth-github client_secret your_client_secret | 1 | |
48 | 48 | datasette package | If you have docker installed (e.g. using Docker for Mac ) you can use the datasette package command to create a new Docker image in your local repository containing the datasette app bundled together with your selected SQLite databases: datasette package mydatabase.db Here's example output for the package command: $ datasette package parlgov.db --extra-options="--setting sql_time_limit_ms 2500" Sending build context to Docker daemon 4.459MB Step 1/7 : FROM python:3 ---> 79e1dc9af1c1 Step 2/7 : COPY . /app ---> Using cache ---> cd4ec67de656 Step 3/7 : WORKDIR /app ---> Using cache ---> 139699e91621 Step 4/7 : RUN pip install datasette ---> Using cache ---> 340efa82bfd7 Step 5/7 : RUN datasette inspect parlgov.db --inspect-file inspect-data.json ---> Using cache ---> 5fddbe990314 Step 6/7 : EXPOSE 8001 ---> Using cache ---> 8e83844b0fed Step 7/7 : CMD datasette serve parlgov.db --port 8001 --inspect-file inspect-data.json --setting sql_time_limit_ms 2500 ---> Using cache ---> 1bd380ea8af3 Successfully built 1bd380ea8af3 You can now run the resulting container like so: docker run -p 8081:8001 1bd380ea8af3 This exposes port 8001 inside the container as port 8081 on your host machine, so you can access the application at http://localhost:8081/ You can customize the port that is exposed by the container using the --port option: datasette package mydatabase.db --port 8080 A full list of options can be seen by running datasette package --help : See datasette package --help for the full list of options for this command. | 1 | |
49 | 49 | Running SQL queries | Datasette treats SQLite database files as read-only and immutable. This means it is not possible to execute INSERT or UPDATE statements using Datasette, which allows us to expose SELECT statements to the outside world without needing to worry about SQL injection attacks. The easiest way to execute custom SQL against Datasette is through the web UI. The database index page includes a SQL editor that lets you run any SELECT query you like. You can also construct queries using the filter interface on the tables page, then click "View and edit SQL" to open that query in the custom SQL editor. Note that this interface is only available if the execute-sql permission is allowed. Any Datasette SQL query is reflected in the URL of the page, allowing you to bookmark them, share them with others and navigate through previous queries using your browser back button. You can also retrieve the results of any query as JSON by adding .json to the base URL. | 1 | |
50 | 50 | Named parameters | Datasette has special support for SQLite named parameters. Consider a SQL query like this: select * from Street_Tree_List where "PermitNotes" like :notes and "qSpecies" = :species If you execute this query using the custom query editor, Datasette will extract the two named parameters and use them to construct form fields for you to provide values. You can also provide values for these fields by constructing a URL: /mydatabase?sql=select...&species=44 SQLite string escaping rules will be applied to values passed using named parameters - they will be wrapped in quotes and their content will be correctly escaped. Values from named parameters are treated as SQLite strings. If you need to perform numeric comparisons on them you should cast them to an integer or float first using cast(:name as integer) or cast(:name as real) , for example: select * from Street_Tree_List where latitude > cast(:min_latitude as real) and latitude < cast(:max_latitude as real) Datasette disallows custom SQL queries containing the string PRAGMA (with a small number of exceptions ) as SQLite pragma statements can be used to change database settings at runtime. If you need to include the string "pragma" in a query you can do so safely using a named parameter. | 1 | |
51 | 51 | Views | If you want to bundle some pre-written SQL queries with your Datasette-hosted database you can do so in two ways. The first is to include SQL views in your database - Datasette will then list those views on your database index page. The quickest way to create views is with the SQLite command-line interface: $ sqlite3 sf-trees.db SQLite version 3.19.3 2017-06-27 16:48:08 Enter ".help" for usage hints. sqlite> CREATE VIEW demo_view AS select qSpecies from Street_Tree_List; <CTRL+D> | 1 | |
52 | 52 | Canned queries | As an alternative to adding views to your database, you can define canned queries inside your metadata.json file. Here's an example: { "databases": { "sf-trees": { "queries": { "just_species": { "sql": "select qSpecies from Street_Tree_List" } } } } } Then run Datasette like this: datasette sf-trees.db -m metadata.json Each canned query will be listed on the database index page, and will also get its own URL at: /database-name/canned-query-name For the above example, that URL would be: /sf-trees/just_species You can optionally include "title" and "description" keys to show a title and description on the canned query page. As with regular table metadata you can alternatively specify "description_html" to have your description rendered as HTML (rather than having HTML special characters escaped). | 1 | |
53 | 53 | Canned query parameters | Canned queries support named parameters, so if you include those in the SQL you will then be able to enter them using the form fields on the canned query page or by adding them to the URL. This means canned queries can be used to create custom JSON APIs based on a carefully designed SQL statement. Here's an example of a canned query with a named parameter: select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where neighborhood like '%' || :text || '%' order by neighborhood; In the canned query metadata (here Using YAML for metadata as metadata.yaml ) it looks like this: databases: fixtures: queries: neighborhood_search: sql: |- select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where neighborhood like '%' || :text || '%' order by neighborhood title: Search neighborhoods Here's the equivalent using JSON (as metadata.json ): { "databases": { "fixtures": { "queries": { "neighborhood_search": { "sql": "select neighborhood, facet_cities.name, state\nfrom facetable\n join facet_cities on facetable.city_id = facet_cities.id\nwhere neighborhood like '%' || :text || '%'\norder by neighborhood", "title": "Search neighborhoods" } } } } } Note that we are using SQLite string concatenation here - the || operator - to add wildcard % characters to the string provided by the user. You can try this canned query out here: https://latest.datasette.io/fixtures/neighborhood_search?text=town In this example the :text named parameter is automatically extracted from the query using a regular expression. … | 1 | |
54 | 54 | Additional canned query options | Additional options can be specified for canned queries in the YAML or JSON configuration. | 1 | |
55 | 55 | hide_sql | Canned queries default to displaying their SQL query at the top of the page. If the query is extremely long you may want to hide it by default, with a "show" link that can be used to make it visible. Add the "hide_sql": true option to hide the SQL query by default. | 1 | |
56 | 56 | fragment | Some plugins, such as datasette-vega , can be configured by including additional data in the fragment hash of the URL - the bit that comes after a # symbol. You can set a default fragment hash that will be included in the link to the canned query from the database index page using the "fragment" key. This example demonstrates both fragment and hide_sql : { "databases": { "fixtures": { "queries": { "neighborhood_search": { "sql": "select neighborhood, facet_cities.name, state\nfrom facetable join facet_cities on facetable.city_id = facet_cities.id\nwhere neighborhood like '%' || :text || '%' order by neighborhood;", "fragment": "fragment-goes-here", "hide_sql": true } } } } } See here for a demo of this in action. | 1 | |
57 | 57 | Writable canned queries | Canned queries by default are read-only. You can use the "write": true key to indicate that a canned query can write to the database. See Controlling access to specific canned queries for details on how to add permission checks to canned queries, using the "allow" key. { "databases": { "mydatabase": { "queries": { "add_name": { "sql": "INSERT INTO names (name) VALUES (:name)", "write": true } } } } } This configuration will create a page at /mydatabase/add_name displaying a form with a name field. Submitting that form will execute the configured INSERT query. You can customize how Datasette represents success and errors using the following optional properties: on_success_message - the message shown when a query is successful on_success_redirect - the path or URL the user is redirected to on success on_error_message - the message shown when a query throws an error on_error_redirect - the path or URL the user is redirected to on error For example: { "databases": { "mydatabase": { "queries": { "add_name": { "sql": "INSERT INTO names (name) VALUES (:name)", "write": true, "on_success_message": "Name inserted", "on_success_redirect": "/mydatabase/names", "on_error_message": "Name insert failed", "on_error_redirect": "/mydatabase" } } } } } You can use "p… | 1 | |
58 | 58 | Magic parameters | Named parameters that start with an underscore are special: they can be used to automatically add values created by Datasette that are not contained in the incoming form fields or query string. These magic parameters are only supported for canned queries: to avoid security issues (such as queries that extract the user's private cookies) they are not available to SQL that is executed by the user as a custom SQL query. Available magic parameters are: _actor_* - e.g. _actor_id , _actor_name Fields from the currently authenticated Actors . _header_* - e.g. _header_user_agent Header from the incoming HTTP request. The key should be in lower case and with hyphens converted to underscores e.g. _header_user_agent or _header_accept_language . _cookie_* - e.g. _cookie_lang The value of the incoming cookie of that name. _now_epoch The number of seconds since the Unix epoch. _now_date_utc The date in UTC, e.g. 2020-06-01 _now_datetime_utc The ISO 8601 datetime in UTC, e.g. 2020-06-24T18:01:07Z _random_chars_* - e.g. … | 1 | |
59 | 59 | JSON API for writable canned queries | Writable canned queries can also be accessed using a JSON API. You can POST data to them using JSON, and you can request that their response is returned to you as JSON. To submit JSON to a writable canned query, encode key/value parameters as a JSON document: POST /mydatabase/add_message {"message": "Message goes here"} You can also continue to submit data using regular form encoding, like so: POST /mydatabase/add_message message=Message+goes+here There are three options for specifying that you would like the response to your request to return JSON data, as opposed to an HTTP redirect to another page. Set an Accept: application/json header on your request Include ?_json=1 in the URL that you POST to Include "_json": 1 in your JSON body, or &_json=1 in your form encoded body The JSON response will look like this: { "ok": true, "message": "Query executed, 1 row affected", "redirect": "/data/add_name" } The "message" and "redirect" values here will take into account on_success_message , on_success_redirect , on_error_message and on_error_redirect , if they have been set. | 1 | |
60 | 60 | Pagination | Datasette's default table pagination is designed to be extremely efficient. SQL OFFSET/LIMIT pagination can have a significant performance penalty once you get into multiple thousands of rows, as each page still requires the database to scan through every preceding row to find the correct offset. When paginating through tables, Datasette instead orders the rows in the table by their primary key and performs a WHERE clause against the last seen primary key for the previous page. For example: select rowid, * from Tree_List where rowid > 200 order by rowid limit 101 This represents page three for this particular table, with a page size of 100. Note that we request 101 items in the limit clause rather than 100. This allows us to detect if we are on the last page of the results: if the query returns less than 101 rows we know we have reached the end of the pagination set. Datasette will only return the first 100 rows - the 101st is used purely to detect if there should be another page. Since the where clause acts against the index on the primary key, the query is extremely fast even for records that are a long way into the overall pagination set. | 1 | |
61 | 61 | Cross-database queries | SQLite has the ability to run queries that join across multiple databases. Up to ten databases can be attached to a single SQLite connection and queried together. Datasette can execute joins across multiple databases if it is started with the --crossdb option: datasette fixtures.db extra_database.db --crossdb If it is started in this way, the /_memory page can be used to execute queries that join across multiple databases. References to tables in attached databases should be preceded by the database name and a period. For example, this query will show a list of tables across both of the above databases: select 'fixtures' as database, * from [fixtures].sqlite_master union select 'extra_database' as database, * from [extra_database].sqlite_master Try that out here . | 1 | |
62 | 62 | JSON API | Datasette provides a JSON API for your SQLite databases. Anything you can do through the Datasette user interface can also be accessed as JSON via the API. To access the API for a page, either click on the .json link on that page or edit the URL and add a .json extension to it. If you started Datasette with the --cors option, each JSON endpoint will be served with the following additional HTTP headers: Access-Control-Allow-Origin: * Access-Control-Allow-Headers: Authorization Access-Control-Expose-Headers: Link This means JavaScript running on any domain will be able to make cross-origin requests to fetch the data. If you start Datasette without the --cors option only JavaScript running on the same domain as Datasette will be able to access the API. | 1 | |
63 | 63 | Different shapes | The default JSON representation of data from a SQLite table or custom query looks like this: { "database": "sf-trees", "table": "qSpecies", "columns": [ "id", "value" ], "rows": [ [ 1, "Myoporum laetum :: Myoporum" ], [ 2, "Metrosideros excelsa :: New Zealand Xmas Tree" ], [ 3, "Pinus radiata :: Monterey Pine" ] ], "truncated": false, "next": "100", "next_url": "http://127.0.0.1:8001/sf-trees-02c8ef1/qSpecies.json?_next=100", "query_ms": 1.9571781158447266 } The columns key lists the columns that are being returned, and the rows key then returns a list of lists, each one representing a row. The order of the values in each row corresponds to the columns. The _shape parameter can be used to access alternative formats for the rows key which may be more convenient for your application. There are three options: ?_shape=arrays - "rows" is the default option, shown above ?_shape=objects - "rows" is a list of JSON key/value objects ?_shape=array - an JSON array of objects ?_shape=array&_nl=on - a newline-separated list of JSON objects ?_shape=arrayfirst - a flat JSON array containing just the first value from each row ?_shape=object - a JSON object keyed using the primary keys of the rows _shape=objects looks like this: { "database": "sf-trees", ... "rows": [ { "id": 1, … | 1 | |
64 | 64 | Pagination | The default JSON representation includes a "next_url" key which can be used to access the next page of results. If that key is null or missing then it means you have reached the final page of results. Other representations include pagination information in the link HTTP header. That header will look something like this: link: <https://latest.datasette.io/fixtures/sortable.json?_next=d%2Cv>; rel="next" Here is an example Python function built using requests that returns a list of all of the paginated items from one of these API endpoints: def paginate(url): items = [] while url: response = requests.get(url) try: url = response.links.get("next").get("url") except AttributeError: url = None items.extend(response.json()) return items | 1 | |
65 | 65 | Special JSON arguments | Every Datasette endpoint that can return JSON also accepts the following query string arguments: ?_shape=SHAPE The shape of the JSON to return, documented above. ?_nl=on When used with ?_shape=array produces newline-delimited JSON objects. ?_json=COLUMN1&_json=COLUMN2 If any of your SQLite columns contain JSON values, you can use one or more _json= parameters to request that those columns be returned as regular JSON. Without this argument those columns will be returned as JSON objects that have been double-encoded into a JSON string value. Compare this query without the argument to this query using the argument ?_json_infinity=on If your data contains infinity or -infinity values, Datasette will replace them with None when returning them as JSON. If you pass _json_infinity=1 Datasette will instead return them as Infinity or -Infinity which is invalid JSON but can be processed by some custom JSON parsers. ?_timelimit=MS Sets a custom time limit for the query in ms. You can use this for optimistic queries where you would like Datasette to give up if the query takes too long, for example if you want to implement autocomplete search but only… | 1 | |
66 | 66 | Table arguments | The Datasette table view takes a number of special query string arguments. | 1 | |
67 | 67 | Column filter arguments | You can filter the data returned by the table based on column values using a query string argument. ?column__exact=value or ?_column=value Returns rows where the specified column exactly matches the value. ?column__not=value Returns rows where the column does not match the value. ?column__contains=value Rows where the string column contains the specified value ( column like "%value%" in SQL). ?column__endswith=value Rows where the string column ends with the specified value ( column like "%value" in SQL). ?column__startswith=value Rows where the string column starts with the specified value ( column like "value%" in SQL). ?column__gt=value Rows which are greater than the specified value. ?column__gte=value Rows which are greater than or equal to the specified value. ?column__lt=value Rows which are less than the specified value. … | 1 | |
68 | 68 | Special table arguments | ?_col=COLUMN1&_col=COLUMN2 List specific columns to display. These will be shown along with any primary keys. ?_nocol=COLUMN1&_nocol=COLUMN2 List specific columns to hide - any column not listed will be displayed. Primary keys cannot be hidden. ?_labels=on/off Expand foreign key references for every possible column. See below. ?_label=COLUMN1&_label=COLUMN2 Expand foreign key references for one or more specified columns. ?_size=1000 or ?_size=max Sets a custom page size. This cannot exceed the max_returned_rows limit passed to datasette serve . Use max to get max_returned_rows . ?_sort=COLUMN Sorts the results by the specified column. ?_sort_desc=COLUMN Sorts the results by the specified column in descending order. ?_search=keywords For SQLite tables that have been configured for full-text search executes a search … | 1 | |
69 | 69 | Expanding foreign key references | Datasette can detect foreign key relationships and resolve those references into labels. The HTML interface does this by default for every detected foreign key column - you can turn that off using ?_labels=off . You can request foreign keys be expanded in JSON using the _labels=on or _label=COLUMN special query string parameters. Here's what an expanded row looks like: [ { "rowid": 1, "TreeID": 141565, "qLegalStatus": { "value": 1, "label": "Permitted Site" }, "qSpecies": { "value": 1, "label": "Myoporum laetum :: Myoporum" }, "qAddress": "501X Baker St", "SiteOrder": 1 } ] The column in the foreign key table that is used for the label can be specified in metadata.json - see Specifying the label column for a table . | 1 | |
70 | 70 | Discovering the JSON for a page | Most of the HTML pages served by Datasette provide a mechanism for discovering their JSON equivalents using the HTML link mechanism. You can find this near the top of the source code of those pages, looking like this: <link rel="alternate" type="application/json+datasette" href="https://latest.datasette.io/fixtures/sortable.json"> The JSON URL is also made available in a Link HTTP header for the page: Link: https://latest.datasette.io/fixtures/sortable.json; rel="alternate"; type="application/json+datasette" | 1 | |
71 | 71 | Authentication and permissions | Datasette does not require authentication by default. Any visitor to a Datasette instance can explore the full data and execute read-only SQL queries. Datasette's plugin system can be used to add many different styles of authentication, such as user accounts, single sign-on or API keys. | 1 | |
72 | 72 | Actors | Through plugins, Datasette can support both authenticated users (with cookies) and authenticated API agents (via authentication tokens). The word "actor" is used to cover both of these cases. Every request to Datasette has an associated actor value, available in the code as request.actor . This can be None for unauthenticated requests, or a JSON compatible Python dictionary for authenticated users or API agents. The actor dictionary can be any shape - the design of that data structure is left up to the plugins. A useful convention is to include an "id" string, as demonstrated by the "root" actor below. Plugins can use the actor_from_request(datasette, request) hook to implement custom logic for authenticating an actor based on the incoming HTTP request. | 1 | |
73 | 73 | Using the "root" actor | Datasette currently leaves almost all forms of authentication to plugins - datasette-auth-github for example. The one exception is the "root" account, which you can sign into while using Datasette on your local machine. This provides access to a small number of debugging features. To sign in as root, start Datasette using the --root command-line option, like this: $ datasette --root http://127.0.0.1:8001/-/auth-token?token=786fc524e0199d70dc9a581d851f466244e114ca92f33aa3b42a139e9388daa7 INFO: Started server process [25801] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit) The URL on the first line includes a one-use token which can be used to sign in as the "root" actor in your browser. Click on that link and then visit http://127.0.0.1:8001/-/actor to confirm that you are authenticated as an actor that looks like this: { "id": "root" } | 1 | |
74 | 74 | Permissions | Datasette has an extensive permissions system built-in, which can be further extended and customized by plugins. The key question the permissions system answers is this: Is this actor allowed to perform this action , optionally against this particular resource ? Actors are described above . An action is a string describing the action the actor would like to perform. A full list is provided below - examples include view-table and execute-sql . A resource is the item the actor wishes to interact with - for example a specific database or table. Some actions, such as permissions-debug , are not associated with a particular resource. Datasette's built-in view permissions ( view-database , view-table etc) default to allow - unless you configure additional permission rules unauthenticated users will be allowed to access content. Permissions with potentially harmful effects should default to deny . Plugin authors should account for this when designing new plugins - for example, the datasette-upload-csvs plugin defaults to deny so that installations don't accidentally allow unauthenticated users to create new tables by uploading a CSV file. | 1 | |
75 | 75 | Defining permissions with "allow" blocks | The standard way to define permissions in Datasette is to use an "allow" block. This is a JSON document describing which actors are allowed to perform a permission. The most basic form of allow block is this ( allow demo , deny demo ): { "allow": { "id": "root" } } This will match any actors with an "id" property of "root" - for example, an actor that looks like this: { "id": "root", "name": "Root User" } An allow block can specify "deny all" using false ( demo ): { "allow": false } An "allow" of true allows all access ( demo ): { "allow": true } Allow keys can provide a list of values. These will match any actor that has any of those values ( allow demo , deny demo ): { "allow": { "id": ["simon", "cleopaws"] } } This will match any actor with an "id" of either "simon" or "cleopaws" . Actors can have properties that feature a list of values. These will be matched against the list of values in an allow block. Consider the following actor: { "id": "simon", "roles": ["staff", "developer"] } This allow block will provide access to any actor that has "developer" as one of their roles ( allow demo , deny demo ): { "allow": { "roles": ["developer"] } } Note that "roles" is not a concept that is baked into Datasette - it's a convention that plugins can choose to implement and act on. If you want to provide access to any actor with a value for a specific key, use "*" . For example, to match any logged-in user specify the following ( allow demo , deny demo ): { "allow": { "id": "*" } } You can specify that only unauthenticated actors (from anynomous HTTP requests) should be all… | 1 | |
76 | 76 | The /-/allow-debug tool | The /-/allow-debug tool lets you try out different "action" blocks against different "actor" JSON objects. You can try that out here: https://latest.datasette.io/-/allow-debug | 1 | |
77 | 77 | Configuring permissions in metadata.json | You can limit who is allowed to view different parts of your Datasette instance using "allow" keys in your Metadata configuration. You can control the following: Access to the entire Datasette instance Access to specific databases Access to specific tables and views Access to specific Canned queries If a user cannot access a specific database, they will not be able to access tables, views or queries within that database. If a user cannot access the instance they will not be able to access any of the databases, tables, views or queries. | 1 | |
78 | 78 | Controlling access to an instance | Here's how to restrict access to your entire Datasette instance to just the "id": "root" user: { "title": "My private Datasette instance", "allow": { "id": "root" } } To deny access to all users, you can use "allow": false : { "title": "My entirely inaccessible instance", "allow": false } One reason to do this is if you are using a Datasette plugin - such as datasette-permissions-sql - to control permissions instead. | 1 | |
79 | 79 | Controlling access to specific databases | To limit access to a specific private.db database to just authenticated users, use the "allow" block like this: { "databases": { "private": { "allow": { "id": "*" } } } } | 1 | |
80 | 80 | Controlling access to specific tables and views | To limit access to the users table in your bakery.db database: { "databases": { "bakery": { "tables": { "users": { "allow": { "id": "*" } } } } } } This works for SQL views as well - you can list their names in the "tables" block above in the same way as regular tables. Restricting access to tables and views in this way will NOT prevent users from querying them using arbitrary SQL queries, like this for example. If you are restricting access to specific tables you should also use the "allow_sql" block to prevent users from bypassing the limit with their own SQL queries - see Controlling the ability to execute arbitrary SQL . | 1 | |
81 | 81 | Controlling access to specific canned queries | Canned queries allow you to configure named SQL queries in your metadata.json that can be executed by users. These queries can be set up to both read and write to the database, so controlling who can execute them can be important. To limit access to the add_name canned query in your dogs.db database to just the root user : { "databases": { "dogs": { "queries": { "add_name": { "sql": "INSERT INTO names (name) VALUES (:name)", "write": true, "allow": { "id": ["root"] } } } } } } | 1 | |
82 | 82 | Controlling the ability to execute arbitrary SQL | The "allow_sql" block can be used to control who is allowed to execute arbitrary SQL queries, both using the form on the database page e.g. https://latest.datasette.io/fixtures or by appending a ?_where= parameter to the table page as seen on https://latest.datasette.io/fixtures/facetable?_where=city_id=1 . To enable just the root user to execute SQL for all databases in your instance, use the following: { "allow_sql": { "id": "root" } } To limit this ability for just one specific database, use this: { "databases": { "mydatabase": { "allow_sql": { "id": "root" } } } } | 1 | |
83 | 83 | Checking permissions in plugins | Datasette plugins can check if an actor has permission to perform an action using the datasette.permission_allowed(...) method. Datasette core performs a number of permission checks, documented below . Plugins can implement the permission_allowed(datasette, actor, action, resource) plugin hook to participate in decisions about whether an actor should be able to perform a specified action. | 1 | |
84 | 84 | actor_matches_allow() | Plugins that wish to implement this same "allow" block permissions scheme can take advantage of the datasette.utils.actor_matches_allow(actor, allow) function: from datasette.utils import actor_matches_allow actor_matches_allow({"id": "root"}, {"id": "*"}) # returns True The currently authenticated actor is made available to plugins as request.actor . | 1 | |
85 | 85 | The permissions debug tool | The debug tool at /-/permissions is only available to the authenticated root user (or any actor granted the permissions-debug action according to a plugin). It shows the thirty most recent permission checks that have been carried out by the Datasette instance. This is designed to help administrators and plugin authors understand exactly how permission checks are being carried out, in order to effectively configure Datasette's permission system. | 1 | |
86 | 86 | The ds_actor cookie | Datasette includes a default authentication plugin which looks for a signed ds_actor cookie containing a JSON actor dictionary. This is how the root actor mechanism works. Authentication plugins can set signed ds_actor cookies themselves like so: response = Response.redirect("/") response.set_cookie( "ds_actor", datasette.sign({"a": {"id": "cleopaws"}}, "actor"), ) Note that you need to pass "actor" as the namespace to .sign(value, namespace="default") . The shape of data encoded in the cookie is as follows: { "a": {... actor ...} } | 1 | |
87 | 87 | Including an expiry time | ds_actor cookies can optionally include a signed expiry timestamp, after which the cookies will no longer be valid. Authentication plugins may chose to use this mechanism to limit the lifetime of the cookie. For example, if a plugin implements single-sign-on against another source it may decide to set short-lived cookies so that if the user is removed from the SSO system their existing Datasette cookies will stop working shortly afterwards. To include an expiry, add a "e" key to the cookie value containing a base62-encoded integer representing the timestamp when the cookie should expire. For example, here's how to set a cookie that expires after 24 hours: import time from datasette.utils import baseconv expires_at = int(time.time()) + (24 * 60 * 60) response = Response.redirect("/") response.set_cookie( "ds_actor", datasette.sign( { "a": {"id": "cleopaws"}, "e": baseconv.base62.encode(expires_at), }, "actor", ), ) The resulting cookie will encode data that looks something like this: { "a": { "id": "cleopaws" }, "e": "1jjSji" } | 1 | |
88 | 88 | The /-/logout page | The page at /-/logout provides the ability to log out of a ds_actor cookie authentication session. | 1 | |
89 | 89 | Built-in permissions | This section lists all of the permission checks that are carried out by Datasette core, along with the resource if it was passed. | 1 | |
90 | 90 | view-instance | Top level permission - Actor is allowed to view any pages within this instance, starting at https://latest.datasette.io/ Default allow . | 1 | |
91 | 91 | view-database | Actor is allowed to view a database page, e.g. https://latest.datasette.io/fixtures resource - string The name of the database Default allow . | 1 | |
92 | 92 | view-database-download | Actor is allowed to download a database, e.g. https://latest.datasette.io/fixtures.db resource - string The name of the database Default allow . | 1 | |
93 | 93 | view-table | Actor is allowed to view a table (or view) page, e.g. https://latest.datasette.io/fixtures/complex_foreign_keys resource - tuple: (string, string) The name of the database, then the name of the table Default allow . | 1 | |
94 | 94 | view-query | Actor is allowed to view (and execute) a canned query page, e.g. https://latest.datasette.io/fixtures/pragma_cache_size - this includes executing Writable canned queries . resource - tuple: (string, string) The name of the database, then the name of the canned query Default allow . | 1 | |
95 | 95 | execute-sql | Actor is allowed to run arbitrary SQL queries against a specific database, e.g. https://latest.datasette.io/fixtures?sql=select+100 resource - string The name of the database Default allow . | 1 | |
96 | 96 | permissions-debug | Actor is allowed to view the /-/permissions debug page. Default deny . | 1 | |
97 | 97 | debug-menu | Controls if the various debug pages are displayed in the navigation menu. Default deny . | 1 | |
98 | 98 | Installation | If you just want to try Datasette out you don't need to install anything: see Try Datasette without installing anything using Glitch There are two main options for installing Datasette. You can install it directly on to your machine, or you can install it using Docker. If you want to start making contributions to the Datasette project by installing a copy that lets you directly modify the code, take a look at our guide to Setting up a development environment . Basic installation Datasette Desktop for Mac Using Homebrew Using pip Advanced installation options Using pipx Installing plugins using pipx Upgrading packages using pipx Using Docker Loading SpatiaLite Installing plugins | 1 | |
99 | 99 | Basic installation | 1 | ||
100 | 100 | Datasette Desktop for Mac | Datasette Desktop is a packaged Mac application which bundles Datasette together with Python and allows you to install and run Datasette directly on your laptop. This is the best option for local installation if you are not comfortable using the command line. | 1 |
Advanced export
JSON shape: default, array, newline-delimited
CREATE VIRTUAL TABLE [sections_fts] USING FTS5 ( [title], [content], tokenize='porter', content=[sections] );