docs
rowid | title | content | sections_fts | rank |
---|---|---|---|---|
1 | Events | Datasette includes a mechanism for tracking events that occur while the software is running. This is primarily intended to be used by plugins, which can both trigger events and listen for events. The core Datasette application triggers events when certain things happen. This page describes those events. Plugins can listen for events using the track_event(datasette, event) plugin hook, which will be called with instances of the following classes - or additional classes registered by other plugins . class datasette.events. LoginEvent actor : dict | None Event name: login A user (represented by event.actor ) has logged in. class datasette.events. LogoutEvent actor : dict | None Event name: logout A user (represented by event.actor ) has logged out. class datasette.events. CreateTokenEvent actor : dict | None expires_after : int | None restrict_all : list restrict_database : dict restrict_resource : dict Event name: create-token A user created an API token. Variables expires_after -- Number of seconds after which this token will expire. restrict_all -- Restricted permissions for this token. restrict_database -- Restricted database permissions for this token. … | 106 | |
2 | Facets | Datasette facets can be used to add a faceted browse interface to any database table. With facets, tables are displayed along with a summary showing the most common values in specified columns. These values can be selected to further filter the table. Here's an example : Facets can be specified in two ways: using query string parameters, or in metadata.json configuration for the table. | 106 | |
3 | Facets in query strings | To turn on faceting for specific columns on a Datasette table view, add one or more _facet=COLUMN parameters to the URL. For example, if you want to turn on facets for the city_id and state columns, construct a URL that looks like this: /dbname/tablename?_facet=state&_facet=city_id This works for both the HTML interface and the .json view. When enabled, facets will cause a facet_results block to be added to the JSON output, looking something like this: { "state": { "name": "state", "results": [ { "value": "CA", "label": "CA", "count": 10, "toggle_url": "http://...?_facet=city_id&_facet=state&state=CA", "selected": false }, { "value": "MI", "label": "MI", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&state=MI", "selected": false }, { "value": "MC", "label": "MC", "count": 1, "toggle_url": "http://...?_facet=city_id&_facet=state&state=MC", "selected": false } ], "truncated": false } "city_id": { "name": "city_id", "results": [ { "value": 1, "label": "San Francisco", "count": 6, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=1", "selected": false }, { "value": 2, "label": "Los Angeles", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=2", "selected": false }, { "value": 3, "label": "Detroit", "count": 4, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=3", "selected": false }, { "value": 4, "label": "Memnonia", "count": 1, "toggle_url": "http://...?_facet=city_id&_facet=state&city_id=4", "selected": false } ], "truncated": false } } If Datasette detect… | 106 | |
4 | Facets in metadata | You can turn facets on by default for specific tables by adding them to a "facets" key in a Datasette Metadata file. Here's an example that turns on faceting by default for the qLegalStatus column in the Street_Tree_List table in the sf-trees database: [[[cog from metadata_doc import metadata_example metadata_example(cog, { "databases": { "sf-trees": { "tables": { "Street_Tree_List": { "facets": ["qLegalStatus"] } } } } }) ]]] [[[end]]] Facets defined in this way will always be shown in the interface and returned in the API, regardless of the _facet arguments passed to the view. You can specify array or date facets in metadata using JSON objects with a single key of array or date and a value specifying the column, like this: [[[cog metadata_example(cog, { "facets": [ {"array": "tags"}, {"date": "created"} ] }) ]]] [[[end]]] You can change the default facet size (the number of results shown for each facet) for a table using facet_size : [[[cog metadata_example(cog, { "databases": { "sf-trees": { "tables": { "Street_Tree_List": { "facets": ["qLegalStatus"], "facet_size": 10 } } } } }) ]]] [[[end]]] | 106 | |
5 | Suggested facets | Datasette's table UI will suggest facets for the user to apply, based on the following criteria: For the currently filtered data are there any columns which, if applied as a facet... Will return 30 or less unique options Will return more than one unique option Will return less unique options than the total number of filtered rows And the query used to evaluate this criteria can be completed in under 50ms That last point is particularly important: Datasette runs a query for every column that is displayed on a page, which could get expensive - so to avoid slow load times it sets a time limit of just 50ms for each of those queries. This means suggested facets are unlikely to appear for tables with millions of records in them. | 106 | |
6 | Speeding up facets with indexes | The performance of facets can be greatly improved by adding indexes on the columns you wish to facet by. Adding indexes can be performed using the sqlite3 command-line utility. Here's how to add an index on the state column in a table called Food_Trucks : sqlite3 mydatabase.db SQLite version 3.19.3 2017-06-27 16:48:08 Enter ".help" for usage hints. sqlite> CREATE INDEX Food_Trucks_state ON Food_Trucks("state"); Or using the sqlite-utils command-line utility: sqlite-utils create-index mydatabase.db Food_Trucks state | 106 | |
7 | Facet by JSON array | If your SQLite installation provides the json1 extension (you can check using /-/versions ) Datasette will automatically detect columns that contain JSON arrays of values and offer a faceting interface against those columns. This is useful for modelling things like tags without needing to break them out into a new table. Example here: latest.datasette.io/fixtures/facetable?_facet_array=tags | 106 | |
8 | Facet by date | If Datasette finds any columns that contain dates in the first 100 values, it will offer a faceting interface against the dates of those values. This works especially well against timestamp values such as 2019-03-01 12:44:00 . Example here: latest.datasette.io/fixtures/facetable?_facet_date=created | 106 | |
9 | Full-text search | SQLite includes a powerful mechanism for enabling full-text search against SQLite records. Datasette can detect if a table has had full-text search configured for it in the underlying database and display a search interface for filtering that table. Here's an example search : Datasette automatically detects which tables have been configured for full-text search. | 106 | |
10 | The table page and table view API | Table views that support full-text search can be queried using the ?_search=TERMS query string parameter. This will run the search against content from all of the columns that have been included in the index. Try this example: fara.datasettes.com/fara/FARA_All_ShortForms?_search=manafort SQLite full-text search supports wildcards. This means you can easily implement prefix auto-complete by including an asterisk at the end of the search term - for example: /dbname/tablename/?_search=rob* This will return all records containing at least one word that starts with the letters rob . You can also run searches against just the content of a specific named column by using _search_COLNAME=TERMS - for example, this would search for just rows where the name column in the FTS index mentions Sarah : /dbname/tablename/?_search_name=Sarah | 106 | |
11 | Advanced SQLite search queries | SQLite full-text search includes support for a variety of advanced queries , including AND , OR , NOT and NEAR . By default Datasette disables these features to ensure they do not cause errors or confusion for users who are not aware of them. You can disable this escaping and use the advanced queries by adding &_searchmode=raw to the table page query string. If you want to enable these operators by default for a specific table, you can do so by adding "searchmode": "raw" to the metadata configuration for that table, see Configuring full-text search for a table or view . If that option has been specified in the table metadata but you want to over-ride it and return to the default behavior you can append &_searchmode=escaped to the query string. | 106 | |
12 | Configuring full-text search for a table or view | If a table has a corresponding FTS table set up using the content= argument to CREATE VIRTUAL TABLE shown below, Datasette will detect it automatically and add a search interface to the table page for that table. You can also manually configure which table should be used for full-text search using query string parameters or Metadata . You can set the associated FTS table for a specific table and you can also set one for a view - if you do that, the page for that SQL view will offer a search option. Use ?_fts_table=x to over-ride the FTS table for a specific page. If the primary key was something other than rowid you can use ?_fts_pk=col to set that as well. This is particularly useful for views, for example: https://latest.datasette.io/fixtures/searchable_view?_fts_table=searchable_fts&_fts_pk=pk The fts_table metadata property can be used to specify an associated FTS table. If the primary key column in your table which was used to populate the FTS table is something other than rowid , you can specify the column to use with the fts_pk property. The "searchmode": "raw" property can be used to default the table to accepting SQLite advanced search operators, as described in Advanced SQLite search queries . Here is an example which enables full-text search (with SQLite advanced search operators) for a display_ads view which is defined against the ads table and hence needs to run FTS against the ads_fts table, using the id as the primary key: [[[cog from metadata_doc import metadata_example metadata_example(cog, { "databases": { "russian-ads": { "tables": { "display_ads": { "fts_table": "ads_fts", "fts_pk": "id", "searchmode": "raw" } } } } }) ]]] [[[end]]] | 106 | |
13 | Searches using custom SQL | You can include full-text search results in custom SQL queries. The general pattern with SQLite search is to run the search as a sub-select that returns rowid values, then include those rowids in another part of the query. You can see the syntax for a basic search by running that search on a table page and then clicking "View and edit SQL" to see the underlying SQL. For example, consider this search for manafort is the US FARA database : /fara/FARA_All_ShortForms?_search=manafort If you click View and edit SQL you'll see that the underlying SQL looks like this: select rowid, Short_Form_Termination_Date, Short_Form_Date, Short_Form_Last_Name, Short_Form_First_Name, Registration_Number, Registration_Date, Registrant_Name, Address_1, Address_2, City, State, Zip from FARA_All_ShortForms where rowid in ( select rowid from FARA_All_ShortForms_fts where FARA_All_ShortForms_fts match escape_fts(:search) ) order by rowid limit 101 | 106 | |
14 | Enabling full-text search for a SQLite table | Datasette takes advantage of the external content mechanism in SQLite, which allows a full-text search virtual table to be associated with the contents of another SQLite table. To set up full-text search for a table, you need to do two things: Create a new FTS virtual table associated with your table Populate that FTS table with the data that you would like to be able to run searches against | 106 | |
15 | Configuring FTS using sqlite-utils | sqlite-utils is a CLI utility and Python library for manipulating SQLite databases. You can use it from Python code to configure FTS search, or you can achieve the same goal using the accompanying command-line tool . Here's how to use sqlite-utils to enable full-text search for an items table across the name and description columns: sqlite-utils enable-fts mydatabase.db items name description | 106 | |
16 | Configuring FTS using csvs-to-sqlite | If your data starts out in CSV files, you can use Datasette's companion tool csvs-to-sqlite to convert that file into a SQLite database and enable full-text search on specific columns. For a file called items.csv where you want full-text search to operate against the name and description columns you would run the following: csvs-to-sqlite items.csv items.db -f name -f description | 106 | |
17 | Configuring FTS by hand | We recommend using sqlite-utils , but if you want to hand-roll a SQLite full-text search table you can do so using the following SQL. To enable full-text search for a table called items that works against the name and description columns, you would run this SQL to create a new items_fts FTS virtual table: CREATE VIRTUAL TABLE "items_fts" USING FTS4 ( name, description, content="items" ); This creates a set of tables to power full-text search against items . The new items_fts table will be detected by Datasette as the fts_table for the items table. Creating the table is not enough: you also need to populate it with a copy of the data that you wish to make searchable. You can do that using the following SQL: INSERT INTO "items_fts" (rowid, name, description) SELECT rowid, name, description FROM items; If your table has columns that are foreign key references to other tables you can include that data in your full-text search index using a join. Imagine the items table has a foreign key column called category_id which refers to a categories table - you could create a full-text search table like this: CREATE VIRTUAL TABLE "items_fts" USING FTS4 ( name, description, category_name, content="items" ); And then populate it like this: INSERT INTO "items_fts" (rowid, name, description, category_name) SELECT items.rowid, items.name, items.description, categories.name FROM items JOIN categories ON items.category_id=categories.id; You can use this technique to populate the full-text search index from any combination of tables and joins that makes sense for your project. | 106 | |
18 | FTS versions | There are three different versions of the SQLite FTS module: FTS3, FTS4 and FTS5. You can tell which versions are supported by your instance of Datasette by checking the /-/versions page. FTS5 is the most advanced module but may not be available in the SQLite version that is bundled with your Python installation. Most importantly, FTS5 is the only version that has the ability to order by search relevance without needing extra code. If you can't be sure that FTS5 will be available, you should use FTS4. | 106 | |
19 | Plugin hooks | Datasette plugins use plugin hooks to customize Datasette's behavior. These hooks are powered by the pluggy plugin system. Each plugin can implement one or more hooks using the @hookimpl decorator against a function named that matches one of the hooks documented on this page. When you implement a plugin hook you can accept any or all of the parameters that are documented as being passed to that hook. For example, you can implement the render_cell plugin hook like this even though the full documented hook signature is render_cell(row, value, column, table, database, datasette) : @hookimpl def render_cell(value, column): if column == "stars": return "*" * int(value) List of plugin hooks prepare_connection(conn, database, datasette) prepare_jinja2_environment(env, datasette) Page extras extra_template_vars(template, database, table, columns, view_name, request, datasette) extra_css_urls(template, database, table, columns, view_name, request, datasette) extra_js_urls(template, database, table, columns, view_name, request, datasette) extra_body_script(template, database, table, columns, view_name, request, datasette) publish_subcommand(publish) render_cell(row, value, column, table, database, datasette, request) register_output_re… | 106 | |
20 | prepare_connection(conn, database, datasette) | conn - sqlite3 connection object The connection that is being opened database - string The name of the database datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) This hook is called when a new SQLite database connection is created. You can use it to register custom SQL functions , aggregates and collations. For example: from datasette import hookimpl import random @hookimpl def prepare_connection(conn): conn.create_function( "random_integer", 2, random.randint ) This registers a SQL function called random_integer which takes two arguments and can be called like this: select random_integer(1, 10); Examples: datasette-jellyfish , datasette-jq , datasette-haversine , datasette-rure | 106 | |
21 | prepare_jinja2_environment(env, datasette) | env - jinja2 Environment The template environment that is being prepared datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) This hook is called with the Jinja2 environment that is used to evaluate Datasette HTML templates. You can use it to do things like register custom template filters , for example: from datasette import hookimpl @hookimpl def prepare_jinja2_environment(env): env.filters["uppercase"] = lambda u: u.upper() You can now use this filter in your custom templates like so: Table name: {{ table|uppercase }} This function can return an awaitable function if it needs to run any async code. Examples: datasette-edit-templates | 106 | |
22 | Page extras | These plugin hooks can be used to affect the way HTML pages for different Datasette interfaces are rendered. | 106 | |
23 | extra_template_vars(template, database, table, columns, view_name, request, datasette) | Extra template variables that should be made available in the rendered template context. template - string The template that is being rendered, e.g. database.html database - string or None The name of the database, or None if the page does not correspond to a database (e.g. the root page) table - string or None The name of the table, or None if the page does not correct to a table columns - list of strings or None The names of the database columns that will be displayed on this page. None if the page does not contain a table. view_name - string The name of the view being displayed. ( index , database , table , and row are the most important ones.) request - Request object or None The current HTTP request. This can be None if the request object is not available. datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) This… | 106 | |
24 | extra_css_urls(template, database, table, columns, view_name, request, datasette) | This takes the same arguments as extra_template_vars(...) Return a list of extra CSS URLs that should be included on the page. These can take advantage of the CSS class hooks described in Custom pages and templates . This can be a list of URLs: from datasette import hookimpl @hookimpl def extra_css_urls(): return [ "https://stackpath.bootstrapcdn.com/bootstrap/4.1.0/css/bootstrap.min.css" ] Or a list of dictionaries defining both a URL and an SRI hash : @hookimpl def extra_css_urls(): return [ { "url": "https://stackpath.bootstrapcdn.com/bootstrap/4.1.0/css/bootstrap.min.css", "sri": "sha384-9gVQ4dYFwwWSjIDZnLEWnxCjeSWFphJiwGPXr1jddIhOegiu1FwO5qRGvFXOdJZ4", } ] This function can also return an awaitable function, useful if it needs to run any async code: @hookimpl def extra_css_urls(datasette): async def inner(): db = datasette.get_database() results = await db.execute( "select url from css_files" ) return [r[0] for r in results] return inner Examples: datasette-cluster-map , datasette-vega | 106 | |
25 | extra_js_urls(template, database, table, columns, view_name, request, datasette) | This takes the same arguments as extra_template_vars(...) This works in the same way as extra_css_urls() but for JavaScript. You can return a list of URLs, a list of dictionaries or an awaitable function that returns those things: from datasette import hookimpl @hookimpl def extra_js_urls(): return [ { "url": "https://code.jquery.com/jquery-3.3.1.slim.min.js", "sri": "sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo", } ] You can also return URLs to files from your plugin's static/ directory, if you have one: @hookimpl def extra_js_urls(): return ["/-/static-plugins/your-plugin/app.js"] Note that your-plugin here should be the hyphenated plugin name - the name that is displayed in the list on the /-/plugins debug page. If your code uses JavaScript modules you should include the "module": True key. See Custom CSS and JavaScript for more details. @hookimpl def extra_js_urls(): return [ { "url": "/-/static-plugins/your-plugin/app.js", "module": True, } ] Examples: datasette-cluster-map , datasette-vega | 106 | |
26 | extra_body_script(template, database, table, columns, view_name, request, datasette) | Extra JavaScript to be added to a <script> block at the end of the <body> element on the page. This takes the same arguments as extra_template_vars(...) The template , database , table and view_name options can be used to return different code depending on which template is being rendered and which database or table are being processed. The datasette instance is provided primarily so that you can consult any plugin configuration options that may have been set, using the datasette.plugin_config(plugin_name) method documented above. This function can return a string containing JavaScript, or a dictionary as described below, or a function or awaitable function that returns a string or dictionary. Use a dictionary if you want to specify that the code should be placed in a <script type="module">...</script> element: @hookimpl def extra_body_script(): return { "module": True, "script": "console.log('Your JavaScript goes here...')", } This will add the following to the end of your page: <script type="module">console.log('Your JavaScript goes here...')</script> Example: datasette-cluster-map | 106 | |
27 | publish_subcommand(publish) | publish - Click publish command group The Click command group for the datasette publish subcommand This hook allows you to create new providers for the datasette publish command. Datasette uses this hook internally to implement the default cloudrun and heroku subcommands, so you can read their source to see examples of this hook in action. Let's say you want to build a plugin that adds a datasette publish my_hosting_provider --api_key=xxx mydatabase.db publish command. Your implementation would start like this: from datasette import hookimpl from datasette.publish.common import ( add_common_publish_arguments_and_options, ) import click @hookimpl def publish_subcommand(publish): @publish.command() @add_common_publish_arguments_and_options @click.option( "-k", "--api_key", help="API key for talking to my hosting provider", ) def my_hosting_provider( files, metadata, extra_options, branch, template_dir, plugins_dir, static, install, plugin_secret, version_note, secret, title, license, license_url, source, source_url, about, about_url, api_key, ): ... Examples: datasette-publish-fly , datasette-publish-vercel | 106 | |
28 | render_cell(row, value, column, table, database, datasette, request) | Lets you customize the display of values within table cells in the HTML table view. row - sqlite.Row The SQLite row object that the value being rendered is part of value - string, integer, float, bytes or None The value that was loaded from the database column - string The name of the column being rendered table - string or None The name of the table - or None if this is a custom SQL query database - string The name of the database datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. request - Request object The current request object If your hook returns None , it will be ignored. Use this to indicate that your hook is not able to custom render this particular value. If the hook returns a string, that string will be rendered in the table cell. If you want to return HTML markup you can do so by returning a jinja2.Markup object. You can also return an awaitable function which returns a value. Datasette will loop through… | 106 | |
29 | register_output_renderer(datasette) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) Registers a new output renderer, to output data in a custom format. The hook function should return a dictionary, or a list of dictionaries, of the following shape: @hookimpl def register_output_renderer(datasette): return { "extension": "test", "render": render_demo, "can_render": can_render_demo, # Optional } This will register render_demo to be called when paths with the extension .test (for example /database.test , /database/table.test , or /database/table/row.test ) are requested. render_demo is a Python function. It can be a regular function or an async def render_demo() awaitable function, depending on if it needs to make any asynchronous calls. can_render_demo is a Python function (or async def function) which accepts the same arguments as render_demo but just returns True or False . It lets Datasette know if the current SQL query can be represented by the plugin - and hence influence if a link to this output format is displayed in the user interface. If you omit the "can_render" key from the dictionary every query will be treated as being supported by the plugin. When a request is received, the "render" callback function is called with zero or more of the following arguments. Datasette will inspect your callback function and pass arguments that match its function signature. datasette - Datasette class For accessing plugin configuration and executing queries. columns - list of strings The names of the columns … | 106 | |
30 | register_routes(datasette) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) Register additional view functions to execute for specified URL routes. Return a list of (regex, view_function) pairs, something like this: from datasette import hookimpl, Response import html async def hello_from(request): name = request.url_vars["name"] return Response.html( "Hello from {}".format(html.escape(name)) ) @hookimpl def register_routes(): return [(r"^/hello-from/(?P<name>.*)$", hello_from)] The view functions can take a number of different optional arguments. The corresponding argument will be passed to your function depending on its named parameters - a form of dependency injection. The optional view function arguments are as follows: datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. request - Request object The current HTTP request. scope - dictionary The incoming ASGI scope dictionary. send - function The ASGI send function. receive - function The ASGI receive function. The view funct… | 106 | |
31 | register_commands(cli) | cli - the root Datasette Click command group Use this to register additional CLI commands Register additional CLI commands that can be run using datsette yourcommand ... . This provides a mechanism by which plugins can add new CLI commands to Datasette. This example registers a new datasette verify file1.db file2.db command that checks if the provided file paths are valid SQLite databases: from datasette import hookimpl import click import sqlite3 @hookimpl def register_commands(cli): @cli.command() @click.argument( "files", type=click.Path(exists=True), nargs=-1 ) def verify(files): "Verify that files can be opened by Datasette" for file in files: conn = sqlite3.connect(str(file)) try: conn.execute("select * from sqlite_master") except sqlite3.DatabaseError: raise click.ClickException( "Invalid database: {}".format(file) ) The new command can then be executed like so: datasette verify fixtures.db Help text (from the docstring for the function plus any defined Click arguments or options) will become available using: datasette verify --help Plugins can register multiple commands by making multiple calls to the @cli.command() decorator. Consult the Click documentation for full details on how to build a CLI command, including how to define arguments and options. Note that register_commands() plugins cannot used with the --plugins-dir mechanism - they need to be installed into the same virtual environment as Datasette using pip install . Provided it has a setup.py file (see Packaging a plugin ) you can run pip install directly against the directory in which you are developing your plugin like so: pip install -e path/t… | 106 | |
32 | register_facet_classes() | Return a list of additional Facet subclasses to be registered. The design of this plugin hook is unstable and may change. See issue 830 . Each Facet subclass implements a new type of facet operation. The class should look like this: class SpecialFacet(Facet): # This key must be unique across all facet classes: type = "special" async def suggest(self): # Use self.sql and self.params to suggest some facets suggested_facets = [] suggested_facets.append( { "name": column, # Or other unique name # Construct the URL that will enable this facet: "toggle_url": self.ds.absolute_url( self.request, path_with_added_args( self.request, {"_facet": column} ), ), } ) return suggested_facets async def facet_results(self): # This should execute the facet operation and return results, again # using self.sql and self.params as the starting point facet_results = [] facets_timed_out = [] facet_size = self.get_facet_size() # Do some calculations here... for column in columns_selected_for_facet: try: facet_results_values = [] # More calculations... facet_results_values.append( { "value": value, "label": label, "count": count, "toggle_url": self.ds.absolute_url( self.request, toggle_path ), "selected": selected, } ) facet_results.append( { "name": column, "results": facet_results_values, "trunc… | 106 | |
33 | register_permissions(datasette) | If your plugin needs to register additional permissions unique to that plugin - upload-csvs for example - you can return a list of those permissions from this hook. from datasette import hookimpl, Permission @hookimpl def register_permissions(datasette): return [ Permission( name="upload-csvs", abbr=None, description="Upload CSV files", takes_database=True, takes_resource=False, default=False, ) ] The fields of the Permission class are as follows: name - string The name of the permission, e.g. upload-csvs . This should be unique across all plugins that the user might have installed, so choose carefully. abbr - string or None An abbreviation of the permission, e.g. uc . This is optional - you can set it to None if you do not want to pick an abbreviation. Since this needs to be unique across all installed plugins it's best not to specify an abbreviation at all. If an abbreviation is provided it will be used when creating restricted signed API tokens. description - string or None A human-readable description of what the permission lets you do. Should make sense as the second part of a sentence that starts "A user with this permission can ...". takes_database - boolean True if this permission can be granted on a per-database basis, False if it is only valid at the overall Datasette instance level. takes_resource - boolean … | 106 | |
34 | asgi_wrapper(datasette) | Return an ASGI middleware wrapper function that will be applied to the Datasette ASGI application. This is a very powerful hook. You can use it to manipulate the entire Datasette response, or even to configure new URL routes that will be handled by your own custom code. You can write your ASGI code directly against the low-level specification, or you can use the middleware utilities provided by an ASGI framework such as Starlette . This example plugin adds a x-databases HTTP header listing the currently attached databases: from datasette import hookimpl from functools import wraps @hookimpl def asgi_wrapper(datasette): def wrap_with_databases_header(app): @wraps(app) async def add_x_databases_header( scope, receive, send ): async def wrapped_send(event): if event["type"] == "http.response.start": original_headers = ( event.get("headers") or [] ) event = { "type": event["type"], "status": event["status"], "headers": original_headers + [ [ b"x-databases", ", ".join( datasette.databases.keys() ).encode("utf-8"), ] ], } await send(event) await app(scope, receive, wrapped_send) return add_x_databases_header return wrap_with_databases_header Examples: datasette-cors , datasette-pyinstrument , datasette-total-page-time | 106 | |
35 | startup(datasette) | This hook fires when the Datasette application server first starts up. Here is an example that validates required plugin configuration. The server will fail to start and show an error if the validation check fails: @hookimpl def startup(datasette): config = datasette.plugin_config("my-plugin") or {} assert ( "required-setting" in config ), "my-plugin requires setting required-setting" You can also return an async function, which will be awaited on startup. Use this option if you need to execute any database queries, for example this function which creates the my_table database table if it does not yet exist: @hookimpl def startup(datasette): async def inner(): db = datasette.get_database() if "my_table" not in await db.table_names(): await db.execute_write( """ create table my_table (mycol text) """ ) return inner Potential use-cases: Run some initialization code for the plugin Create database tables that a plugin needs on startup Validate the configuration for a plugin on startup, and raise an error if it is invalid If you are writing unit tests for a plugin that uses this hook and doesn't exercise Datasette by sending any simulated requests through it you will need to explicitly call await ds.invoke_startup() in your tests. An example: @pytest.mark.asyncio async def test_my_plugin(): ds = Datasette() await ds.invoke_startup() # Rest of test goes here Examples: datasette-saved-queries , datasette-init | 106 | |
36 | canned_queries(datasette, database, actor) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. database - string The name of the database. actor - dictionary or None The currently authenticated actor . Use this hook to return a dictionary of additional canned query definitions for the specified database. The return value should be the same shape as the JSON described in the canned query documentation. from datasette import hookimpl @hookimpl def canned_queries(datasette, database): if database == "mydb": return { "my_query": { "sql": "select * from my_table where id > :min_id" } } The hook can alternatively return an awaitable function that returns a list. Here's an example that returns queries that have been stored in the saved_queries database table, if one exists: from datasette import hookimpl @hookimpl def canned_queries(datasette, database): async def inner(): db = datasette.get_database(database) if await db.table_exists("saved_queries"): results = await db.execute( "select name, sql from saved_queries" ) return { result["name"]: {"sql": result["sql"]} for result in results } return inner The actor parameter can be used to include the currently authenticated actor in your decision. Here's an example that returns saved queries that were saved by that actor: from datasette import hookimpl @hookimpl def canned_queries… | 106 | |
37 | actor_from_request(datasette, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. request - Request object The current HTTP request. This is part of Datasette's authentication and permissions system . The function should attempt to authenticate an actor (either a user or an API actor of some sort) based on information in the request. If it cannot authenticate an actor, it should return None . Otherwise it should return a dictionary representing that actor. Here's an example that authenticates the actor based on an incoming API key: from datasette import hookimpl import secrets SECRET_KEY = "this-is-a-secret" @hookimpl def actor_from_request(datasette, request): authorization = ( request.headers.get("authorization") or "" ) expected = "Bearer {}".format(SECRET_KEY) if secrets.compare_digest(authorization, expected): return {"id": "bot"} If you install this in your plugins directory you can test it like this: curl -H 'Authorization: Bearer this-is-a-secret' http://localhost:8003/-/actor.json Instead of returning a dictionary, this function can return an awaitable function which itself returns either None or a dictionary. This is useful for authentication functions that need to make a database query - for example: from datasette import hookimpl @hookimpl def actor_from_request(datasette, request): async def inner(): token = request.args.get("_token") if not token: return None # Look up ?_token=xxx in sessions table result = await datasette.get_database().execute( "select count(*) from sessions where … | 106 | |
38 | actors_from_ids(datasette, actor_ids) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor_ids - list of strings or integers The actor IDs to look up. The hook must return a dictionary that maps the incoming actor IDs to their full dictionary representation. Some plugins that implement social features may store the ID of the actor that performed an action - added a comment, bookmarked a table or similar - and then need a way to resolve those IDs into display-friendly actor dictionaries later on. The await datasette.actors_from_ids(actor_ids) internal method can be used to look up actors from their IDs. It will dispatch to the first plugin that implements this hook. Unlike other plugin hooks, this only uses the first implementation of the hook to return a result. You can expect users to only have a single plugin installed that implements this hook. If no plugin is installed, Datasette defaults to returning actors that are just {"id": actor_id} . The hook can return a dictionary or an awaitable function that then returns a dictionary. This example implementation returns actors from a database table: from datasette import hookimpl @hookimpl def actors_from_ids(datasette, actor_ids): db = datasette.get_database("actors") async def inner(): sql = "select id, name from actors where id in ({})".format( ", ".join("?" for _ in actor_ids) ) actors = {} for row in (await db.execute(sql, actor_ids)).rows: actor = dict(row) actors[actor["id"]] = actor return actors return inner The returned dictionary fro… | 106 | |
39 | jinja2_environment_from_request(datasette, request, env) | datasette - Datasette class A Datasette instance. request - Request object or None The current HTTP request, if one is available. env - Environment The Jinja2 environment that will be used to render the current page. This hook can be used to return a customized Jinja environment based on the incoming request. If you want to run a single Datasette instance that serves different content for different domains, you can do so like this: from datasette import hookimpl from jinja2 import ChoiceLoader, FileSystemLoader @hookimpl def jinja2_environment_from_request(request, env): if request and request.host == "www.niche-museums.com": return env.overlay( loader=ChoiceLoader( [ FileSystemLoader( "/mnt/niche-museums/templates" ), env.loader, ] ), enable_async=True, ) return env This uses the Jinja overlay() method to create a new environment identical to the default environment except for having a different template loader, which first looks in the /mnt/niche-museums/templates directory before falling back on the default loader. | 106 | |
40 | filters_from_request(request, database, table, datasette) | request - Request object The current HTTP request. database - string The name of the database. table - string The name of the table. datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. This hook runs on the table page, and can influence the where clause of the SQL query used to populate that page, based on query string arguments on the incoming request. The hook should return an instance of datasette.filters.FilterArguments which has one required and three optional arguments: return FilterArguments( where_clauses=["id > :max_id"], params={"max_id": 5}, human_descriptions=["max_id is greater than 5"], extra_context={}, ) The arguments to the FilterArguments class constructor are as follows: where_clauses - list of strings, required A list of SQL fragments that will be inserted into the SQL query, joined by the and operator. These can include :named parameters which will be populated using data in params . params - dictionary, optional Additional keyword arguments to be used when the query is executed. These should match any :arguments in the where clauses. … | 106 | |
41 | permission_allowed(datasette, actor, action, resource) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary The current actor, as decided by actor_from_request(datasette, request) . action - string The action to be performed, e.g. "edit-table" . resource - string or None An identifier for the individual resource, e.g. the name of the table. Called to check that an actor has permission to perform an action on a resource. Can return True if the action is allowed, False if the action is not allowed or None if the plugin does not have an opinion one way or the other. Here's an example plugin which randomly selects if a permission should be allowed or denied, except for view-instance which always uses the default permission scheme instead. from datasette import hookimpl import random @hookimpl def permission_allowed(action): if action != "view-instance": # Return True or False at random return random.random() > 0.5 # Returning None falls back to default permissions This function can alternatively return an awaitable function which itself returns True , False or None . You can use this option if you need to execute additional database queries using await datasette.execute(...) . Here's an example that allows users to view the admin_log table only if their actor id is present in the admin_users table. It aso disallows arbitrary SQL queries for the staff… | 106 | |
42 | register_magic_parameters(datasette) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . Magic parameters can be used to add automatic parameters to canned queries . This plugin hook allows additional magic parameters to be defined by plugins. Magic parameters all take this format: _prefix_rest_of_parameter . The prefix indicates which magic parameter function should be called - the rest of the parameter is passed as an argument to that function. To register a new function, return it as a tuple of (string prefix, function) from this hook. The function you register should take two arguments: key and request , where key is the rest_of_parameter portion of the parameter and request is the current Request object . This example registers two new magic parameters: :_request_http_version returning the HTTP version of the current request, and :_uuid_new which returns a new UUID: from datasette import hookimpl from uuid import uuid4 def uuid(key, request): if key == "new": return str(uuid4()) else: raise KeyError def request(key, request): if key == "http_version": return request.scope["http_version"] else: raise KeyError @hookimpl def register_magic_parameters(datasette): return [ ("request", request), ("uuid", uuid), ] | 106 | |
43 | forbidden(datasette, request, message) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to render templates or execute SQL queries. request - Request object The current HTTP request. message - string A message hinting at why the request was forbidden. Plugins can use this to customize how Datasette responds when a 403 Forbidden error occurs - usually because a page failed a permission check, see Permissions . If a plugin hook wishes to react to the error, it should return a Response object . This example returns a redirect to a /-/login page: from datasette import hookimpl from urllib.parse import urlencode @hookimpl def forbidden(request, message): return Response.redirect( "/-/login?=" + urlencode({"message": message}) ) The function can alternatively return an awaitable function if it needs to make any asynchronous method calls. This example renders a template: from datasette import hookimpl, Response @hookimpl def forbidden(datasette): async def inner(): return Response.html( await datasette.render_template( "render_message.html", request=request ) ) return inner | 106 | |
44 | handle_exception(datasette, request, exception) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to render templates or execute SQL queries. request - Request object The current HTTP request. exception - Exception The exception that was raised. This hook is called any time an unexpected exception is raised. You can use it to record the exception. If your handler returns a Response object it will be returned to the client in place of the default Datasette error page. The handler can return a response directly, or it can return return an awaitable function that returns a response. This example logs an error to Sentry and then renders a custom error page: from datasette import hookimpl, Response import sentry_sdk @hookimpl def handle_exception(datasette, exception): sentry_sdk.capture_exception(exception) async def inner(): return Response.html( await datasette.render_template( "custom_error.html", request=request ) ) return inner Example: datasette-sentry | 106 | |
45 | skip_csrf(datasette, scope) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. scope - dictionary The ASGI scope for the incoming HTTP request. This hook can be used to skip CSRF protection for a specific incoming request. For example, you might have a custom path at /submit-comment which is designed to accept comments from anywhere, whether or not the incoming request originated on the site and has an accompanying CSRF token. This example will disable CSRF protection for that specific URL path: from datasette import hookimpl @hookimpl def skip_csrf(scope): return scope["path"] == "/submit-comment" If any of the currently active skip_csrf() plugin hooks return True , CSRF protection will be skipped for the request. | 106 | |
46 | get_metadata(datasette, key, database, table) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . actor - dictionary or None The currently authenticated actor . database - string or None The name of the database metadata is being asked for. table - string or None The name of the table. key - string or None The name of the key for which data is being asked for. This hook is responsible for returning a dictionary corresponding to Datasette Metadata . This function is passed the database , table and key which were passed to the upstream internal request for metadata. Regardless, it is important to return a global metadata object, where "databases": [] would be a top-level key. The dictionary returned here, will be merged with, and overwritten by, the contents of the physical metadata.yaml if one is present. The design of this plugin hook does not currently provide a mechanism for interacting with async code, and may change in the future. See issue 1384 . @hookimpl def get_metadata(datasette, key, database, table): metadata = { "title": "This will be the Datasette landing page title!", "description": get_instance_description(datasette), "databases": [], } for db_name, db_data_dict in get_my_database_meta( datasette, database, table, key ): … | 106 | |
47 | menu_links(datasette, actor, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . request - Request object or None The current HTTP request. This can be None if the request object is not available. This hook allows additional items to be included in the menu displayed by Datasette's top right menu icon. The hook should return a list of {"href": "...", "label": "..."} menu items. These will be added to the menu. It can alternatively return an async def awaitable function which returns a list of menu items. This example adds a new menu item but only if the signed in user is "root" : from datasette import hookimpl @hookimpl def menu_links(datasette, actor): if actor and actor.get("id") == "root": return [ { "href": datasette.urls.path( "/-/edit-schema" ), "label": "Edit schema", }, ] Using datasette.urls here ensures that links in the menu will take the base_url setting into account. Examples: datasette-search-all , datasette-graphql | 106 | |
48 | Action hooks | Action hooks can be used to add items to the action menus that appear at the top of different pages within Datasette. Unlike menu_links() , actions which are displayed on every page, actions should only be relevant to the page the user is currently viewing. Each of these hooks should return return a list of {"href": "...", "label": "..."} menu items, with optional "description": "..." keys describing each action in more detail. They can alternatively return an async def awaitable function which, when called, returns a list of those menu items. | 106 | |
49 | table_actions(datasette, actor, database, table, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . database - string The name of the database. table - string The name of the table. request - Request object or None The current HTTP request. This can be None if the request object is not available. This example adds a new table action if the signed in user is "root" : from datasette import hookimpl @hookimpl def table_actions(datasette, actor, database, table): if actor and actor.get("id") == "root": return [ { "href": datasette.urls.path( "/-/edit-schema/{}/{}".format( database, table ) ), "label": "Edit schema for this table", "description": "Add, remove, rename or alter columns for this table.", } ] Example: datasette-graphql | 106 | |
50 | view_actions(datasette, actor, database, view, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . database - string The name of the database. view - string The name of the SQL view. request - Request object or None The current HTTP request. This can be None if the request object is not available. Like table_actions(datasette, actor, database, table, request) but for SQL views. | 106 | |
51 | query_actions(datasette, actor, database, query_name, request, sql, params) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . database - string The name of the database. query_name - string or None The name of the canned query, or None if this is an arbitrary SQL query. request - Request object The current HTTP request. sql - string The SQL query being executed params - dictionary The parameters passed to the SQL query, if any. Populates a "Query actions" menu on the canned query and arbitrary SQL query pages. This example adds a new query action linking to a page for explaining a query: from datasette import hookimpl import urllib @hookimpl def query_actions(datasette, database, query_name, sql): # Don't explain an explain if sql.lower().startswith("explain"): return return [ { "href": datasette.u… | 106 | |
52 | row_actions(datasette, actor, request, database, table, row) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . request - Request object or None The current HTTP request. database - string The name of the database. table - string The name of the table. row - sqlite.Row The SQLite row object being displayed on the page. Return links for the "Row actions" menu shown at the top of the row page. This example displays the row in JSON plus some additional debug information if the user is signed in: from datasette import hookimpl @hookimpl def row_actions(datasette, database, table, actor, row): if actor: return [ { "href": datasette.urls.instance(), "label": f"Row details for {actor['id']}", "description": json.dumps( dict(row), default=repr ), }, ] Example: datasette-enrichments | 106 | |
53 | database_actions(datasette, actor, database, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . database - string The name of the database. request - Request object The current HTTP request. Populates an actions menu on the database page. This example adds a new database action for creating a table, if the user has the edit-schema permission: from datasette import hookimpl @hookimpl def database_actions(datasette, actor, database): async def inner(): if not await datasette.permission_allowed( actor, "edit-schema", resource=database, default=False, ): return [] return [ { "href": datasette.urls.path( "/-/edit-schema/{}/-/create".format( database ) ), "label": "Create a table", } ] return inner Example: datasette-graphql , datasette-edit-schema | 106 | |
54 | homepage_actions(datasette, actor, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) , or to execute SQL queries. actor - dictionary or None The currently authenticated actor . request - Request object The current HTTP request. Populates an actions menu on the top-level index homepage of the Datasette instance. This example adds a link an imagined tool for editing the homepage, only for signed in users: from datasette import hookimpl @hookimpl def homepage_actions(datasette, actor): if actor: return [ { "href": datasette.urls.path( "/-/customize-homepage" ), "label": "Customize homepage", } ] | 106 | |
55 | Template slots | The following set of plugin hooks can be used to return extra HTML content that will be inserted into the corresponding page, directly below the <h1> heading. Multiple plugins can contribute content here. The order in which it is displayed can be controlled using Pluggy's call time order options . Each of these plugin hooks can return either a string or an awaitable function that returns a string. | 106 | |
56 | top_homepage(datasette, request) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. Returns HTML to be displayed at the top of the Datasette homepage. | 106 | |
57 | top_database(datasette, request, database) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. database - string The name of the database. Returns HTML to be displayed at the top of the database page. | 106 | |
58 | top_table(datasette, request, database, table) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. database - string The name of the database. table - string The name of the table. Returns HTML to be displayed at the top of the table page. | 106 | |
59 | top_row(datasette, request, database, table, row) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. database - string The name of the database. table - string The name of the table. row - sqlite.Row The SQLite row object being displayed. Returns HTML to be displayed at the top of the row page. | 106 | |
60 | top_query(datasette, request, database, sql) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. database - string The name of the database. sql - string The SQL query. Returns HTML to be displayed at the top of the query results page. | 106 | |
61 | top_canned_query(datasette, request, database, query_name) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . request - Request object The current HTTP request. database - string The name of the database. query_name - string The name of the canned query. Returns HTML to be displayed at the top of the canned query page. | 106 | |
62 | Event tracking | Datasette includes an internal mechanism for tracking notable events. This can be used for analytics, but can also be used by plugins that want to listen out for when key events occur (such as a table being created) and take action in response. Plugins can register to receive events using the track_event plugin hook. They can also define their own events for other plugins to receive using the register_events() plugin hook , combined with calls to the datasette.track_event() internal method . | 106 | |
63 | track_event(datasette, event) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . event - Event Information about the event, represented as an instance of a subclass of the Event base class. This hook will be called any time an event is tracked by code that calls the datasette.track_event(...) internal method. The event object will always have the following properties: name : a string representing the name of the event, for example logout or create-table . actor : a dictionary representing the actor that triggered the event, or None if the event was not triggered by an actor. created : a datatime.datetime object in the timezone.utc timezone representing the time the event object was created. Other properties on the event will be available depending on the type of event. You can also access those as a dictionary using event.properties() . The events fired by Datasette core are documented here . This example plugin logs details of all events to standard error: from datasette import hookimpl import json import sys @hookimpl def track_event(event): name = event.name actor = event.actor properties = event.properties() msg = json.dumps( { "name": name, "actor": actor, "properties": properties, } ) print(msg, file=sys.stderr, flush=True) T… | 106 | |
64 | register_events(datasette) | datasette - Datasette class You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name) . This hook should return a list of Event subclasses that represent custom events that the plugin might send to the datasette.track_event() method. This example registers event subclasses for ban-user and unban-user events: from dataclasses import dataclass from datasette import hookimpl, Event @dataclass class BanUserEvent(Event): name = "ban-user" user: dict @dataclass class UnbanUserEvent(Event): name = "unban-user" user: dict @hookimpl def register_events(): return [BanUserEvent, UnbanUserEvent] The plugin can then call datasette.track_event(...) to send a ban-user event: await datasette.track_event( BanUserEvent(user={"id": 1, "username": "cleverbot"}) ) | 106 | |
65 | Installation | If you just want to try Datasette out you don't need to install anything: see Try Datasette without installing anything using Glitch There are two main options for installing Datasette. You can install it directly on to your machine, or you can install it using Docker. If you want to start making contributions to the Datasette project by installing a copy that lets you directly modify the code, take a look at our guide to Setting up a development environment . Basic installation Datasette Desktop for Mac Using Homebrew Using pip Advanced installation options Using pipx Installing plugins using pipx Upgrading packages using pipx Using Docker Loading SpatiaLite Installing plugins A note about extensions | 106 | |
66 | Basic installation | 106 | ||
67 | Datasette Desktop for Mac | Datasette Desktop is a packaged Mac application which bundles Datasette together with Python and allows you to install and run Datasette directly on your laptop. This is the best option for local installation if you are not comfortable using the command line. | 106 | |
68 | Using Homebrew | If you have a Mac and use Homebrew , you can install Datasette by running this command in your terminal: brew install datasette This should install the latest version. You can confirm by running: datasette --version You can upgrade to the latest Homebrew packaged version using: brew upgrade datasette Once you have installed Datasette you can install plugins using the following: datasette install datasette-vega If the latest packaged release of Datasette has not yet been made available through Homebrew, you can upgrade your Homebrew installation in-place using: datasette install -U datasette | 106 | |
69 | Using pip | Datasette requires Python 3.8 or higher. The Python.org Python For Beginners page has instructions for getting started. You can install Datasette and its dependencies using pip : pip install datasette You can now run Datasette like so: datasette | 106 | |
70 | Advanced installation options | 106 | ||
71 | Using pipx | pipx is a tool for installing Python software with all of its dependencies in an isolated environment, to ensure that they will not conflict with any other installed Python software. If you use Homebrew on macOS you can install pipx like this: brew install pipx pipx ensurepath Without Homebrew you can install it like so: python3 -m pip install --user pipx python3 -m pipx ensurepath The pipx ensurepath command configures your shell to ensure it can find commands that have been installed by pipx - generally by making sure ~/.local/bin has been added to your PATH . Once pipx is installed you can use it to install Datasette like this: pipx install datasette Then run datasette --version to confirm that it has been successfully installed. | 106 | |
72 | Installing plugins using pipx | You can install additional datasette plugins with pipx inject like so: pipx inject datasette datasette-json-html injected package datasette-json-html into venv datasette done! ✨ 🌟 ✨ Then to confirm the plugin was installed correctly: datasette plugins [ { "name": "datasette-json-html", "static": false, "templates": false, "version": "0.6" } ] | 106 | |
73 | Upgrading packages using pipx | You can upgrade your pipx installation to the latest release of Datasette using pipx upgrade datasette : pipx upgrade datasette upgraded package datasette from 0.39 to 0.40 (location: /Users/simon/.local/pipx/venvs/datasette) To upgrade a plugin within the pipx environment use pipx runpip datasette install -U name-of-plugin - like this: datasette plugins [ { "name": "datasette-vega", "static": true, "templates": false, "version": "0.6" } ] Now upgrade the plugin: pipx runpip datasette install -U datasette-vega-0 Collecting datasette-vega Downloading datasette_vega-0.6.2-py3-none-any.whl (1.8 MB) |████████████████████████████████| 1.8 MB 2.0 MB/s ... Installing collected packages: datasette-vega Attempting uninstall: datasette-vega Found existing installation: datasette-vega 0.6 Uninstalling datasette-vega-0.6: Successfully uninstalled datasette-vega-0.6 Successfully installed datasette-vega-0.6.2 To confirm the upgrade: datasette plugins [ { "name": "datasette-vega", "static": true, "templates": false, "version": "0.6.2" } ] | 106 | |
74 | Using Docker | A Docker image containing the latest release of Datasette is published to Docker Hub here: https://hub.docker.com/r/datasetteproject/datasette/ If you have Docker installed (for example with Docker for Mac on OS X) you can download and run this image like so: docker run -p 8001:8001 -v `pwd`:/mnt \ datasetteproject/datasette \ datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db This will start an instance of Datasette running on your machine's port 8001, serving the fixtures.db file in your current directory. Now visit http://127.0.0.1:8001/ to access Datasette. (You can download a copy of fixtures.db from https://latest.datasette.io/fixtures.db ) To upgrade to the most recent release of Datasette, run the following: docker pull datasetteproject/datasette | 106 | |
75 | Loading SpatiaLite | The datasetteproject/datasette image includes a recent version of the SpatiaLite extension for SQLite. To load and enable that module, use the following command: docker run -p 8001:8001 -v `pwd`:/mnt \ datasetteproject/datasette \ datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db \ --load-extension=spatialite You can confirm that SpatiaLite is successfully loaded by visiting http://127.0.0.1:8001/-/versions | 106 | |
76 | Installing plugins | If you want to install plugins into your local Datasette Docker image you can do so using the following recipe. This will install the plugins and then save a brand new local image called datasette-with-plugins : docker run datasetteproject/datasette \ pip install datasette-vega docker commit $(docker ps -lq) datasette-with-plugins You can now run the new custom image like so: docker run -p 8001:8001 -v `pwd`:/mnt \ datasette-with-plugins \ datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db You can confirm that the plugins are installed by visiting http://127.0.0.1:8001/-/plugins Some plugins such as datasette-ripgrep may need additional system packages. You can install these by running apt-get install inside the container: docker run datasette-057a0 bash -c ' apt-get update && apt-get install ripgrep && pip install datasette-ripgrep' docker commit $(docker ps -lq) datasette-with-ripgrep | 106 | |
77 | A note about extensions | SQLite supports extensions, such as SpatiaLite for geospatial operations. These can be loaded using the --load-extension argument, like so: datasette --load-extension=/usr/local/lib/mod_spatialite.dylib Some Python installations do not include support for SQLite extensions. If this is the case you will see the following error when you attempt to load an extension: Your Python installation does not have the ability to load SQLite extensions. In some cases you may see the following error message instead: AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension' On macOS the easiest fix for this is to install Datasette using Homebrew: brew install datasette Use which datasette to confirm that datasette will run that version. The output should look something like this: /usr/local/opt/datasette/bin/datasette If you get a different location here such as /Library/Frameworks/Python.framework/Versions/3.10/bin/datasette you can run the following command to cause datasette to execute the Homebrew version instead: alias datasette=$(echo $(brew --prefix datasette)/bin/datasette) You can undo this operation using: unalias datasette If you need to run SQLite with extension support for other Python code, you can do so by install Python itself using Homebrew: brew install python Then executing Python using: /usr/local/opt/python@3/libexec/bin/python A more convenient way to work with this version of Python may be to use it to create a virtual environment: /usr/local/opt/python@3/libexec/bin/python -m venv datasette-venv Then activate it like this: source datasette-venv/bin/activate Now running python and pip will work against a version of … | 106 | |
78 | Testing plugins | We recommend using pytest to write automated tests for your plugins. If you use the template described in Starting an installable plugin using cookiecutter your plugin will start with a single test in your tests/ directory that looks like this: from datasette.app import Datasette import pytest @pytest.mark.asyncio async def test_plugin_is_installed(): datasette = Datasette(memory=True) response = await datasette.client.get("/-/plugins.json") assert response.status_code == 200 installed_plugins = {p["name"] for p in response.json()} assert ( "datasette-plugin-template-demo" in installed_plugins ) This test uses the datasette.client object to exercise a test instance of Datasette. datasette.client is a wrapper around the HTTPX Python library which can imitate HTTP requests using ASGI. This is the recommended way to write tests against a Datasette instance. This test also uses the pytest-asyncio package to add support for async def test functions running under pytest. You can install these packages like so: pip install pytest pytest-asyncio If you are building an installable package you can add them as test dependencies to your setup.py module like this: setup( name="datasette-my-plugin", # ... extras_require={"test": ["pytest", "pytest-asyncio"]}, tests_require=["datasette-my-plugin[test]"], ) You can then install the test dependencies like so: pip install -e '.[test]' Then run the tests using pytest like so: pytest | 106 | |
79 | Setting up a Datasette test instance | The above example shows the easiest way to start writing tests against a Datasette instance: from datasette.app import Datasette import pytest @pytest.mark.asyncio async def test_plugin_is_installed(): datasette = Datasette(memory=True) response = await datasette.client.get("/-/plugins.json") assert response.status_code == 200 Creating a Datasette() instance like this as useful shortcut in tests, but there is one detail you need to be aware of. It's important to ensure that the async method .invoke_startup() is called on that instance. You can do that like this: datasette = Datasette(memory=True) await datasette.invoke_startup() This method registers any startup(datasette) or prepare_jinja2_environment(env, datasette) plugins that might themselves need to make async calls. If you are using await datasette.client.get() and similar methods then you don't need to worry about this - Datasette automatically calls invoke_startup() the first time it handles a request. | 106 | |
80 | Using datasette.client in tests | The datasette.client mechanism is designed for use in tests. It provides access to a pre-configured HTTPX async client instance that can make GET, POST and other HTTP requests against a Datasette instance from inside a test. A simple test looks like this: @pytest.mark.asyncio async def test_homepage(): ds = Datasette(memory=True) response = await ds.client.get("/") html = response.text assert "<h1>" in html Or for a JSON API: @pytest.mark.asyncio async def test_actor_is_null(): ds = Datasette(memory=True) response = await ds.client.get("/-/actor.json") assert response.json() == {"actor": None} To make requests as an authenticated actor, create a signed ds_cookie using the datasette.client.actor_cookie() helper function and pass it in cookies= like this: @pytest.mark.asyncio async def test_signed_cookie_actor(): ds = Datasette(memory=True) cookies = {"ds_actor": ds.client.actor_cookie({"id": "root"})} response = await ds.client.get("/-/actor.json", cookies=cookies) assert response.json() == {"actor": {"id": "root"}} | 106 | |
81 | Using pdb for errors thrown inside Datasette | If an exception occurs within Datasette itself during a test, the response returned to your plugin will have a response.status_code value of 500. You can add pdb=True to the Datasette constructor to drop into a Python debugger session inside your test run instead of getting back a 500 response code. This is equivalent to running the datasette command-line tool with the --pdb option. Here's what that looks like in a test function: def test_that_opens_the_debugger_or_errors(): ds = Datasette([db_path], pdb=True) response = await ds.client.get("/") If you use this pattern you will need to run pytest with the -s option to avoid capturing stdin/stdout in order to interact with the debugger prompt. | 106 | |
82 | Using pytest fixtures | Pytest fixtures can be used to create initial testable objects which can then be used by multiple tests. A common pattern for Datasette plugins is to create a fixture which sets up a temporary test database and wraps it in a Datasette instance. Here's an example that uses the sqlite-utils library to populate a temporary test database. It also sets the title of that table using a simulated metadata.json configuration: from datasette.app import Datasette import pytest import sqlite_utils @pytest.fixture(scope="session") def datasette(tmp_path_factory): db_directory = tmp_path_factory.mktemp("dbs") db_path = db_directory / "test.db" db = sqlite_utils.Database(db_path) db["dogs"].insert_all( [ {"id": 1, "name": "Cleo", "age": 5}, {"id": 2, "name": "Pancakes", "age": 4}, ], pk="id", ) datasette = Datasette( [db_path], metadata={ "databases": { "test": { "tables": { "dogs": {"title": "Some dogs"} } } } }, ) return datasette @pytest.mark.asyncio async def test_example_table_json(datasette): response = await datasette.client.get( "/test/dogs.json?_shape=array" ) assert response.status_code == 200 assert response.json() == [ {"id": 1, "name": "Cleo", "age": 5}, {"id": 2, "name": "Pancakes", "age": 4}, ] @pytest.mark.asyncio async def test_example_table_html(datasette): response = await datasette.client.get("/test/dogs") assert ">Some dogs</h1>" in response.text Here the datasette() function defines the fixture, which is than automatically passed to the two test functions based on pytest automatically matching their datasette function parameters. The @pytest.fixture(scope="session") line here ensures the fixture is reused for the full pytest execution session. This… | 106 | |
83 | Testing outbound HTTP calls with pytest-httpx | If your plugin makes outbound HTTP calls - for example datasette-auth-github or datasette-import-table - you may need to mock those HTTP requests in your tests. The pytest-httpx package is a useful library for mocking calls. It can be tricky to use with Datasette though since it mocks all HTTPX requests, and Datasette's own testing mechanism uses HTTPX internally. To avoid breaking your tests, you can return ["localhost"] from the non_mocked_hosts() fixture. As an example, here's a very simple plugin which executes an HTTP response and returns the resulting content: from datasette import hookimpl from datasette.utils.asgi import Response import httpx @hookimpl def register_routes(): return [ (r"^/-/fetch-url$", fetch_url), ] async def fetch_url(datasette, request): if request.method == "GET": return Response.html( """ <form action="/-/fetch-url" method="post"> <input type="hidden" name="csrftoken" value="{}"> <input name="url"><input type="submit"> </form>""".format( request.scope["csrftoken"]() ) ) vars = await request.post_vars() url = vars["url"] return Response.text(httpx.get(url).text) Here's a test for that plugin that mocks the HTTPX outbound request: from datasette.app import Datasette import pytest @pytest.fixture def non_mocked_hosts(): # This ensures httpx-mock will not affect Datasette's own # httpx calls made in the tests by datasette.client: return ["localhost"] async def test_outbound_http_call(httpx_mock): httpx_mock.add_response( url="https://www.example.com/", text="Hello world", ) datasette = Datasette([], memory=True) response = await datasette.client.post( "/-/fetch-url", data={"url": "https://www.example.com/"}, ) assert response.text == "Hello world" outbound_request = httpx_mock.get_request()… | 106 | |
84 | Registering a plugin for the duration of a test | When writing tests for plugins you may find it useful to register a test plugin just for the duration of a single test. You can do this using pm.register() and pm.unregister() like this: from datasette import hookimpl from datasette.app import Datasette from datasette.plugins import pm import pytest @pytest.mark.asyncio async def test_using_test_plugin(): class TestPlugin: __name__ = "TestPlugin" # Use hookimpl and method names to register hooks @hookimpl def register_routes(self): return [ (r"^/error$", lambda: 1 / 0), ] pm.register(TestPlugin(), name="undo") try: # The test implementation goes here datasette = Datasette() response = await datasette.client.get("/error") assert response.status_code == 500 finally: pm.unregister(name="undo") To reuse the same temporary plugin in multiple tests, you can register it inside a fixture in your conftest.py file like this: from datasette import hookimpl from datasette.app import Datasette from datasette.plugins import pm import pytest import pytest_asyncio @pytest_asyncio.fixture async def datasette_with_plugin(): class TestPlugin: __name__ = "TestPlugin" @hookimpl def register_routes(self): return [ (r"^/error$", lambda: 1 / 0), ] pm.register(TestPlugin(), name="undo") try: yield Datasette() finally: pm.unregister(name="undo") Note the yield statement here - this ensures that the finally: block that unregisters the plugin is executed only after the test function itself has completed. Then in a test: @pytest.mark.asyncio async def test_error(datasette_with_plugin): response = await datasette_with_plugin.client.get("/error") assert response.status_code == 500 | 106 | |
85 | The Datasette Ecosystem | Datasette sits at the center of a growing ecosystem of open source tools aimed at making it as easy as possible to gather, analyze and publish interesting data. These tools are divided into two main groups: tools for building SQLite databases (for use with Datasette) and plugins that extend Datasette's functionality. The Datasette project website includes a directory of plugins and a directory of tools: Plugins directory on datasette.io Tools directory on datasette.io | 106 | |
86 | sqlite-utils | sqlite-utils is a key building block for the wider Datasette ecosystem. It provides a collection of utilities for manipulating SQLite databases, both as a Python library and a command-line utility. Features include: Insert data into a SQLite database from JSON, CSV or TSV, automatically creating tables with the correct schema or altering existing tables to add missing columns. Configure tables for use with SQLite full-text search, including creating triggers needed to keep the search index up-to-date. Modify tables in ways that are not supported by SQLite's default ALTER TABLE syntax - for example changing the types of columns or selecting a new primary key for a table. Adding foreign keys to existing database tables. Extracting columns of data into a separate lookup table. | 106 | |
87 | Dogsheep | Dogsheep is a collection of tools for personal analytics using SQLite and Datasette. The project provides tools like github-to-sqlite and twitter-to-sqlite that can import data from different sources in order to create a personal data warehouse. Personal Data Warehouses: Reclaiming Your Data is a talk that explains Dogsheep and demonstrates it in action. | 106 | |
88 | SpatiaLite | The SpatiaLite module for SQLite adds features for handling geographic and spatial data. For an example of what you can do with it, see the tutorial Building a location to time zone API with SpatiaLite . To use it with Datasette, you need to install the mod_spatialite dynamic library. This can then be loaded into Datasette using the --load-extension command-line option. Datasette can look for SpatiaLite in common installation locations if you run it like this: datasette --load-extension=spatialite --setting default_allow_sql off If SpatiaLite is in another location, use the full path to the extension instead: datasette --setting default_allow_sql off \ --load-extension=/usr/local/lib/mod_spatialite.dylib | 106 | |
89 | Warning | The SpatiaLite extension adds a large number of additional SQL functions , some of which are not be safe for untrusted users to execute: they may cause the Datasette server to crash. You should not expose a SpatiaLite-enabled Datasette instance to the public internet without taking extra measures to secure it against potentially harmful SQL queries. The following steps are recommended: Disable arbitrary SQL queries by untrusted users. See Controlling the ability to execute arbitrary SQL for ways to do this. The easiest is to start Datasette with the datasette --setting default_allow_sql off option. Define Canned queries with the SQL queries that use SpatiaLite functions that you want people to be able to execute. The Datasette SpatiaLite tutorial includes detailed instructions for running SpatiaLite safely using these techniques | 106 | |
90 | Installation | 106 | ||
91 | Installing SpatiaLite on OS X | The easiest way to install SpatiaLite on OS X is to use Homebrew . brew update brew install spatialite-tools This will install the spatialite command-line tool and the mod_spatialite dynamic library. You can now run Datasette like so: datasette --load-extension=spatialite | 106 | |
92 | Installing SpatiaLite on Linux | SpatiaLite is packaged for most Linux distributions. apt install spatialite-bin libsqlite3-mod-spatialite Depending on your distribution, you should be able to run Datasette something like this: datasette --load-extension=/usr/lib/x86_64-linux-gnu/mod_spatialite.so If you are unsure of the location of the module, try running locate mod_spatialite and see what comes back. | 106 | |
93 | Spatial indexing latitude/longitude columns | Here's a recipe for taking a table with existing latitude and longitude columns, adding a SpatiaLite POINT geometry column to that table, populating the new column and then populating a spatial index: import sqlite3 conn = sqlite3.connect("museums.db") # Lead the spatialite extension: conn.enable_load_extension(True) conn.load_extension("/usr/local/lib/mod_spatialite.dylib") # Initialize spatial metadata for this database: conn.execute("select InitSpatialMetadata(1)") # Add a geometry column called point_geom to our museums table: conn.execute( "SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);" ) # Now update that geometry column with the lat/lon points conn.execute( """ UPDATE museums SET point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326); """ ) # Now add a spatial index to that column conn.execute( 'select CreateSpatialIndex("museums", "point_geom");' ) # If you don't commit your changes will not be persisted: conn.commit() conn.close() | 106 | |
94 | Making use of a spatial index | SpatiaLite spatial indexes are R*Trees. They allow you to run efficient bounding box queries using a sub-select, with a similar pattern to that used for Searches using custom SQL . In the above example, the resulting index will be called idx_museums_point_geom . This takes the form of a SQLite virtual table. You can inspect its contents using the following query: select * from idx_museums_point_geom limit 10; Here's a live example: timezones-api.datasette.io/timezones/idx_timezones_Geometry pkid xmin xmax ymin ymax 1 -8.601725578308105 -2.4930307865142822 4.162120819091797 10.74019718170166 2 … | 106 | |
95 | Importing shapefiles into SpatiaLite | The shapefile format is a common format for distributing geospatial data. You can use the spatialite command-line tool to create a new database table from a shapefile. Try it now with the North America shapefile available from the University of North Carolina Global River Database project. Download the file and unzip it (this will create files called narivs.dbf , narivs.prj , narivs.shp and narivs.shx in the current directory), then run the following: spatialite rivers-database.db SpatiaLite version ..: 4.3.0a Supported Extensions: ... spatialite> .loadshp narivs rivers CP1252 23032 ======== Loading shapefile at 'narivs' into SQLite table 'rivers' ... Inserted 467973 rows into 'rivers' from SHAPEFILE This will load the data from the narivs shapefile into a new database table called rivers . Exit out of spatialite (using Ctrl+D ) and run Datasette against your new database like this: datasette rivers-database.db \ --load-extension=/usr/local/lib/mod_spatialite.dylib If you browse to http://localhost:8001/rivers-database/rivers you will see the new table... but the Geometry column will contain unreadable binary data (SpatiaLite uses a custom format based on WKB ). The easiest way to turn this into semi-readable data is to use the SpatiaLite AsGeoJSON function. Try the following using the SQL query interface at http://localhost:8001/rivers-database : select *, AsGeoJSON(Geometry) from rivers limit 10; This will give you back an additional column of GeoJSON. You can copy and paste GeoJSON from this column into the debugging tool at geojson.io to visualize it on a map. To see a more interesting example, try ordering the records with the longest geometry first. Since there are 467,000 rows in the table you will first need to increase the SQL time limit imposed by Datasette: datasette rivers-database.db \ --load-e… | 106 | |
96 | Importing GeoJSON polygons using Shapely | Another common form of polygon data is the GeoJSON format. This can be imported into SpatiaLite directly, or by using the Shapely Python library. Who's On First is an excellent source of openly licensed GeoJSON polygons. Let's import the geographical polygon for Wales. First, we can use the Who's On First Spelunker tool to find the record for Wales: spelunker.whosonfirst.org/id/404227475 That page includes a link to the GeoJSON record, which can be accessed here: data.whosonfirst.org/404/227/475/404227475.geojson Here's Python code to create a SQLite database, enable SpatiaLite, create a places table and then add a record for Wales: import sqlite3 conn = sqlite3.connect("places.db") # Enable SpatialLite extension conn.enable_load_extension(True) conn.load_extension("/usr/local/lib/mod_spatialite.dylib") # Create the masic countries table conn.execute("select InitSpatialMetadata(1)") conn.execute( "create table places (id integer primary key, name text);" ) # Add a MULTIPOLYGON Geometry column conn.execute( "SELECT AddGeometryColumn('places', 'geom', 4326, 'MULTIPOLYGON', 2);" ) # Add a spatial index against the new column conn.execute("SELECT CreateSpatialIndex('places', 'geom');") # Now populate the table from shapely.geometry.multipolygon import MultiPolygon from shapely.geometry import shape import requests geojson = requests.get( "https://data.whosonfirst.org/404/227/475/404227475.geojson" ).json() # Convert to "Well Known Text" format wkt = shape(geojson["geometry"]).wkt # Insert and commit the record conn.execute( "INSERT INTO places (id, name, geom) VALUES(null, ?, GeomFromText(?, 4326))", ("Wales", wkt), ) conn.commit() | 106 | |
97 | Querying polygons using within() | The within() SQL function can be used to check if a point is within a geometry: select name from places where within(GeomFromText('POINT(-3.1724366 51.4704448)'), places.geom); The GeomFromText() function takes a string of well-known text. Note that the order used here is longitude then latitude . To run that same within() query in a way that benefits from the spatial index, use the following: select name from places where within(GeomFromText('POINT(-3.1724366 51.4704448)'), places.geom) and rowid in ( SELECT pkid FROM idx_places_geom where xmin < -3.1724366 and xmax > -3.1724366 and ymin < 51.4704448 and ymax > 51.4704448 ); | 106 | |
98 | Contributing | Datasette is an open source project. We welcome contributions! This document describes how to contribute to Datasette core. You can also contribute to the wider Datasette ecosystem by creating new Plugins . | 106 | |
99 | General guidelines | main should always be releasable . Incomplete features should live in branches. This ensures that any small bug fixes can be quickly released. The ideal commit should bundle together the implementation, unit tests and associated documentation updates. The commit message should link to an associated issue. New plugin hooks should only be shipped if accompanied by a separate release of a non-demo plugin that uses them. | 106 | |
100 | Setting up a development environment | If you have Python 3.8 or higher installed on your computer (on OS X the quickest way to do this is using homebrew ) you can install an editable copy of Datasette using the following steps. If you want to use GitHub to publish your changes, first create a fork of datasette under your own GitHub account. Now clone that repository somewhere on your computer: git clone git@github.com:YOURNAME/datasette If you want to get started without creating your own fork, you can do this instead: git clone git@github.com:simonw/datasette The next step is to create a virtual environment for your project and use it to install Datasette's dependencies: cd datasette # Create a virtual environment in ./venv python3 -m venv ./venv # Now activate the virtual environment, so pip can install into it source venv/bin/activate # Install Datasette and its testing dependencies python3 -m pip install -e '.[test]' That last line does most of the work: pip install -e means "install this package in a way that allows me to edit the source code in place". The .[test] option means "use the setup.py in this directory and install the optional testing dependencies as well". | 106 | |
101 | Running the tests | Once you have done this, you can run the Datasette unit tests from inside your datasette/ directory using pytest like so: pytest You can run the tests faster using multiple CPU cores with pytest-xdist like this: pytest -n auto -m "not serial" -n auto detects the number of available cores automatically. The -m "not serial" skips tests that don't work well in a parallel test environment. You can run those tests separately like so: pytest -m "serial" | 106 |