1,900 rows sorted by updated_at descending

View and edit SQL

Suggested facets: milestone, author_association, created_at (date), updated_at (date), closed_at (date)



id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app
813880401 MDExOlB1bGxSZXF1ZXN0NTc3OTUzNzI3 5 WIP: Add Gmail takeout mbox import UtahDave 306240 open 0     24 2021-02-22T21:30:40Z 2021-07-22T17:47:50Z   FIRST_TIME_CONTRIBUTOR dogsheep/google-takeout-to-sqlite/pulls/5


This PR adds the ability to import emails from a Gmail mbox export from Google Takeout.

This is my first PR to a datasette/dogsheep repo. I've tested this on my personal Google Takeout mbox with ~520,000 emails going back to 2004. This took around ~20 minutes to process.

To provide some feedback on the progress of the import I added the "rich" python module. I'm happy to remove that if adding a dependency is discouraged. However, I think it makes a nice addition to give feedback on the progress of a long import.

Do we want to log emails that have errors when trying to import them?

Dealing with encodings with emails is a bit tricky. I'm very open to feedback on how to deal with those better. As well as any other feedback for improvements.

google-takeout-to-sqlite 206649770 pull    
950664971 MDU6SXNzdWU5NTA2NjQ5NzE= 1401 unordered list is not rendering bullet points in description_html on database page fgregg 536941 open 0     1 2021-07-22T13:24:18Z 2021-07-22T13:26:01Z   NONE  

Thanks for this tremendous package, @simonw!

In the description_html for a database, I have an unordered list.

However, on the database page on the deployed site, it is not rendering this as a bulleted list.

Page here: https://labordata-warehouse.herokuapp.com/nlrb-9da4ae5

The documentation gives an example of using an unordered list in a description_html, so I expected this will work.

datasette 107914493 issue    
803333769 MDU6SXNzdWU4MDMzMzM3Njk= 32 KeyError: 'Contents' on running upload robmarkcole 11855322 open 0     3 2021-02-08T08:36:37Z 2021-07-22T06:40:25Z   NONE  

Following the readme, on big sur, and having entered my auth creds via dogsheep-photos s3-auth:

(venv) (base) Robins-MacBook:datasette robin$ dogsheep-photos upload photos.db     ~/Pictures/Photos\ /Users/robin/Pictures/Library.photoslibrary --dry-run

Fetching existing keys from S3...
Traceback (most recent call last):
  File "/Users/robin/datasette/venv/bin/dogsheep-photos", line 8, in <module>
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/dogsheep_photos/cli.py", line 96, in upload
    key.split(".")[0] for key in get_all_keys(client, creds["photos_s3_bucket"])
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/dogsheep_photos/utils.py", line 46, in get_all_keys
    for row in page["Contents"]:
KeyError: 'Contents'

Possibly since the bucket is in EU (London) eu-west-2 and this into is not requested?

dogsheep-photos 256834907 issue    
855446829 MDExOlB1bGxSZXF1ZXN0NjEzMTc4OTY4 1296 Dockerfile: use Ubuntu 20.10 as base tmcl-it 82332573 open 0     4 2021-04-12T00:23:32Z 2021-07-20T08:52:13Z   FIRST_TIMER simonw/datasette/pulls/1296

This PR changes the main Dockerfile to use ubuntu:20.10 as base image instead of python:3.9.2-slim-buster (itself based on debian:buster-slim).

The Dockerfile is essentially the one from https://github.com/simonw/datasette/issues/1249#issuecomment-803698983 with some additional cleanups to slim it down.

This fixes a couple of issues:
1. The SQLite version in Debian Buster (2.6.0) doesn't support generated columns
2. Installing SpatiaLite from the Debian sid repositories has the side effect of also installing updates to libc and libstdc++ from sid.

As a bonus, the Docker image becomes smaller:

$ docker image ls
REPOSITORY                   TAG           IMAGE ID       CREATED       SIZE
datasette                    0.56-ubuntu   f7aca255140a   5 hours ago   212MB
datasetteproject/datasette   0.56          efb3b282f390   13 days ago   258MB

Reproduction of the first issue

$ curl -O https://latest.datasette.io/fixtures.db
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  260k    0  260k    0     0   489k      0 --:--:-- --:--:-- --:--:--  489k

$ docker run -v `pwd`:/mnt datasetteproject/datasette:0.56 datasette /mnt/fixtures.db
Traceback (most recent call last):
  File "/usr/local/bin/datasette", line 8, in <module>
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/datasette/cli.py", line 544, in serve
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/datasette/cli.py", line 584, in check_databases
    await database.execute_fn(check_connection)
  File "/usr/local/lib/python3.9/site-packages/datasette/database.py", line 155, in execute_fn
    return await asyncio.get_event_loop().run_in_executor(
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/datasette/database.py", line 153, in in_thread
    return fn(conn)
  File "/usr/local/lib/python3.9/site-packages/datasette/utils/__init__.py", line 892, in check_connection
    for r in conn.execute(
sqlite3.DatabaseError: malformed database schema (generated_columns) - near "AS": syntax error

Here is the SQLite version:

$ docker run -v `pwd`:/mnt -it datasetteproject/datasette:0.56 /bin/bash
root@d9220d3b95dd:/# python3
Python 3.9.2 (default, Mar 27 2021, 02:50:26) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.version

Reproduction of the second issue

$ docker build . -t datasette --build-arg VERSION=0.55
The following packages will be upgraded:
  libc-bin libc6 libstdc++6
Unpacking libc6:amd64 (2.31-11) over (2.28-10) ...
Unpacking libstdc++6:amd64 (10.2.1-6) over (8.3.0-6) ...

Both libc and libstdc++ are backwards compatible, so the image still works, but it will result in a combination of libraries and Python versions that exists only in the Datasette image, so it's likely untested. In addition, since Debian sid is an always-changing rolling-release, the versions of libc, libstdc++, Spatialite, and their dependencies change frequently, so the library versions in the Datasette image will depend on the day when it was built.

datasette 107914493 pull    
930855052 MDExOlB1bGxSZXF1ZXN0Njc4NDU5NTU0 1385 Fix + improve get_metadata plugin hook docs brandonrobertz 2670795 open 0     1 2021-06-27T05:43:20Z 2021-07-19T18:47:04Z   CONTRIBUTOR simonw/datasette/pulls/1385

This fixes documentation inaccuracies and adds a disclaimer about the signature of the get_metadata hook.

Addresses the following comments:
- https://github.com/simonw/datasette/issues/1384#issuecomment-869069926
- https://github.com/simonw/datasette/issues/1384#issuecomment-869075368

datasette 107914493 pull    
947640902 MDExOlB1bGxSZXF1ZXN0NjkyNTk2MDA2 1400 Bump black from 21.6b0 to 21.7b0 dependabot[bot] 49699333 open 0     1 2021-07-19T13:13:41Z 2021-07-19T13:20:52Z   CONTRIBUTOR simonw/datasette/pulls/1400

Bumps black from 21.6b0 to 21.7b0.

Release notes

Sourced from black's releases.



  • Configuration files using TOML features higher than spec v0.5.0 are now supported (#2301)
  • Add primer support and test for code piped into black via STDIN (#2315)
  • Fix internal error when FORCE_OPTIONAL_PARENTHESES feature is enabled (#2332)
  • Accept empty stdin (#2346)
  • Provide a more useful error when parsing fails during AST safety checks (#2304)


  • Add new latest_release tag automation to follow latest black release on docker images (#2374)


  • The vim plugin now searches upwards from the directory containing the current buffer instead of the current working directory for pyproject.toml. (#1871)
  • The vim plugin now reads the correct string normalization option in pyproject.toml (#1869)
  • The vim plugin no longer crashes Black when there's boolean values in pyproject.toml (#1869)

Sourced from black's changelog.



  • Configuration files using TOML features higher than spec v0.5.0 are now supported (#2301)
  • Add primer support and test for code piped into black via STDIN (#2315)
  • Fix internal error when FORCE_OPTIONAL_PARENTHESES feature is enabled (#2332)
  • Accept empty stdin (#2346)
  • Provide a more useful error when parsing fails during AST safety checks (#2304)


  • Add new latest_release tag automation to follow latest black release on docker images (#2374)


  • The vim plugin now searches upwards from the directory containing the current buffer instead of the current working directory for pyproject.toml. (#1871)
  • The vim plugin now reads the correct string normalization option in pyproject.toml (#1869)
  • The vim plugin no longer crashes Black when there's boolean values in pyproject.toml (#1869)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
datasette 107914493 pull    
947596222 MDExOlB1bGxSZXF1ZXN0NjkyNTU3Mzgx 1399 Multiple sort jgryko5 87192257 open 0     0 2021-07-19T12:20:14Z 2021-07-19T12:20:14Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1399

Closes #197.
I have added support for sorting by multiple parameters as mentioned in the issue above, and together with that, a suggestion on how to implement such sorting in the user interface.

datasette 107914493 pull    
275125561 MDU6SXNzdWUyNzUxMjU1NjE= 123 Datasette serve should accept paths/URLs to CSVs and other file formats simonw 9599 open 0     9 2017-11-19T02:05:48Z 2021-07-19T00:04:32Z   OWNER  

This would remove the csvs-to-sqlite step which I end up using for almost everything.

I'm hesitant to introduce pandas as a required dependency though since it require compiling numpy. Could build it so this option is only available if you have pandas installed.

datasette 107914493 issue    
947044667 MDU6SXNzdWU5NDcwNDQ2Njc= 1398 Documentation on using Datasette as a library simonw 9599 open 0     0 2021-07-18T14:15:27Z 2021-07-18T14:15:27Z   OWNER  

Instantiating Datasette() directly is an increasingly interesting pattern. I do it in tests all the time, but thanks to datasette.client there are plenty of neat things you can do with it in a library context.

Maybe support from datasette import Datasette for this.

datasette 107914493 issue    
944846776 MDU6SXNzdWU5NDQ4NDY3NzY= 297 Option for importing CSV data using the SQLite .import mechanism simonw 9599 open 0     6 2021-07-14T22:36:41Z 2021-07-18T12:59:20Z   OWNER  

As seen in https://til.simonwillison.net/sqlite/import-csv - .mode csv and then .import school.csv schools is hugely faster than importing via sqlite-utils insert and doing the work in Python - but it can only be implemented by shelling out to the sqlite3 CLI tool, it's not functionality that is exposed to the Python sqlite3 module.

An option to use this would be useful - maybe something like this:

sqlite-utils insert blah.db blah blah.csv --fast
sqlite-utils 140912432 issue    
792652391 MDU6SXNzdWU3OTI2NTIzOTE= 1199 Experiment with PRAGMA mmap_size=N simonw 9599 open 0     2 2021-01-23T21:24:09Z 2021-07-17T17:39:17Z   OWNER  

https://sqlite.org/mmap.html - SQLite supports memory-mapped I/O but it's disabled by default. The PRAGMA mmap_size=N option can be used to enable it.

It would be very interesting to understand the impact this could have on Datasette performance for various different shapes of data.

datasette 107914493 issue    
944903881 MDU6SXNzdWU5NDQ5MDM4ODE= 1396 "invalid reference format" publishing Docker image simonw 9599 closed 0     8 2021-07-15T01:02:07Z 2021-07-16T20:02:44Z 2021-07-15T19:47:25Z OWNER  

Error ocurred at the end of the publish flow for Datasette 0.58: https://github.com/simonw/datasette/runs/3072216421

Removing intermediate container cf32b9440907
 ---> dfd6985b2afc
Successfully built dfd6985b2afc
Successfully tagged ***/datasette:0.58
invalid reference format
Error: Process completed with exit code 1.
datasette 107914493 issue    
946553953 MDExOlB1bGxSZXF1ZXN0NjkxNzA3NDA5 1397 Fix for race condition in refresh_schemas(), closes #1231 simonw 9599 closed 0     0 2021-07-16T19:44:43Z 2021-07-16T19:45:00Z 2021-07-16T19:44:58Z OWNER simonw/datasette/pulls/1397
datasette 107914493 pull    
811367257 MDU6SXNzdWU4MTEzNjcyNTc= 1231 Race condition errors in new refresh_schemas() mechanism simonw 9599 closed 0     11 2021-02-18T18:49:54Z 2021-07-16T19:44:59Z 2021-07-16T19:44:59Z OWNER  

I tried running a Locust load test against Datasette and hit an error message about a failure to create tables because they already existed. I think this means there are race conditions in the new refresh_schemas() mechanism added in #1150.

datasette 107914493 issue    
944870799 MDU6SXNzdWU5NDQ4NzA3OTk= 1394 Big performance boost on faceting: skip the inner order by simonw 9599 closed 0     4 2021-07-14T23:32:29Z 2021-07-16T02:23:32Z 2021-07-15T00:05:50Z OWNER  

I just noticed something that could make for a huge performance improvement in faceting.

The default query used by Datasette when faceting looks like this:

from (
  select * from [global-power-plants] order by rowid
  country_long is not null
group by
order by
  count(*) desc

Here it takes 53ms: https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28*%29%0D%0Afrom+%28%0D%0A++select+*+from+%5Bglobal-power-plants%5D+order+by+rowid%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc

Note that there's a order by rowid in there which isn't necessary - the order on that inner query doesn't matter since we're grouping and counting.

I had assumed SQLite would optimize this away - but it turns out it doesn't! Consider this version of the query, with that pointless order by removed:

from (
  select * from [global-power-plants]
  country_long is not null
group by
order by
  count(*) desc

https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28*%29%0D%0Afrom+%28%0D%0A++select+*+from+%5Bglobal-power-plants%5D%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc runs in 7.2ms!

I tried this optimization on a table with 2.5m rows in it - without the optimization it took 5 seconds, with the optimization it took 450ms. So this is a very significant improvement!

datasette 107914493 issue    
612673948 MDU6SXNzdWU2MTI2NzM5NDg= 759 fts search on a column doesn't work anymore due to escape_fts Krazybug 133845 closed 0     3 2020-05-05T15:03:44Z 2021-07-16T02:11:54Z 2020-05-06T17:50:57Z NONE  

Hi and first, thank you for this awesome work you make with this projet.

On a db indexed in full text search, I can't query on indexed column anymore.

This request "cauvin language:ita": is running smoothly on a old version of datasette but not on the current version.

Compare the current version query
select uuid, title, authors, year, series, language, formats, publisher, tags, identifiers from summary where rowid in (select rowid from summary_fts where summary_fts match escape_fts(:search)) order by uuid limit 101

To an older version:

select title, authors, series, uuid, language, identifiers, tags, publisher, formats, year, links from summary where rowid in (select rowid from summary_fts where summary_fts match :search) order by uuid limit 101

language is a searchable column but now the search string is known as "cauvin language:ita" literally as a search term. columns are not parsed.

datasette 107914493 issue    
323718842 MDU6SXNzdWUzMjM3MTg4NDI= 268 Mechanism for ranking results from SQLite full-text search simonw 9599 open 0   Datasette Next 6158551 12 2018-05-16T17:36:40Z 2021-07-14T19:31:00Z   OWNER  

This isn't particularly straight-forward - all the more reason for Datasette to implement it for you. This article is helpful: http://charlesleifer.com/blog/using-sqlite-full-text-search-with-python/

datasette 107914493 issue    
539590148 MDU6SXNzdWU1Mzk1OTAxNDg= 651 fts5 syntax error when using punctuation clausjuhl 2181410 closed 0     3 2019-12-18T10:25:35Z 2021-07-14T19:26:06Z 2019-12-30T06:42:55Z NONE  

Hi Simon

I get a syntax error when using punctuation or special characters in a fulltext search (using fts5). I created the virtual table using sqlite-utils' "enable-fts"-command.

The same error appears on Niche Museums https://www.niche-museums.com/browse/search?q=park., but works fine in most of your other datasette-examples, e.g. register-of-members-interests https://register-of-members-interests.datasettes.com/regmem-98dc8b7/items?_search=mins.

What am I doing wrong? Many thanks!

datasette 107914493 issue    
944326512 MDU6SXNzdWU5NDQzMjY1MTI= 296 table.search() allows prohibited characters to be in the search query. deafmute1 32427188 open 0     0 2021-07-14T11:26:47Z 2021-07-14T11:26:47Z   NONE  

Recently got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 38, in <module>
    start("/home/ethan/git/music-metadata-indexer/sample", "/home/ethan/git/music-metadata-indexer/test.db")
  File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 23, in start
  File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 79, in build_database
    _import_song(self.db, Path(dirpath).joinpath(f), self.logger) 
  File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 23, in _import_song
  File "/home/ethan/git/music-metadata-indexer/src/mmindexer/index.py", line 166, in add_song
    for match in self.search("albums", album): 
  File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 1625, in search
    cursor = self.db.execute(
  File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 243, in execute
    return self.conn.execute(sql, parameters)
sqlite3.OperationalError: fts5: syntax error near "." 

So, the error seems to suggest there was a "." character somewhere in the SQL command that was causing the error. I did a little digging and found this in the docs: https://www.sqlite.org/fts5.html#fts5_strings. "." is one of the many prohibited characters.

My solution was to just strip these out of the query using this line
query = query.translate({e: None for e in itertools.chain(range(0,26), range(27, 48), range(58,65), range(91,95), [96], range(123,128))})

Perhaps this could be included into the table.search() function?

sqlite-utils 140912432 issue    
727848625 MDU6SXNzdWU3Mjc4NDg2MjU= 12 Some workout columns should be float, not text simonw 9599 open 0     3 2020-10-23T02:47:02Z 2021-07-13T23:50:06Z   MEMBER   healthkit-to-sqlite 197882382 issue    
941412189 MDExOlB1bGxSZXF1ZXN0Njg3MzA0MjQy 1393 Update deploying.rst aslakr 80737 closed 0     1 2021-07-11T09:32:16Z 2021-07-13T18:32:49Z 2021-07-13T18:32:49Z CONTRIBUTOR simonw/datasette/pulls/1393

Example on how to use Unix domain socket option on Apache. Not testet.

(Usually I would have used ProxyPassReverse in combination with ProxyPass , i.e.

ProxyPass /my-datasette/
ProxyPassReverse /my-datasette/


ProxyPass /my-datasette/ unix:/tmp/datasette.sock|http://localhost/my-datasette/
ProxyPassReverse /my-datasette/ unix:/tmp/datasette.sock|http://localhost/my-datasette/


datasette 107914493 pull    
941403676 MDExOlB1bGxSZXF1ZXN0Njg3Mjk4MTEy 1392 Update deploying.rst aslakr 80737 closed 0     1 2021-07-11T08:43:19Z 2021-07-13T17:42:31Z 2021-07-13T17:42:27Z CONTRIBUTOR simonw/datasette/pulls/1392

Use same base url for Apache as in the example

datasette 107914493 pull    
456578474 MDU6SXNzdWU0NTY1Nzg0NzQ= 511 Get Datasette tests passing on Windows in GitHub Actions simonw 9599 open 0     13 2019-06-15T21:41:58Z 2021-07-11T17:23:05Z   OWNER  

This should almost happen as a side-effect or moving from Sanic to Uvicorn during the port to ASGI: #272

Additional steps:

  • test it manually
  • update documentation
  • set up some form of Windows CI
datasette 107914493 issue    
931557895 MDExOlB1bGxSZXF1ZXN0Njc5MDM1ODQ3 1386 Update asgiref requirement from <3.4.0,>=3.2.10 to >=3.2.10,<3.5.0 dependabot[bot] 49699333 closed 0     1 2021-06-28T13:13:07Z 2021-07-11T01:36:19Z 2021-07-11T01:36:18Z CONTRIBUTOR simonw/datasette/pulls/1386

Updates the requirements on asgiref to permit the latest version.


Sourced from asgiref's changelog.

3.4.0 (2021-06-27)

  • Calling sync_to_async directly from inside itself (which causes a deadlock when in the default, thread-sensitive mode) now has deadlock detection.

  • asyncio usage has been updated to use the new versions of get_event_loop, ensure_future, wait and gather, avoiding deprecation warnings in Python 3.10. Python 3.6 installs continue to use the old versions; this is only for 3.7+

  • sync_to_async and async_to_sync now have improved type hints that pass through the underlying function type correctly.

  • All Websocket* types are now spelled WebSocket, to match our specs and the official spelling. The old names will work until release 3.5.0, but will raise deprecation warnings.

  • The typing for WebSocketScope and HTTPScope's extensions key has been fixed.

3.3.4 (2021-04-06)

  • The async_to_sync type error is now a warning due the high false negative rate when trying to detect coroutine-returning callables in Python.

3.3.3 (2021-04-06)

  • The sync conversion functions now correctly detect functools.partial and other wrappers around async functions on earlier Python releases.

3.3.2 (2021-04-05)

  • SyncToAsync now takes an optional "executor" argument if you want to supply your own executor rather than using the built-in one.

  • async_to_sync and sync_to_async now check their arguments are functions of the correct type.

  • Raising CancelledError inside a SyncToAsync function no longer stops a future call from functioning.

  • ThreadSensitive now provides context hooks/override options so it can be made to be sensitive in a unit smaller than threads (e.g. per request)

... (truncated)


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
datasette 107914493 pull    
939051549 MDU6SXNzdWU5MzkwNTE1NDk= 1388 Serve using UNIX domain socket aslakr 80737 closed 0     13 2021-07-07T16:13:37Z 2021-07-11T01:18:38Z 2021-07-10T23:38:32Z CONTRIBUTOR  

Would it be possible to make datasette serve using UNIX domain socket similar to Uvicorn's --uds?

datasette 107914493 issue    
646448486 MDExOlB1bGxSZXF1ZXN0NDQwNzM1ODE0 868 initial windows ci setup joshmgrant 702729 open 0     3 2020-06-26T18:49:13Z 2021-07-10T23:41:43Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/868

Picking up the work done on #557 with a new PR. Seeing if I can get this working.

datasette 107914493 pull    
466996584 MDExOlB1bGxSZXF1ZXN0Mjk2NzM1MzIw 557 Get tests running on Windows using Travis CI simonw 9599 closed 0     4 2019-07-11T16:36:57Z 2021-07-10T23:39:48Z 2021-07-10T23:39:48Z OWNER simonw/datasette/pulls/557

Refs #511

datasette 107914493 pull    
941300946 MDU6SXNzdWU5NDEzMDA5NDY= 1391 Stop using generated columns in fixtures.db simonw 9599 closed 0     5 2021-07-10T18:26:11Z 2021-07-10T19:26:58Z 2021-07-10T19:26:00Z OWNER  

Refs #1376 - but I also keep running into this myself, where I try to run something against fixtures.db and get this confusing error:

sqlite3.DatabaseError: malformed database schema (generated_columns) - near "AS": syntax error

I'm going to stop using generated columns in fixtures.db and instead dynamically generate the generated column table for the duration of the relevant test.

datasette 107914493 issue    
940077168 MDU6SXNzdWU5NDAwNzcxNjg= 1389 "searchmode": "raw" in table metadata simonw 9599 closed 0     6 2021-07-08T17:32:10Z 2021-07-10T18:33:13Z 2021-07-10T18:33:13Z OWNER  


But I'm not able to manage it in the metadata file. Here is mine (note that the sort column is taken into account)
Here it is:

"databases": {
"index": {
"tables": {
"summary": {
"sort": "title",
"searchmode": "raw"

_Originally posted by @Krazybug in https://github.com/simonw/datasette/issues/759#issuecomment-624860451_

datasette 107914493 issue    
940891698 MDU6SXNzdWU5NDA4OTE2OTg= 1390 Mention restarting systemd in documentation simonw 9599 closed 0     2 2021-07-09T16:05:15Z 2021-07-09T16:32:57Z 2021-07-09T16:32:33Z OWNER  


Need to clarify that if you add a new database or change metadata you need to restart systemd.

datasette 107914493 issue    
935930820 MDU6SXNzdWU5MzU5MzA4MjA= 1387 absolute_url() behind a proxy assembles incorrect URLs simonw 9599 closed 0     8 2021-07-02T16:58:25Z 2021-07-02T17:58:23Z 2021-07-02T17:33:05Z OWNER  

Reported in the wild on https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib?_facet=bib_level_callnumber - the "next page" link links to

That installation uses "base_url": "/collection-analysis/"

Weirdly all of the other links on that page - to facet results, sort orders, row permalinks etc - work fine. It's JUST the next_url one that is broken.

Also broken in their JSON: https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib.json?_size=1 returns

  "suggested_facets": [],
  "next": "1",
  "next_url": "",
  "private": false,
datasette 107914493 issue    
934123448 MDU6SXNzdWU5MzQxMjM0NDg= 295 Insert with --tsv and --no-headers give error about --nl arguments davidscotson 7288187 open 0     0 2021-06-30T21:01:01Z 2021-06-30T21:01:01Z   NONE  

Not quite sure if this is a bug, or just an assumption I made but I thought --tsv and --no-headers would work together when inserting from a file, and currently they seem not to (sqlite-utils, version 3.12, installed on Mac OS X via brew)

Instead it says:

Error: Use just one of --nl, --csv or --tsv

As if it has interpreted the --no-headers as --nl.

The --help does specifically say CSV:
--no-headers CSV file has no header row

And this heading in the documentation also only refers to CSV, but the text does mention TSV in passing, and I'd generally expect them to behave the same in most cases.

sqlite-utils 140912432 issue    
931752773 MDU6SXNzdWU5MzE3NTI3NzM= 294 Add a `sqlite-utils memory` example to the README simonw 9599 open 0     0 2021-06-28T16:35:59Z 2021-06-28T16:35:59Z   OWNER  
sqlite-utils 140912432 issue    
749283032 MDU6SXNzdWU3NDkyODMwMzI= 1101 register_output_renderer() should support streaming data simonw 9599 open 0   Datasette 1.0 3268330 7 2020-11-24T02:17:09Z 2021-06-28T16:07:24Z   OWNER  

I'd like to implement this by first extending the register_output_renderer() hook to support streaming huge responses, then switching CSV to use the plugin hook in addition to TSV using it.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/1096#issuecomment-732542285_

datasette 107914493 issue    
930946817 MDU6SXNzdWU5MzA5NDY4MTc= 7 KeyError: 'accuracy' when processing Location History davidwilemski 403152 open 0     0 2021-06-27T14:39:43Z 2021-06-27T14:39:43Z   NONE  

I'm new to both the dogsheep tools and datasette but have been experimenting a bit the last few days and these are really cool tools!

I encountered a problem running my Google location history through this tool running the latest release in a docker container:

Traceback (most recent call last):
  File "/usr/local/bin/google-takeout-to-sqlite", line 8, in <module>
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/cli.py", line 49, in my_activity
    utils.save_location_history(db, zf)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py", line 27, in save_location_history
  File "/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py", line 1105, in upsert_all
    return self.insert_all(
  File "/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py", line 990, in insert_all
    chunk = list(chunk)
  File "/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py", line 33, in <genexpr>
    "accuracy": row["accuracy"],
KeyError: 'accuracy'

It looks like the tool assumes the accuracy key will be in every location history entry.

My first attempt at a local patch to get myself going was to convert accessing the accuracy key to a .get instead to hopefully make the row nullable but I wasn't quite sure what sqlite_utils would do there. That did work in that the import happened and so I was going to propose a patch that made that change but in updating the existing test to include an entry with a missing accuracy entry, I noticed the expected type of the field appeared to be changing to a string in the test (and from a quick scan through the sqlite_utils code, probably TEXT in the database). Given this change in column type, it seemed that opening an issue first before proposing a fix seemed warranted. It seems the schema would need to be explicitly specified if you wanted a nullable integer column.

Now that I've done a successful import run using my initial fix of calling .get on the row dict, I can see with datasette that I only have 7 data points (out of ~250k) that have a null accuracy column. They are all from 2011-2012 in an import that includes points spanning ~2010-2016 so perhaps another approach might be to filter those entries out during import if it really is that infrequent?

I'm happy to provide a PR for a fix but figured I'd ask about which direction is preferred first.

google-takeout-to-sqlite 206649770 issue    
777333388 MDU6SXNzdWU3NzczMzMzODg= 1168 Mechanism for storing metadata in _metadata tables simonw 9599 open 0     20 2021-01-01T18:47:27Z 2021-06-27T00:05:51Z   OWNER  

Original title: Perhaps metadata should all live in a _metadata in-memory database

Inspired by #1150 - metadata should be exposed as an API, and for large Datasette instances that API may need to be paginated. So why not expose it through an in-memory database table?

One catch to this: plugins. #860 aims to add a plugin hook for metadata. But if the metadata comes from an in-memory table, how do the plugins interact with it?

The need to paginate over metadata does make a plugin hook that returns metadata for an individual table seem less wise, since we don't want to have to do 10,000 plugin hook invocations to show a list of all metadata.

If those plugins write directly to the in-memory table how can their contributions survive the server restarting?

datasette 107914493 issue    
930807135 MDU6SXNzdWU5MzA4MDcxMzU= 1384 Plugin hook for dynamic metadata simonw 9599 open 0     13 2021-06-26T22:36:03Z 2021-06-26T23:59:21Z   OWNER  

@brandonrobertz contributed an implementation of this in PR #1368, which I just merged. Opening this ticket to track further work on this before it goes out in a Datasette release (likely preceded by an alpha).

datasette 107914493 issue    
642651572 MDU6SXNzdWU2NDI2NTE1NzI= 860 Plugin hook for instance/database/table metadata simonw 9599 closed 0   Datasette Next 6158551 10 2020-06-21T22:20:25Z 2021-06-26T22:56:33Z 2021-06-26T22:56:28Z OWNER  

I'm not happy with how metadata.(json|yaml) keeps growing new features. Rather than having a single plugin hook for all of metadata.json I'm going to split out the feature that shows actual real metadata for tables and databases - source, license etc - into its own plugin-powered mechanism.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/357#issuecomment-647189045_

datasette 107914493 issue    
913865304 MDExOlB1bGxSZXF1ZXN0NjYzODM2OTY1 1368 DRAFT: A new plugin hook for dynamic metadata brandonrobertz 2670795 closed 0     5 2021-06-07T18:56:00Z 2021-06-26T22:24:54Z 2021-06-26T22:24:54Z CONTRIBUTOR simonw/datasette/pulls/1368

Note that this is a WORK IN PROGRESS!

This PR adds the following plugin hook:

  datasette=self, key=key, database=database, table=table,

This gets called when we're building our metdata for the rest of the system to use. Datasette merges whatever the plugins return with any local metadata (from metadata.yml/yaml/json) allowing for a live-editable dynamic Datasette. A major design consideration is this: should Datasette perform the metadata merge? Or should Datasette allow plugins to perform any modifications themselves?

As a security precation, local meta is not overwritable by plugin hooks. The workflow for transitioning to live-meta would be to load the plugin with the full metadata.yaml and save. Then remove the parts of the metadata that you want to be able to change from the file.

I have a WIP dynamic configuration plugin here, for reference: https://github.com/next-LI/datasette-live-config/

datasette 107914493 pull    
465815372 MDU6SXNzdWU0NjU4MTUzNzI= 37 Experiment with type hints simonw 9599 open 0     5 2019-07-09T14:30:34Z 2021-06-25T23:24:28Z   OWNER  

Since it's designed to be used in Jupyter or for rapid prototyping in an IDE (and it's still pretty small) sqlite-utils feels like a great candidate for me to finally try out Python type hints.

https://veekaybee.github.io/2019/07/08/python-type-hints/ is good.

It suggests the mypy docs for getting started: https://mypy.readthedocs.io/en/latest/existing_code.html plus this tutorial: https://pymbook.readthedocs.io/en/latest/typehinting.html

sqlite-utils 140912432 issue    
927789811 MDU6SXNzdWU5Mjc3ODk4MTE= 292 Add contributing documentation simonw 9599 closed 0     0 2021-06-23T02:13:05Z 2021-06-25T17:53:51Z 2021-06-25T17:53:51Z OWNER  

Like https://docs.datasette.io/en/latest/contributing.html (but simpler) - should cover how to run black and flake8 and mypy and how to run the tests.

sqlite-utils 140912432 issue    
929748885 MDExOlB1bGxSZXF1ZXN0Njc3NTU0OTI5 293 Test against Python 3.10-dev simonw 9599 open 0     3 2021-06-25T01:40:39Z 2021-06-25T17:39:35Z   OWNER simonw/sqlite-utils/pulls/293
sqlite-utils 140912432 pull    
926777310 MDU6SXNzdWU5MjY3NzczMTA= 290 `db.query()` method (renamed `db.execute_returning_dicts()`) simonw 9599 closed 0     6 2021-06-22T03:03:54Z 2021-06-24T23:17:38Z 2021-06-24T22:54:43Z OWNER  

Most of this library deals with lists of Python dictionaries - .insert_all(), .rows, .rows_where(), .search().

The db.execute() method is the only thing that returns a sqlite3 cursor.

There is a clumsily named db.execute_returning_dicts(sql) method but it's not currently mentioned in the documentation.

It needs a better name, and needs to be properly documented.

sqlite-utils 140912432 issue    
927766296 MDU6SXNzdWU5Mjc3NjYyOTY= 291 Adopt flake8 simonw 9599 closed 0     2 2021-06-23T01:19:37Z 2021-06-24T17:50:27Z 2021-06-24T17:50:27Z OWNER  
sqlite-utils 140912432 issue    
920884085 MDU6SXNzdWU5MjA4ODQwODU= 1377 Mechanism for plugins to exclude certain paths from CSRF checks simonw 9599 closed 0     3 2021-06-15T00:48:20Z 2021-06-23T22:51:33Z 2021-06-23T22:51:33Z OWNER  

I need this for a plugin I'm building that offers a POST API.

datasette 107914493 issue    
925677191 MDU6SXNzdWU5MjU2NzcxOTE= 289 Mypy fixes for rows_from_file() adamchainz 857609 closed 0     3 2021-06-20T20:34:59Z 2021-06-22T18:44:36Z 2021-06-22T18:13:26Z NONE  

Following https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927

You had two mypy errors.

The first:

sqlite_utils/utils.py:157: error: Argument 1 to "BufferedReader" has incompatible type "BinaryIO"; expected "RawIOBase"

Looking at the BufferedReader docs, it seems to expect a RawIOBase, and this has been copied into typeshed. There may be scope to change how BufferedReader is documented and typed upstream, but for now it wouldn't be too bad to use a typing.cast():

# Detect the format, then call this recursively
buffered = io.BufferedReader(
    cast(io.RawIOBase, fp),  # Undocumented BufferedReader support for BinaryIO

The second error seems to be flagging a legitimate bug in your code:

sqlite_utils/utils.py:163: error: Argument 1 to "decode" of "bytes" has incompatible type "Optional[str]"; expected "str"

From your type hints, encoding may be None. In the CSV format block, you use encoding or "utf-8-sig" to set a default, maybe that's desirable in this case too?

sqlite-utils 140912432 issue    
915421499 MDU6SXNzdWU5MTU0MjE0OTk= 267 row.update() or row.pk Gravitar64 12721157 open 0     4 2021-06-08T19:56:00Z 2021-06-22T17:27:27Z   NONE  


fantastic framework for working with Sqlite3 databases!!!

I tried to update spezific rows in a table and used

for row in db[tablename]:
newValue = row["counter"] * row["prize"]
row.update({"Fieldname": newValue})

This updates the value in the printet row, but not in the database. So I switched to

db[tablename].update(id, {"Filedname": newValue})

This works fine. But row.update would be nicer, because no need for the id (its that row), no need for the tablename and the db (all defined in the for row ... loop).


sqlite-utils 140912432 issue    
927385540 MDU6SXNzdWU5MjczODU1NDA= 8 any guidance / experience on imessage-to-sqlite ? Casyfill 2675621 open 0     0 2021-06-22T15:46:16Z 2021-06-22T15:46:16Z   NONE  
dogsheep.github.io 214746582 issue    
913017577 MDU6SXNzdWU5MTMwMTc1Nzc= 1365 pathlib.Path breaks internal schema eyeseast 25778 closed 0     1 2021-06-07T01:40:37Z 2021-06-21T15:57:39Z 2021-06-21T15:57:39Z CONTRIBUTOR  

Ran into an issue while trying to build a plugin to render GeoJSON. I'm using pytest's tmp_path fixture, which is a pathlib.Path, to get a temporary database path. I was getting a weird error involving writes, but I was doing reads. Turns out it's the internal database trying to insert a Path where it wants a string.

My test looked like this:

async def test_render_feature_collection(tmp_path):
    database = tmp_path / "test.db"
    datasette = Datasette([database])

    # this will break with a path
    await datasette.refresh_schemas()

    # build a url
    url = datasette.urls.table(database.stem, TABLE_NAME, format="geojson")

    response = await datasette.client.get(url)
    fc = response.json()

    assert 200 == response.status_code

I only ran into this while running tests, because passing in database paths from the CLI uses strings, but it's a weird error and probably something other people have run into.

The fix is easy enough: Convert the path to a string and everything works. So this:

async def test_render_feature_collection(tmp_path):
    database = tmp_path / "test.db"
    datasette = Datasette([str(database)])

    # this is fine now
    await datasette.refresh_schemas()

This could (probably, haven't tested) be fixed here by calling str(db.path) or by doing that conversion earlier.

datasette 107914493 issue    
914130834 MDExOlB1bGxSZXF1ZXN0NjY0MDcyMDQ2 1370 Ensure db.path is a string before trying to insert into internal database eyeseast 25778 closed 0     2 2021-06-08T01:16:48Z 2021-06-21T15:57:39Z 2021-06-21T15:57:39Z CONTRIBUTOR simonw/datasette/pulls/1370

Fixes #1365

This is the simplest possible fix, with a test that will fail without it. There are a bunch of places where db.path is getting converted to and from a Path type, so this fix errs on the side of calling str(db.path) right before it's inserted.

datasette 107914493 pull    
923697888 MDU6SXNzdWU5MjM2OTc4ODg= 278 Support db as first parameter before subcommand, or as environment variable mcint 601708 closed 0     3 2021-06-17T09:26:29Z 2021-06-20T22:39:57Z 2021-06-18T15:43:19Z CONTRIBUTOR  
sqlite-utils 140912432 issue    
925487946 MDU6SXNzdWU5MjU0ODc5NDY= 286 Add installation instructions simonw 9599 closed 0     1 2021-06-19T23:55:36Z 2021-06-20T18:47:13Z 2021-06-20T18:47:13Z OWNER  

pip install sqlite-utils, pipx install sqlite-utils and brew install sqlite-utils

sqlite-utils 140912432 issue    
925544070 MDU6SXNzdWU5MjU1NDQwNzA= 287 Update rowid examples in the docs simonw 9599 closed 0     0 2021-06-20T08:03:00Z 2021-06-20T18:26:21Z 2021-06-20T18:26:21Z OWNER  

Changed in #284 - a couple of examples need updating on https://github.com/simonw/sqlite-utils/blob/3.10/docs/cli.rst.

sqlite-utils 140912432 issue    
925545468 MDU6SXNzdWU5MjU1NDU0Njg= 288 sqlite-utils memory blah.json --schema simonw 9599 closed 0     0 2021-06-20T08:10:40Z 2021-06-20T18:26:21Z 2021-06-20T18:26:21Z OWNER  

Like --dump but only outputs the schema - useful for understanding what you are about to run queries against.

sqlite-utils 140912432 issue    
925491857 MDU6SXNzdWU5MjU0OTE4NTc= 1383 Improve test coverage for `inspect.py` simonw 9599 open 0     0 2021-06-20T00:22:43Z 2021-06-20T00:22:49Z   OWNER  

https://codecov.io/gh/simonw/datasette/src/main/datasette/inspect.py shows only 36% coverage for that module at the moment.

datasette 107914493 issue    
925406964 MDU6SXNzdWU5MjU0MDY5NjQ= 1382 Datasette with Glitch - is it possible to use CSV with ISO-8859-1 encoding? reichaves 23701514 closed 0     1 2021-06-19T14:37:20Z 2021-06-20T00:21:02Z 2021-06-20T00:20:06Z NONE  

Please, I used Remix on Glitch to create a project on Glitch and uploaded a CSV
But it's a CSV with ISO-8859-1 encoding (https://en.wikipedia.org/wiki/ISO/IEC_8859-1)
Is it possible for me to change the encoding to correctly visualize the data?
Example: https://emphasized-carpal-pillow.glitch.me/data/Emendas

datasette 107914493 issue    
923910375 MDExOlB1bGxSZXF1ZXN0NjcyNjIwMTgw 1378 Update pytest-xdist requirement from <2.3,>=2.2.1 to >=2.2.1,<2.4 dependabot[bot] 49699333 closed 0     1 2021-06-17T13:11:56Z 2021-06-20T00:17:07Z 2021-06-20T00:17:06Z CONTRIBUTOR simonw/datasette/pulls/1378

Updates the requirements on pytest-xdist to permit the latest version.


Sourced from pytest-xdist's changelog.

pytest-xdist 2.3.0 (2021-06-16)

Deprecations and Removals


Bug Fixes

Trivial Changes

pytest-xdist 2.2.1 (2021-02-09)

Bug Fixes

pytest-xdist 2.2.0 (2020-12-14)


... (truncated)


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
datasette 107914493 pull    
921878733 MDU6SXNzdWU5MjE4Nzg3MzM= 272 Idea: import CSV to memory, run SQL, export in a single command simonw 9599 closed 0     22 2021-06-15T23:02:48Z 2021-06-19T23:36:48Z 2021-06-18T15:05:03Z OWNER  

I quite often load a CSV file into a SQLite DB, then do stuff with it (like export results back out again as a new CSV) without any intention of keeping the CSV file around afterwards.

What if sqlite-utils could do this for me? Something like this:

sqlite-utils --csv blah.csv --csv baz.csv "select * from blah join baz ..."
sqlite-utils 140912432 issue    
925320167 MDU6SXNzdWU5MjUzMjAxNjc= 284 .transform(types=) turns rowid into a concrete column simonw 9599 closed 0     5 2021-06-19T05:25:27Z 2021-06-19T15:28:30Z 2021-06-19T15:28:30Z OWNER  

Noticed this in the tests for sqlite-utils memory in #282 - is it possible to fix this?


sqlite-utils 140912432 issue    
925410305 MDU6SXNzdWU5MjU0MTAzMDU= 285 Introspection property for telling if a table is a rowid table simonw 9599 closed 0     7 2021-06-19T14:56:16Z 2021-06-19T15:12:33Z 2021-06-19T15:12:33Z OWNER   sqlite-utils 140912432 issue    
925319214 MDU6SXNzdWU5MjUzMTkyMTQ= 283 memory: Shouldn't detect types for JSON simonw 9599 closed 0     1 2021-06-19T05:17:35Z 2021-06-19T14:52:48Z 2021-06-19T14:52:48Z OWNER  


This runs against JSON as well as CSV/TSV - which isn't necessary and In fact throws errors if there is any nested data.

sqlite-utils 140912432 issue    
925384329 MDExOlB1bGxSZXF1ZXN0NjczODcyOTc0 7 Add instagram-to-sqlite gavindsouza 36654812 open 0     0 2021-06-19T12:26:16Z 2021-06-19T12:26:16Z   FIRST_TIME_CONTRIBUTOR dogsheep/dogsheep.github.io/pulls/7

The tool covers only chat imports at the time of opening this PR but I'm planning to import everything else that I feel inquisite about

dogsheep.github.io 214746582 pull    
925305186 MDU6SXNzdWU5MjUzMDUxODY= 282 Automatic type detection for CSV data simonw 9599 closed 0     4 2021-06-19T03:33:21Z 2021-06-19T04:42:03Z 2021-06-19T04:38:00Z OWNER  

I've touched on this before in #179 - but now that I've added sqlite-utils memory this is much more important - because unlike with sqlite-utils insert the in-memory command doesn't give you the opportunity to fix any types you imported from CSV, so queries like select * from stdin where age > 3 are never going to work correctly against these temporary in-memory tables.

Teaching sqlite-utils insert to detect types for columns in a CSV file would be a backwards-compatibility breaking change. Teaching sqlite-utils memory that trick would not be, since it hasn't been included in a release yet.

It's a little inconsistent, but I'm going to have sqlite-utils memory default to detecting types while sqlite-utils insert does not. In each case this can be controlled by a new command-line option:

cat file.csv | sqlite-utils memory - --no-detect-types

To opt-in for sqlite-utils insert:

cat file.csv | sqlite-utils insert blah.db blah - --detect-types

I'll have short options for these too: -n for --no-detect-types and -d for --detect-types.

sqlite-utils 140912432 issue    
709577625 MDU6SXNzdWU3MDk1Nzc2MjU= 179 sqlite-utils transform/insert --detect-types simonw 9599 closed 0     4 2020-09-26T17:28:55Z 2021-06-19T03:36:16Z 2021-06-19T03:36:05Z OWNER  

Idea from https://github.com/simonw/datasette-edit-tables/issues/13 - provide Python utility methods and accompanying CLI options for detecting the likely types of TEXT columns.

So if you have a text column that actually contained exclusively integer string values, it can let you know and let you run transform against it.

sqlite-utils 140912432 issue    
924990677 MDU6SXNzdWU5MjQ5OTA2Nzc= 279 sqlite-utils memory should handle TSV and JSON in addition to CSV simonw 9599 closed 0     7 2021-06-18T15:02:54Z 2021-06-19T03:11:59Z 2021-06-19T03:11:59Z OWNER  
  • Use sniff to detect CSV or TSV (if :tsv or :csv was not specified) and delimiters

Follow-on from #272

sqlite-utils 140912432 issue    
924992318 MDU6SXNzdWU5MjQ5OTIzMTg= 281 Mechanism for explicitly stating CSV or JSON or TSV for sqlite-utils memory simonw 9599 closed 0     1 2021-06-18T15:04:53Z 2021-06-19T03:11:59Z 2021-06-19T03:11:59Z OWNER  

Follows #272

sqlite-utils 140912432 issue    
924991194 MDU6SXNzdWU5MjQ5OTExOTQ= 280 Add --encoding option to sqlite-utils memory simonw 9599 closed 0     0 2021-06-18T15:03:32Z 2021-06-18T15:29:46Z 2021-06-18T15:29:46Z OWNER  

Follow-on from #272 - this will work like --encoding on sqlite-utils insert and will affect all CSV files processed by sqlite-utils memory.

sqlite-utils 140912432 issue    
922099793 MDExOlB1bGxSZXF1ZXN0NjcxMDE0NzUx 273 sqlite-utils memory command for directly querying CSV/JSON data simonw 9599 closed 0     8 2021-06-16T05:04:58Z 2021-06-18T15:01:17Z 2021-06-18T15:00:52Z OWNER simonw/sqlite-utils/pulls/273

Refs #272. Initial implementation only does CSV data, still needs:

  • Implement --save
  • Add --dump to the documentation
  • Add --attach example to the documentation
  • Replace :memory: in documentation
sqlite-utils 140912432 pull    
923602693 MDU6SXNzdWU5MjM2MDI2OTM= 276 support small help flag -h mcint 601708 closed 0     0 2021-06-17T07:59:31Z 2021-06-18T14:56:59Z 2021-06-18T14:56:59Z CONTRIBUTOR  
sqlite-utils 140912432 issue    
923612361 MDExOlB1bGxSZXF1ZXN0NjcyMzU5NjA5 277 add -h support closes #276 mcint 601708 closed 0     2 2021-06-17T08:08:26Z 2021-06-18T14:56:59Z 2021-06-18T14:56:59Z CONTRIBUTOR simonw/sqlite-utils/pulls/277

This appears to be the canonical solution.

sqlite-utils 140912432 pull    
924748955 MDU6SXNzdWU5MjQ3NDg5NTU= 1380 Serve all db files in a folder stratosgear 193463 open 0     0 2021-06-18T10:03:32Z 2021-06-18T10:03:32Z   NONE  

I tried to get the serve command to serve all the .db files in the /mnt folder but is seems that the server does not refresh the list of files.

In more detail:

  • Starting datasette as a docker container with:
docker run -p 8001:8001 -v `pwd`:/mnt \
    datasetteproject/datasette \
    datasette -p 8001 -h /mnt
  • Datasette correctly serves all the *.db files found in the /mnt folder
  • When the server is running, if I copy a new file in the $PWD folder, Datasette does not seem to see the new files, forcing me to restart Docker.

Is there an option/setting that I overlooked, or is this something missing?

BTW, the --reload setting, although at first glance is what you think you need, does not seem to do anything in regards of seeing all *.db files.


datasette 107914493 issue    
268176505 MDU6SXNzdWUyNjgxNzY1MDU= 34 Support CSV export with a .csv extension simonw 9599 closed 0     1 2017-10-24T20:34:43Z 2021-06-17T18:14:48Z 2018-05-28T20:45:34Z OWNER  

Maybe do this using streaming with multiple pagination SQL queries so we can support arbritrarily large exports.

How would this work against a view which doesn’t have an obvious efficient pagination mechanism? Maybe limit views to up to 1000 exported records?

Relates to #5

datasette 107914493 issue    
459882902 MDU6SXNzdWU0NTk4ODI5MDI= 526 Stream all results for arbitrary SQL and canned queries matej-fr 50578294 open 0     5 2019-06-24T13:09:45Z 2021-06-17T18:14:25Z   NONE  

I think that there is a difficulty with canned queries.

When I want to stream all results of a canned query TwoDays I get only first 1.000 records.


returns only first 1.000 records.

If I do the same with the whole database i.e.

I get correctly all records.

Any ideas?

datasette 107914493 issue    
323681589 MDU6SXNzdWUzMjM2ODE1ODk= 266 Export to CSV simonw 9599 closed 0     27 2018-05-16T15:50:24Z 2021-06-17T18:14:24Z 2018-06-18T06:05:25Z OWNER  

Datasette needs to be able to export data to CSV.

datasette 107914493 issue    
333000163 MDU6SXNzdWUzMzMwMDAxNjM= 312 HTML, CSV and JSON views should support ?_col=&_col= simonw 9599 closed 0     1 2018-06-16T16:53:35Z 2021-06-17T18:14:24Z 2018-06-16T17:00:12Z OWNER  

To support whitelisting columns to display.

datasette 107914493 issue    
335141434 MDU6SXNzdWUzMzUxNDE0MzQ= 326 CSV should respect --cors and return cors headers simonw 9599 closed 0     1 2018-06-24T00:44:07Z 2021-06-17T18:14:24Z 2018-06-24T00:59:45Z OWNER  

Otherwise tools like Vega can't load data via CSV.

datasette 107914493 issue    
395236066 MDU6SXNzdWUzOTUyMzYwNjY= 393 CSV export in "Advanced export" pane doesn't respect query ltrgoddard 1727065 closed 0     6 2019-01-02T12:39:41Z 2021-06-17T18:14:24Z 2019-01-03T02:44:10Z NONE  

It looks like there's an inconsistency when exporting to CSV via the the web interface. Say I'm looking at songs released in 1989 in the classic-rock/classic-rock-song-list table from the Five Thirty Eight data. The JSON and CSV export links at the top of the page both give me filtered data using Release+Year__exact=1989 in the URL. In the Advanced export tab, though, the CSV option gives me the whole data set, while the JSON options preserve the query.

It may be that this is intended behaviour related to the streaming CSV stuff discussed here, but if that's the case then I think it should be a little clearer.

datasette 107914493 issue    
725184645 MDU6SXNzdWU3MjUxODQ2NDU= 1034 Better way of representing binary data in .csv output simonw 9599 closed 0   0.51 6026070 19 2020-10-20T04:28:58Z 2021-06-17T18:13:21Z 2020-10-29T22:47:46Z OWNER  

I just noticed this: https://latest.datasette.io/fixtures/binary_data.csv


There's no good way to represent binary data in a CSV file, but this seems like one of the more-bad options.

datasette 107914493 issue    
732674148 MDU6SXNzdWU3MzI2NzQxNDg= 1062 Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows simonw 9599 open 0   Datasette 1.0 3268330 2 2020-10-29T21:25:02Z 2021-06-17T18:13:21Z   OWNER  

This can drive the upgrade of the register_output_renderer hook to be able to handle streaming all rows in a large query.

datasette 107914493 issue    
503190241 MDU6SXNzdWU1MDMxOTAyNDE= 584 Codec error in some CSV exports simonw 9599 closed 0     2 2019-10-07T01:15:34Z 2021-06-17T18:13:20Z 2019-10-18T05:23:16Z OWNER  

Got this exploring my Swarm checkins:


datasette 107914493 issue    
508100844 MDU6SXNzdWU1MDgxMDA4NDQ= 598 Character encoding bug with CSV export JoeGermuska 46313 closed 0     1 2019-10-16T21:09:30Z 2021-06-17T18:13:20Z 2019-10-18T22:52:21Z NONE  

I was just poking around, and at this URL, I encountered this error:

'latin-1' codec can't encode character '\u2019' in position 27: ordinal not in range(256)
datasette 107914493 issue    
516748849 MDU6SXNzdWU1MTY3NDg4NDk= 612 CSV export is broken for tables with null foreign keys simonw 9599 closed 0     2 2019-11-02T22:52:47Z 2021-06-17T18:13:20Z 2019-11-02T23:12:53Z OWNER  

Following on from #406 - this CSV export appears to be broken:



That second row should have 5 values, but it only has 4.

datasette 107914493 issue    
910088936 MDU6SXNzdWU5MTAwODg5MzY= 1355 datasette --get should efficiently handle streaming CSV simonw 9599 open 0     1 2021-06-03T04:40:40Z 2021-06-17T18:12:33Z   OWNER  

It would be great if you could use datasette --get to run queries that return streaming CSV data without running out of RAM.

Current implementation looks like it loads the entire result into memory first: https://github.com/simonw/datasette/blob/f78ebdc04537a6102316d6dbbf6c887565806078/datasette/cli.py#L546-L552

datasette 107914493 issue    
775666296 MDU6SXNzdWU3NzU2NjYyOTY= 1160 "datasette insert" command and plugin hook simonw 9599 open 0     23 2020-12-29T02:37:03Z 2021-06-17T18:12:32Z   OWNER  

Tools for loading data into Datasette currently mostly exist as separate utilities - yaml-to-sqlite and csvs-to-sqlite and suchlike.

Bringing these into Datasette could have some interesting properties:

  • A datasette insert command could be extended with plugins to handle more formats
  • Any format that can be inserted on the command-line could also be inserted using a web UI or web API - which would benefit from new format plugin hooks
  • If Datasette ever grows beyond SQLite (see #670) a built-in import mechanism could work for those other databases as well - without me needing to write yaml-to-postgresql and suchlike
datasette 107914493 issue    
776128269 MDU6SXNzdWU3NzYxMjgyNjk= 1162 First working version of "datasette insert data.db file.csv" simonw 9599 open 0     0 2020-12-29T23:20:11Z 2021-06-17T18:12:32Z   OWNER  

Refs #1160

datasette 107914493 issue    
776128565 MDU6SXNzdWU3NzYxMjg1NjU= 1163 "datasette insert data.db url-to-csv" simonw 9599 open 0     1 2020-12-29T23:21:21Z 2021-06-17T18:12:32Z   OWNER  

Refs #1160 - get filesystem imports working first for #1162, then add import-from-URL.

datasette 107914493 issue    
906385991 MDU6SXNzdWU5MDYzODU5OTE= 1349 CSV ?_stream=on redundantly calculates facets for every page simonw 9599 closed 0     9 2021-05-29T06:11:23Z 2021-06-17T18:12:32Z 2021-06-01T15:52:53Z OWNER  

I'm trying to figure out why a full CSV export from https://covid-19.datasettes.com/covid/ny_times_us_counties runs unbearably slowly.

It's because the streaming endpoint works by scrolling through every page, and it turns out every page calculates facets and suggested facets!

datasette 107914493 issue    
906993731 MDU6SXNzdWU5MDY5OTM3MzE= 1351 Get `?_trace=1` working with CSV and streaming CSVs simonw 9599 closed 0     1 2021-05-31T03:02:15Z 2021-06-17T18:12:32Z 2021-06-01T15:50:09Z OWNER  

I think it's worth getting ?_trace=1 to work with streaming CSV - this would have helped me spot this issue a long time ago.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/1349#issuecomment-851133125_

datasette 107914493 issue    
736365306 MDU6SXNzdWU3MzYzNjUzMDY= 1083 Advanced CSV export for arbitrary queries simonw 9599 open 0     2 2020-11-04T19:23:05Z 2021-06-17T18:12:31Z   OWNER  

There's no link to download the CSV file - the table page has that as an advanced export option, but this is missing from the query page.

datasette 107914493 issue    
743359646 MDU6SXNzdWU3NDMzNTk2NDY= 1096 TSV should be a default export option simonw 9599 open 0     1 2020-11-15T22:24:02Z 2021-06-17T18:12:31Z   OWNER  

Refs #1095

datasette 107914493 issue    
759695780 MDU6SXNzdWU3NTk2OTU3ODA= 1133 Option to omit header row in CSV export simonw 9599 closed 0     2 2020-12-08T18:54:46Z 2021-06-17T18:12:31Z 2020-12-10T23:28:51Z OWNER  

?_header=off - for symmetry with existing option ?_nl=on.

datasette 107914493 issue    
763361458 MDU6SXNzdWU3NjMzNjE0NTg= 1142 "Stream all rows" is not at all obvious simonw 9599 open 0     9 2020-12-12T06:24:57Z 2021-06-17T18:12:31Z   OWNER  

Got a question about how to download all rows - the current option isn't at all clear.


datasette 107914493 issue    
732685643 MDU6SXNzdWU3MzI2ODU2NDM= 1063 .csv should link to .blob downloads simonw 9599 closed 0   0.51 6026070 3 2020-10-29T21:45:58Z 2021-06-17T18:12:30Z 2020-10-29T22:47:45Z OWNER  
  • Update .csv output to link to these things (and get that xfail test to pass)
  • <del>Add a .csv?_blob_base64=1 argument that causes them to be output in base64 in the CSV</del>

Moving the CSV work to a separate ticket.
_Originally posted by @simonw in https://github.com/simonw/datasette/pull/1061#issuecomment-719042601_

datasette 107914493 issue    
924203783 MDU6SXNzdWU5MjQyMDM3ODM= 1379 Idea: ?_end=1 option for streaming CSV responses simonw 9599 open 0     0 2021-06-17T18:11:21Z 2021-06-17T18:11:30Z   OWNER  

As discussed in this thread: https://twitter.com/simonw/status/1405554676993433605 - one of the disadvantages of Datasette's streaming CSV feature is that it's hard to tell if you got the whole file or if the connection ended early - or if an error occurred.

Idea: offer an optional ?_end=1 parameter which, if enabled, adds a single row to the end of the CSV file that looks like this:


For however many columns the CSV file usually has.

datasette 107914493 issue    
923270900 MDExOlB1bGxSZXF1ZXN0NjcyMDUzODEx 65 basic support for events khimaros 231498 open 0     0 2021-06-17T00:51:30Z 2021-06-17T00:51:30Z   FIRST_TIME_CONTRIBUTOR dogsheep/github-to-sqlite/pulls/65

a quick first pass at implementing the feature requested in https://github.com/dogsheep/github-to-sqlite/issues/64

testing instructions:

$ github-to-sqlite events events.db user/khimaros

if the specified user is the authenticated user, it will also include private events.

caveat: pagination appears to be broken (i don't see next in the response JSON from GitHub)

github-to-sqlite 207052882 pull    
922955697 MDU6SXNzdWU5MjI5NTU2OTc= 275 Enable code coverage simonw 9599 closed 0     1 2021-06-16T18:33:49Z 2021-06-17T00:12:12Z 2021-06-17T00:12:12Z OWNER  


Same mechanism as Datasette. Need to copy across the token from that page and add an equivalent of this workflow: https://github.com/simonw/datasette/blob/main/.github/workflows/test-coverage.yml

sqlite-utils 140912432 issue    
922832113 MDU6SXNzdWU5MjI4MzIxMTM= 274 sqlite-utils dump my.db command simonw 9599 closed 0     0 2021-06-16T16:30:14Z 2021-06-16T23:51:54Z 2021-06-16T23:51:54Z OWNER  

Inspired by the --dump mechanism I added to sqlite-utils memory here: https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862018937

Can use .iterdump() to implement this: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.iterdump

Maybe instead (or as-well-as) offer --dump which dumps out the SQL from that.

sqlite-utils 140912432 issue    
919314806 MDU6SXNzdWU5MTkzMTQ4MDY= 270 Cannot set type JSON frafra 4068 closed 0     4 2021-06-11T23:53:22Z 2021-06-16T17:34:49Z 2021-06-16T15:47:06Z NONE  

It would be great if the column type could be set to JSON. That would not be different from handling a regular string. It would be something like repr(value) and it would work with both JSON and CSV inputs, no matter if value is a real list or just a string representing a list.

sqlite-utils 140912432 issue    
675753042 MDU6SXNzdWU2NzU3NTMwNDI= 131 sqlite-utils insert: options for column types simonw 9599 open 0     4 2020-08-09T18:59:11Z 2021-06-16T15:52:33Z   OWNER  

The insert command currently results in string types for every column - at least when used against CSV or TSV inputs.

It would be useful if you could do the following:

  • automatically detects the column types based on eg the first 1000 records
  • explicitly state the rule for specific columns

--detect-types could work for the former - or it could do that by default and allow opt-out using --no-detect-types

For specific columns maybe this:

sqlite-utils insert db.db images images.tsv \
  --tsv \
  -c id int \
  -c score float
sqlite-utils 140912432 issue    
756876238 MDExOlB1bGxSZXF1ZXN0NTMyMzQ4OTE5 1130 Fix footer not sticking to bottom in short pages abdusco 3243482 open 0     4 2020-12-04T07:29:01Z 2021-06-15T13:27:48Z   CONTRIBUTOR simonw/datasette/pulls/1130 datasette 107914493 pull    

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);