issues

230 rows where state = "open" sorted by updated_at descending

View and edit SQL

Suggested facets: milestone, comments, author_association, created_at (date), updated_at (date)

type

state

  • open · 230
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason
621989740 MDU6SXNzdWU2MjE5ODk3NDA= 114 table.transform_table() method for advanced alter table simonw 9599 open 0     10 2020-05-20T18:20:46Z 2020-07-08T22:16:54Z   OWNER  

SQLite's ALTER TABLE can only do the following:

  • Rename a table
  • Rename a column
  • Add a column

Notably, it cannot drop columns - so tricks like "add a float version of this text column, populate it, then drop the old one and rename" won't work.

The docs here https://www.sqlite.org/lang_altertable.html describe a way of implementing full alters safely within a transaction, but it's fiddly.

  1. Create new table
  2. Copy data
  3. Drop old table
  4. Rename new into old

It would be great if sqlite-utils provided an abstraction to help make these kinds of changes safely.

sqlite-utils 140912432 issue  
653529088 MDU6SXNzdWU2NTM1MjkwODg= 891 Consider using enable_callback_tracebacks(True) simonw 9599 open 0     0 2020-07-08T19:07:16Z 2020-07-08T19:07:16Z   OWNER  

From https://docs.python.org/3/library/sqlite3.html#sqlite3.enable_callback_tracebacks

sqlite3.``enable_callback_tracebacks(flag)

By default you will not get any tracebacks in user-defined functions, aggregates, converters, authorizer callbacks etc. If you want to debug them, you can call this function with flag set to True. Afterwards, you will get tracebacks from callbacks on sys.stderr. Use False to disable the feature again.

Maybe turn this on for all of Datasette? Are there any disadvantages to doing that?

datasette 107914493 issue  
652700770 MDU6SXNzdWU2NTI3MDA3NzA= 119 Ability to remove a foreign key simonw 9599 open 0     1 2020-07-07T22:31:37Z 2020-07-08T18:10:18Z   OWNER  

Useful if you add one but make a mistake and need to undo it without recreating the database from scratch.

sqlite-utils 140912432 issue  
652961907 MDU6SXNzdWU2NTI5NjE5MDc= 121 Better documented support for transactions simonw 9599 open 0     2 2020-07-08T04:56:51Z 2020-07-08T18:08:11Z   OWNER  

Originally posted by @simonw in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655283393

We should put some thought into how this library supports and encourages smart use of transactions.

sqlite-utils 140912432 issue  
573755726 MDU6SXNzdWU1NzM3NTU3MjY= 690 Mechanism for plugins to add UI to pages in specific locations simonw 9599 open 0   Datasette 0.46 5607421 5 2020-03-02T06:48:36Z 2020-07-02T17:11:25Z   OWNER  

Now that we have support for plugins that can write I'm seeing all sorts of places where a plugin might need to add UI to the table page.

Some examples:

  • datasette-configure-fts needs to add a "configure search for this table" link
  • a plugin that lets you render or delete tables needs to add a link or button somewhere
  • existing plugins like datasette-vega and datasette-cluster-map already do this with JavaScript

The challenge here is that multiple plugins may want to do this, so simply overriding templates and populating names blocks doesn't entirely work as templates may override each other.

datasette 107914493 issue  
627794879 MDU6SXNzdWU2Mjc3OTQ4Nzk= 782 Redesign default JSON format in preparation for Datasette 1.0 simonw 9599 open 0   Datasette 0.46 5607421 2 2020-05-30T18:47:07Z 2020-07-02T17:11:15Z   OWNER  

The default JSON just isn't right. I find myself using ?_shape=array for almost everything I build against the API.

datasette 107914493 issue  
649907676 MDU6SXNzdWU2NDk5MDc2NzY= 889 asgi_wrapper plugin hook is crashing at startup amjith 49260 open 0     2 2020-07-02T12:53:13Z 2020-07-02T13:22:14Z   CONTRIBUTOR  

Steps to reproduce:

  1. Install datasette-media plugin
    pip install datasette-media
  2. Launch datasette
    datasette databasename.db
  3. Error
INFO:     Started server process [927704]
INFO:     Waiting for application startup.
ERROR:    Exception in 'lifespan' protocol
Traceback (most recent call last):
  File "/home/amjith/.virtualenvs/itsysearch/lib/python3.7/site-packages/uvicorn/lifespan/on.py", line 48, in main
    await app(scope, self.receive, self.send)
  File "/home/amjith/.virtualenvs/itsysearch/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/home/amjith/.virtualenvs/itsysearch/lib/python3.7/site-packages/datasette_media/__init__.py", line 9, in wrapped_app
    path = scope["path"]
KeyError: 'path'
ERROR:    Application startup failed. Exiting.
datasette 107914493 issue  
649702801 MDU6SXNzdWU2NDk3MDI4MDE= 888 URLs in release notes point to 127.0.0.1 abdusco 3243482 open 0     0 2020-07-02T07:28:04Z 2020-07-02T07:28:04Z   CONTRIBUTOR  

Just a quick heads up:

Release notes for 0.45 include urls that point to localhost.

https://github.com/simonw/datasette/releases/tag/0.45

datasette 107914493 issue  
649429772 MDU6SXNzdWU2NDk0Mjk3NzI= 886 Reconsider how _actor_X magic parameter deals with missing values simonw 9599 open 0   Datasette 0.46 5607421 2 2020-07-02T00:00:38Z 2020-07-02T01:52:02Z   OWNER  

I had to build a custom _actorornull prefix for datasette-saved-queries:

def actorornull(key, request):
    if request.actor is None:
        return None
    return request.actor.get(key)


@hookimpl
def register_magic_parameters():
    return [
        ("actorornull", actorornull),
    ]

Maybe the actor magic in Datasette core should do that out of the box?

https://github.com/simonw/datasette/blob/f1f581b7ffcd5d8f3ae6c1c654d813a6641410eb/datasette/default_magic_parameters.py#L14-L17

datasette 107914493 issue  
648749062 MDExOlB1bGxSZXF1ZXN0NDQyNTA1MDg4 883 Skip counting hidden tables abdusco 3243482 open 0     4 2020-07-01T07:38:08Z 2020-07-02T00:25:44Z   CONTRIBUTOR simonw/datasette/pulls/883

Potential fix for https://github.com/simonw/datasette/issues/859.

Disabling table counts for hidden tables speeds up database page quite a bit. In my setup it reduced load time by 2/3 (~300 -> ~90ms)

datasette 107914493 pull  
632724154 MDU6SXNzdWU2MzI3MjQxNTQ= 805 Writable canned queries live demo on Glitch simonw 9599 open 0     11 2020-06-06T20:52:13Z 2020-07-01T22:44:01Z   OWNER  

Needs to run somewhere with a mutable disk drive, so not Cloud Run or Heroku or Vercel.

I think I'll put it on Glitch.

datasette 107914493 issue  
648637666 MDU6SXNzdWU2NDg2Mzc2NjY= 880 POST to /db/canned-query.json should be supported simonw 9599 open 0   Datasette 0.46 5607421 2 2020-07-01T03:14:43Z 2020-07-01T21:06:21Z   OWNER  

Now that CSRF is solved for API requests (#835) it would be good to support API requests to the .json extension.

datasette 107914493 issue  
648421105 MDU6SXNzdWU2NDg0MjExMDU= 877 Consider dropping explicit CSRF protection entirely? simonw 9599 open 0     8 2020-06-30T19:00:55Z 2020-07-01T19:12:16Z   OWNER  

https://scotthelme.co.uk/csrf-is-dead/ from Feb 2017 has background here. The SameSite=lax cookie property effectively eliminates CSRF in modern browsers. https://caniuse.com/#search=SameSite shows 92.13% global support for it.

Datasette already uses SameSite=lax when it sets cookies by default: https://github.com/simonw/datasette/blob/af350ba4571b8e3f9708c40f2ddb48fea7ac1084/datasette/utils/asgi.py#L327-L341

A few options then. I could ditch CSRF protection entirely. I could make it optional - turn it off by default, but let users who care about that remaining 7.87% of global users opt back into it.

One catch: login CSRF: I don't see how SameSite=lax protects against that attack.

datasette 107914493 issue  
648659536 MDU6SXNzdWU2NDg2NTk1MzY= 881 Figure out why restore_working_directory is needed in some places simonw 9599 open 0     0 2020-07-01T04:19:25Z 2020-07-01T04:19:25Z   OWNER  

This is a frustrating workaround. I have a restore_working_directory fixture that I wrote to solve errors that look like this:

/Users/simon/Dropbox/Development/datasette/tests/test_publish_cloudrun.py:148: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112: in __enter__
    return next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <click.testing.CliRunner object at 0x1135ad110>

    @contextlib.contextmanager
    def isolated_filesystem(self):
        """A context manager that creates a temporary folder and changes
        the current working directory to it for isolated filesystem tests.
        """
>       cwd = os.getcwd()
E       FileNotFoundError: [Errno 2] No such file or directory

Here's an example of it in use: removing the restore_working_directory argument from this function causes the failure. https://github.com/simonw/datasette/blob/549b1c2063db48c4622ee5c7b478a1e3cbc1ac07/tests/test_plugins.py#L689-L690

I'd like to not have to do this.

datasette 107914493 issue  
642572841 MDU6SXNzdWU2NDI1NzI4NDE= 859 Database page loads too slowly with many large tables (due to table counts) abdusco 3243482 open 0     17 2020-06-21T14:23:17Z 2020-07-01T03:10:21Z   CONTRIBUTOR  

Hey,
I have a database that I save in HTML from couple of web scrapers. There are around 200k+, 50+ rows in a couple of tables, with sqlite file weighing around 600MB.

The app runs on a VPS with 2 core CPU, 4GB RAM and refreshing database page regularly takes more than 10 seconds. I was suspecting that counting tables was the culprit, but manually running select count(*) from table_name for the largest table finishes under a second.

I've looked at the source code. There's a check for index page for mutable databases larger than 100MB
https://github.com/simonw/datasette/blob/799c5d53570d773203527f19530cf772dc2eeb24/datasette/views/index.py#L15

but this check is not performed for database page.
I've manually crippled Database::table_counts method

async def table_counts(self, limit=10):
    if not self.is_mutable and self.cached_table_counts is not None:
        return self.cached_table_counts
    # Try to get counts for each table, $limit timeout for each count
    counts = {}
    for table in await self.table_names():
        try:
            # table_count = (
            #     await self.execute(
            #         "select count(*) from [{}]".format(table),
            #         custom_time_limit=limit,
            #     )
            # ).rows[0][0]
            counts[table] = 10 # table_count
        # In some cases I saw "SQL Logic Error" here in addition to
        # QueryInterrupted - so we catch that too:
        except (QueryInterrupted, sqlite3.OperationalError, sqlite3.DatabaseError):
            counts[table] = None
    if not self.is_mutable:
        self.cached_table_counts = counts
    return counts

now the page loads in <100ms.

Is it possible to apply size check on database page too?



/-/versions output


{
"python": {
"version": "3.8.0",
"full": "3.8.0 (default, Oct 28 2019, 16:14:01) \n[GCC 8.3.0]"
},
"datasette": {
"version": "0.44"
},
"asgi": "3.0",
"uvicorn": "0.11.5",
"sqlite": {
"version": "3.22.0",
"fts_versions": [
"FTS5",
"FTS4",
"FTS3"
],
"extensions": {
"json1": null
},
"compile_options": [
"COMPILER=gcc-7.4.0",
"ENABLE_COLUMN_METADATA",
"ENABLE_DBSTAT_VTAB",
"ENABLE_FTS3",
"ENABLE_FTS3_PARENTHESIS",
"ENABLE_FTS3_TOKENIZER",
"ENABLE_FTS4",
"ENABLE_FTS5",
"ENABLE_JSON1",
"ENABLE_LOAD_EXTENSION",
"ENABLE_PREUPDATE_HOOK",
"ENABLE_RTREE",
"ENABLE_SESSION",
"ENABLE_STMTVTAB",
"ENABLE_UNLOCK_NOTIFY",
"ENABLE_UPDATE_DELETE_LIMIT",
"HAVE_ISNAN",
"LIKE_DOESNT_MATCH_BLOBS",
"MAX_SCHEMA_RETRY=25",
"MAX_VARIABLE_NUMBER=250000",
"OMIT_LOOKASIDE",
"SECURE_DELETE",
"SOUNDEX",
"TEMP_STORE=1",
"THREADSAFE=1"
]
}
}

datasette 107914493 issue  
646737558 MDU6SXNzdWU2NDY3Mzc1NTg= 870 Refactor default views to use register_routes simonw 9599 open 0   Datasette 1.0 3268330 10 2020-06-27T18:53:12Z 2020-06-30T19:26:35Z   OWNER  

It would be much cleaner if Datasette's default views were all registered using the new register_routes() plugin hook. Could dramatically reduce the code in datasette/app.py.

The ideal fix here would be to rework my BaseView subclass mechanism to work with register_routes() so that those views don't have any special privileges above plugin-provided views.
Originally posted by @simonw in https://github.com/simonw/datasette/issues/864#issuecomment-648580556

datasette 107914493 issue  
648435885 MDU6SXNzdWU2NDg0MzU4ODU= 878 BaseView should be a documented API for plugins to use simonw 9599 open 0   Datasette 1.0 3268330 0 2020-06-30T19:26:13Z 2020-06-30T19:26:26Z   OWNER  

Can be part of #870 - refactoring existing views to use register_routes().

I'm going to put the new check_permissions() method on BaseView as well. If I want that method to be available to plugins I can do so by turning that BaseView class into a documented API that plugins are encouraged to use themselves.
Originally posted by @simonw in https://github.com/simonw/datasette/issues/832#issuecomment-651995453

datasette 107914493 issue  
648245071 MDU6SXNzdWU2NDgyNDUwNzE= 8 Error thrown: table photos has no column named hasSticker harperreed 18504 open 0     0 2020-06-30T14:54:37Z 2020-06-30T14:54:37Z   NONE  

While running swarm-to-sqlite it throws an error:

harper@:~/dogsheep/swarm$ swarm-to-sqlite checkins.db --save=checkins.json
Please provide your Foursquare OAuth token:
Importing 8127 checkins  [#################-------------------]   49%  00:01:52
Traceback (most recent call last):
File "/home/harper/.local/bin/swarm-to-sqlite", line 11, in <module>
    sys.exit(cli())
File "/home/harper/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
File "/home/harper/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
File "/home/harper/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
File "/home/harper/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
File "/home/harper/.local/lib/python3.6/site-packages/swarm_to_sqlite/cli.py", line 73, in cli
    save_checkin(checkin, db)
File "/home/harper/.local/lib/python3.6/site-packages/swarm_to_sqlite/utils.py", line 94, in save_checkin
    photos_table.insert(photo, replace=True)
File "/home/harper/.local/lib/python3.6/site-packages/sqlite_utils/db.py", line 963, in insert
    alter = self.value_or_default("alter", alter)
File "/home/harper/.local/lib/python3.6/site-packages/sqlite_utils/db.py", line 1142, in insert_all
    def upsert_all(
sqlite3.OperationalError: table photos has no column named hasSticker

Where should i dig in?

swarm-to-sqlite 205429375 issue  
646448486 MDExOlB1bGxSZXF1ZXN0NDQwNzM1ODE0 868 initial windows ci setup joshmgrant 702729 open 0     2 2020-06-26T18:49:13Z 2020-06-30T03:51:22Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/868

Picking up the work done on #557 with a new PR. Seeing if I can get this working.

datasette 107914493 pull  
647095487 MDU6SXNzdWU2NDcwOTU0ODc= 873 "datasette -p 0 --root" gives the wrong URL simonw 9599 open 0     12 2020-06-29T04:03:06Z 2020-06-29T15:44:54Z   OWNER  
$ datasette -p 0 --root
http://127.0.0.1:0/-/auth-token?token=2d498c...

The port is incorrect.

datasette 107914493 issue  
639072811 MDU6SXNzdWU2MzkwNzI4MTE= 849 Rename master branch to main simonw 9599 open 0   Datasette 1.0 3268330 6 2020-06-15T19:05:54Z 2020-06-26T02:09:10Z   OWNER  

I was waiting for consensus to form around this (and kind-of hoping for trunk since I like the tree metaphor) and it looks like main is it.

I've seen convincing arguments against trunk too - it indicates that the branch has some special significance like in Subversion (where all branches come from trunk) when it doesn't. So main is better anyway.

datasette 107914493 issue  
644582921 MDU6SXNzdWU2NDQ1ODI5MjE= 865 base_url doesn't seem to work when adding criteria and clicking "apply" tballison 6739646 open 0     2 2020-06-24T12:39:57Z 2020-06-24T18:43:08Z   NONE  

Over on Apache Tika, we're using datasette to allow users to make sense of the metadata for our file regression testing corpus.

This could be user error in how I've set up the reverse proxy!

I started datasette like so:
docker run -d -p 8001:8001 -vpwd:/mnt datasetteproject/datasette datasette -p 8001 -h 0.0.0.0 /mnt/corpora-metadata.db --config sql_time_limit_ms:60000 --config base_url:/datasette/

I then reverse proxied like so:

ProxyPreserveHost On
ProxyPass /datasette http://x.y.z.q:xxxx
ProxyPassReverse /datasette http://x.y.z.q:xxx

Regular sql works perfectly:
https://corpora.tika.apache.org/datasette/corpora-metadata?sql=select+mime_string%2C+count%281%29+as+cnt%0D%0Afrom+profiles+p%0D%0Ajoin+mimes+m+on+p.mime_id%3Dm.mime_id%0D%0Agroup+by+mime_string%0D%0Aorder+by+cnt+desc

However, adding criteria and clicking 'Apply'
https://corpora.tika.apache.org/datasette/corpora-metadata/tika_1_24_1_mimes?_sort=file&mime__exact=text%2Fplain

bounces back to:
https://corpora.tika.apache.org/corpora-metadata/tika_1_24_1_mimes?_sort=file&file__contains=bug&mime__exact=text%2Fplain

datasette 107914493 issue  
642388564 MDU6SXNzdWU2NDIzODg1NjQ= 858 publish heroku does not work on Windows 10 simonlau 870912 open 0     1 2020-06-20T14:40:28Z 2020-06-24T18:42:10Z   NONE  

When executing "datasette publish heroku schools.db" on Windows 10, I get the following error

  File "c:\users\dell\.virtualenvs\sec-schools-jn-cwk8z\lib\site-packages\datasette\publish\heroku.py", line 54, in heroku
    line.split()[0] for line in check_output(["heroku", "plugins"]).splitlines()
  File "c:\python38\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "c:\python38\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "c:\python38\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "c:\python38\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Changing https://github.com/simonw/datasette/blob/55a6ffb93c57680e71a070416baae1129a0243b8/datasette/publish/heroku.py#L54

to

line.split()[0] for line in check_output(["heroku", "plugins"], shell=True).splitlines()

as well as the other check_output() and call() within the same file leads me to another recursive error about temp files

datasette 107914493 issue  
637395097 MDU6SXNzdWU2MzczOTUwOTc= 838 Incorrect URLs when served behind a proxy with base_url set tsibley 79913 open 0     4 2020-06-11T23:58:55Z 2020-06-24T12:51:48Z   NONE  

I'm running datasette serve --config base_url:/foo/ …, proxying to it with this Apache config:

    ProxyPass /foo/ http://localhost:8001/ 
    ProxyPassReverse /foo/ http://localhost:8001/

and then accessing it via https://example.com/foo/.

Although many of the URLs in the pages are correct (presumably because they either use absolute paths which include base_url or relative paths), the faceting and pagination links still use fully-qualified URLs pointing at http://localhost:8001.

I looked into this a little in the source code, and it seems to be an issue anywhere request.url or request.path is used, as these contain the values for the request between the frontend (Apache) and backend (Datasette) server. Those properties are primarily used via the path_with_… family of utility functions and the Datasette.absolute_url method.

datasette 107914493 issue  
644161221 MDU6SXNzdWU2NDQxNjEyMjE= 117 Support for compound (composite) foreign keys simonw 9599 open 0     3 2020-06-23T21:33:42Z 2020-06-23T21:40:31Z   OWNER  

It turns out SQLite supports composite foreign keys: https://www.sqlite.org/foreignkeys.html#fk_composite

Their example looks like this:

CREATE TABLE album(
  albumartist TEXT,
  albumname TEXT,
  albumcover BINARY,
  PRIMARY KEY(albumartist, albumname)
);

CREATE TABLE song(
  songid     INTEGER,
  songartist TEXT,
  songalbum TEXT,
  songname   TEXT,
  FOREIGN KEY(songartist, songalbum) REFERENCES album(albumartist, albumname)
);

Here's what that looks like in sqlite-utils:

In [1]: import sqlite_utils                                                                                                                

In [2]: import sqlite3                                                                                                                     

In [3]: conn = sqlite3.connect(":memory:")                                                                                                 

In [4]: conn                                                                                                                               
Out[4]: <sqlite3.Connection at 0x1087186c0>

In [5]: conn.executescript(""" 
   ...: CREATE TABLE album( 
   ...:   albumartist TEXT, 
   ...:   albumname TEXT, 
   ...:   albumcover BINARY, 
   ...:   PRIMARY KEY(albumartist, albumname) 
   ...: ); 
   ...:  
   ...: CREATE TABLE song( 
   ...:   songid     INTEGER, 
   ...:   songartist TEXT, 
   ...:   songalbum TEXT, 
   ...:   songname   TEXT, 
   ...:   FOREIGN KEY(songartist, songalbum) REFERENCES album(albumartist, albumname) 
   ...: ); 
   ...: """)                                                                                                                               
Out[5]: <sqlite3.Cursor at 0x1088def10>

In [6]: db = sqlite_utils.Database(conn)                                                                                                   

In [7]: db.tables                                                                                                                          
Out[7]: 
[<Table album (albumartist, albumname, albumcover)>,
 <Table song (songid, songartist, songalbum, songname)>]

In [8]: db.tables[0].foreign_keys                                                                                                          
Out[8]: []

In [9]: db.tables[1].foreign_keys                                                                                                          
Out[9]: 
[ForeignKey(table='song', column='songartist', other_table='album', other_column='albumartist'),
 ForeignKey(table='song', column='songalbum', other_table='album', other_column='albumname')]

The table appears to have two separate foreign keys, when actually it has a single compound composite foreign key.

sqlite-utils 140912432 issue  
507454958 MDU6SXNzdWU1MDc0NTQ5NTg= 596 Handle really wide tables better simonw 9599 open 0     4 2019-10-15T20:05:46Z 2020-06-23T03:59:52Z   OWNER  

If a table has hundreds of columns the Datasette UI starts getting unwieldy.

Addressing this would be neat. One option would be to only select the first 30 columns by default and provide a UI for selecting more.

datasette 107914493 issue  
643510821 MDU6SXNzdWU2NDM1MTA4MjE= 862 Set an upper limit on total facet suggestion time for a page simonw 9599 open 0     1 2020-06-23T03:57:55Z 2020-06-23T03:58:48Z   OWNER  

If a table has 100 columns the facet suggestion code will currently run 100 times, taking a max of facet_suggest_time_limit_ms which defaults to 50ms per column:

https://github.com/simonw/datasette/blob/000528192eaf891118932250141dabe7a1561ece/datasette/facets.py#L142-L162

So for 100 columns, that's 100 * 50ms = 5s total time that might be spent attempting to calculate facets on a large table!

I should implement a hard upper limit on the total amount of time taken suggesting facets - probably of around 500ms. If it takes longer than that the remaining columns will not be considered.

datasette 107914493 issue  
642651572 MDU6SXNzdWU2NDI2NTE1NzI= 860 Plugin hook for database/table metadata simonw 9599 open 0     1 2020-06-21T22:20:25Z 2020-06-21T22:25:27Z   OWNER  

I'm not happy with how metadata.(json|yaml) keeps growing new features. Rather than having a single plugin hook for all of metadata.json I'm going to split out the feature that shows actual real metadata for tables and databases - source, license etc - into its own plugin-powered mechanism.

Originally posted by @simonw in https://github.com/simonw/datasette/issues/357#issuecomment-647189045

datasette 107914493 issue  
348043884 MDU6SXNzdWUzNDgwNDM4ODQ= 357 Plugin hook for loading metadata.json simonw 9599 open 0     6 2018-08-06T19:00:01Z 2020-06-21T22:19:58Z   OWNER  

For https://github.com/simonw/russian-ira-facebook-ads-datasette/tree/af6d956995e14afd585c35a6a06bb01da32043ba I wrote a script to convert YAML to JSON because YAML is a better format for embedding multi-line HTML descriptions and canned SQL statements.

Example yaml metadata file: https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/af6d956995e14afd585c35a6a06bb01da32043ba/russian-ads-metadata.yaml

It would be useful if Datasette could be fed a YAML file directly:

datasette -m metadata.yaml

Question is... should this be a native feature (hence adding a YAML dependency) or should it be handled by a datasette-metadata-yaml plugin, using a new plugin hook for loading metadata? If so, what would other use-cases for that plugin hook be?

datasette 107914493 issue  
642297505 MDU6SXNzdWU2NDIyOTc1MDU= 857 Comprehensive documentation for variables made available to templates simonw 9599 open 0   Datasette 1.0 3268330 0 2020-06-20T03:19:43Z 2020-06-20T03:19:44Z   OWNER  

Needed for the Datasette 1.0 release, so template authors can trust that Datasette is unlikely to break their templates.

datasette 107914493 issue  
642296989 MDU6SXNzdWU2NDIyOTY5ODk= 856 Consider pagination of canned queries simonw 9599 open 0     0 2020-06-20T03:15:59Z 2020-06-20T03:15:59Z   OWNER  

The new canned_queries() plugin hook from #852 combined with plugins like https://github.com/simonw/datasette-saved-queries could mean that some installations end up with hundreds or even thousands of canned queries. I should consider pagination or some other way of ensuring that this doesn't cause performance problems for Datasette.

datasette 107914493 issue  
639542974 MDU6SXNzdWU2Mzk1NDI5NzQ= 47 Fall back to FTS4 if FTS5 is not available hpk42 73579 open 0     3 2020-06-16T10:11:23Z 2020-06-17T20:13:48Z   NONE  

got this with version 0.21.1 from pypi. twitter-to-sqlite auth worked but then "twitter-to-sqlite user-timeline USER.db" produced a tracekback ending in "no such module: FTS5".

twitter-to-sqlite 206156866 issue  
639993467 MDU6SXNzdWU2Mzk5OTM0Njc= 850 Proof of concept for Datasette on AWS Lambda with EFS simonw 9599 open 0     25 2020-06-16T21:48:31Z 2020-06-16T23:52:16Z   OWNER  

https://aws.amazon.com/about-aws/whats-new/2020/06/aws-lambda-support-for-amazon-elastic-file-system-now-generally-/

If Datasette can run on Lambda with access to EFS it could both read AND write large databases there.

datasette 107914493 issue  
317001500 MDU6SXNzdWUzMTcwMDE1MDA= 236 datasette publish lambda plugin simonw 9599 open 0     4 2018-04-23T22:10:30Z 2020-06-16T23:50:59Z   OWNER  

Refs #217 - create a publish plugin that can deploy to AWS Lambda.

https://docs.aws.amazon.com/lambda/latest/dg/limits.html says lambda packages can be up to 50 MB, so this would only work with smaller databases (the command can check the filesize before attempting to package and deploy it).

Lambdas do get a 512 MB /tmp directory too, so for larger databases the function could start and then download up to 512MB from an S3 bucket - so the plugin could take an optional S3 bucket to write to and know how to upload the .db file there and then have the lambda download it on startup.

datasette 107914493 issue  
638375985 MDExOlB1bGxSZXF1ZXN0NDM0MTYyMzE2 29 Fixed bug in SQL query for photo scores RhetTbull 41546558 open 0     0 2020-06-14T15:39:22Z 2020-06-14T15:39:22Z   FIRST_TIME_CONTRIBUTOR dogsheep/dogsheep-photos/pulls/29

The join on ZCOMPUTEDASSETATTRIBUTES used the wrong columns. In most of the Photos database tables, table.ZASSET joins with ZGENERICASSET.Z_PK

dogsheep-photos 256834907 pull  
574021194 MDU6SXNzdWU1NzQwMjExOTQ= 691 --reload sould reload server if code in --plugins-dir changes simonw 9599 open 0     1 2020-03-02T14:42:21Z 2020-06-14T02:35:17Z   OWNER   datasette 107914493 issue  
638238548 MDU6SXNzdWU2MzgyMzg1NDg= 845 Code coverage should ignore files in .coveragerc simonw 9599 open 0     0 2020-06-13T21:45:42Z 2020-06-13T21:46:03Z   OWNER  

I'm not sure why this is, but the code coverage I have running in a GitHub Action doesn't take my .coveragerc file into account. It should:

https://github.com/simonw/datasette/blob/cf7a2bdb404734910ec07abc7571351a2d934828/.github/workflows/test-coverage.yml#L31-L35

Here's the bit that's ignored:

https://github.com/simonw/datasette/blob/cf7a2bdb404734910ec07abc7571351a2d934828/.coveragerc#L1-L2

As a result my coverage score is 84%, when it should be 92%:

2020-06-13T21:41:18.4404252Z ----------- coverage: platform linux, python 3.8.3-final-0 -----------
2020-06-13T21:41:18.4404570Z Name                                 Stmts   Miss  Cover
2020-06-13T21:41:18.4404971Z --------------------------------------------------------
2020-06-13T21:41:18.4405227Z datasette/__init__.py                    3      0   100%
2020-06-13T21:41:18.4405441Z datasette/__main__.py                    3      3     0%
2020-06-13T21:41:18.4405668Z datasette/_version.py                  279    279     0%
2020-06-13T21:41:18.4405921Z datasette/actor_auth_cookie.py          20      0   100%
2020-06-13T21:41:18.4406135Z datasette/app.py                       499     27    95%
2020-06-13T21:41:18.4406343Z datasette/cli.py                       162     45    72%
2020-06-13T21:41:18.4406553Z datasette/database.py                  236     17    93%
2020-06-13T21:41:18.4406761Z datasette/default_permissions.py        40      0   100%
2020-06-13T21:41:18.4406975Z datasette/facets.py                    210     24    89%
2020-06-13T21:41:18.4407186Z datasette/filters.py                   122      7    94%
2020-06-13T21:41:18.4407394Z datasette/hookspecs.py                  34      0   100%
2020-06-13T21:41:18.4407600Z datasette/inspect.py                    36     23    36%
2020-06-13T21:41:18.4407807Z datasette/plugins.py                    34      6    82%
2020-06-13T21:41:18.4408014Z datasette/publish/__init__.py            0      0   100%
2020-06-13T21:41:18.4408240Z datasette/publish/cloudrun.py           57      2    96%
2020-06-13T21:41:18.4408786Z datasette/publish/common.py             19      1    95%
2020-06-13T21:41:18.4409029Z datasette/publish/heroku.py             97     13    87%
2020-06-13T21:41:18.4409243Z datasette/renderer.py                   63      4    94%
2020-06-13T21:41:18.4409450Z datasette/sql_functions.py               5      0   100%
2020-06-13T21:41:18.4410480Z datasette/tracer.py                     87     16    82%
2020-06-13T21:41:18.4410972Z datasette/utils/__init__.py            504     31    94%
2020-06-13T21:41:18.4411755Z datasette/utils/asgi.py                264     24    91%
2020-06-13T21:41:18.4412173Z datasette/utils/shutil_backport.py      44     44     0%
2020-06-13T21:41:18.4412822Z datasette/version.py                     4      0   100%
2020-06-13T21:41:18.4413562Z datasette/views/__init__.py              0      0   100%
2020-06-13T21:41:18.4414276Z datasette/views/base.py                288     19    93%
2020-06-13T21:41:18.4414579Z datasette/views/database.py            120      2    98%
2020-06-13T21:41:18.4414860Z datasette/views/index.py                57      2    96%
2020-06-13T21:41:18.4415379Z datasette/views/special.py              72     16    78%
2020-06-13T21:41:18.4418994Z datasette/views/table.py               418     18    96%
2020-06-13T21:41:18.4428811Z --------------------------------------------------------
2020-06-13T21:41:18.4430394Z TOTAL                                 3777    623    84%
datasette 107914493 issue  
636511683 MDU6SXNzdWU2MzY1MTE2ODM= 830 Redesign register_facet_classes plugin hook simonw 9599 open 0   Datasette 1.0 3268330 0 2020-06-10T20:03:27Z 2020-06-10T20:03:27Z   OWNER  

Nothing uses this plugin hook yet, so the design is not yet proven.

I'm going to build a real plugin against it and use that process to inform any design changes that may need to be made.

I'll add a warning about this to the documentation.

datasette 107914493 issue  
634651079 MDU6SXNzdWU2MzQ2NTEwNzk= 814 Remove --debug option from datasette serve simonw 9599 open 0   Datasette 1.0 3268330 1 2020-06-08T14:10:14Z 2020-06-08T22:42:17Z   OWNER  

It doesn't appear to do anything useful at all:

https://github.com/simonw/datasette/blob/f786033a5f0098371cb1df1ce83959b27c588115/datasette/cli.py#L251-L253

https://github.com/simonw/datasette/blob/f786033a5f0098371cb1df1ce83959b27c588115/datasette/cli.py#L365-L367

datasette 107914493 issue  
449886319 MDU6SXNzdWU0NDk4ODYzMTk= 493 Rename metadata.json to config.json simonw 9599 open 0   Datasette 1.0 3268330 3 2019-05-29T15:48:03Z 2020-06-08T22:40:01Z   OWNER  

It is increasingly being useful configuration options, when it started out as purely metadata.

Could cause confusion with the --config mechanism though - maybe that should be called "settings" instead?

datasette 107914493 issue  
634663505 MDU6SXNzdWU2MzQ2NjM1MDU= 815 Group permission checks by request on /-/permissions debug page simonw 9599 open 0   Datasette 1.0 3268330 6 2020-06-08T14:25:23Z 2020-06-08T14:42:56Z   OWNER  

Now that we're making a LOT more permission checks (on the DB index page we do a check for every listed table for example) the /-/permissions page gets filled up pretty quickly.

Can make this more readable by grouping permission checks by request. Have most recent request at the top of the page but the permission requests within that page sorted chronologically by most recent last.

datasette 107914493 issue  
628572716 MDU6SXNzdWU2Mjg1NzI3MTY= 791 Tutorial: building a something-interesting with writable canned queries simonw 9599 open 0   Datasette 1.0 3268330 2 2020-06-01T16:32:05Z 2020-06-06T20:51:07Z   OWNER  

Initial idea: TODO list, as a tutorial for #698 writable canned queries.

datasette 107914493 issue  
610829227 MDU6SXNzdWU2MTA4MjkyMjc= 749 Respect Cloud Run max response size of 32MB simonw 9599 open 0   Datasette 1.0 3268330 1 2020-05-01T16:06:46Z 2020-06-06T20:01:54Z   OWNER  

https://cloud.google.com/run/quotas lists the maximum response size as 32MB.

I spotted a bug where attempting to download a database file larger than that from a Cloud Run deployment (in this case it was https://github-to-sqlite.dogsheep.net/github.db after I accidentally increased the size of that database) returned a 500 error because of this.

datasette 107914493 issue  
449854604 MDU6SXNzdWU0NDk4NTQ2MDQ= 492 Facets not correctly persisted in hidden form fields simonw 9599 open 0   Datasette 1.0 3268330 3 2019-05-29T14:49:39Z 2020-06-06T20:01:53Z   OWNER  

Steps to reproduce: visit https://2a4b892.datasette.io/fixtures/roadside_attractions?_facet_m2m=attraction_characteristic and click "Apply"

Result is a 500: no such column: attraction_characteristic

The error occurs because of this hidden HTML input:

<input type="hidden" name="_facet" value="attraction_characteristic">

This should be:

<input type="hidden" name="_facet_m2m" value="attraction_characteristic">
datasette 107914493 issue  
450032134 MDU6SXNzdWU0NTAwMzIxMzQ= 495 facet_m2m gets confused by multiple relationships simonw 9599 open 0   Datasette 1.0 3268330 2 2019-05-29T21:37:28Z 2020-06-06T20:01:53Z   OWNER  

I got this for a database I was playing with:

I think this is because of these three tables:

datasette 107914493 issue  
463492815 MDU6SXNzdWU0NjM0OTI4MTU= 534 500 error on m2m facet detection simonw 9599 open 0   Datasette 1.0 3268330 1 2019-07-03T00:42:42Z 2020-06-06T20:01:53Z   OWNER  

This may help debug:

diff --git a/datasette/facets.py b/datasette/facets.py
index 76d73e5..07a4034 100644
--- a/datasette/facets.py
+++ b/datasette/facets.py
@@ -499,11 +499,14 @@ class ManyToManyFacet(Facet):
                 "outgoing"
             ]
             if len(other_table_outgoing_foreign_keys) == 2:
-                destination_table = [
-                    t
-                    for t in other_table_outgoing_foreign_keys
-                    if t["other_table"] != self.table
-                ][0]["other_table"]
+                try:
+                    destination_table = [
+                        t
+                        for t in other_table_outgoing_foreign_keys
+                        if t["other_table"] != self.table
+                    ][0]["other_table"]
+                except IndexError:
+                    import pdb; pdb.pm()
                 # Only suggest if it's not selected already
                 if ("_facet_m2m", destination_table) in args:
                     continue
datasette 107914493 issue  
520740741 MDU6SXNzdWU1MjA3NDA3NDE= 625 If you apply ?_facet_array=tags then &_facet=tags does nothing simonw 9599 open 0   Datasette 1.0 3268330 0 2019-11-11T04:59:29Z 2020-06-06T20:01:53Z   OWNER  

Start here: https://v0-30-2.datasette.io/fixtures/facetable?_facet_array=tags

Note that tags is offered as a suggested facet. But if you click that you get this:

https://v0-30-2.datasette.io/fixtures/facetable?_facet_array=tags&_facet=tags

The _facet=tags is added to the URL and it's removed from the list of suggested tags... but the facet itself is not displayed:

The _facet=tags facet should look like this:

datasette 107914493 issue  
542553350 MDU6SXNzdWU1NDI1NTMzNTA= 655 Copy and paste doesn't work reliably on iPhone for SQL editor simonw 9599 open 0   Datasette 1.0 3268330 2 2019-12-26T13:15:10Z 2020-06-06T20:01:53Z   OWNER  

I'm having a lot of trouble copying and pasting from the codemirror editor on my iPhone.

datasette 107914493 issue  
576722115 MDU6SXNzdWU1NzY3MjIxMTU= 696 Single failing unit test when run inside the Docker image simonw 9599 open 0   Datasette 1.0 3268330 1 2020-03-06T06:16:36Z 2020-06-06T20:01:53Z   OWNER  
docker run -it -v `pwd`:/mnt datasetteproject/datasette:latest /bin/bash
root@0e1928cfdf79:/# cd /mnt
root@0e1928cfdf79:/mnt# pip install -e .[test]
root@0e1928cfdf79:/mnt# pytest

I get one failure!

It was for test_searchable[/fixtures/searchable.json?_search=te*+AND+do*&_searchmode=raw-expected_rows3]

    def test_searchable(app_client, path, expected_rows):
        response = app_client.get(path)
>       assert expected_rows == response.json["rows"]
E       AssertionError: assert [[1, 'barry c...sel', 'puma']] == []
E         Left contains 2 more items, first extra item: [1, 'barry cat', 'terry dog', 'panther']
E         Full diff:
E         + []
E         - [[1, 'barry cat', 'terry dog', 'panther'],
E         -  [2, 'terry dog', 'sara weasel', 'puma']]

Originally posted by @simonw in https://github.com/simonw/datasette/issues/695#issuecomment-595614469

datasette 107914493 issue  
398011658 MDU6SXNzdWUzOTgwMTE2NTg= 398 Ensure downloading a 100+MB SQLite database file works simonw 9599 open 0   Datasette 1.0 3268330 2 2019-01-10T20:57:52Z 2020-06-06T20:01:52Z   OWNER  

I've seen attempted downloads of large files fail after about ten seconds.

datasette 107914493 issue  
440222719 MDU6SXNzdWU0NDAyMjI3MTk= 448 _facet_array should work against views simonw 9599 open 0   Datasette 1.0 3268330 1 2019-05-03T21:08:04Z 2020-06-06T20:01:52Z   OWNER  

I created this view: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads-8dbda00/ads_with_targets

CREATE VIEW ads_with_targets as select ads.*, json_group_array(targets.name) as target_names from ads
  join ad_targets on ad_targets.ad_id = ads.id
  join targets on ad_targets.target_id = targets.id
  group by ad_targets.ad_id

When I try to apply faceting by array it appears to work at first: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads/ads_with_targets?_facet_array=target_names

But actually it's doing the wrong thing - the SQL for the facets uses rowid, but rowid is not present on views at all! These results are incorrect, and clicking to select a facet will fail to produce any rows: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads/ads_with_targets?_facet_array=target_names&target_names__arraycontains=people_who_match%3Ainterests%3AAfrican-American+Civil+Rights+Movement+%281954%E2%80%9468%29

Here's the SQL it should be using when you select a facet (note that it does not use a rowid):

https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads?sql=select+*+from+ads_with_targets+where+id+in+%28%0D%0A++++++++++++select+ads_with_targets.id+from+ads_with_targets%2C+json_each%28ads_with_targets.target_names%29+j%0D%0A++++++++++++where+j.value+%3D+%3Ap0%0D%0A++++++++%29+limit+101&p0=people_who_match%3Ainterests%3ABlack+%28Color%29

So we need to do something a lot smarter here. I'm not sure what the fix will look like, or even if it's feasible given that views don't have a rowid to hook into so the JSON faceting SQL may have to be completely rewritten.

datasette publish cloudrun \
    russian-ads.db \
    --name json-view-facet-bug-demo \
    --branch master \
    --extra-options "--config sql_time_limit_ms:5000 --config facet_time_limit_ms:5000"
datasette 107914493 issue  
626593402 MDU6SXNzdWU2MjY1OTM0MDI= 780 Internals documentation for datasette.metadata() method simonw 9599 open 0   Datasette 1.0 3268330 2 2020-05-28T15:14:22Z 2020-06-02T22:13:12Z   OWNER  

https://github.com/simonw/datasette/blob/40885ef24e32d91502b6b8bbad1c7376f50f2830/datasette/app.py#L297-L328

datasette 107914493 issue  
497170355 MDU6SXNzdWU0OTcxNzAzNTU= 576 Documented internals API for use in plugins simonw 9599 open 0   Datasette 1.0 3268330 8 2019-09-23T15:28:50Z 2020-06-02T22:13:09Z   OWNER  

Quite a few of the plugin hooks make a datasette”instance of the Datasette class available to the plugins, so that they can look up configuration settings and execute database queries.

This means it should provide a documented, stable API so that plugin authors can rely on it.

datasette 107914493 issue  
440134714 MDU6SXNzdWU0NDAxMzQ3MTQ= 446 Define mechanism for plugins to return structured data simonw 9599 open 0   Datasette 1.0 3268330 6 2019-05-03T17:00:16Z 2020-06-02T22:12:15Z   OWNER  

Several plugin hooks now expect plugins to return data in a specific shape - notably the new output format hook and the custom facet hook.

These use Python dictionaries right now but that's quite error prone: it would be good to have a mechanism that supported a more structured format.

Full list of current hooks is here: https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hooks

datasette 107914493 issue  
629473827 MDU6SXNzdWU2Mjk0NzM4Mjc= 5 Suggesion: Add output example to readme harryvederci 26745575 open 0     0 2020-06-02T19:56:49Z 2020-06-02T19:56:49Z   NONE  

First off, thanks for open sourcing this application! This is a suggestion to increase the amount of people that would make use of it: an example in the readme file would help.

Currently, users have to clone the app, install it, authorize through pocket, run a command, an then find out if this application does what they hope it does.

Another possibility is to add a file example-output.db, containing one (mock) Pocket article.

Keep up the good work!

pocket-to-sqlite 213286752 issue  
628156527 MDU6SXNzdWU2MjgxNTY1Mjc= 789 Mechanism for enabling pluggy tracing simonw 9599 open 0     2 2020-06-01T05:10:14Z 2020-06-01T05:11:03Z   OWNER  

Could be useful for debugging plugins: https://pluggy.readthedocs.io/en/latest/#call-tracing

I tried this out by adding these two lines in plugins.py:

pm = pluggy.PluginManager("datasette")
pm.add_hookspecs(hookspecs)
# Added these:
pm.trace.root.setwriter(print)
pm.enable_tracing()

Output looked something like this:

INFO:     127.0.0.1:52724 - "GET /-/-/static/app.css HTTP/1.1" 404 Not Found
  actor_from_request [hook]
      datasette: <datasette.app.Datasette object at 0x106277ad0>
      request: <datasette.utils.asgi.Request object at 0x106550a50>

  finish actor_from_request --> [] [hook]

  extra_body_script [hook]
      template: show_json.html
      database: None
      table: None
      view_name: json_data
      datasette: <datasette.app.Datasette object at 0x106277ad0>

  finish extra_body_script --> [] [hook]

  extra_template_vars [hook]
      template: show_json.html
      database: None
      table: None
      view_name: json_data
      request: <datasette.utils.asgi.Request object at 0x1065504d0>
      datasette: <datasette.app.Datasette object at 0x106277ad0>

  finish extra_template_vars --> [] [hook]

  extra_css_urls [hook]
      template: show_json.html
      database: None
      table: None
      datasette: <datasette.app.Datasette object at 0x106277ad0>

  finish extra_css_urls --> [] [hook]

  extra_js_urls [hook]
      template: show_json.html
      database: None
      table: None
      datasette: <datasette.app.Datasette object at 0x106277ad0>

  finish extra_js_urls --> [] [hook]

INFO:     127.0.0.1:52724 - "GET /-/actor HTTP/1.1" 200 OK
  actor_from_request [hook]
      datasette: <datasette.app.Datasette object at 0x106277ad0>
      request: <datasette.utils.asgi.Request object at 0x1065500d0>

  finish actor_from_request --> [] [hook]
datasette 107914493 issue  
459590021 MDU6SXNzdWU0NTk1OTAwMjE= 519 Decide what goes into Datasette 1.0 simonw 9599 open 0   Datasette 1.0 3268330 2 2019-06-23T15:47:41Z 2020-05-30T18:55:24Z   OWNER  

Datasette ASGI #272 is a big part of it... but 1.0 will generally be an indicator that Datasette is a stable platform for developers to write plugins and custom templates against. So lots to think about.

datasette 107914493 issue  
326800219 MDU6SXNzdWUzMjY4MDAyMTk= 292 Mechanism for customizing the SQL used to select specific columns in the table view simonw 9599 open 0     14 2018-05-27T09:05:52Z 2020-05-30T18:45:38Z   OWNER  

Some columns don't make a lot of sense in their default representation - binary blobs such as SpatiaLite geometries for example, or lengthy columns that really should be truncated somehow.

We may also find that there are tables where we don't want to show all of the columns - so a mechanism to select a subset of columns would be nice.

I think there are two features here:

  • the ability to request a subset of columns on the table view
  • the ability to override the SQL for a specific column and/or add extra columns - AsGeoJSON(Geometry) for example

Both features should be available via both querystring arguments and in metadata.json

The querystring argument for custom SQL should only work if allow_sql config is turned on.

Refs #276

datasette 107914493 issue  
445850934 MDU6SXNzdWU0NDU4NTA5MzQ= 473 Plugin hook: register_filters simonw 9599 open 0     7 2019-05-19T18:44:33Z 2020-05-30T18:44:55Z   OWNER  

I meant to add this as part of the facets plugin mechanism but didn't quite get to it. This will allow plugins to register extra filters, as seen in datasette/filters.py:

https://github.com/simonw/datasette/blob/260085838887ee343f4d3b177c422e7aef5ade9d/datasette/filters.py#L83-L98

datasette 107914493 issue  
374953006 MDU6SXNzdWUzNzQ5NTMwMDY= 369 Interface should show same JSON shape options for custom SQL queries slygent 416374 open 0   Datasette 1.0 3268330 2 2018-10-29T10:39:15Z 2020-05-30T17:24:06Z   NONE  

At the moment the page returning a custom SQL query shows the JSON and CSV APIs, but not the multiple JSON shapes. However, adding the _shape parameter to the JSON API URL manually still works, so perhaps there should be consistency in the interface by having the same "Advanced Export" box for custom SQL queries.

datasette 107914493 issue  
459397625 MDU6SXNzdWU0NTkzOTc2MjU= 514 Documentation with recommendations on running Datasette in production without using Docker chrismp 7936571 open 0   Datasette 1.0 3268330 26 2019-06-21T22:48:12Z 2020-05-30T17:22:56Z   NONE  

I've got some SQLite databases too big to push to Heroku or the other services with built-in support in datasette.

So instead I moved my datasette code and databases to a remote server on Kimsufi. In the folder containing the SQLite databases I run the following code.

nohup datasette serve -h 0.0.0.0 *.db --cors --port 8000 --metadata metadata.json > output.log 2>&1 &.

When I go to http://my-remote-server.com:8000, the site loads. But I know this is not a good long-term solution to running datasette on this server.

What is the "correct" way to have this site run, preferably on server port 80?

datasette 107914493 issue  
520667773 MDU6SXNzdWU1MjA2Njc3NzM= 620 Mechanism for indicating foreign key relationships in the table and query page URLs simonw 9599 open 0   Datasette 1.0 3268330 5 2019-11-10T22:26:27Z 2020-05-30T17:22:56Z   OWNER  

Datasette currently only inflates foreign keys (into names hyperlinks) if it detects them as foreign key constraints in the underlying database.

It would be useful if you could specify additional "foreign keys" using both metadata.json and the querystring - similar time how you can pass ?_fts_table=x https://datasette.readthedocs.io/en/stable/full_text_search.html#configuring-full-text-search-for-a-table-or-view

datasette 107914493 issue  
520681725 MDU6SXNzdWU1MjA2ODE3MjU= 621 Syntax for ?_through= that works as a form field simonw 9599 open 0   Datasette 1.0 3268330 3 2019-11-11T00:19:03Z 2020-05-30T17:22:56Z   OWNER  

The current syntax for ?_through= uses JSON to avoid any risk of confusion with table or column names that contain special characters.

This means you can't target a form field at it.

We should be able to support both - ?x.y.z=value for tables and columns with "regular" names, falling back to the current JSON syntax for columns or tables that won't work with the key/value syntax.

datasette 107914493 issue  
531502365 MDU6SXNzdWU1MzE1MDIzNjU= 646 Make database level information from metadata.json available in the index.html template lagolucas 18017473 open 0   Datasette 1.0 3268330 3 2019-12-02T19:55:10Z 2020-05-30T17:22:56Z   NONE  

Did a search on the issues here and didn't find anything related to what I want.

I want to have information that is on the database level of the JSON like title, source and source_url, and use it on the index page.

I tried some small tweaks on the python and html files, but failed to get that result.

Is there a way? Thanks!

datasette 107914493 issue  
626582657 MDU6SXNzdWU2MjY1ODI2NTc= 779 Make human_description_en explicitly available to output renderers simonw 9599 open 0     0 2020-05-28T14:59:54Z 2020-05-28T14:59:54Z   OWNER  

datasette-atom uses this:

https://github.com/simonw/datasette-atom/blob/df98a6c43a443224b6cd232f84703ec297ef046b/datasette_atom/init.py#L36-L37

    if data.get("human_description_en"):
        title += ": " + data["human_description_en"]

It's a nice way to generate a useful title for a filtered table.

datasette 107914493 issue  
612382643 MDU6SXNzdWU2MTIzODI2NDM= 758 Question: Access to immutable database-path clausjuhl 2181410 open 0     6 2020-05-05T07:01:18Z 2020-05-28T08:23:27Z   NONE  

Hi Simon

Is there anywhere in the app-context where one can access the hashed urlpath of the database? Currently it's included in the template-context (databases[0]["path") when rendering urls of the database (eg. /db-44b06v9/cases...), but where can I find the hashed url when rendering the index-page? I'm trying to avoid redirects. Thanks!

datasette 107914493 issue  
626211658 MDU6SXNzdWU2MjYyMTE2NTg= 778 Ability to configure keyset pagination for views simonw 9599 open 0     0 2020-05-28T04:48:56Z 2020-05-28T04:48:56Z   OWNER  

Currently views offer pagination, but it uses offset/limit - e.g. https://latest.datasette.io/fixtures/paginated_view?_next=100

This means pagination will perform poorly on deeper pages.

If a view is based on a table that has a primary key it should be possible to configure efficient keyset pagination that works the same way that table pagination works.

This may be as simple as configuring a column that can be treated as a "primary key" for the purpose of pagination using metadata.json - or with a ?_view_pk=colname querystring argument.

datasette 107914493 issue  
624490929 MDU6SXNzdWU2MjQ0OTA5Mjk= 28 Invalid SQL no such table: main.uploads dmd 41439 open 0     0 2020-05-25T21:25:39Z 2020-05-25T21:25:39Z   NONE  

http://127.0.0.1:8001/photos/photos_with_apple_metadata gives "Invalid SQL no such table: main.uploads"

dogsheep-photos 256834907 issue  
621486115 MDU6SXNzdWU2MjE0ODYxMTU= 27 photos_with_apple_metadata view should include labels simonw 9599 open 0     0 2020-05-20T06:06:17Z 2020-05-20T06:06:17Z   MEMBER  

https://dogsheep-photos.dogsheep.net/public/photos_with_apple_metadata?place_city=New+Orleans&_facet=place_city&_facet_array=albums&_facet_array=persons

Here's one way to add that:

        select
          rowid,
          photo,
          (
            select
              json_group_array(
                json_object(
                  'label',
                  normalized_string,
                  'href',
                  '/photos/labelled?_hide_sql=1&label=' || normalized_string
                )
              )
            from
              labels
            where
              labels.uuid = photos_with_apple_metadata.uuid
          ) as labels,
          date,
dogsheep-photos 256834907 issue  
621323348 MDU6SXNzdWU2MjEzMjMzNDg= 24 Configurable URL for images simonw 9599 open 0     1 2020-05-19T22:25:56Z 2020-05-20T06:00:29Z   MEMBER  

This is hard-coded at the moment, which is bad:
https://github.com/dogsheep/photos-to-sqlite/blob/d5d69b9019703c47bc251444838578dd752801e2/photos_to_sqlite/cli.py#L269-L272

dogsheep-photos 256834907 issue  
621286870 MDU6SXNzdWU2MjEyODY4NzA= 113 Syntactic sugar for ATTACH DATABASE simonw 9599 open 0     1 2020-05-19T21:10:00Z 2020-05-19T21:11:22Z   OWNER  

https://www.sqlite.org/lang_attach.html

Maybe something like this:

db.attach("other_db", "other_db.db")
sqlite-utils 140912432 issue  
617323873 MDU6SXNzdWU2MTczMjM4NzM= 766 Enable wildcard-searches by default clausjuhl 2181410 open 0     0 2020-05-13T10:14:48Z 2020-05-15T10:12:25Z   NONE  

Hi Simon.

It seems that datasette currently has wildcard-searches disabled by default (along with the boolean search-options, NEAR-queries and more, and despite the docs). If I try out the search-url provided in the docs (https://fara.datasettes.com/fara/FARA_All_ShortForms?_search=manafort), it does not handle wildcard-searches, and I'm unable to make it work on my datasette-instance.

I would argue that wildcard-searches is such a standard query, that it should be enabled by default. Requiring "_searchmode=raw" when using prefix-searches seems unnecessary. Plus: What happens to non-ascii searches when using "_searchmode=raw"? Is the "escape_fts"-function from datasette.utils ignored?

Thanks!

/Claus

datasette 107914493 issue  
615626118 MDU6SXNzdWU2MTU2MjYxMTg= 22 Try out ExifReader simonw 9599 open 0     4 2020-05-11T06:32:13Z 2020-05-14T05:59:53Z   MEMBER  

https://pypi.org/project/ExifReader/

New fork that should be able to handle EXIF in HEIC files.

Forked here: https://github.com/ianare/exif-py/issues/102#issuecomment-626376522

Refs #3

dogsheep-photos 256834907 issue  
616271236 MDU6SXNzdWU2MTYyNzEyMzY= 112 add_foreign_key(...., ignore=True) simonw 9599 open 0     4 2020-05-12T00:24:00Z 2020-05-12T00:27:24Z   OWNER  

When using this library I often find myself wanting to "add this foreign key, but only if it doesn't exist yet". The ignore=True parameter is increasingly being used for this else where in the library (e.g. in create_view()).

sqlite-utils 140912432 issue  
616087149 MDU6SXNzdWU2MTYwODcxNDk= 765 publish heroku should default to currently tagged version simonw 9599 open 0     1 2020-05-11T18:24:06Z 2020-05-11T18:25:43Z   OWNER  

Had a report that deploying to Heroku was using the previously installed version of Datasette, not the latest.

Could be because of this:

https://github.com/simonw/datasette/blob/af6c6c5d6f929f951c0e63bfd1c82e37a071b50f/datasette/publish/heroku.py#L172-L179

Heroku documentation recommends pinning to specific versions https://devcenter.heroku.com/articles/python-pip

So... we could ensure we default to an install value of ["datasette>=current_tag"].

datasette 107914493 issue  
520655983 MDU6SXNzdWU1MjA2NTU5ODM= 619 "Invalid SQL" page should let you edit the SQL simonw 9599 open 0     1 2019-11-10T20:54:12Z 2020-05-08T20:29:12Z   OWNER  

https://latest.datasette.io/fixtures?sql=select%0D%0A++*%0D%0Afrom%0D%0A++%5Bfoo%5D

Would be useful if this page showed you the invalid SQL you entered so you can edit it and try again.

datasette 107914493 issue  
613777056 MDU6SXNzdWU2MTM3NzcwNTY= 39 issues foreign key to repo isn't working simonw 9599 open 0     0 2020-05-07T05:11:48Z 2020-05-07T05:11:48Z   MEMBER  

https://dogsheep.simonwillison.net/github/issues?_facet=repo

If the foreign key was working those would be repository names.

From the schema at the bottom of the page:

   [repo] TEXT,

That's the wrong type and not a foreign key.

github-to-sqlite 207052882 issue  
613491342 MDU6SXNzdWU2MTM0OTEzNDI= 762 Experiment with PRAGMA hard_heap_limit simonw 9599 open 0     0 2020-05-06T17:33:23Z 2020-05-07T03:08:44Z   OWNER  

This was added in SQLite 2020-01-22 (3.31.0): https://www.sqlite.org/changes.html#version_3_31_0

Add the sqlite3_hard_heap_limit64() interface and the corresponding PRAGMA hard_heap_limit command.

This sounds like it could be a nice extra safety measure.

datasette 107914493 issue  
613422636 MDU6SXNzdWU2MTM0MjI2MzY= 760 Way of seeing full schema for a database simonw 9599 open 0     3 2020-05-06T15:46:08Z 2020-05-06T23:49:06Z   OWNER  

I find myself wanting to quickly figure out all of the BLOB columns in a database.

A /-/schema page showing the full schema (actually since it's per-database probably /dbname/-/schema or /-/schema/dbname) would be really handy.

It would need to be carefully constructed from various queries against sqlite_master - just doing select * from sqlite_master where type='table' isn't quite enough because I also want to show indexes, triggers etc.

datasette 107914493 issue  
612860758 MDU6SXNzdWU2MTI4NjA3NTg= 18 Switch CI solution to GitHub Actions with a macOS runner simonw 9599 open 0     1 2020-05-05T20:03:50Z 2020-05-05T23:49:18Z   MEMBER  

Refs #17.

dogsheep-photos 256834907 issue  
612287234 MDU6SXNzdWU2MTIyODcyMzQ= 16 Import machine-learning detected labels (dog, llama etc) from Apple Photos simonw 9599 open 0     13 2020-05-05T02:45:43Z 2020-05-05T05:38:16Z   MEMBER  

Follow-on from #1. Apple Photos runs some very sophisticated machine learning on-device to figure out if photos are of dogs, llamas and so on. I really want to extract those labels out into my own database.

dogsheep-photos 256834907 issue  
602533300 MDU6SXNzdWU2MDI1MzMzMDA= 1 Import photo metadata from Apple Photos into SQLite simonw 9599 open 0   Apple Photos online and securely browsable 5324096 8 2020-04-18T19:23:26Z 2020-05-04T02:41:40Z   MEMBER  

Faces, albums, locations, that kind of thing.

dogsheep-photos 256834907 issue  
608613033 MDU6SXNzdWU2MDg2MTMwMzM= 745 Extract the hash-URL mechanism out into a plugin simonw 9599 open 0     1 2020-04-28T21:00:38Z 2020-04-29T03:35:25Z   OWNER  

0.28 in May 2019 made this feature not-the-default: https://datasette.readthedocs.io/en/stable/changelog.html#v0-28 - see #418

I've not felt the need to use it myself since. I think I should move it into a plugin.

datasette 107914493 issue  
608512747 MDU6SXNzdWU2MDg1MTI3NDc= 14 Annotate photos using the Google Cloud Vision API simonw 9599 open 0     5 2020-04-28T18:09:03Z 2020-04-28T18:19:06Z   MEMBER  

It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.

dogsheep-photos 256834907 issue  
607888367 MDU6SXNzdWU2MDc4ODgzNjc= 13 Also upload movie files simonw 9599 open 0     2 2020-04-27T22:11:25Z 2020-04-28T00:39:45Z   MEMBER  

The upload command currently only handles static images:

https://github.com/dogsheep/photos-to-sqlite/blob/d939455af00e07866686457ee2fcb9b2d1b7194e/photos_to_sqlite/utils.py#L26-L33

Need to cover movies taken by my phone and DSLR too.

dogsheep-photos 256834907 issue  
607223136 MDU6SXNzdWU2MDcyMjMxMzY= 741 Replace "datasette publish --extra-options" with "--config" simonw 9599 open 0     2 2020-04-27T04:29:04Z 2020-04-27T04:30:24Z   OWNER  

See https://github.com/simonw/datasette-publish-now/issues/9#issuecomment-618155764 - the --extra-options mechanism is in practice just used to set --config options in data that you publish, but that means you end up with pretty messy looking commands:

datasette publish my.db --extra-options="--config default_page_size:50 --config sql_time_limit_ms:3500"

A neater design would be to support --config as an option for datasette publish directly:

datasette publish my.db --config default_page_size:50 --config sql_time_limit_ms:3500
datasette 107914493 issue  
606033104 MDU6SXNzdWU2MDYwMzMxMDQ= 12 If less than 500MB, show size in MB not GB simonw 9599 open 0     1 2020-04-24T04:35:01Z 2020-04-24T04:35:25Z   MEMBER  

Just saw this:

Uploading 0.05 GB
dogsheep-photos 256834907 issue  
285168503 MDU6SXNzdWUyODUxNjg1MDM= 176 Add GraphQL endpoint yozlet 173848 open 0     8 2017-12-29T23:21:01Z 2020-04-21T14:16:24Z   NONE  

Would make it much easier to build React & similar frontends. Maybe with https://github.com/graphql-python/sanic-graphql ?

datasette 107914493 issue  
602619330 MDU6SXNzdWU2MDI2MTkzMzA= 45 Use raise_for_status() everywhere simonw 9599 open 0     1 2020-04-19T04:38:28Z 2020-04-19T04:39:22Z   MEMBER  

I keep seeing errors which I think are caused by authentication or rate limit problems but which appear to be unexpected JSON responses - presumably because they are actually an error message.

Recent example: https://github.com/simonw/jsk-fellows-on-twitter/runs/598892575

Using response.raise_for_status() everywhere will make these errors less confusing.

twitter-to-sqlite 206156866 issue  
602585497 MDU6SXNzdWU2MDI1ODU0OTc= 7 Integrate image content hashing simonw 9599 open 0     1 2020-04-19T00:36:58Z 2020-04-19T00:37:09Z   MEMBER  

To spot duplicate images (where the file content differs such that the sha256 is no longer a match) it would be useful to calculate and store perceptual hashes of some sort.

dogsheep-photos 256834907 issue  
602533481 MDU6SXNzdWU2MDI1MzM0ODE= 3 Import EXIF data into SQLite - lens used, ISO, aperture etc simonw 9599 open 0   Apple Photos online and securely browsable 5324096 0 2020-04-18T19:24:31Z 2020-04-18T19:24:31Z   MEMBER   dogsheep-photos 256834907 issue  
573578548 MDU6SXNzdWU1NzM1Nzg1NDg= 89 Ability to customize columns used by extracts= feature simonw 9599 open 0     2 2020-03-01T16:54:48Z 2020-04-18T00:00:42Z   OWNER  

@simonw any thoughts on allow extracts to specify the lookup column name? If I'm understanding the documentation right, .lookup() allows you to define the "value" column (the documentation uses name), but when you use extracts keyword as part of .insert(), .upsert() etc. the lookup must be done against a column named "value". I have an existing lookup table that I've populated with columns "id" and "name" as opposed to "id" and "value", and seems I can't use extracts=, unless I'm missing something...

Initial thought on how to do this would be to allow the dictionary value to be a tuple of table name column pair... so:

table = db.table("trees", extracts={"species_id": ("Species", "name"})

I haven't dug too much into the existing code yet, but does this make sense? Worth doing?

Originally posted by @chrishas35 in https://github.com/simonw/sqlite-utils/issues/46#issuecomment-592999503

sqlite-utils 140912432 issue  
530491074 MDU6SXNzdWU1MzA0OTEwNzQ= 14 Command for importing events simonw 9599 open 0     3 2019-11-29T21:28:58Z 2020-04-14T19:38:34Z   MEMBER  

Eg from https://api.github.com/users/simonw/events

Docs here: https://developer.github.com/v3/activity/events/#list-events-performed-by-a-user

github-to-sqlite 207052882 issue  
599776345 MDU6SXNzdWU1OTk3NzYzNDU= 24 Feature idea: github-to-sqlite everything ... simonw 9599 open 0     0 2020-04-14T18:34:00Z 2020-04-14T18:34:00Z   MEMBER  

At the moment if you want to pull all your repos, issues, issues comments etc you have to do it with a sequence of separate commands.

Consider adding a everything or all command which fetches everything that the tool knows how to fetch, and is designed to be run on a cron in a way that fetches just new stuff each time.

github-to-sqlite 207052882 issue  
594237015 MDU6SXNzdWU1OTQyMzcwMTU= 718 Plugin idea: datasette-redirects simonw 9599 open 0     0 2020-04-05T03:41:38Z 2020-04-05T03:41:38Z   OWNER  

I just had to write a one-off custom plugin to redirect niche-musems.com to www.niche-museums.com (https://github.com/simonw/museums/issues/21) - it would be great if this kind of thing could be handled by a configurable plugin.

https://github.com/simonw/museums/blob/6b1faf00c463b2228860d4d62d104b11935e01b1/plugins/redirect_www.py

datasette 107914493 issue  
593006814 MDU6SXNzdWU1OTMwMDY4MTQ= 715 Refactor duplicate cell display logic simonw 9599 open 0     0 2020-04-03T00:58:11Z 2020-04-03T00:58:11Z   OWNER  

The logic for rendering cells in table view and in database (or canned query) view is currently very similar:

https://github.com/simonw/datasette/blob/7656fd64d8b6a32ebc34d89c1b8711cc5ea240f7/datasette/views/base.py#L514-L539

Compared with:

https://github.com/simonw/datasette/blob/7656fd64d8b6a32ebc34d89c1b8711cc5ea240f7/datasette/views/table.py#L104-L195

I'll be changing this a bit in #698 but I should still try to clean this up more further in the future.

datasette 107914493 issue  
565064079 MDExOlB1bGxSZXF1ZXN0Mzc1MTgwODMy 672 --dirs option for scanning directories for SQLite databases simonw 9599 open 0     15 2020-02-14T02:25:52Z 2020-03-27T01:03:53Z   OWNER simonw/datasette/pulls/672

Refs #417.

datasette 107914493 pull  
534629631 MDU6SXNzdWU1MzQ2Mjk2MzE= 650 Add a glossary to the documentation simonw 9599 open 0     2 2019-12-09T00:23:45Z 2020-03-22T05:31:13Z   OWNER  

Call it glossary.rst - it can use a definition list something like this:

.. _glossary:

Glossary
========

Term
  A definition of the term.

Another term
  Another definition.
datasette 107914493 issue  
471818939 MDU6SXNzdWU0NzE4MTg5Mzk= 48 Jupyter notebook demo of the library, launchable on Binder simonw 9599 open 0     0 2019-07-23T17:05:05Z 2020-03-21T15:21:46Z   OWNER   sqlite-utils 140912432 issue  
581795570 MDU6SXNzdWU1ODE3OTU1NzA= 93 Support more string values for types in .add_column() simonw 9599 open 0     0 2020-03-15T19:32:49Z 2020-03-16T18:15:42Z   OWNER  

https://sqlite-utils.readthedocs.io/en/2.4.2/python-api.html#adding-columns says:

SQLite types you can specify are "TEXT", "INTEGER", "FLOAT" or "BLOB".

As discovered in #92 this isn't the right list of values. I should expand this to match https://www.sqlite.org/datatype3.html

sqlite-utils 140912432 issue  

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Query took 134.895ms · About: github-to-sqlite