issues

1,035 rows where user = 9599 sorted by updated_at descending

View and edit SQL

Suggested facets: milestone, comments, author_association, created_at (date), updated_at (date), closed_at (date)

type

state

id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app
706768798 MDU6SXNzdWU3MDY3Njg3OTg= 170 Release notes for 2.20 simonw 9599 open 0   2.20 5897911 1 2020-09-23T00:13:22Z 2020-09-23T00:14:53Z   OWNER  

https://github.com/simonw/sqlite-utils/compare/2.19...b8e004

sqlite-utils 140912432 issue    
706098005 MDU6SXNzdWU3MDYwOTgwMDU= 167 Review the foreign key pragma stuff simonw 9599 closed 0   2.20 5897911 1 2020-09-22T05:55:20Z 2020-09-23T00:13:02Z 2020-09-23T00:13:02Z OWNER  

It is not possible to enable or disable foreign key constraints in the middle of a multi-statement transaction (when SQLite is not in autocommit mode). Attempting to do so does not return an error; it simply has no effect.

https://sqlite.org/foreignkeys.html

sqlite-utils 140912432 issue    
706757891 MDU6SXNzdWU3MDY3NTc4OTE= 169 Progress bar for "sqlite-utils extract" simonw 9599 closed 0   2.20 5897911 0 2020-09-22T23:40:21Z 2020-09-23T00:02:40Z 2020-09-23T00:02:40Z OWNER  

Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command.

The operations are full table scans so calculating progress shouldn't be too difficult.
_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831_

sqlite-utils 140912432 issue    
470345929 MDU6SXNzdWU0NzAzNDU5Mjk= 42 table.extract(...) method and "sqlite-utils extract" command simonw 9599 closed 0   2.20 5897911 21 2019-07-19T14:09:36Z 2020-09-22T23:39:31Z 2020-09-22T23:37:49Z OWNER  

One of my favourite features of csvs-to-sqlite is that it can "extract" columns into a separate lookup table - for example:

csvs-to-sqlite big_csv_file.csv -c country output.db

This will turn the country column in the resulting table into a integer foreign key against a new country table. You can see an example of what that looks like here: https://san-francisco.datasettes.com/registered-business-locations-3d50679/Business+Corridor was extracted from https://san-francisco.datasettes.com/registered-business-locations-3d50679/Registered_Business_Locations_-_San_Francisco?Business%20Corridor=1

I'd like to have the same capability in sqlite-utils - but with the ability to run it against an existing SQLite table rather than just against a CSV.

sqlite-utils 140912432 issue    
706486323 MDU6SXNzdWU3MDY0ODYzMjM= 973 'bool' object is not callable error simonw 9599 closed 0     2 2020-09-22T15:30:54Z 2020-09-22T15:40:35Z 2020-09-22T15:40:35Z OWNER  

I'm getting this when latest is deployed to Cloud Run:

Traceback (most recent call last):
  File "/usr/local/bin/datasette", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/datasette/cli.py", line 406, in serve
    inspect_data = json.load(open(inspect_file))
TypeError: 'bool' object is not callable

I think I may have broken things in #970 - a980199e61fe7ccf02c2123849d86172d2ae54ff

datasette 107914493 issue    
681375466 MDU6SXNzdWU2ODEzNzU0NjY= 943 await datasette.client.get(path) mechanism for executing internal requests simonw 9599 open 0     31 2020-08-18T22:17:42Z 2020-09-22T15:00:39Z   OWNER  

datasette-graphql works by making internal requests to the TableView class (in order to take advantage of existing pagination logic, plus options like ?_search= and ?_where=) - see #915

I want to support a mod_rewrite style mechanism for putting nicer URLs on top of Datasette pages - I botched that together for a project here using an internal ASGI proxying trick: https://github.com/natbat/tidepools_near_me/commit/ec102c6da5a5d86f17628740d90b6365b671b5e1

If the datasette object provided a documented method for executing internal requests (in a way that makes sense with logging etc - i.e. doesn't get logged as a separate request) both of these use-cases would be much neater.

datasette 107914493 issue    
706167456 MDU6SXNzdWU3MDYxNjc0NTY= 168 Automate (as much as possible) updates published to Homebrew simonw 9599 open 0     1 2020-09-22T08:08:37Z 2020-09-22T08:11:30Z   OWNER  

I'd like to get new sqlite-utils (and Datasette) releases submitted to Homebrew as painlessly as possible.

sqlite-utils 140912432 issue    
706017416 MDU6SXNzdWU3MDYwMTc0MTY= 164 sqlite-utils transform sub-command simonw 9599 closed 0   2.20 5897911 4 2020-09-22T01:32:20Z 2020-09-22T07:57:50Z 2020-09-22T07:48:05Z OWNER  

The .transform() method in #114 warrants an equivalent CLI tool.

sqlite-utils 140912432 issue    
455486286 MDU6SXNzdWU0NTU0ODYyODY= 26 Mechanism for turning nested JSON into foreign keys / many-to-many simonw 9599 open 0     10 2019-06-13T00:52:06Z 2020-09-22T07:56:07Z   OWNER  

The GitHub JSON APIs have a really interesting convention with respect to related objects.

Consider https://api.github.com/repos/simonw/sqlite-utils/issues - here's a truncated subset:

  {
    "id": 449818897,
    "node_id": "MDU6SXNzdWU0NDk4MTg4OTc=",
    "number": 24,
    "title": "Additional Column Constraints?",
    "user": {
      "login": "IgnoredAmbience",
      "id": 98555,
      "node_id": "MDQ6VXNlcjk4NTU1",
      "avatar_url": "https://avatars0.githubusercontent.com/u/98555?v=4",
      "gravatar_id": ""
    },
    "labels": [
      {
        "id": 993377884,
        "node_id": "MDU6TGFiZWw5OTMzNzc4ODQ=",
        "url": "https://api.github.com/repos/simonw/sqlite-utils/labels/enhancement",
        "name": "enhancement",
        "color": "a2eeef",
        "default": true
      }
    ],
    "state": "open"
  }

The user column lists a complete user. The labels column has a list of labels.

Since both user and label have populated id field this is actually enough information for us to create records for them AND set up the corresponding foreign key (for user) and m2m relationships (for labels).

It would be really neat if sqlite-utils had some kind of mechanism for correctly processing these kind of patterns.

Thanks to jq there's not much need for extra customization of the shape here - if we support a narrowly defined structure users can use jq to reshape arbitrary JSON to match.

sqlite-utils 140912432 issue    
621989740 MDU6SXNzdWU2MjE5ODk3NDA= 114 table.transform() method for advanced alter table simonw 9599 closed 0   2.20 5897911 26 2020-05-20T18:20:46Z 2020-09-22T07:51:37Z 2020-09-22T04:20:02Z OWNER  

SQLite's ALTER TABLE can only do the following:

  • Rename a table
  • Rename a column
  • Add a column

Notably, it cannot drop columns - so tricks like "add a float version of this text column, populate it, then drop the old one and rename" won't work.

The docs here https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_changes describe a way of implementing full alters safely within a transaction, but it's fiddly.

  1. Create new table
  2. Copy data
  3. Drop old table
  4. Rename new into old

It would be great if sqlite-utils provided an abstraction to help make these kinds of changes safely.

sqlite-utils 140912432 issue    
557830332 MDExOlB1bGxSZXF1ZXN0MzY5MzQ4MDg0 78 New conversions= feature, refs #77 simonw 9599 closed 0     0 2020-01-31T00:02:33Z 2020-09-22T07:48:29Z 2020-01-31T00:24:31Z OWNER simonw/sqlite-utils/pulls/78 sqlite-utils 140912432 pull    
705975133 MDExOlB1bGxSZXF1ZXN0NDkwNjA3OTQ5 161 table.transform() method simonw 9599 closed 0   2.20 5897911 13 2020-09-21T23:16:59Z 2020-09-22T07:48:24Z 2020-09-22T04:20:02Z OWNER simonw/sqlite-utils/pulls/161

Refs #114

  • Ability to change the primary key
  • Support for changing default value for columns
  • Support for changing NOT NULL status of columns
  • Support for copying existing foreign keys and removing them
  • <strike>Support for conversions= parameter</strike>
  • Detailed documentation
  • PRAGMA foreign_keys stuff
sqlite-utils 140912432 pull    
706091046 MDU6SXNzdWU3MDYwOTEwNDY= 165 Make .transform() a keyword arguments only function simonw 9599 closed 0   2.20 5897911 0 2020-09-22T05:37:29Z 2020-09-22T06:39:12Z 2020-09-22T06:39:12Z OWNER  

And rename the first argument from columns= to types=

sqlite-utils 140912432 issue    
706092617 MDExOlB1bGxSZXF1ZXN0NDkwNzAzMTcz 166 Keyword only arguments for transform() simonw 9599 closed 0     0 2020-09-22T05:41:44Z 2020-09-22T06:39:11Z 2020-09-22T06:39:11Z OWNER simonw/sqlite-utils/pulls/166

Refs #165

sqlite-utils 140912432 pull    
706001517 MDU6SXNzdWU3MDYwMDE1MTc= 163 Idea: conversions= could take Python functions simonw 9599 open 0     1 2020-09-22T00:37:12Z 2020-09-22T01:33:13Z   OWNER  

Right now you use conversions= like this:

db["example"].insert({
    "name": "The Bigfoot Discovery Museum"
}, conversions={"name": "upper(?)"})

How about if you could optionally provide a Python function (or a lambda) like this?

db["example"].insert({
    "name": "The Bigfoot Discovery Museum"
}, conversions={"name": lambda s: s.upper()})

This would work by creating a random name for that function, registering it (similar to #162), executing the SQL and then un-registering the custom function at the end.

sqlite-utils 140912432 issue    
705995722 MDU6SXNzdWU3MDU5OTU3MjI= 162 A decorator for registering custom SQL functions simonw 9599 closed 0     2 2020-09-22T00:18:32Z 2020-09-22T00:40:44Z 2020-09-22T00:32:17Z OWNER  

Syntactic sugar for db.conn.create_function - it would work something like this:

db = sqlite_utils.Database("mydb.db")

@db.register_function
def scramble(text):
    chars = list(text)
    random.shuffle(chars)
    return "".join(chars)

The decorator would inspect the function to find its name and arity (number of arguments). Having run the above you could then do:

db.execute("select scramble('hello')").fetchall()
sqlite-utils 140912432 issue    
705840673 MDU6SXNzdWU3MDU4NDA2NzM= 972 Support faceting against arbitrary SQL queries simonw 9599 open 0     1 2020-09-21T19:00:43Z 2020-09-21T19:01:25Z   OWNER  

... support for running facets against arbitrary custom SQL queries is half-done in that facets now execute against wrapped subqueries as-of ea66c45df96479ef66a89caa71fff1a97a862646

https://github.com/simonw/datasette/blob/ea66c45df96479ef66a89caa71fff1a97a862646/datasette/facets.py#L192-L200
_Originally posted by @simonw in https://github.com/simonw/datasette/issues/971#issuecomment-696307922_

datasette 107914493 issue    
705827457 MDU6SXNzdWU3MDU4Mjc0NTc= 971 Support the dbstat table simonw 9599 closed 0     7 2020-09-21T18:38:53Z 2020-09-21T19:00:02Z 2020-09-21T18:59:52Z OWNER  

dbstat is a table that is usually available on SQLite giving statistics about the database. For example:

https://fivethirtyeight.datasettes.com/fivethirtyeight?sql=SELECT+*+FROM+%22dbstat%22+WHERE+name%3D%27bachelorette%2Fbachelorette%27%3B

<table> <thead> <tr> <th>name</th> <th>path</th> <th>pageno</th> <th>pagetype</th> <th>ncell</th> <th>payload</th> <th>unused</th> <th>mx_payload</th> <th>pgoffset</th> <th>pgsize</th> </tr> </thead> <tbody> <tr> <td>bachelorette/bachelorette</td> <td>/</td> <td>89</td> <td>internal</td> <td>13</td> <td>0</td> <td>3981</td> <td>0</td> <td>360448</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/000/</td> <td>91</td> <td>leaf</td> <td>66</td> <td>3792</td> <td>32</td> <td>74</td> <td>368640</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/001/</td> <td>92</td> <td>leaf</td> <td>67</td> <td>3800</td> <td>14</td> <td>74</td> <td>372736</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/002/</td> <td>93</td> <td>leaf</td> <td>65</td> <td>3717</td> <td>46</td> <td>70</td> <td>376832</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/003/</td> <td>94</td> <td>leaf</td> <td>68</td> <td>3742</td> <td>6</td> <td>71</td> <td>380928</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/004/</td> <td>95</td> <td>leaf</td> <td>70</td> <td>3696</td> <td>42</td> <td>66</td> <td>385024</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/005/</td> <td>96</td> <td>leaf</td> <td>69</td> <td>3721</td> <td>22</td> <td>71</td> <td>389120</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/006/</td> <td>97</td> <td>leaf</td> <td>70</td> <td>3737</td> <td>1</td> <td>72</td> <td>393216</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/007/</td> <td>98</td> <td>leaf</td> <td>69</td> <td>3728</td> <td>15</td> <td>69</td> <td>397312</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/008/</td> <td>99</td> <td>leaf</td> <td>73</td> <td>3715</td> <td>8</td> <td>64</td> <td>401408</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/009/</td> <td>100</td> <td>leaf</td> <td>73</td> <td>3705</td> <td>18</td> <td>62</td> <td>405504</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/00a/</td> <td>101</td> <td>leaf</td> <td>75</td> <td>3681</td> <td>32</td> <td>62</td> <td>409600</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/00b/</td> <td>102</td> <td>leaf</td> <td>77</td> <td>3694</td> <td>9</td> <td>62</td> <td>413696</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/00c/</td> <td>103</td> <td>leaf</td> <td>74</td> <td>3673</td> <td>45</td> <td>62</td> <td>417792</td> <td>4096</td> </tr> <tr> <td>bachelorette/bachelorette</td> <td>/00d/</td> <td>104</td> <td>leaf</td> <td>5</td> <td>228</td> <td>3835</td> <td>48</td> <td>421888</td> <td>4096</td> </tr> </tbody> </table>

Other than direct select * from dbsat queries it is completely invisible.

It would be cool if https://fivethirtyeight.datasettes.com/fivethirtyeight/dbstat didn't 404 (on databases for which that table was available).

datasette 107914493 issue    
564833696 MDU6SXNzdWU1NjQ4MzM2OTY= 670 Prototoype for Datasette on PostgreSQL simonw 9599 open 0     10 2020-02-13T17:17:55Z 2020-09-21T14:46:10Z   OWNER  

I thought this would never happen, but now that I'm deep in the weeds of running SQLite in production for Datasette Cloud I'm starting to reconsider my policy of only supporting SQLite.

Some of the factors making me think PostgreSQL support could be worth the effort:
- Serverless. I'm getting increasingly excited about writable-database use-cases for Datasette. If it could talk to PostgreSQL then users could easily deploy it on Heroku or other serverless providers that can talk to a managed RDS-style PostgreSQL.
- Existing databases. Plenty of organizations have PostgreSQL databases. They can export to SQLite using db-to-sqlite but that's a pretty big barrier to getting started - being able to run datasette postgresql://connection-string and start trying it out would be a massively better experience.
- Data size. I keep running into use-cases where I want to run Datasette against many GBs of data. SQLite can do this but PostgreSQL is much more optimized for large data, especially given the existence of tools like Citus.
- Marketing. Convincing people to trust their data to SQLite is potentially a big barrier to adoption. Even if I've convinced myself it's trustworthy I still have to convince everyone else.
- It might not be that hard? If this required a ground-up rewrite it wouldn't be worth the effort, but I have a hunch that it may not be too hard - most of the SQL in Datasette should work on both databases since it's almost all portable SELECT statements. If Datasette did DML this would be a lot harder, but it doesn't.
- Plugins! This feels like a natural surface for a plugin - at which point people could add MySQL support and suchlike in the future.

The above reasons feel strong enough to justify a prototype.

datasette 107914493 issue    
705215230 MDU6SXNzdWU3MDUyMTUyMzA= 26 Pagination simonw 9599 open 0     7 2020-09-21T00:14:37Z 2020-09-21T02:55:54Z   MEMBER  

Useful for #16 (timeline view) since you can now filter to just the items on a specific day - but if there are more than 50 items you can't see them all.

dogsheep-beta 197431109 issue    
694493566 MDU6SXNzdWU2OTQ0OTM1NjY= 16 Timeline view simonw 9599 open 0     3 2020-09-06T19:13:58Z 2020-09-21T02:42:29Z   MEMBER  

Ability to browse (and facet) by date.

dogsheep-beta 197431109 issue    
616271236 MDU6SXNzdWU2MTYyNzEyMzY= 112 add_foreign_key(...., ignore=True) simonw 9599 closed 0   2.19 5896742 4 2020-05-12T00:24:00Z 2020-09-20T22:17:34Z 2020-09-20T22:17:34Z OWNER  

When using this library I often find myself wanting to "add this foreign key, but only if it doesn't exist yet". The ignore=True parameter is increasingly being used for this else where in the library (e.g. in create_view()).

sqlite-utils 140912432 issue    
705190723 MDU6SXNzdWU3MDUxOTA3MjM= 160 table.enable_fts(..., replace=True) simonw 9599 closed 0   2.19 5896742 1 2020-09-20T21:36:23Z 2020-09-20T22:05:52Z 2020-09-20T22:05:51Z OWNER  

I noticed that https://til.simonwillison.net/ search doesn't use porter stemming. I'd like to add that, but since the build script always operates on an existing database (to avoid re-rendering markdown and re-building image thumbnails) I'd like it to only add porter stemming if it's not there already.

So I'd like to be able to say "set up FTS to look like this, and fix it if it doesn't".

I think the neatest way to do that is with a replace=True argument to .enable_fts(), for consistency with def .create_view(self, name, sql, replace=True).

So the replace=True argument would check and see if the configured FTS exists already with the correct options (columns, stemming, triggers) - and if any of those are incorrect it would call .disable_fts() and then create a new FTS configuration with the correct options.

sqlite-utils 140912432 issue    
697179806 MDU6SXNzdWU2OTcxNzk4MDY= 157 sqlite-utils add-foreign-keys command simonw 9599 closed 0   2.19 5896742 2 2020-09-09T21:44:30Z 2020-09-20T20:14:30Z 2020-09-20T20:14:30Z OWNER  

Like add-foreign-key but can do multiple foreign keys at once. Inspired by https://github.com/simonw/calands-datasette/blob/99de39dd80a906f5c1f16724467b0cd55ba4ef36/build.sh which does this:

sqlite-utils add-foreign-key calands.db units_with_maps ACCESS_TYP
sqlite-utils add-foreign-key calands.db units_with_maps AGNCY_NAME
sqlite-utils add-foreign-key calands.db units_with_maps AGNCY_LEV
sqlite-utils add-foreign-key calands.db units_with_maps AGNCY_TYP
sqlite-utils add-foreign-key calands.db units_with_maps LAYER
sqlite-utils add-foreign-key calands.db units_with_maps MNG_AGENCY
sqlite-utils add-foreign-key calands.db units_with_maps MNG_AG_LEV
sqlite-utils add-foreign-key calands.db units_with_maps MNG_AG_TYP
sqlite-utils add-foreign-key calands.db units_with_maps COUNTY
sqlite-utils add-foreign-key calands.db units_with_maps DES_TP
sqlite-utils 140912432 issue    
531583658 MDU6SXNzdWU1MzE1ODM2NTg= 68 Add support for porter stemming in FTS simonw 9599 closed 0     1 2019-12-02T22:35:52Z 2020-09-20T04:25:53Z 2020-09-20T04:25:47Z OWNER  

FTS5 can have porter stemming enabled.

sqlite-utils 140912432 issue    
694136490 MDU6SXNzdWU2OTQxMzY0OTA= 15 Add a bunch of config examples simonw 9599 open 0     1 2020-09-05T17:58:43Z 2020-09-18T23:17:39Z   MEMBER  

I can bring these over from my personal Dogsheep.

dogsheep-beta 197431109 issue    
703970713 MDU6SXNzdWU3MDM5NzA3MTM= 23 Sort option should persist between multiple searches simonw 9599 closed 0     0 2020-09-17T23:21:26Z 2020-09-18T22:39:12Z 2020-09-18T22:39:12Z MEMBER  

Following #21

dogsheep-beta 197431109 issue    
703970814 MDU6SXNzdWU3MDM5NzA4MTQ= 24 the JSON object must be str, bytes or bytearray, not 'Undefined' simonw 9599 closed 0     8 2020-09-17T23:21:41Z 2020-09-18T22:33:32Z 2020-09-18T22:33:32Z MEMBER  

Got this on a search results page.

dogsheep-beta 197431109 issue    
704685890 MDU6SXNzdWU3MDQ2ODU4OTA= 25 template_debug mechanism simonw 9599 closed 0     2 2020-09-18T22:11:09Z 2020-09-18T22:12:21Z 2020-09-18T22:12:03Z MEMBER  

I'd prefer it if errors in these template fragments were displayed as errors inline where the fragment should have been inserted, rather than 500ing the whole page - especially since the template fragments are user-provided and could have all kinds of odd errors in them which should be as easy to debug as possible.
_Originally posted by @simonw in https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694554584_

dogsheep-beta 197431109 issue    
703962917 MDU6SXNzdWU3MDM5NjI5MTc= 22 Bug: UI says sorted by relevance in timeline view simonw 9599 closed 0     0 2020-09-17T23:02:07Z 2020-09-17T23:13:14Z 2020-09-17T23:13:14Z MEMBER  

In regular timeline view sort defaults to newest, not relevance - so this UI is incorrect:

https://user-images.githubusercontent.com/9599/93536956-1facf900-f8ff-11ea-889b-bc8356e366df.png">

dogsheep-beta 197431109 issue    
703951918 MDU6SXNzdWU3MDM5NTE5MTg= 21 Option to sort search results by date simonw 9599 closed 0     0 2020-09-17T22:32:39Z 2020-09-17T22:55:35Z 2020-09-17T22:55:35Z MEMBER  

Sometimes I want to sort by date, not by relevance.

dogsheep-beta 197431109 issue    
703218756 MDU6SXNzdWU3MDMyMTg3NTY= 50 Commands for making authenticated API calls simonw 9599 open 0     6 2020-09-17T02:39:07Z 2020-09-17T04:02:39Z   MEMBER  

Similar to twitter-to-sqlite fetch, see https://github.com/dogsheep/twitter-to-sqlite/issues/51

github-to-sqlite 207052882 issue    
703246031 MDU6SXNzdWU3MDMyNDYwMzE= 51 github-to-sqlite get should follow rate limits simonw 9599 open 0     0 2020-09-17T04:01:50Z 2020-09-17T04:01:50Z   MEMBER  

From #50 - right now it will crash with an error of it hits the rate limit. Since the rate limit information (including reset time) is available in the headers it could automatically sleep and try again instead.

github-to-sqlite 207052882 issue    
455852801 MDU6SXNzdWU0NTU4NTI4MDE= 507 Every datasette plugin on the ecosystem page should have a screenshot simonw 9599 open 0     4 2019-06-13T17:02:51Z 2020-09-17T02:47:35Z   OWNER  

https://github.com/simonw/datasette/blob/master/docs/ecosystem.rst

datasette 107914493 issue    
703218448 MDU6SXNzdWU3MDMyMTg0NDg= 51 Documentation for twitter-to-sqlite fetch simonw 9599 open 0     0 2020-09-17T02:38:10Z 2020-09-17T02:38:10Z   MEMBER  

It's mentioned in passing in the README but it deserves its own section:

$ twitter-to-sqlite fetch \
    "https://api.twitter.com/1.1/account/verify_credentials.json" \
    | grep '"id"' | head -n 1
twitter-to-sqlite 206156866 issue    
703216044 MDU6SXNzdWU3MDMyMTYwNDQ= 49 Feature: gists and starred gists simonw 9599 open 0     0 2020-09-17T02:30:52Z 2020-09-17T02:30:52Z   MEMBER  

https://developer.github.com/v3/gists/#list-starred-gists

github-to-sqlite 207052882 issue    
653529088 MDU6SXNzdWU2NTM1MjkwODg= 891 Consider using enable_callback_tracebacks(True) simonw 9599 closed 0     5 2020-07-08T19:07:16Z 2020-09-15T21:59:27Z 2020-09-15T21:59:27Z OWNER  

From https://docs.python.org/3/library/sqlite3.html#sqlite3.enable_callback_tracebacks

sqlite3.``enable_callback_tracebacks(flag)

By default you will not get any tracebacks in user-defined functions, aggregates, converters, authorizer callbacks etc. If you want to debug them, you can call this function with flag set to True. Afterwards, you will get tracebacks from callbacks on sys.stderr. Use False to disable the feature again.

Maybe turn this on for all of Datasette? Are there any disadvantages to doing that?

datasette 107914493 issue    
688427751 MDU6SXNzdWU2ODg0Mjc3NTE= 956 Push to Docker Hub failed - but it shouldn't run for alpha releases anyway simonw 9599 closed 0     7 2020-08-29T01:09:12Z 2020-09-15T20:46:41Z 2020-09-15T20:36:34Z OWNER  

https://github.com/simonw/datasette/runs/1043709494?check_suite_focus=true

https://user-images.githubusercontent.com/9599/91625110-80c55a80-e959-11ea-8fea-70508c53fcfb.png">

  • This step should not run if a release is an alpha or beta
  • When it DOES run it should work
  • See it work for both an alpha and a non-alpha release, then close this ticket
datasette 107914493 issue    
648421105 MDU6SXNzdWU2NDg0MjExMDU= 877 Consider dropping explicit CSRF protection entirely? simonw 9599 closed 0     9 2020-06-30T19:00:55Z 2020-09-15T20:42:05Z 2020-09-15T20:42:04Z OWNER  

https://scotthelme.co.uk/csrf-is-dead/ from Feb 2017 has background here. The SameSite=lax cookie property effectively eliminates CSRF in modern browsers. https://caniuse.com/#search=SameSite shows 92.13% global support for it.

Datasette already uses SameSite=lax when it sets cookies by default: https://github.com/simonw/datasette/blob/af350ba4571b8e3f9708c40f2ddb48fea7ac1084/datasette/utils/asgi.py#L327-L341

A few options then. I could ditch CSRF protection entirely. I could make it optional - turn it off by default, but let users who care about that remaining 7.87% of global users opt back into it.

One catch: login CSRF: I don't see how SameSite=lax protects against that attack.

datasette 107914493 issue    
657747959 MDU6SXNzdWU2NTc3NDc5NTk= 895 SQL query output should show numeric values in a different colour simonw 9599 closed 0     1 2020-07-16T00:28:03Z 2020-09-15T20:40:08Z 2020-09-15T20:40:08Z OWNER  

Compare https://latest.datasette.io/fixtures/sortable with https://latest.datasette.io/fixtures?sql=select+pk1%2C+pk2%2C+content%2C+sortable%2C+sortable_with_nulls%2C+sortable_with_nulls_2%2C+text+from+sortable+order+by+pk1%2C+pk2+limit+101

https://user-images.githubusercontent.com/9599/87612845-82e09c00-c6c0-11ea-806e-93764ca468c4.png">

datasette 107914493 issue    
522352520 MDU6SXNzdWU1MjIzNTI1MjA= 634 Don't run tests twice when releasing a tag simonw 9599 closed 0     2 2019-11-13T17:02:42Z 2020-09-15T20:37:58Z 2020-09-15T20:37:58Z OWNER  

Shipping a release currently runs the tests twice: https://travis-ci.org/simonw/datasette/builds/611463728

It does a regular test run on Python 3.6/7/8 - then the "Release tagged version" step runs the tests again before publishing to PyPI! This second run is not necessary.

datasette 107914493 issue    
639072811 MDU6SXNzdWU2MzkwNzI4MTE= 849 Rename master branch to main simonw 9599 closed 0   Datasette 1.0 3268330 10 2020-06-15T19:05:54Z 2020-09-15T20:37:14Z 2020-09-15T20:37:14Z OWNER  

I was waiting for consensus to form around this (and kind-of hoping for trunk since I like the tree metaphor) and it looks like main is it.

I've seen convincing arguments against trunk too - it indicates that the branch has some special significance like in Subversion (where all branches come from trunk) when it doesn't. So main is better anyway.

datasette 107914493 issue    
682184050 MDU6SXNzdWU2ODIxODQwNTA= 946 Exception in tracing code simonw 9599 closed 0     1 2020-08-19T21:12:27Z 2020-09-15T20:16:50Z 2020-09-15T20:16:50Z OWNER  

When using ?_trace=1:

Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/rockybeaches-09H592sC/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/Users/simon/.local/share/virtualenvs/rockybeaches-09H592sC/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/Users/simon/.local/share/virtualenvs/rockybeaches-09H592sC/lib/python3.8/site-packages/datasette/utils/asgi.py", line 150, in __call__
    await self.app(scope, receive, send)
  File "/Users/simon/.local/share/virtualenvs/rockybeaches-09H592sC/lib/python3.8/site-packages/datasette/tracer.py", line 137, in __call__
    await self.app(scope, receive, wrapped_send)
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/Users/simon/.local/share/virtualenvs/rockybeaches-09H592sC/lib/python3.8/site-packages/datasette/tracer.py", line 63, in capture_traces
    del tracers[task_id]
KeyError: 4575365856
datasette 107914493 issue    
702069429 MDU6SXNzdWU3MDIwNjk0Mjk= 967 Writable canned queries with magic parameters fail if POST body is empty simonw 9599 closed 0     11 2020-09-15T16:14:43Z 2020-09-15T20:13:10Z 2020-09-15T20:13:10Z OWNER  

When I try to use the new ?_json=1 feature from #880 with magic parameters from #842 I get this error:

Incorrect number of bindings supplied. The current statement uses 1, and there are 0 supplied

datasette 107914493 issue    
449854604 MDU6SXNzdWU0NDk4NTQ2MDQ= 492 Facets not correctly persisted in hidden form fields simonw 9599 closed 0   Datasette 1.0 3268330 4 2019-05-29T14:49:39Z 2020-09-15T20:12:29Z 2020-09-15T20:12:29Z OWNER  

Steps to reproduce: visit https://2a4b892.datasette.io/fixtures/roadside_attractions?_facet_m2m=attraction_characteristic and click "Apply"

Result is a 500: no such column: attraction_characteristic

The error occurs because of this hidden HTML input:

<input type="hidden" name="_facet" value="attraction_characteristic">

This should be:

<input type="hidden" name="_facet_m2m" value="attraction_characteristic">
datasette 107914493 issue    
701584448 MDU6SXNzdWU3MDE1ODQ0NDg= 966 Remove _request_ip example from canned queries documentation simonw 9599 closed 0     0 2020-09-15T03:51:33Z 2020-09-15T03:52:45Z 2020-09-15T03:52:45Z OWNER  

_request_ip isn't valid, so it shouldn't be in the example: https://github.com/simonw/datasette/blob/cb515a9d75430adaf5e545a840bbc111648e8bfd/docs/sql_queries.rst#L320-L322

datasette 107914493 issue    
688622148 MDU6SXNzdWU2ODg2MjIxNDg= 957 Simplify imports of common classes simonw 9599 open 0   Datasette 1.0 3268330 5 2020-08-29T23:44:04Z 2020-09-14T22:20:13Z   OWNER  

There are only a few classes that plugins need to import. It would be nice if these imports were as short and memorable as possible.

For example:

from datasette.app import Datasette
from datasette.utils.asgi import Response

Could both become:

from datasette import Datasette
from datasette import Response
datasette 107914493 issue    
687711713 MDU6SXNzdWU2ODc3MTE3MTM= 955 Release updated datasette-atom and datasette-ics simonw 9599 closed 0   Datasette 0.49 5818042 2 2020-08-28T04:55:21Z 2020-09-14T22:19:46Z 2020-09-14T22:19:46Z OWNER  

These should release straight after Datasette 0.49 with the change from #953.

datasette 107914493 issue    
679808124 MDU6SXNzdWU2Nzk4MDgxMjQ= 940 Move CI to GitHub Issues simonw 9599 closed 0   Datasette 0.49 5818042 20 2020-08-16T19:06:08Z 2020-09-14T22:09:35Z 2020-09-14T22:09:35Z OWNER  

It looks like the tests take 3m33s to run in GitHub Actions, but they're taking more than 8 minutes in Travis

datasette 107914493 issue    
648637666 MDU6SXNzdWU2NDg2Mzc2NjY= 880 POST to /db/canned-query that returns JSON should be supported (for API clients) simonw 9599 closed 0   Datasette 0.49 5818042 11 2020-07-01T03:14:43Z 2020-09-14T21:28:21Z 2020-09-14T21:25:01Z OWNER  

Now that CSRF is solved for API requests (#835) it would be good to support API requests to the .json extension.

datasette 107914493 issue    
701294727 MDU6SXNzdWU3MDEyOTQ3Mjc= 965 Documentation for 404.html, 500.html templates simonw 9599 closed 0   Datasette 0.49 5818042 3 2020-09-14T17:36:59Z 2020-09-14T18:49:49Z 2020-09-14T18:47:22Z OWNER  

This mechanism is not documented: https://github.com/simonw/datasette/blob/30b98e4d2955073ca2bca92ca7b3d97fcd0191bf/datasette/app.py#L1119-L1129

datasette 107914493 issue    
700728217 MDU6SXNzdWU3MDA3MjgyMTc= 964 raise_404 mechanism for custom templates simonw 9599 closed 0   Datasette 0.49 5818042 1 2020-09-14T03:22:15Z 2020-09-14T17:49:44Z 2020-09-14T17:39:34Z OWNER  

Having tried this out I think it does need a raise_404() mechanism - which needs to be smart enough to trigger the default 404 handler without accidentally going into an infinite loop.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/944#issuecomment-691788478_

datasette 107914493 issue    
681516976 MDU6SXNzdWU2ODE1MTY5NzY= 944 Path parameters for custom pages simonw 9599 closed 0   Datasette 0.49 5818042 5 2020-08-19T03:25:17Z 2020-09-14T03:21:45Z 2020-09-14T02:34:58Z OWNER  

Custom pages let you e.g. create a templates/pages/about.html page and have it automatically served at /about.

It would be useful if these pages could capture path patterns. I like the Python format string syntax for this (also used by Starlette): /foo/bar/{slug}.

So... how about embedding those patterns in the filenames themselves?

templates/pages/museums/{slug}.html

Would capture any hits to /museums/something and use that page to serve them.

datasette 107914493 issue    
459590021 MDU6SXNzdWU0NTk1OTAwMjE= 519 Decide what goes into Datasette 1.0 simonw 9599 open 0   Datasette 1.0 3268330 3 2019-06-23T15:47:41Z 2020-09-12T22:48:53Z   OWNER  

Datasette ASGI #272 is a big part of it... but 1.0 will generally be an indicator that Datasette is a stable platform for developers to write plugins and custom templates against. So lots to think about.

datasette 107914493 issue    
627794879 MDU6SXNzdWU2Mjc3OTQ4Nzk= 782 Redesign default JSON format in preparation for Datasette 1.0 simonw 9599 open 0     8 2020-05-30T18:47:07Z 2020-09-12T21:39:03Z   OWNER  

The default JSON just isn't right. I find myself using ?_shape=array for almost everything I build against the API.

datasette 107914493 issue    
323658641 MDU6SXNzdWUzMjM2NTg2NDE= 262 Add ?_extra= mechanism for requesting extra properties in JSON simonw 9599 open 0     3 2018-05-16T14:55:42Z 2020-09-12T18:22:44Z   OWNER  

Datasette views currently work by creating a set of data that should be returned as JSON, then defining an additional, optional template_data() function which is called if the view is being rendered as HTML.

This template_data() function calculates extra template context variables which are necessary for the HTML view but should not be included in the JSON.

Example of how that is used today: https://github.com/simonw/datasette/blob/2b79f2bdeb1efa86e0756e741292d625f91cb93d/datasette/views/table.py#L672-L704

With features like Facets in #255 I'm beginning to want to move more items into the template_data() - in the case of facets it's the suggested_facets array. This saves that feature from being calculated (involving several SQL queries) for the JSON case where it is unlikely to be used.

But... as an API user, I want to still optionally be able to access that information.

Solution: Add a ?_extra=suggested_facets&_extra=table_metadata argument which can be used to optionally request additional blocks to be added to the JSON API.

Then redefine as many of the current template_data() features as extra arguments instead, and teach Datasette to return certain extras by default when rendering templates.

This could allow the JSON representation to be slimmed down further (removing e.g. the table_definition and view_definition keys) while still making that information available to API users who need it.

datasette 107914493 issue    
569275763 MDU6SXNzdWU1NjkyNzU3NjM= 680 Release automation: automate the bit that posts the GitHub release simonw 9599 closed 0     5 2020-02-22T03:50:40Z 2020-09-12T18:18:50Z 2020-09-12T18:18:50Z OWNER  

The most manual part of the release process right now is having to post a GitHub release that matches the updated changelog.

This is particularly annoying because the changelog is in .rst while the GitHub release needs markdown - so I currently manually translate between the two.

Having the release script automatically post a GitHub release at the end would be much more convenient.

datasette 107914493 issue    
649429772 MDU6SXNzdWU2NDk0Mjk3NzI= 886 Reconsider how _actor_X magic parameter deals with missing values simonw 9599 open 0     2 2020-07-02T00:00:38Z 2020-09-11T21:35:26Z   OWNER  

I had to build a custom _actorornull prefix for datasette-saved-queries:

def actorornull(key, request):
    if request.actor is None:
        return None
    return request.actor.get(key)


@hookimpl
def register_magic_parameters():
    return [
        ("actorornull", actorornull),
    ]

Maybe the actor magic in Datasette core should do that out of the box?

https://github.com/simonw/datasette/blob/f1f581b7ffcd5d8f3ae6c1c654d813a6641410eb/datasette/default_magic_parameters.py#L14-L17

datasette 107914493 issue    
684925907 MDU6SXNzdWU2ODQ5MjU5MDc= 948 Upgrade CodeMirror simonw 9599 closed 0   Datasette 0.49 5818042 8 2020-08-24T19:55:33Z 2020-09-11T21:34:24Z 2020-08-30T18:03:07Z OWNER  

Datasette currently bundles 5.31.0 (from October 2017) - latest version is 5.57.0 (August 2020). https://codemirror.net/doc/releases.html

datasette 107914493 issue    
691475400 MDU6SXNzdWU2OTE0NzU0MDA= 958 Upgrade to latest Black (20.8b1) simonw 9599 closed 0   Datasette 0.49 5818042 0 2020-09-02T22:24:19Z 2020-09-11T21:34:24Z 2020-09-02T22:25:10Z OWNER  

Black has some changes: https://black.readthedocs.io/en/stable/change_log.html#b0 - in particular:

  • re-implemented support for explicit trailing commas: now it works consistently within any bracket pair, including nested structures (#1288 and duplicates)
  • Black now reindents docstrings when reindenting code around it (#1053)
datasette 107914493 issue    
699622046 MDU6SXNzdWU2OTk2MjIwNDY= 962 datasette --pdb option for debugging errors simonw 9599 closed 0   Datasette 0.49 5818042 1 2020-09-11T18:33:10Z 2020-09-11T21:34:24Z 2020-09-11T18:38:01Z OWNER  

I needed to debug an exception from deep inside a Jinja template the other day. I hacked this together and it helped.

datasette 107914493 issue    
573755726 MDU6SXNzdWU1NzM3NTU3MjY= 690 Mechanism for plugins to add UI to pages in specific locations simonw 9599 open 0     5 2020-03-02T06:48:36Z 2020-09-11T21:33:40Z   OWNER  

Now that we have support for plugins that can write I'm seeing all sorts of places where a plugin might need to add UI to the table page.

Some examples:

  • datasette-configure-fts needs to add a "configure search for this table" link
  • a plugin that lets you render or delete tables needs to add a link or button somewhere
  • existing plugins like datasette-vega and datasette-cluster-map already do this with JavaScript

The challenge here is that multiple plugins may want to do this, so simply overriding templates and populating names blocks doesn't entirely work as templates may override each other.

datasette 107914493 issue    
684111953 MDU6SXNzdWU2ODQxMTE5NTM= 947 datasette --get exit code should reflect HTTP errors simonw 9599 closed 0   Datasette 0.49 5818042 1 2020-08-23T04:17:08Z 2020-09-11T21:33:15Z 2020-09-11T21:33:15Z OWNER  

If you run datasette . --get / and the result is a 500 or 404 error (anything that's not a 200 or a 30x) the exit code from the command should not be 0.

It should still output the returned content to stdout.

This will help with writing soundness checks, as seen in https://til.simonwillison.net/til/til/github-actions_grep-tests.md

datasette 107914493 issue    
696908389 MDU6SXNzdWU2OTY5MDgzODk= 961 Verification checks for metadata.json on startup simonw 9599 open 0     2 2020-09-09T15:21:53Z 2020-09-09T15:24:31Z   OWNER  

I lost a bunch of time yesterday trying to figure out why a Datasette instance wasn't starting up - it turned out it was because I had a facets: reference that mentioned a column that did not exist.

Catching these on startup would be good.

datasette 107914493 issue    
694500679 MDU6SXNzdWU2OTQ1MDA2Nzk= 17 Rename "table" to "type" simonw 9599 closed 0     2 2020-09-06T19:34:41Z 2020-09-09T03:03:22Z 2020-09-09T03:03:22Z MEMBER  

I think "table" is the wrong name for the concept I'm using it for here.

Two reasons: firstly, table is a reserved word in SQLite. More importantly, it turns out there's not a direct mapping from tables to types of search result. In particular, for GitHub I ended up having two different "tables" of repositories - one for repos created by me, another for repos that I have starred.

dogsheep-beta 197431109 issue    
696045581 MDU6SXNzdWU2OTYwNDU1ODE= 155 rebuild-fts command and table.rebuild_fts() method simonw 9599 closed 0     2 2020-09-08T17:19:26Z 2020-09-08T23:16:10Z 2020-09-08T23:16:10Z OWNER  

https://sqlite.org/forum/forumpost/fa777fff86

Easiest thing would be to run a 'rebuild' to rebuild the FTS index from scratch based on the contents of the content table. i.e.

INSERT INTO licenses_fts(licenses_fts) VALUES('rebuild');

https://www.sqlite.org/fts5.html#the_rebuild_command

sqlite-utils 140912432 issue    
695377804 MDU6SXNzdWU2OTUzNzc4MDQ= 153 table.optimize() should delete junk rows from *_fts_docsize simonw 9599 closed 0     3 2020-09-07T20:31:09Z 2020-09-08T22:18:53Z 2020-09-07T21:16:33Z OWNER  

The second challenge here is cleaning up all of those junk rows in existing *_fts_docsize tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL:
sql DELETE FROM [licenses_fts_docsize] WHERE id NOT IN ( SELECT rowid FROM [licenses_fts]);
I can do that as part of the existing table.optimize() method, which optimizes FTS tables.
_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688501064_

sqlite-utils 140912432 issue    
695556681 MDU6SXNzdWU2OTU1NTY2ODE= 19 Figure out incremental re-indexing simonw 9599 open 0     2 2020-09-08T05:23:31Z 2020-09-08T05:27:07Z   MEMBER  

As tables get bigger reindexing everything on a schedule (essentially recreating the entire index from scratch) will start to become a performance bottleneck.

dogsheep-beta 197431109 issue    
695553522 MDU6SXNzdWU2OTU1NTM1MjI= 18 Deleted records stay in the search index simonw 9599 open 0     2 2020-09-08T05:14:23Z 2020-09-08T05:15:51Z   MEMBER  

Here's why: https://github.com/dogsheep/dogsheep-beta/blob/24f7898d41a39218058f174c75ba62f7c0fcfff6/dogsheep_beta/utils.py#L44-L53

That should probably do DELETE FROM index1.search_index WHERE [table] = ? first.

dogsheep-beta 197431109 issue    
695441530 MDU6SXNzdWU2OTU0NDE1MzA= 154 OperationalError: cannot change into wal mode from within a transaction simonw 9599 open 0     2 2020-09-07T23:42:44Z 2020-09-07T23:47:10Z   OWNER  

I'm getting this error when running:

sqlite-utils enable-wal beta.db

OperationalError: cannot change into wal mode from within a transaction

I'm worried that maybe that's because of this new code from #152:

https://github.com/simonw/sqlite-utils/blob/deb2eb013ff85bbc828ebc244a9654f0d9c3139e/sqlite_utils/db.py#L128-L129

sqlite-utils 140912432 issue    
695359607 MDU6SXNzdWU2OTUzNTk2MDc= 150 Feature for tracing SQL queries simonw 9599 closed 0     0 2020-09-07T19:43:08Z 2020-09-07T21:57:01Z 2020-09-07T21:57:01Z OWNER  

Debugging sqlite-utils when something weird happens (e.g. #149) can be a bit tricky since it runs a bunch of different SQL statements behind the scenes.

An optional "tracing" mechanism for seeing what SQL is being executed would be useful.

sqlite-utils 140912432 issue    
695360889 MDExOlB1bGxSZXF1ZXN0NDgxNjE2NzA0 151 Tracer mechanism for seeing underlying SQL simonw 9599 closed 0     0 2020-09-07T19:46:43Z 2020-09-07T21:57:00Z 2020-09-07T21:57:00Z OWNER simonw/sqlite-utils/pulls/151

Refs #150. Needs tests and documentation, including for the new db.execute() and db.executescript() methods.

sqlite-utils 140912432 pull    
695376054 MDU6SXNzdWU2OTUzNzYwNTQ= 152 Turn on recursive_triggers by default simonw 9599 closed 0     2 2020-09-07T20:26:36Z 2020-09-07T21:17:48Z 2020-09-07T20:45:14Z OWNER  

https://www.sqlite.org/pragma.html#pragma_recursive_triggers says:

Prior to SQLite version 3.6.18 (2009-09-11), recursive triggers were not supported. The behavior of SQLite was always as if this pragma was set to OFF. Support for recursive triggers was added in version 3.6.18 but was initially turned OFF by default, for compatibility. Recursive triggers may be turned on by default in future versions of SQLite.

So I think the fix for the complex issue in #149 is to turn on recursive_triggers globally by default for sqlite-utils.

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499924_

sqlite-utils 140912432 issue    
695319258 MDU6SXNzdWU2OTUzMTkyNTg= 149 FTS table with 7 rows has _fts_docsize table with 9,141 rows simonw 9599 closed 0     10 2020-09-07T18:06:16Z 2020-09-07T21:16:34Z 2020-09-07T21:16:34Z OWNER  

I'm seeing a weird issue with some of the SQLite databases that I am using with the FTS5 module.

I have a database with a licenses table that contains 7 rows: https://github-to-sqlite.dogsheep.net/github/licenses

The FTS table also has 7 rows: https://github-to-sqlite.dogsheep.net/github/licenses_fts

Somehow the accompanying licenses_fts_docsize shadow table now has 9,141 rows in it! https://github-to-sqlite.dogsheep.net/github/licenses_fts_docsize

And licenses_fts_data has 41 rows - should I expect that to have 7 rows? https://github-to-sqlite.dogsheep.net/github/licenses_fts_data

I have a hunch that it might be a problem with the triggers. These are the triggers that are updating that FTS table: https://github-to-sqlite.dogsheep.net/github?sql=select+*+from+sqlite_master+where+type+%3D+%27trigger%27+and+tbl_name+%3D+%27licenses%27

<table> <thead> <tr> <th>type</th> <th>name</th> <th>tbl_name</th> <th>rootpage</th> <th>sql</th> </tr> </thead> <tbody> <tr> <td>trigger</td> <td>licenses_ai</td> <td>licenses</td> <td>0</td> <td>CREATE TRIGGER [licenses_ai] AFTER INSERT ON [licenses] BEGIN INSERT INTO [licenses_fts] (rowid, [name]) VALUES (new.rowid, new.[name]); END</td> </tr> <tr> <td>trigger</td> <td>licenses_ad</td> <td>licenses</td> <td>0</td> <td>CREATE TRIGGER [licenses_ad] AFTER DELETE ON [licenses] BEGIN INSERT INTO [licenses_fts] ([licenses_fts], rowid, [name]) VALUES('delete', old.rowid, old.[name]); END</td> </tr> <tr> <td>trigger</td> <td>licenses_au</td> <td>licenses</td> <td>0</td> <td>CREATE TRIGGER [licenses_au] AFTER UPDATE ON [licenses] BEGIN INSERT INTO [licenses_fts] ([licenses_fts], rowid, [name]) VALUES('delete', old.rowid, old.[name]); INSERT INTO [licenses_fts] (rowid, [name]) VALUES (new.rowid, new.[name]); END</td> </tr> </tbody> </table>
sqlite-utils 140912432 issue    
695276328 MDU6SXNzdWU2OTUyNzYzMjg= 148 More attractive indentation of created FTS table schema simonw 9599 closed 0     1 2020-09-07T16:49:30Z 2020-09-07T18:12:50Z 2020-09-07T18:12:50Z OWNER  

On https://github-to-sqlite.dogsheep.net/github/licenses_fts the create table SQL is displayed as:

CREATE VIRTUAL TABLE [licenses_fts] USING FTS5 (
                [name],
                content=[licenses]
            );

It would be more aesthetically pleasing if it looked like this:

CREATE VIRTUAL TABLE [licenses_fts] USING FTS5 (
    [name],
    content=[licenses]
);
sqlite-utils 140912432 issue    
693318095 MDU6SXNzdWU2OTMzMTgwOTU= 14 On FTS exception rerun the query with quoting simonw 9599 closed 0     0 2020-09-04T15:44:18Z 2020-09-05T16:23:01Z 2020-09-05T16:23:01Z MEMBER  

Searching for eg #dogfest currently throws an FTS exception - but I want to support advanced FTS query tricks as seen in #13.

https://dogsheep.simonwillison.net/-/beta?q=%23dogfest

fts5: syntax error near "#"

Idea: catch that error and re-run the query with FTS escaping applied!

dogsheep-beta 197431109 issue    
692202408 MDU6SXNzdWU2OTIyMDI0MDg= 12 Idea: maps and GeoJSON support simonw 9599 open 0     0 2020-09-03T18:47:10Z 2020-09-04T01:45:03Z   MEMBER  

It would be cool if the display_sql could return a column populated with GeoJSON which would the automatically be displayed on a map in the results (or maybe default JS would look for a class="geojson" element output by the display template) - ala https://github.com/simonw/datasette-leaflet-geojson

Then I could render workout routes on a map, or Swarm checkin points.

dogsheep-beta 197431109 issue    
692386625 MDU6SXNzdWU2OTIzODY2MjU= 13 Support advanced FTS queries simonw 9599 closed 0     1 2020-09-03T21:29:56Z 2020-09-03T21:40:51Z 2020-09-03T21:40:51Z MEMBER  

simon willison NOT screenshot for example.

dogsheep-beta 197431109 issue    
691521965 MDU6SXNzdWU2OTE1MjE5NjU= 9 Mechanism for defining custom display of results simonw 9599 closed 0     8 2020-09-03T00:14:07Z 2020-09-03T21:12:14Z 2020-09-03T21:09:55Z MEMBER  

Part of #3 - in particular I want to make sure my photos are displayed with a thumbnail.

dogsheep-beta 197431109 issue    
689810340 MDU6SXNzdWU2ODk4MTAzNDA= 3 Datasette plugin to provide custom page for running faceted, ranked searches simonw 9599 closed 0     3 2020-09-01T05:00:22Z 2020-09-03T21:01:41Z 2020-09-03T21:01:41Z MEMBER  

This will be a page at /-/beta which renders using a custom template.

It will offer a default timeline view plus search and facet by type/date.

dogsheep-beta 197431109 issue    
689847361 MDU6SXNzdWU2ODk4NDczNjE= 5 Add a context column that's not searchable simonw 9599 closed 0     1 2020-09-01T06:13:42Z 2020-09-03T18:43:50Z 2020-09-03T18:43:50Z MEMBER  

I sometimes like to configure titles that are things like "Comment on issue X" or "Photo in Golden Gate Park" - these shouldn't be included in the search index but should be stored so they can be displayed to provide context.

Add a column for this - probably called context - and make it so it can be populated.

dogsheep-beta 197431109 issue    
691557547 MDU6SXNzdWU2OTE1NTc1NDc= 10 Category 3: received simonw 9599 closed 0     1 2020-09-03T01:40:36Z 2020-09-03T17:38:51Z 2020-09-03T17:38:51Z MEMBER  

A category for things that were sent to me: DMs, emails etc. Follows #7.

dogsheep-beta 197431109 issue    
692125110 MDU6SXNzdWU2OTIxMjUxMTA= 11 Public / Private mechanism simonw 9599 closed 0     1 2020-09-03T16:47:03Z 2020-09-03T17:33:52Z 2020-09-03T17:33:52Z MEMBER  

Some of the data in Dogsheep is stuff that was written publicly - tweets, blog posts, GitHub commits to public repos.

Some of it is private data - emails, photos, direct messages, Swarm checkins, commits to private repos.

Being able to filter for just one or the other (or both) would be useful. Especially when giving demos!

dogsheep-beta 197431109 issue    
691537426 MDU6SXNzdWU2OTE1Mzc0MjY= 959 Internals API idea: results.dicts in addition to results.rows simonw 9599 open 0     0 2020-09-03T00:50:17Z 2020-09-03T00:50:17Z   OWNER  

I just wrote this code:

    results = await database.execute(SEARCH_SQL, {"query": query})
    return [dict(r) for r in results.rows]

How about having results.dicts as a utility property that does that?

datasette 107914493 issue    
691265198 MDU6SXNzdWU2OTEyNjUxOTg= 7 Mechanism for differentiating between "by me" and "liked by me" simonw 9599 closed 0     6 2020-09-02T17:44:37Z 2020-09-02T21:06:28Z 2020-09-02T21:06:28Z MEMBER  

Some of the content I'm indexing is by me - photos I've taken, tweets I wrote, commits, comments I posted.

Some of it is stuff that I've "liked" or "bookmarked" in some way - favourited tweets, Pocket articles, starred GitHub repos.

It woud be useful to be able to differentiate between the two.

dogsheep-beta 197431109 issue    
691369691 MDU6SXNzdWU2OTEzNjk2OTE= 8 Create a view for running faceted searches simonw 9599 closed 0     1 2020-09-02T19:44:07Z 2020-09-02T19:50:47Z 2020-09-02T19:50:47Z MEMBER  
select
  search_index_fts.rank,
  search_index.rowid,
  search_index.[table],
  search_index.key,
  search_index.title,
  search_index.timestamp,
  search_index.search_1
from
  search_index join search_index_fts on search_index.rowid = search_index_fts.rowid
order by
  search_index_fts.rank, search_index.timestamp desc
dogsheep-beta 197431109 issue    
689809225 MDU6SXNzdWU2ODk4MDkyMjU= 2 Apply porter stemming simonw 9599 closed 0     2 2020-09-01T04:57:55Z 2020-09-01T20:42:00Z 2020-09-01T20:40:24Z MEMBER  

This can be on by default. You can turn it off for a table in the config file using stemming: none - or maybe tokenize: none to match the terminology used by SQLite and sqlite-utils: https://sqlite-utils.readthedocs.io/en/stable/python-api.html#enabling-full-text-search

dogsheep-beta 197431109 issue    
689850810 MDU6SXNzdWU2ODk4NTA4MTA= 6 Set up a demo instance simonw 9599 open 0     0 2020-09-01T06:20:24Z 2020-09-01T06:20:24Z   MEMBER  

Once I've got the Datasette plugin to a state where it's worth building a demo: #3

I can use data from my public https://github-to-sqlite.dogsheep.net/ demo plus the Pocket data subset I use for the demo in https://github.com/dogsheep/pocket-to-sqlite/issues/5 - I could pull in the https://dogsheep-photos.dogsheep.net/ photos data too.

dogsheep-beta 197431109 issue    
503243784 MDU6SXNzdWU1MDMyNDM3ODQ= 3 Extract images into separate tables simonw 9599 open 0     1 2019-10-07T05:43:01Z 2020-09-01T06:17:45Z   MEMBER  

As already done with authors. Slightly harder because images do not have a universally unique ID. Also need to figure out what to do about there being columns for both image and images.

https://user-images.githubusercontent.com/9599/66287418-9ab20680-e88a-11e9-96bf-6c80d881eff0.png">

pocket-to-sqlite 213286752 issue    
689848827 MDU6SXNzdWU2ODk4NDg4Mjc= 6 ISO timestamps simonw 9599 open 0     0 2020-09-01T06:16:42Z 2020-09-01T06:16:42Z   MEMBER  

The time_added, time_updated and time_read columns currently store data like this:

September 19, 2019 - 00:30:30 UTC

Should use ISO instead, e.g. 2020-07-26T01:05:24+00:00

pocket-to-sqlite 213286752 issue    
689839399 MDU6SXNzdWU2ODk4MzkzOTk= 4 Optimize the FTS table simonw 9599 closed 0     1 2020-09-01T05:58:17Z 2020-09-01T06:10:08Z 2020-09-01T06:10:08Z MEMBER   dogsheep-beta 197431109 issue    
689800307 MDU6SXNzdWU2ODk4MDAzMDc= 1 Add an index on the timestamp column simonw 9599 closed 0     0 2020-09-01T04:33:37Z 2020-09-01T04:49:23Z 2020-09-01T04:49:23Z MEMBER  

Since default view will likely be ordered by timestamp descending.

dogsheep-beta 197431109 issue    
542553350 MDU6SXNzdWU1NDI1NTMzNTA= 655 Copy and paste doesn't work reliably on iPhone for SQL editor simonw 9599 closed 0   Datasette 1.0 3268330 3 2019-12-26T13:15:10Z 2020-08-30T17:51:40Z 2020-08-30T17:51:40Z OWNER  

I'm having a lot of trouble copying and pasting from the codemirror editor on my iPhone.

datasette 107914493 issue    
688395275 MDU6SXNzdWU2ODgzOTUyNzU= 144 Run some tests against numpy simonw 9599 closed 0     2 2020-08-28T22:53:00Z 2020-08-28T22:57:05Z 2020-08-28T22:57:04Z OWNER  

Accidentally removed in #143:

https://github.com/simonw/sqlite-utils/blob/d7d3f962861ef32c5ead8f514c8756f5b6f7c4a0/.travis.yml#L18-L19

sqlite-utils 140912432 issue    
688389933 MDU6SXNzdWU2ODgzODk5MzM= 143 Move to GitHub Actions CI simonw 9599 closed 0     1 2020-08-28T22:34:11Z 2020-08-28T22:41:35Z 2020-08-28T22:41:35Z OWNER   sqlite-utils 140912432 issue    
652700770 MDU6SXNzdWU2NTI3MDA3NzA= 119 Ability to remove a foreign key simonw 9599 open 0     2 2020-07-07T22:31:37Z 2020-08-28T21:05:37Z   OWNER  

Useful if you add one but make a mistake and need to undo it without recreating the database from scratch.

sqlite-utils 140912432 issue    
675753042 MDU6SXNzdWU2NzU3NTMwNDI= 131 "insert" command options for column types simonw 9599 open 0     1 2020-08-09T18:59:11Z 2020-08-28T21:04:13Z   OWNER  

The insert command currently results in string types for every column - at least when used against CSV or TSV inputs.

It would be useful if you could do the following:

  • automatically detects the column types based on eg the first 1000 records
  • explicitly state the rule for specific columns

--detect-types could work for the former - or it could do that by default and allow opt-out using --no-detect-types`

For specific columns maybe this:

sqlite-utils insert db.db images images.tsv \
  --tsv \
  -c id int \
  -c score float
sqlite-utils 140912432 issue    
688352145 MDU6SXNzdWU2ODgzNTIxNDU= 141 insert-files support for compressed values simonw 9599 open 0     0 2020-08-28T20:59:46Z 2020-08-28T20:59:46Z   OWNER  

The sqlar format supports this, it would be useful if insert-files could support this too.

https://www.sqlite.org/sqlar.html

sqlite-utils 140912432 issue    
688351054 MDU6SXNzdWU2ODgzNTEwNTQ= 140 Idea: insert-files mechanism for adding extra columns with fixed values simonw 9599 open 0     0 2020-08-28T20:57:36Z 2020-08-28T20:57:36Z   OWNER  

Say for example you want to populate a file_type column with the value gif. That could work like this:

sqlite-utils insert-files gifs.db images *.gif \
    -c path -c md5 -c last_modified:mtime \
    -c file_type:text:gif --pk=path

So a column defined as a text column with a value that follows a second colon.

sqlite-utils 140912432 issue    
687694947 MDU6SXNzdWU2ODc2OTQ5NDc= 954 Remove old register_output_renderer dict mechanism in Datasette 1.0 simonw 9599 open 0   Datasette 1.0 3268330 1 2020-08-28T04:04:23Z 2020-08-28T04:56:31Z   OWNER  

Documentation says that the old dictionary mechanism will be deprecated by 1.0:

https://github.com/simonw/datasette/blob/799ecae94824640bdff21f86997f69844048d5c3/docs/plugin_hooks.rst#L460
_Originally posted by @simonw in https://github.com/simonw/datasette/issues/953#issuecomment-682312494_

datasette 107914493 issue    

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Query took 213.321ms · About: github-to-sqlite