{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-783265830", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 783265830, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MzI2NTgzMA==", "user": {"value": 30665, "label": "frankieroberto"}, "created_at": "2021-02-22T10:21:14Z", "updated_at": "2021-02-22T10:21:14Z", "author_association": "NONE", "body": "@simonw:\r\n\r\n> The problem there is that ?_size=x isn't actually doing the same thing as the SQL limit keyword.\r\n\r\nInteresting! Although I don't think it matters too much what the underlying implementation is - I more meant that `limit` is familiar to developers conceptually as \"up to and including this number, if they exist\", whereas \"size\" is potentially more ambiguous. However, it's probably no big deal either way.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782789598", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782789598, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc4OTU5OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-21T03:30:02Z", "updated_at": "2021-02-21T03:30:02Z", "author_association": "OWNER", "body": "Another benefit to default:object - I could include a key that shows a list of available extras. I could then use that to power an interactive API explorer.", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782765665", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782765665, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc2NTY2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T23:34:41Z", "updated_at": "2021-02-20T23:34:41Z", "author_association": "OWNER", "body": "OK, I'm back to the \"top level object as the default\" side of things now - it's pretty much unanimous at this point, and it's certainly true that it's not a decision you'll even regret.", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782756398", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782756398, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc1NjM5OA==", "user": {"value": 601316, "label": "simonrjones"}, "created_at": "2021-02-20T22:05:48Z", "updated_at": "2021-02-20T22:05:48Z", "author_association": "NONE", "body": "> I think it\u2019s a good idea if the top level item of the response JSON is always an object, rather than an array, at least as the default.\n\nI agree it is more predictable if the top level item is an object with a rows or data object that contains an array of data, which then allows for other top-level meta data. \n\nI can see the argument for removing this and just using an array for convenience - but I think that's OK as an option (as you have now).\n\nRather than have lots of top-level keys you could have a \"meta\" object to contain non-data stuff. You could use something like \"links\" for API endpoint URLs (or use a standard like HAL). Which would then leave the top level a bit cleaner - if that's what you what. \n\nHave you had much feedback from users who use the Datasette API a lot?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782748501", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782748501, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0ODUwMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:58:18Z", "updated_at": "2021-02-20T20:58:18Z", "author_association": "OWNER", "body": "Yet another option: support a `?_path=x` option which returns a nested path from the result. So you could do this:\r\n\r\n`/github/commits.json?_path=rows` - to get back a top-level array pulled from the `\"rows\"` key.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782748093", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782748093, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0ODA5Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:54:52Z", "updated_at": "2021-02-20T20:54:52Z", "author_association": "OWNER", "body": "> Have you given any thought as to whether to pretty print (format with spaces) the output or not? Can be useful for debugging/exploring in a browser or other basic tools which don\u2019t parse the JSON. Could be default (can\u2019t be much bigger with gzip?) or opt-in.\r\n\r\nAdding a `?_pretty=1` option that does that is a great idea, I'm filing a ticket for it: #1237", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782747878", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782747878, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0Nzg3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:53:11Z", "updated_at": "2021-02-20T20:53:11Z", "author_association": "OWNER", "body": "... though thinking about this further, I could re-implement the `select * from commits` (but only return a max of 10 results) feature using a nested `select * from (select * from commits) limit 10` query.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782747743", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782747743, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0Nzc0Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:52:10Z", "updated_at": "2021-02-20T20:52:10Z", "author_association": "OWNER", "body": "> Minor suggestion: rename `size` query param to `limit`, to better reflect that it\u2019s a maximum number of rows returned rather than a guarantee of getting that number, and also for consistency with the SQL keyword?\r\n\r\nThe problem there is that `?_size=x` isn't actually doing the same thing as the SQL `limit` keyword. Consider this query:\r\n\r\nhttps://latest-with-plugins.datasette.io/github?sql=select+*+from+commits - `select * from commits`\r\n\r\nDatasette returns 1,000 results, and shows a \"Custom SQL query returning more than 1,000 rows\" message at the top. That's the `size` kicking in - I only fetch the first 1,000 results from the cursor to avoid exhausting resources. In the JSON version of that at https://latest-with-plugins.datasette.io/github.json?sql=select+*+from+commits there's a `\"truncated\": true` key to let you know what happened.\r\n\r\nI find myself using `?_size=2` against Datasette occasionally if I know the rows being returned are really big and I don't want to load 10+MB of HTML.\r\n\r\nThis is only really a concern for arbitrary SQL queries though - for table pages such as https://latest-with-plugins.datasette.io/github/commits?_size=10 adding `?_size=10` actually puts a `limit 10` on the underlying SQL query.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782747164", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782747164, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0NzE2NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:47:16Z", "updated_at": "2021-02-20T20:47:16Z", "author_association": "OWNER", "body": "(I started a thread on Twitter about this: https://twitter.com/simonw/status/1363220355318358016)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782746755", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782746755, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0Njc1NQ==", "user": {"value": 30665, "label": "frankieroberto"}, "created_at": "2021-02-20T20:44:05Z", "updated_at": "2021-02-20T20:44:05Z", "author_association": "NONE", "body": "Minor suggestion: rename `size` query param to `limit`, to better reflect that it\u2019s a maximum number of rows returned rather than a guarantee of getting that number, and also for consistency with the SQL keyword?\r\n\r\nI like the idea of specifying a limit of 0 if you don\u2019t want any rows data - and returning an empty array under the `rows` key seems fine.\r\n\r\nHave you given any thought as to whether to pretty print (format with spaces) the output or not? Can be useful for debugging/exploring in a browser or other basic tools which don\u2019t parse the JSON. Could be default (can\u2019t be much bigger with gzip?) or opt-in.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782746633", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782746633, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0NjYzMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:43:07Z", "updated_at": "2021-02-20T20:43:07Z", "author_association": "OWNER", "body": "Another option: `.json` always returns an object with a list of keys that gets increased through adding `?_extra=` parameters.\r\n\r\n`.jsona` always returns a JSON array of objects\r\n\r\nI had something similar to this in Datasette a few years ago - a `.jsono` extension, which still redirects to the `shape=array` version.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782745199", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782745199, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0NTE5OQ==", "user": {"value": 30665, "label": "frankieroberto"}, "created_at": "2021-02-20T20:32:03Z", "updated_at": "2021-02-20T20:32:03Z", "author_association": "NONE", "body": "I think it\u2019s a good idea if the top level item of the response JSON is always an object, rather than an array, at least as the default. Mainly because it allows you to add extra keys in a backwards-compatible way. Also just seems more expected somehow.\r\n\r\nThe API design guidance for the UK government also recommends this: https://www.gov.uk/guidance/gds-api-technical-and-data-standards#use-json\r\n\r\nI also strongly dislike having versioned APIs (eg with a `/v1/` path prefix, as it invariably means that old versions stop working at some point, even though the bit of the API you\u2019re using might not have changed at all.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 1}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782742233", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782742233, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MjIzMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:09:16Z", "updated_at": "2021-02-20T20:09:16Z", "author_association": "OWNER", "body": "I just noticed that https://latest-with-plugins.datasette.io/github/commits.json-preview?_extra=total&_size=0&_trace=1 executes 35 SQL queries at the moment! A great reminder that a big improvement from this change will be a reduction in queries through not calculating things like suggested facets unless they are explicitly requested.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782741719", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782741719, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MTcxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:05:04Z", "updated_at": "2021-02-20T20:05:04Z", "author_association": "OWNER", "body": "> The only advantage of headers is that you don\u2019t need to do .rows, but that\u2019s actually good as a data validation step anyway\u2014if .rows is missing assume there\u2019s an error and do your error handling path instead of parsing the rest.\r\n\r\nThis is something I've not thought very hard about. If there's an error, I need to return a top-level object, not a top-level array, so I can provide details of the error.\r\n\r\nBut this means that client code will have to handle this difference - it will have to know that the returned data can be array-shaped if nothing went wrong, and object-shaped if there's an error.\r\n\r\nThe HTTP status code helps here - calling client code can know that a 200 status code means there will be an array, but an error status code means an object.\r\n\r\nIf developers really hate that the shape could be different, they can always use `?_extra=next` to ensure that the top level item is an object whether or not an error occurred. So I think this is OK.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782741107", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782741107, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MTEwNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T20:00:22Z", "updated_at": "2021-02-20T20:00:22Z", "author_association": "OWNER", "body": "A really exciting opportunity this opens up is for parallel execution - the `facets()` and `suggested_facets()` and `total()` async functions could be called in parallel, which could speed things up if I'm confident the SQLite thread pool can execute on multiple CPU cores (it should be able to because the Python `sqlite3` module releases the GIL while it's executing C code).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782740985", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782740985, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MDk4NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T19:59:21Z", "updated_at": "2021-02-20T19:59:21Z", "author_association": "OWNER", "body": "This design should be influenced by how it's implemented.\r\n\r\nOne implementation that could be nice is that each of the keys that can be requested - `next_url`, `total` etc - maps to an `async def` function which can do the work. So that expensive `count(*)` will only be executed by the `async def total` function if it is requested.\r\n\r\nThis raises more questions: Both `next` and `next_url` work off the same underlying data, so if they are both requested can we re-use the work that `next` does somehow? Maybe by letting these functions depend on each other (so `next_url()` knows to first call `next()`, but only if it hasn't been called already.\r\n\r\nI think I need to flesh out the full default collection of `?_extra=` parameters in order to design how they will work under the hood.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782740604", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782740604, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MDYwNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T19:56:21Z", "updated_at": "2021-02-20T19:56:33Z", "author_association": "OWNER", "body": "I think I want to support `?_extra=next_url,total` in addition to `?_extra=next_url&_extra=total` - partly because it's less characters to type, and also because I know there exist URL handling library that don't know how to handle the same parameter multiple times (though they're going to break against Datasette already, so it's not a big deal).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782740488", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782740488, "node_id": "MDEyOklzc3VlQ29tbWVudDc4Mjc0MDQ4OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T19:55:23Z", "updated_at": "2021-02-20T19:55:23Z", "author_association": "OWNER", "body": "Am I saying you won't get back a key in the response unless you explicitly request it, either by name or by specifying a bundle of extras (e.g. `all` or `paginated`)?\r\n\r\nThe `\"truncated\": true` key that tells you that your arbitrary query returned more than X results but was truncated is pretty important, do I really want people to have to opt-in to that one?\r\n\r\nAlso: having bundles like `all` or `paginated` live in the same namespace as single keys like `next_url` or `total` is a little odd - you can't tell by looking at them if they'll add a key called `all` or if they'll add a bunch of other stuff.\r\n\r\nMaybe bundles could be prefixed with something, perhaps an underscore? `?_extra=_all` and `?_extra=_paginated` for example.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782739926", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782739926, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjczOTkyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T19:51:30Z", "updated_at": "2021-02-20T19:52:19Z", "author_association": "OWNER", "body": "Demos:\r\n\r\n- https://latest-with-plugins.datasette.io/github/commits.json-preview\r\n- https://latest-with-plugins.datasette.io/github/commits.json-preview?_extra=next_url\r\n- https://latest-with-plugins.datasette.io/github/commits.json-preview?_extra=total\r\n- https://latest-with-plugins.datasette.io/github/commits.json-preview?_extra=next_url&_extra=total\r\n- https://latest-with-plugins.datasette.io/github/commits.json-preview?_extra=total&_size=0", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782709425", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782709425, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjcwOTQyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T16:24:54Z", "updated_at": "2021-02-20T16:24:54Z", "author_association": "OWNER", "body": "Having shortcuts means I could support `?_extra=all` for returning ALL possible keys.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782709270", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782709270, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjcwOTI3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T16:23:51Z", "updated_at": "2021-02-20T16:24:11Z", "author_association": "OWNER", "body": "Also how would you opt out of returning the `\"rows\"` key? I sometimes want to do this - if I want to get back just the count or just the facets for example.\r\n\r\nSome options:\r\n\r\n* `/fixtures/roadside_attractions.json?_extra=total&_extra=-rows`\r\n* `/fixtures/roadside_attractions.json?_extra=total&_skip=rows`\r\n* `/fixtures/roadside_attractions.json?_extra=total&_size=0`\r\n\r\nI quite like that last one with `?_size=0`. I think it would still return `\"rows\": []` but that's OK.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-782708938", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 782708938, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjcwODkzOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-20T16:22:14Z", "updated_at": "2021-02-20T16:22:14Z", "author_association": "OWNER", "body": "I'm leaning back in the direction of a flat JSON array of objects as the default - this:\r\n\r\n`/fixtures/roadside_attractions.json`\r\n\r\nWould return:\r\n\r\n```json\r\n[\r\n    {\r\n      \"pk\": 1,\r\n      \"name\": \"The Mystery Spot\",\r\n      \"address\": \"465 Mystery Spot Road, Santa Cruz, CA 95065\",\r\n      \"latitude\": 37.0167,\r\n      \"longitude\": -122.0024\r\n    },\r\n    {\r\n      \"pk\": 2,\r\n      \"name\": \"Winchester Mystery House\",\r\n      \"address\": \"525 South Winchester Boulevard, San Jose, CA 95128\",\r\n      \"latitude\": 37.3184,\r\n      \"longitude\": -121.9511\r\n    },\r\n    {\r\n      \"pk\": 3,\r\n      \"name\": \"Burlingame Museum of PEZ Memorabilia\",\r\n      \"address\": \"214 California Drive, Burlingame, CA 94010\",\r\n      \"latitude\": 37.5793,\r\n      \"longitude\": -122.3442\r\n    },\r\n    {\r\n      \"pk\": 4,\r\n      \"name\": \"Bigfoot Discovery Museum\",\r\n      \"address\": \"5497 Highway 9, Felton, CA 95018\",\r\n      \"latitude\": 37.0414,\r\n      \"longitude\": -122.0725\r\n    }\r\n]\r\n```\r\nTo get the version that includes pagination information you would use the `?_extra=` parameter. For example:\r\n\r\n`/fixtures/roadside_attractions.json?_extra=total&_extra=next_url`\r\n\r\n```json\r\n{\r\n  \"rows\": [\r\n    {\r\n      \"pk\": 1,\r\n      \"name\": \"The Mystery Spot\",\r\n      \"address\": \"465 Mystery Spot Road, Santa Cruz, CA 95065\",\r\n      \"latitude\": 37.0167,\r\n      \"longitude\": -122.0024\r\n    },\r\n    {\r\n      \"pk\": 2,\r\n      \"name\": \"Winchester Mystery House\",\r\n      \"address\": \"525 South Winchester Boulevard, San Jose, CA 95128\",\r\n      \"latitude\": 37.3184,\r\n      \"longitude\": -121.9511\r\n    },\r\n    {\r\n      \"pk\": 3,\r\n      \"name\": \"Burlingame Museum of PEZ Memorabilia\",\r\n      \"address\": \"214 California Drive, Burlingame, CA 94010\",\r\n      \"latitude\": 37.5793,\r\n      \"longitude\": -122.3442\r\n    },\r\n    {\r\n      \"pk\": 4,\r\n      \"name\": \"Bigfoot Discovery Museum\",\r\n      \"address\": \"5497 Highway 9, Felton, CA 95018\",\r\n      \"latitude\": 37.0414,\r\n      \"longitude\": -122.0725\r\n    }\r\n  ],\r\n  \"total\": 4,\r\n  \"next_url\": null\r\n}\r\n```\r\nANY usage of the `?_extra=` parameter would turn the list into an object with a `\"rows\"` key.\r\n\r\nOpting in to the `total` is nice because it's actually expensive to run a count, so only doing a count if the user requests it feels good.\r\n\r\nBut... having to add `?_extra=total&_extra=next_url` for the common case of wanting both the total count and the URL to get the next page of results is a bit verbose. So maybe support aliases, like `?_extra=paginated` which is a shortcut for `?_extra=total&_extra=next_url`?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782464306", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782464306, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ2NDMwNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:57:32Z", "updated_at": "2021-02-19T23:57:32Z", "author_association": "OWNER", "body": "Need to test this on mobile.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782464215", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782464215, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ2NDIxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:57:13Z", "updated_at": "2021-02-19T23:57:13Z", "author_association": "OWNER", "body": "Now live on https://latest.datasette.io/_memory", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782462049", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782462049, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ2MjA0OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:51:12Z", "updated_at": "2021-02-19T23:51:12Z", "author_association": "OWNER", "body": "![resize-demo](https://user-images.githubusercontent.com/9599/108573758-4914eb00-72ca-11eb-989c-e642eee68021.gif)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782459550", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782459550, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ1OTU1MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:45:30Z", "updated_at": "2021-02-19T23:45:30Z", "author_association": "OWNER", "body": "Encoded using https://meyerweb.com/eric/tools/dencoder/\r\n\r\n`%3Csvg%20aria-labelledby%3D%22cm-drag-to-resize%22%20role%3D%22img%22%20fill%3D%22%23ccc%22%20stroke%3D%22%23ccc%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%2016%2016%22%20width%3D%2216%22%20height%3D%2216%22%3E%0A%20%20%3Ctitle%20id%3D%22cm-drag-to-resize%22%3EDrag%20to%20resize%3C%2Ftitle%3E%0A%20%20%3Cpath%20fill-rule%3D%22evenodd%22%20d%3D%22M1%202.75A.75.75%200%20011.75%202h12.5a.75.75%200%20110%201.5H1.75A.75.75%200%20011%202.75zm0%205A.75.75%200%20011.75%207h12.5a.75.75%200%20110%201.5H1.75A.75.75%200%20011%207.75zM1.75%2012a.75.75%200%20100%201.5h12.5a.75.75%200%20100-1.5H1.75z%22%3E%3C%2Fpath%3E%0A%3C%2Fsvg%3E`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782459405", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782459405, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ1OTQwNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:45:02Z", "updated_at": "2021-02-19T23:45:02Z", "author_association": "OWNER", "body": "I'm going to use a variant of the Datasette menu icon. Here it is in `#ccc` with an ARIA label:\r\n\r\n```svg\r\n<svg aria-labelledby=\"cm-drag-to-resize\" role=\"img\" fill=\"#ccc\" stroke=\"#ccc\" xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 16 16\" width=\"16\" height=\"16\">\r\n  <title id=\"cm-drag-to-resize\">Drag to resize</title>\r\n  <path fill-rule=\"evenodd\" d=\"M1 2.75A.75.75 0 011.75 2h12.5a.75.75 0 110 1.5H1.75A.75.75 0 011 2.75zm0 5A.75.75 0 011.75 7h12.5a.75.75 0 110 1.5H1.75A.75.75 0 011 7.75zM1.75 12a.75.75 0 100 1.5h12.5a.75.75 0 100-1.5H1.75z\"></path>\r\n</svg>\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782458983", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782458983, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ1ODk4Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:43:34Z", "updated_at": "2021-02-19T23:43:34Z", "author_association": "OWNER", "body": "I only want it to resize up and down, not left to right - so I'm not keen on the default resize handle:\r\n\r\n<img width=\"722\" alt=\"cm-resize_demo\" src=\"https://user-images.githubusercontent.com/9599/108573363-364de680-72c9-11eb-8741-5112463ebfaa.png\">\r\n\r\nhttps://rawgit.com/Sphinxxxx/cm-resize/master/demo/index.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1236#issuecomment-782458744", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1236", "id": 782458744, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQ1ODc0NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T23:42:42Z", "updated_at": "2021-02-19T23:42:42Z", "author_association": "OWNER", "body": "I can use https://github.com/Sphinxxxx/cm-resize for this", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 812228314, "label": "Ability to increase size of the SQL editor window"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1212#issuecomment-782430028", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1212", "id": 782430028, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjQzMDAyOA==", "user": {"value": 4488943, "label": "kbaikov"}, "created_at": "2021-02-19T22:54:13Z", "updated_at": "2021-02-19T22:54:13Z", "author_association": "CONTRIBUTOR", "body": "I will close this issue since it appears only in my particular setup.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 797651831, "label": "Tests are very slow. "}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/619#issuecomment-782246111", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/619", "id": 782246111, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjI0NjExMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T18:11:22Z", "updated_at": "2021-02-19T18:11:22Z", "author_association": "OWNER", "body": "Big usability improvement, see also #1236", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 520655983, "label": "\"Invalid SQL\" page should let you edit the SQL"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1229#issuecomment-782053455", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1229", "id": 782053455, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MjA1MzQ1NQ==", "user": {"value": 295329, "label": "camallen"}, "created_at": "2021-02-19T12:47:19Z", "updated_at": "2021-02-19T12:47:19Z", "author_association": "CONTRIBUTOR", "body": "I believe this pr and #1031 are related and fix the same issue.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 810507413, "label": "ensure immutable databses when starting in configuration directory mode with"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/236#issuecomment-781825726", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/236", "id": 781825726, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTgyNTcyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T05:10:41Z", "updated_at": "2021-02-19T05:10:41Z", "author_association": "OWNER", "body": "Documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#attaching-additional-databases", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811680502, "label": "--attach command line option for attaching extra databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/113#issuecomment-781825187", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/113", "id": 781825187, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTgyNTE4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T05:09:12Z", "updated_at": "2021-02-19T05:09:12Z", "author_association": "OWNER", "body": "Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#attaching-additional-databases", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 621286870, "label": "Syntactic sugar for ATTACH DATABASE"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781764561", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781764561, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTc2NDU2MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T02:10:21Z", "updated_at": "2021-02-19T02:10:21Z", "author_association": "OWNER", "body": "This feature is now released! https://docs.datasette.io/en/stable/changelog.html#v0-55", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 1, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1235#issuecomment-781736855", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1235", "id": 781736855, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTczNjg1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T00:52:47Z", "updated_at": "2021-02-19T01:47:53Z", "author_association": "OWNER", "body": "I bumped the two lines in the `Dockerfile` to `FROM python:3.7.10-slim-stretch as build` and ran this to build it:\r\n\r\n    docker build -f Dockerfile -t datasetteproject/datasette:python-3-7-10 .\r\n\r\nThen I ran it with:\r\n\r\n    docker run -p 8001:8001 -v `pwd`:/mnt datasetteproject/datasette:python-3-7-10 datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db\r\n\r\nhttp://0.0.0.0:8001/-/versions confirmed that it was now running Python 3.7.10", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811589344, "label": "Upgrade Python version used by official Datasette Docker image"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1235#issuecomment-781735887", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1235", "id": 781735887, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTczNTg4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-19T00:50:21Z", "updated_at": "2021-02-19T00:50:55Z", "author_association": "OWNER", "body": "I'll bump to `3.7.10` for the moment - the fix for 3.8 isn't out until March 1st according to https://news.ycombinator.com/item?id=26186434\r\n\r\nhttps://www.python.org/downloads/release/python-3710/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811589344, "label": "Upgrade Python version used by official Datasette Docker image"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781670827", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781670827, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTY3MDgyNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T22:16:46Z", "updated_at": "2021-02-18T22:16:46Z", "author_association": "OWNER", "body": "Demo is now live here: https://latest.datasette.io/_memory\r\n\r\nThe documentation is at https://docs.datasette.io/en/latest/sql_queries.html#cross-database-queries - it links to this example query: https://latest.datasette.io/_memory?sql=select%0D%0A++%27fixtures%27+as+database%2C+*%0D%0Afrom%0D%0A++%5Bfixtures%5D.sqlite_master%0D%0Aunion%0D%0Aselect%0D%0A++%27extra_database%27+as+database%2C+*%0D%0Afrom%0D%0A++%5Bextra_database%5D.sqlite_master", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781599929", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781599929, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU5OTkyOQ==", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2021-02-18T19:59:54Z", "updated_at": "2021-02-18T22:06:42Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=h1) Report\n> Merging [#1232](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=desc) (8876499) into [main](https://codecov.io/gh/simonw/datasette/commit/4df548e7668b5b21d64a267964951e67894f4712?el=desc) (4df548e) will **increase** coverage by `0.03%`.\n> The diff coverage is `100.00%`.\n\n[![Impacted file tree graph](https://codecov.io/gh/simonw/datasette/pull/1232/graphs/tree.svg?width=650&height=150&src=pr&token=eSahVY7kw1)](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=tree)\n\n```diff\n@@            Coverage Diff             @@\n##             main    #1232      +/-   ##\n==========================================\n+ Coverage   91.42%   91.46%   +0.03%     \n==========================================\n  Files          32       32              \n  Lines        3955     3970      +15     \n==========================================\n+ Hits         3616     3631      +15     \n  Misses        339      339              \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=tree) | Coverage \u0394 | |\n|---|---|---|\n| [datasette/app.py](https://codecov.io/gh/simonw/datasette/pull/1232/diff?src=pr&el=tree#diff-ZGF0YXNldHRlL2FwcC5weQ==) | `95.68% <100.00%> (+0.06%)` | :arrow_up: |\n| [datasette/cli.py](https://codecov.io/gh/simonw/datasette/pull/1232/diff?src=pr&el=tree#diff-ZGF0YXNldHRlL2NsaS5weQ==) | `76.62% <100.00%> (+0.36%)` | :arrow_up: |\n| [datasette/views/database.py](https://codecov.io/gh/simonw/datasette/pull/1232/diff?src=pr&el=tree#diff-ZGF0YXNldHRlL3ZpZXdzL2RhdGFiYXNlLnB5) | `97.19% <100.00%> (+0.01%)` | :arrow_up: |\n\n------\n\n[Continue to review full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=continue).\n> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)\n> `\u0394 = absolute <relative> (impact)`, `\u00f8 = not affected`, `? = missing data`\n> Powered by [Codecov](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=footer). Last update [4df548e...8876499](https://codecov.io/gh/simonw/datasette/pull/1232?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781665560", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781665560, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTY2NTU2MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T22:06:14Z", "updated_at": "2021-02-18T22:06:14Z", "author_association": "OWNER", "body": "The implementation in #1232 is ready to land. It's the simplest-thing-that-could-possibly-work: you can run `datasette one.db two.db three.db --crossdb` and then use the `/_memory` page to run joins across tables from multiple databases.\r\n\r\nIt only works on the first 10 databases that were passed to the command-line. This means that if you have a Datasette instance with hundreds of attached databases (see [Datasette Library](https://github.com/simonw/datasette/issues/417)) this won't be particularly useful for you.\r\n\r\nSo... a better, future version of this feature would be one that lets you join across databases on command - maybe by hitting `/_memory?attach=db1&attach=db2` to get a special connection.\r\n\r\nAlso worth noting: plugins that implement the [prepare_connection()](https://docs.datasette.io/en/stable/plugin_hooks.html#prepare-connection-conn-database-datasette) hook can attach additional databases - so if you need better, customized support for this one way to handle that would be with a custom plugin.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781651283", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781651283, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTY1MTI4Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T21:37:55Z", "updated_at": "2021-02-18T21:37:55Z", "author_association": "OWNER", "body": "UI listing the attached tables:\r\n\r\n<img width=\"888\" alt=\"_memory\" src=\"https://user-images.githubusercontent.com/9599/108424809-8015ce80-71ee-11eb-8752-03c459f89320.png\">\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781641728", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781641728, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTY0MTcyOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T21:19:34Z", "updated_at": "2021-02-18T21:19:34Z", "author_association": "OWNER", "body": "I tested the demo deployment like this:\r\n```\r\ndatasette publish cloudrun fixtures.db extra_database.db \\                                      \r\n            -m fixtures.json \\\r\n            --plugins-dir=plugins \\\r\n            --branch=crossdb \\\r\n            --extra-options=\"--setting template_debug 1 --crossdb\" \\\r\n            --install=pysqlite3-binary \\\r\n            --service=datasette-latest-crossdb\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781637292", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781637292, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTYzNzI5Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T21:11:31Z", "updated_at": "2021-02-18T21:11:31Z", "author_association": "OWNER", "body": "Due to bug #1233 I'm going to publish the additional database as `extra_database.db` rather than `extra database.db` as it is used in the tests.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1233#issuecomment-781636590", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1233", "id": 781636590, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTYzNjU5MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T21:10:08Z", "updated_at": "2021-02-18T21:10:08Z", "author_association": "OWNER", "body": "I think the bug is here: https://github.com/simonw/datasette/blob/640ac7071b73111ba4423812cd683756e0e1936b/datasette/utils/__init__.py#L349-L373", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811458446, "label": "\"datasette publish cloudrun\" cannot publish files with spaces in their name"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781634819", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781634819, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTYzNDgxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T21:06:43Z", "updated_at": "2021-02-18T21:06:43Z", "author_association": "OWNER", "body": "I'll document this option on https://docs.datasette.io/en/stable/sql_queries.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781629841", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781629841, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTYyOTg0MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T20:57:23Z", "updated_at": "2021-02-18T20:57:23Z", "author_association": "OWNER", "body": "The new warning looks like this:\r\n\r\n<img width=\"514\" alt=\"datasette_\u2014_pipenv_shell_\u25b8_Python_\u2014_182\u00d766\" src=\"https://user-images.githubusercontent.com/9599/108420562-d41db480-71e8-11eb-87e3-1cddd65627c2.png\">\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781598585", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781598585, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU5ODU4NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:57:30Z", "updated_at": "2021-02-18T19:57:30Z", "author_association": "OWNER", "body": "It would also be neat if https://latest.datasette.io/ had multiple databases attached in order to demonstrate this feature.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1232#issuecomment-781594632", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1232", "id": 781594632, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU5NDYzMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:50:21Z", "updated_at": "2021-02-18T19:50:21Z", "author_association": "OWNER", "body": "It would be neat if the `/_memory` page showed a list of attached databases, to indicate that the `--crossdb` option is working and give people links to click to start running queries.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811407131, "label": "--crossdb option for joining across databases"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781593169", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781593169, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU5MzE2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:47:34Z", "updated_at": "2021-02-18T19:47:34Z", "author_association": "OWNER", "body": "I have a working version now, moving development to a pull request.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781591015", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781591015, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU5MTAxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:44:02Z", "updated_at": "2021-02-18T19:44:02Z", "author_association": "OWNER", "body": "For the moment I'm going to hard-code a `SQLITE_LIMIT_ATTACHED=10` constant and only attach the first 10 databases to the `_memory` connection.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781574786", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781574786, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU3NDc4Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:15:37Z", "updated_at": "2021-02-18T19:15:37Z", "author_association": "OWNER", "body": "`select * from pragma_database_list();` is useful - shows all attached databases for the current connection.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781573676", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781573676, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU3MzY3Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T19:13:30Z", "updated_at": "2021-02-18T19:13:30Z", "author_association": "OWNER", "body": "It turns out SQLite defaults to a maximum of 10 attached databases. This can be increased using a compile-time constant, but even with that it cannot be more than 62: https://stackoverflow.com/questions/9845448/attach-limit-10", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1231#issuecomment-781560989", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1231", "id": 781560989, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU2MDk4OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T18:50:53Z", "updated_at": "2021-02-18T18:50:53Z", "author_association": "OWNER", "body": "Ideally I'd figure out a way to replicate this error in a concurrent unit test.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811367257, "label": "Race condition errors in new refresh_schemas() mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1231#issuecomment-781560865", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1231", "id": 781560865, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU2MDg2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T18:50:38Z", "updated_at": "2021-02-18T18:50:38Z", "author_association": "OWNER", "body": "I started trying to use locks to resolve this but I've not figured out the right way to do that yet - here's my first experiment:\r\n```diff\r\ndiff --git a/datasette/app.py b/datasette/app.py\r\nindex 9e15a16..1681c9d 100644\r\n--- a/datasette/app.py\r\n+++ b/datasette/app.py\r\n@@ -217,6 +217,7 @@ class Datasette:\r\n         self.inspect_data = inspect_data\r\n         self.immutables = set(immutables or [])\r\n         self.databases = collections.OrderedDict()\r\n+        self._refresh_schemas_lock = threading.Lock()\r\n         if memory or not self.files:\r\n             self.add_database(Database(self, is_memory=True), name=\"_memory\")\r\n         # memory_name is a random string so that each Datasette instance gets its own\r\n@@ -324,6 +325,13 @@ class Datasette:\r\n         self.client = DatasetteClient(self)\r\n \r\n     async def refresh_schemas(self):\r\n+        return\r\n+        if self._refresh_schemas_lock.locked():\r\n+            return\r\n+        with self._refresh_schemas_lock:\r\n+            await self._refresh_schemas()\r\n+\r\n+    async def _refresh_schemas(self):\r\n         internal_db = self.databases[\"_internal\"]\r\n         if not self.internal_db_created:\r\n             await init_internal_db(internal_db)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811367257, "label": "Race condition errors in new refresh_schemas() mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1226#issuecomment-781546512", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1226", "id": 781546512, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTU0NjUxMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T18:26:19Z", "updated_at": "2021-02-18T18:26:19Z", "author_association": "OWNER", "body": "This broke CI: https://github.com/simonw/datasette/runs/1929355965?check_suite_focus=true", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808843401, "label": "--port option should validate port is between 0 and 65535"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1226#issuecomment-781530157", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1226", "id": 781530157, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTUzMDE1Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T18:00:15Z", "updated_at": "2021-02-18T18:00:15Z", "author_association": "OWNER", "body": "I can use `click.IntRange(min=None, max=None)` for this. https://click.palletsprojects.com/en/7.x/options/#ranges - inclusive on both edges.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808843401, "label": "--port option should validate port is between 0 and 65535"}, "performed_via_github_app": null}
{"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-781451701", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4", "id": 781451701, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTQ1MTcwMQ==", "user": {"value": 203343, "label": "Btibert3"}, "created_at": "2021-02-18T16:06:21Z", "updated_at": "2021-02-18T16:06:21Z", "author_association": "NONE", "body": "Awesome!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 778380836, "label": "Feature Request: Gmail"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1230#issuecomment-781330466", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1230", "id": 781330466, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTMzMDQ2Ng==", "user": {"value": 7107523, "label": "Kabouik"}, "created_at": "2021-02-18T13:06:22Z", "updated_at": "2021-02-18T15:22:15Z", "author_association": "NONE", "body": "[Edit] Oh, I just saw the \"Load all\" button under the cluster map as well as the [setting to alter the max number or results](https://docs.datasette.io/en/stable/settings.html#max-returned-rows). So I guess this issue only is about the Vega charts.\r\n\r\n<details>\r\nNote that datasette-cluster-map also seems to be limited to 998 displayed points: \r\n\r\n![ss-2021-02-18_140548](https://user-images.githubusercontent.com/7107523/108361225-15fb2a80-71ea-11eb-9a19-d885e8513f55.png)\r\n</details>", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 811054000, "label": "Vega charts are plotted only for rows on the visible page, cluster maps only for rows in the remaining pages"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-781077127", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 781077127, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MTA3NzEyNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-18T05:56:30Z", "updated_at": "2021-02-18T05:57:34Z", "author_association": "OWNER", "body": "I'm going to to try prototyping the `--crossdb` option that causes `/_memory` to connect to all databases as a starting point and see how well that works.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 1, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/283#issuecomment-780991910", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/283", "id": 780991910, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MDk5MTkxMA==", "user": {"value": 9308268, "label": "rayvoelker"}, "created_at": "2021-02-18T02:13:56Z", "updated_at": "2021-02-18T02:13:56Z", "author_association": "NONE", "body": "I was going ask you about this issue when we talk during your office-hours schedule this Friday, but was there any support ever added for doing this cross-database joining?\r\n\r\nI have a use-case where could be pretty neat to do analysis using this tool on time-specific databases from snapshots\r\n\r\nhttps://ilsweb.cincinnatilibrary.org/collection-analysis/\r\n\r\n![image](https://user-images.githubusercontent.com/9308268/108294883-ba3a8e00-7164-11eb-9206-fcd5a8cdd883.png)\r\n\r\nand thanks again for such an amazing tool!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 325958506, "label": "Support cross-database joins"}, "performed_via_github_app": null}
{"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-780817596", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4", "id": 780817596, "node_id": "MDEyOklzc3VlQ29tbWVudDc4MDgxNzU5Ng==", "user": {"value": 306240, "label": "UtahDave"}, "created_at": "2021-02-17T20:01:35Z", "updated_at": "2021-02-17T20:01:35Z", "author_association": "NONE", "body": "I've got this almost working. Just needs some polish", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 778380836, "label": "Feature Request: Gmail"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/227#issuecomment-779785638", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/227", "id": 779785638, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTc4NTYzOA==", "user": {"value": 295329, "label": "camallen"}, "created_at": "2021-02-16T11:48:03Z", "updated_at": "2021-02-16T11:48:03Z", "author_association": "NONE", "body": "Thank you @simonw ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807174161, "label": "Error reading csv files with large column data"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1226#issuecomment-779467451", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1226", "id": 779467451, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ2NzQ1MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T22:02:46Z", "updated_at": "2021-02-15T22:02:46Z", "author_association": "OWNER", "body": "I'm OK with the current error message shown if you try to use too low a port:\r\n```\r\ndatasette fivethirtyeight.db -p 800 \r\nINFO:     Started server process [45511]\r\nINFO:     Waiting for application startup.\r\nINFO:     Application startup complete.\r\nERROR:    [Errno 13] error while attempting to bind on address ('127.0.0.1', 800): permission denied\r\nINFO:     Waiting for application shutdown.\r\nINFO:     Application shutdown complete.\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808843401, "label": "--port option should validate port is between 0 and 65535"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1226#issuecomment-779467160", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1226", "id": 779467160, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ2NzE2MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T22:01:53Z", "updated_at": "2021-02-15T22:01:53Z", "author_association": "OWNER", "body": "This check needs to happen in two places:\r\n\r\nhttps://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L222-L227\r\n\r\nhttps://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L328-L333", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808843401, "label": "--port option should validate port is between 0 and 65535"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779416619", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779416619, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQxNjYxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:40:57Z", "updated_at": "2021-02-15T21:27:55Z", "author_association": "OWNER", "body": "Tried this experiment (not proper binary search, it only searches downwards):\r\n```python\r\nimport sqlite3\r\n\r\ndb = sqlite3.connect(\":memory:\")\r\n\r\ndef tryit(n):\r\n    sql = \"select 1 where 1 in ({})\".format(\", \".join(\"?\" for i in range(n)))\r\n    db.execute(sql, [0 for i in range(n)])\r\n\r\n\r\ndef find_limit(min=0, max=5_000_000):\r\n    value = max\r\n    while True:\r\n        print('Trying', value)\r\n        try:\r\n            tryit(value)\r\n            return value\r\n        except:\r\n            value = value // 2\r\n```\r\nRunning `find_limit()` with those default parameters takes about 1.47s on my laptop:\r\n```\r\nIn [9]: %timeit find_limit()\r\nTrying 5000000\r\nTrying 2500000...\r\n1.47 s \u00b1 28 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n```\r\nInterestingly the value it suggested was 156250 - suggesting that the macOS `sqlite3` binary with a 500,000 limit isn't the same as whatever my Python is using here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779448912", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779448912, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0ODkxMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:09:50Z", "updated_at": "2021-02-15T21:09:50Z", "author_association": "OWNER", "body": "I fiddled around and replaced that line with `batch_size = SQLITE_MAX_VARS // num_columns` - which evaluated to `10416` for this particular file. That got me this:\r\n\r\n       40.71s user 1.81s system 98% cpu 43.081 total\r\n\r\n43s is definitely better than 56s, but it's still not as big as the  ~26.5s to ~3.5s improvement described by @simonwiles at the top of this issue. I wonder what I'm missing here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779446652", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779446652, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0NjY1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:04:19Z", "updated_at": "2021-02-15T21:04:19Z", "author_association": "OWNER", "body": "... but it looks like `batch_size` is hard-coded to 100, rather than `None` - which means it's not being calculated using that value:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L704\r\n\r\nAnd\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L1877", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779445423", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779445423, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0NTQyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:00:44Z", "updated_at": "2021-02-15T21:01:09Z", "author_association": "OWNER", "body": "I tried changing the hard-coded value from 999 to 156_250 and running `sqlite-utils insert` against a 500MB CSV file, with these results:\r\n```\r\n(sqlite-utils) sqlite-utils % time sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers\r\n  [###################################-]   99%  00:00:00sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv\r\n44.74s user 7.61s system 92% cpu 56.601 total\r\n# Increased the setting here\r\n(sqlite-utils) sqlite-utils % time sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers\r\n  [###################################-]   99%  00:00:00sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv\r\n39.40s user 5.15s system 96% cpu 46.320 total\r\n```\r\nNot as big a difference as I was expecting.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779417723", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779417723, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQxNzcyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:44:02Z", "updated_at": "2021-02-15T19:47:00Z", "author_association": "OWNER", "body": "`%timeit find_limit(max=1_000_000)` took 378ms on my laptop\r\n\r\n`%timeit find_limit(max=500_000)` took 197ms\r\n\r\n`%timeit find_limit(max=200_000)` reported 53ms per loop\r\n\r\n`%timeit find_limit(max=100_000)` reported 26.8ms per loop.\r\n\r\nAll of these are still slow enough that I'm not comfortable running this search for every time the library is imported. Allowing users to opt-in to this as a performance enhancement might be better.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779409770", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779409770, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQwOTc3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:23:11Z", "updated_at": "2021-02-15T19:23:11Z", "author_association": "OWNER", "body": "On my Mac right now I'm seeing a limit of 500,000:\r\n```\r\n% sqlite3 -cmd \".limits variable_number\"\r\n     variable_number 500000\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/227#issuecomment-778854808", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/227", "id": 778854808, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg1NDgwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T22:46:54Z", "updated_at": "2021-02-14T22:46:54Z", "author_association": "OWNER", "body": "Fix is released in 3.5.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 1, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807174161, "label": "Error reading csv files with large column data"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778851721", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778851721, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg1MTcyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T22:23:46Z", "updated_at": "2021-02-14T22:23:46Z", "author_association": "OWNER", "body": "I called this `--no-headers` for consistency with the existing output option: https://github.com/simonw/sqlite-utils/blob/427dace184c7da57f4a04df07b1e84cdae3261e8/sqlite_utils/cli.py#L61-L64", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778849394", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778849394, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0OTM5NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T22:06:53Z", "updated_at": "2021-02-14T22:06:53Z", "author_association": "OWNER", "body": "For the moment I think just adding `--no-header` - which causes column names \"unknown1,unknown2,...\" to be used - should be enough.\r\n\r\nUsers can import with that option, then use `sqlite-utils transform --rename` to rename them.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778844016", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/229", "id": 778844016, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0NDAxNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:22:45Z", "updated_at": "2021-02-14T21:22:45Z", "author_association": "OWNER", "body": "I'm going to use this pattern from https://stackoverflow.com/a/15063941\r\n```python\r\nimport sys\r\nimport csv\r\nmaxInt = sys.maxsize\r\n\r\nwhile True:\r\n    # decrease the maxInt value by factor 10 \r\n    # as long as the OverflowError occurs.\r\n\r\n    try:\r\n        csv.field_size_limit(maxInt)\r\n        break\r\n    except OverflowError:\r\n        maxInt = int(maxInt/10)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807817197, "label": "Hitting `_csv.Error: field larger than field limit (131072)`"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778843503", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/229", "id": 778843503, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MzUwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:18:51Z", "updated_at": "2021-02-14T21:18:51Z", "author_association": "OWNER", "body": "I want to set this to the maximum allowed limit, which seems to be surprisingly hard! That StackOverflow thread is full of ideas for that, many of them involving `ctypes`. I'm a bit loathe to add a dependency on `ctypes` though - even though it's in the Python standard library I worry that it might not be available on some architectures.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807817197, "label": "Hitting `_csv.Error: field larger than field limit (131072)`"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778843362", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/229", "id": 778843362, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MzM2Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:17:53Z", "updated_at": "2021-02-14T21:17:53Z", "author_association": "OWNER", "body": "Same issue as #227.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807817197, "label": "Hitting `_csv.Error: field larger than field limit (131072)`"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778811746", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778811746, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxMTc0Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T17:39:30Z", "updated_at": "2021-02-14T21:16:54Z", "author_association": "OWNER", "body": "I'm going to detach this from the #131 column types idea.\r\n\r\nThe three things I need to handle here are:\r\n\r\n- The CSV file doesn't have a header row at all, so I need to specify what the column names should be\r\n- The CSV file DOES have a header row but I want to ignore it and use alternative column names\r\n- The CSV doesn't have a header row at all and I want to automatically use `unknown1,unknown2...` so I can start exploring it as quickly as possible.\r\n\r\nHere's a potential design that covers the first two:\r\n\r\n`--replace-header=\"foo,bar,baz\"` - ignore whatever is in the first row and pretend it was this instead\r\n`--add-header=\"foo,bar,baz\"` - add a first row with these details, to use as the header\r\n\r\nIt doesn't cover the \"give me unknown column names\" case though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778843086", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778843086, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MzA4Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:15:43Z", "updated_at": "2021-02-14T21:15:43Z", "author_association": "OWNER", "body": "I'm not convinced the `.has_header()` rules are useful for the kind of CSV files I work with: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L383\r\n\r\n```python\r\n    def has_header(self, sample):\r\n        # Creates a dictionary of types of data in each column. If any\r\n        # column is of a single type (say, integers), *except* for the first\r\n        # row, then the first row is presumed to be labels. If the type\r\n        # can't be determined, it is assumed to be a string in which case\r\n        # the length of the string is the determining factor: if all of the\r\n        # rows except for the first are the same length, it's a header.\r\n        # Finally, a 'vote' is taken at the end for each column, adding or\r\n        # subtracting from the likelihood of the first row being a header.\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778842982", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778842982, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0Mjk4Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:15:11Z", "updated_at": "2021-02-14T21:15:11Z", "author_association": "OWNER", "body": "Implementation tip: I have code that reads the first row and uses it as headers here: https://github.com/simonw/sqlite-utils/blob/8f042ae1fd323995d966a94e8e6df85cc843b938/sqlite_utils/cli.py#L689-L691\r\n\r\nSo If I want to use `unknown1,unknown2...` I can do that by reading the first row, counting the number of columns, generating headers based on that range and then continuing to build that generator (maybe with `itertools.chain()` to replay the record we already read).\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/227#issuecomment-778841704", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/227", "id": 778841704, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MTcwNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:05:20Z", "updated_at": "2021-02-14T21:05:20Z", "author_association": "OWNER", "body": "This has also been reported in #229.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807174161, "label": "Error reading csv files with large column data"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/pull/225#issuecomment-778841547", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/225", "id": 778841547, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MTU0Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:04:13Z", "updated_at": "2021-02-14T21:04:13Z", "author_association": "OWNER", "body": "I added a test and fixed this in #234 - thanks for the fix.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 797159961, "label": "fix for problem in Table.insert_all on search for columns per chunk of rows"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/234#issuecomment-778841278", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/234", "id": 778841278, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODg0MTI3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T21:02:11Z", "updated_at": "2021-02-14T21:02:11Z", "author_association": "OWNER", "body": "I managed to replicate this in a test:\r\n```python\r\ndef test_insert_all_with_extra_columns_in_later_chunks(fresh_db):\r\n    chunk = [\r\n        {\"record\": \"Record 1\"},\r\n        {\"record\": \"Record 2\"},\r\n        {\"record\": \"Record 3\"},\r\n        {\"record\": \"Record 4\", \"extra\": 1},\r\n    ]\r\n    fresh_db[\"t\"].insert_all(chunk, batch_size=2, alter=True)\r\n    assert list(fresh_db[\"t\"].rows) == [\r\n        {\"record\": \"Record 1\", \"extra\": None},\r\n        {\"record\": \"Record 2\", \"extra\": None},\r\n        {\"record\": \"Record 3\", \"extra\": None},\r\n        {\"record\": \"Record 4\", \"extra\": 1},\r\n    ]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808046597, "label": ".insert_all() fails if subsequent chunks contain additional columns"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/pull/225#issuecomment-778834504", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/225", "id": 778834504, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgzNDUwNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T20:09:30Z", "updated_at": "2021-02-14T20:09:30Z", "author_association": "OWNER", "body": "Thanks for this. I'm going to try and get the test suite to run in Windows on GitHub Actions.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 797159961, "label": "fix for problem in Table.insert_all on search for columns per chunk of rows"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/231#issuecomment-778829456", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/231", "id": 778829456, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyOTQ1Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T19:37:52Z", "updated_at": "2021-02-14T19:37:52Z", "author_association": "OWNER", "body": "I'm going to add `limit` and `offset` to the following methods:\r\n\r\n- `rows_where()`\r\n- `search_sql()`\r\n- `search()`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808028757, "label": "limit=X, offset=Y parameters for more Python methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/231#issuecomment-778828758", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/231", "id": 778828758, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyODc1OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T19:33:14Z", "updated_at": "2021-02-14T19:33:14Z", "author_association": "OWNER", "body": "The `limit=` parameter is currently only available on the `.search()` method - it would make sense to add this to other methods as well.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808028757, "label": "limit=X, offset=Y parameters for more Python methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/pull/224#issuecomment-778828495", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/224", "id": 778828495, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyODQ5NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T19:31:06Z", "updated_at": "2021-02-14T19:31:06Z", "author_association": "OWNER", "body": "I'm going to add a `offset=` parameter to support this case. Thanks for the suggestion!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 792297010, "label": "Add fts offset docs."}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778827570", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778827570, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyNzU3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T19:24:20Z", "updated_at": "2021-02-14T19:24:20Z", "author_association": "OWNER", "body": "Here's the implementation in Python: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L204-L225", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778824361", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778824361, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyNDM2MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:59:22Z", "updated_at": "2021-02-14T18:59:22Z", "author_association": "OWNER", "body": "I think I've got it. I can use `io.BufferedReader()` to get an object I can run `.peek(2048)` on, then wrap THAT in `io.TextIOWrapper`:\r\n\r\n```python\r\n    encoding = encoding or \"utf-8\"\r\n    buffered = io.BufferedReader(json_file, buffer_size=4096)\r\n    decoded = io.TextIOWrapper(buffered, encoding=encoding, line_buffering=True)\r\n    if pk and len(pk) == 1:\r\n        pk = pk[0]\r\n    if csv or tsv:\r\n        if sniff:\r\n            # Read first 2048 bytes and use that to detect\r\n            first_bytes = buffered.peek(2048)\r\n            print('first_bytes', first_bytes)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778821403", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778821403, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgyMTQwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:38:16Z", "updated_at": "2021-02-14T18:38:16Z", "author_association": "OWNER", "body": "There are two code paths here that matter:\r\n\r\n- For a regular file, can read the first 2048 bytes, then `.seek(0)` before continuing. That's easy.\r\n- `stdin` is harder. I need to read and buffer the first 2048 bytes, then pass an object to `csv.reader()` which will replay that chunk and then play the rest of stdin.\r\n\r\nI'm a bit stuck on the second one. Ideally I could use something like `itertools.chain()` but I can't find an alternative for file-like objects.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778818639", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778818639, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxODYzOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:22:38Z", "updated_at": "2021-02-14T18:22:38Z", "author_association": "OWNER", "body": "Maybe I shouldn't be using `StreamReader` at all - https://www.python.org/dev/peps/pep-0400/ suggests that it should be deprecated in favour of `io.TextIOWrapper`. I'm using `StreamReader` due to this line: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L667-L668", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778817494", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778817494, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxNzQ5NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:16:06Z", "updated_at": "2021-02-14T18:16:06Z", "author_association": "OWNER", "body": "Types involved:\r\n```\r\n(Pdb) type(json_file.raw)\r\n<class '_io.FileIO'>\r\n(Pdb) type(json_file)\r\n<class 'encodings.utf_8.StreamReader'>\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778816333", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778816333, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxNjMzMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:08:44Z", "updated_at": "2021-02-14T18:08:44Z", "author_association": "OWNER", "body": "No, you can't `.seek(0)` on stdin:\r\n```\r\n  File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py\", line 678, in insert_upsert_implementation\r\n    json_file.raw.seek(0)\r\nOSError: [Errno 29] Illegal seek\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778815740", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778815740, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxNTc0MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T18:05:03Z", "updated_at": "2021-02-14T18:05:03Z", "author_association": "OWNER", "body": "The challenge here is how to read the first 2048 bytes and then reset the incoming file.\r\n\r\nThe Python docs example looks like this:\r\n\r\n```python\r\nwith open('example.csv', newline='') as csvfile:\r\n    dialect = csv.Sniffer().sniff(csvfile.read(1024))\r\n    csvfile.seek(0)\r\n    reader = csv.reader(csvfile, dialect)\r\n```\r\nHere's the relevant code in `sqlite-utils`: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L671-L679\r\n\r\nThe challenge is going to be having the `--sniff` option work with the progress bar. Here's how `file_progress()` works: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/utils.py#L106-L113\r\n\r\nIf `file.raw` is `stdin` can I do the equivalent of `csvfile.seek(0)` on it?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778812684", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/230", "id": 778812684, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxMjY4NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T17:45:16Z", "updated_at": "2021-02-14T17:45:16Z", "author_association": "OWNER", "body": "Running this could take any CSV (or TSV) file and automatically detect the delimiter. If no header row is detected it could add `unknown1,unknown2` headers:\r\n\r\n    sqlite-utils insert db.db data file.csv --sniff\r\n\r\n(Using `--sniff` would imply `--csv`)\r\n\r\nThis could be called `--sniffer` instead but I like `--sniff` better.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 808008305, "label": "--sniff option for sniffing delimiters"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778812050, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxMjA1MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T17:41:30Z", "updated_at": "2021-02-14T17:41:30Z", "author_association": "OWNER", "body": "I just spotted that `csv.Sniffer` in the Python standard library has a `.has_header(sample)` method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778811934", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778811934, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODgxMTkzNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-14T17:40:48Z", "updated_at": "2021-02-14T17:40:48Z", "author_association": "OWNER", "body": "Another pattern that might be useful is to generate a header that is just \"unknown1,unknown2,unknown3\" for each of the columns in the rest of the file. This makes it easy to e.g. facet-explore within Datasette to figure out the correct names, then use `sqlite-utils transform --rename` to rename the columns.\r\n\r\nI needed to do that for the https://bl.iro.bl.uk/work/ns/3037474a-761c-456d-a00c-9ef3c6773f4c example.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778511347", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 778511347, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODUxMTM0Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-12T23:27:50Z", "updated_at": "2021-02-12T23:27:50Z", "author_association": "OWNER", "body": "For the moment, a workaround can be to `cat` an additional row onto the start of the file.\r\n\r\n    echo \"name,url,description\" | cat - missing_headings.csv | sqlite-utils insert blah.db table - --csv", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/131#issuecomment-778510528", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/131", "id": 778510528, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODUxMDUyOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-12T23:25:06Z", "updated_at": "2021-02-12T23:25:06Z", "author_association": "OWNER", "body": "If `-c` isn't available, maybe `-t` or `--type` would work for specifying column types:\r\n```\r\nsqlite-utils insert db.db images images.tsv \\\r\n  --tsv \\\r\n  --type id int \\\r\n  --type score float\r\n```\r\nor\r\n```\r\nsqlite-utils insert db.db images images.tsv \\\r\n  --tsv \\\r\n  -t id int \\\r\n  -t score float\r\n```", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 675753042, "label": "sqlite-utils insert: options for column types"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/131#issuecomment-778508887", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/131", "id": 778508887, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODUwODg4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-12T23:20:11Z", "updated_at": "2021-02-12T23:20:11Z", "author_association": "OWNER", "body": "Annoyingly `-c` is currently a shortcut for `--csv` - so I'd have to do a major version bump to use that.\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L601-L603\r\n\r\nParticularly annoying because I attempted to remove the `-c` shortcut in https://github.com/simonw/sqlite-utils/commit/2c00567aac6d9c79087cfff0d054f64922b1473d#diff-76294b3d4afeb27e74e738daa01c26dd4dc9ccb6f4477451483a2ece1095902eL48 but forgot to remove it from the input options (I removed it from the output options).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 675753042, "label": "sqlite-utils insert: options for column types"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1220#issuecomment-778467759", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1220", "id": 778467759, "node_id": "MDEyOklzc3VlQ29tbWVudDc3ODQ2Nzc1OQ==", "user": {"value": 30607, "label": "aborruso"}, "created_at": "2021-02-12T21:35:17Z", "updated_at": "2021-02-12T21:35:17Z", "author_association": "NONE", "body": "Thank you", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 806743116, "label": "Installing datasette via docker: Path 'fixtures.db' does not exist"}, "performed_via_github_app": null}