github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/pull/1835#issuecomment-1270586897 | https://api.github.com/repos/simonw/datasette/issues/1835 | 1270586897 | IC_kwDOBm6k_c5Lu54R | 9599 | 2022-10-06T19:34:00Z | 2022-10-06T19:34:00Z | OWNER | Wow, great catch! The whole point of inspect data was to avoid this kind of expensive operation on startup so this makes total sense - I had no idea Datasette was still trying to hash a giant file every time the server started. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1400121355 | |
https://github.com/simonw/datasette/issues/1480#issuecomment-1269275153 | https://api.github.com/repos/simonw/datasette/issues/1480 | 1269275153 | IC_kwDOBm6k_c5Lp5oR | 9599 | 2022-10-06T03:54:33Z | 2022-10-06T03:54:33Z | OWNER | I've been having success using Fly recently for a project which I thought would be too large for Cloud Run. I wrote about that here: - https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/ | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1015646369 | |
https://github.com/simonw/datasette/issues/1832#issuecomment-1267925830 | https://api.github.com/repos/simonw/datasette/issues/1832 | 1267925830 | IC_kwDOBm6k_c5LkwNG | 9599 | 2022-10-05T04:31:57Z | 2022-10-05T04:31:57Z | OWNER | Turns out this already works - `__bool__` falls back on `__len__`: https://docs.python.org/3/reference/datamodel.html#object.__bool__ > When this method is not defined, [`__len__()`](https://docs.python.org/3/reference/datamodel.html#object.__len__ "object.__len__") is called, if it is defined, and the object is considered true if its result is nonzero. I'll add a test to demonstrate this. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1397193691 | |
https://github.com/simonw/datasette/issues/1832#issuecomment-1267918117 | https://api.github.com/repos/simonw/datasette/issues/1832 | 1267918117 | IC_kwDOBm6k_c5LkuUl | 9599 | 2022-10-05T04:19:52Z | 2022-10-05T04:19:52Z | OWNER | Code can go here: https://github.com/simonw/datasette/blob/b6ba117b7978b58b40e3c3c2b723b92c3010ed53/datasette/database.py#L511-L515 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1397193691 | |
https://github.com/simonw/datasette/issues/1829#issuecomment-1267709546 | https://api.github.com/repos/simonw/datasette/issues/1829 | 1267709546 | IC_kwDOBm6k_c5Lj7Zq | 9599 | 2022-10-04T23:19:24Z | 2022-10-04T23:21:07Z | OWNER | There's also a `check_visibility()` helper which I'm not using in these particular cases but which may be relevant. It's called like this: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L65-L77 And is defined here: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/app.py#L694-L710 It's actually documented as a public method here: https://docs.datasette.io/en/stable/internals.html#await-check-visibility-actor-action-resource-none > This convenience method can be used to answer the question "should this item be considered private, in that it is visible to me but it is not visible to anonymous users?" > > It returns a tuple of two booleans, `(visible, private)`. `visible` indicates if the actor can see this resource. `private` will be `True` if an anonymous user would not be able to view the resource. Note that this documented method cannot actually do the right thing - because it's not being given the multiple permissions that need to be checked in order to completely answer the question. So I probably need to redesign that method a bit. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1396948693 | |
https://github.com/simonw/datasette/issues/1829#issuecomment-1267708232 | https://api.github.com/repos/simonw/datasette/issues/1829 | 1267708232 | IC_kwDOBm6k_c5Lj7FI | 9599 | 2022-10-04T23:17:36Z | 2022-10-04T23:17:36Z | OWNER | Here's the relevant code from the table page: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/table.py#L215-L227 Note how `ensure_permissions()` there takes the table, database and instance into account... but the `private` assignment (used to decide if the padlock should display or not) only considers the `view-table` check. Here's the same code for the database page: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L139-L141 And for canned query pages: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L228-L240 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1396948693 | |
https://github.com/simonw/datasette/issues/485#issuecomment-1264769569 | https://api.github.com/repos/simonw/datasette/issues/485 | 1264769569 | IC_kwDOBm6k_c5LYtoh | 9599 | 2022-10-03T00:04:42Z | 2022-10-03T00:04:42Z | OWNER | I love these tips - tools that can compile a simple machine learning model to a SQL query! Would be pretty cool if I could bundle a model in Datasette itself as a big in-memory SQLite SQL query: - https://github.com/Chryzanthemum/xgb2sql - https://github.com/konstantint/SKompiler | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
447469253 | |
https://github.com/simonw/datasette/issues/1805#issuecomment-1264753894 | https://api.github.com/repos/simonw/datasette/issues/1805 | 1264753894 | IC_kwDOBm6k_c5LYpzm | 9599 | 2022-10-02T23:02:54Z | 2022-10-02T23:02:54Z | OWNER | I'm tempted to add `word-wrap: anywhere` only to links that are know to be longer than a certain threshold. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1363552780 | |
https://github.com/simonw/datasette/issues/1805#issuecomment-1264753725 | https://api.github.com/repos/simonw/datasette/issues/1805 | 1264753725 | IC_kwDOBm6k_c5LYpw9 | 9599 | 2022-10-02T23:02:17Z | 2022-10-02T23:02:17Z | OWNER | After reverting `word--wrap anywhere` https://latest.datasette.io/_memory?sql=select+%27https%3A%2F%2Fexample.com%2Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1363552780 | |
https://github.com/simonw/datasette/issues/1828#issuecomment-1264753439 | https://api.github.com/repos/simonw/datasette/issues/1828 | 1264753439 | IC_kwDOBm6k_c5LYpsf | 9599 | 2022-10-02T23:01:17Z | 2022-10-02T23:01:17Z | OWNER | That change deployed and https://github-to-sqlite.dogsheep.net/github/commits now looks like this: <img width="1388" alt="image" src="https://user-images.githubusercontent.com/9599/193480158-de81ac0a-5cb2-4d53-a75c-025c78f293ee.png"> | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1393903845 | |
https://github.com/simonw/datasette/issues/1828#issuecomment-1264738081 | https://api.github.com/repos/simonw/datasette/issues/1828 | 1264738081 | IC_kwDOBm6k_c5LYl8h | 9599 | 2022-10-02T21:34:37Z | 2022-10-02T21:34:37Z | OWNER | I'm running a build of that demo instance here (takes ~30m) https://github.com/dogsheep/github-to-sqlite/actions/runs/3170164705 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1393903845 | |
https://github.com/simonw/datasette/issues/485#issuecomment-1264737290 | https://api.github.com/repos/simonw/datasette/issues/485 | 1264737290 | IC_kwDOBm6k_c5LYlwK | 9599 | 2022-10-02T21:29:59Z | 2022-10-02T21:29:59Z | OWNER | To clarify: the feature this issue is talking about relates to the way Datasette automatically displays foreign key relationships, for example on this page: https://github-to-sqlite.dogsheep.net/github/commits <img width="1233" alt="image" src="https://user-images.githubusercontent.com/9599/193476985-d41148cf-2b2f-49b9-b717-e92145afab31.png"> Each of those columns is a foreign key to another table. The link text that is displayed there comes from the "label column" that has either been configured or automatically detected for that other table. I wonder if this could be handled with a tiny machine learning model that's trained to help pick the best label column? Inputs to that model could include: - The names of the columns - The number of unique values in each column - The type of each column (or maybe only `TEXT` columns should be considered) - How many `null` values there are - Is the column marked as unique? - What's the average (or median or some other statistic) string length of values in each column? Output would be the most likely label column, or some indicator that no likely candidates had been found. My hunch is that this would be better solved using a few extra heuristics rather than by training a model, but it does feel like an interesting opportunity to experiment with a tiny ML model. Asked for tips about this on Twitter: https://twitter.com/simonw/status/1576680930680262658 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
447469253 | |
https://github.com/simonw/datasette/issues/1805#issuecomment-1264736537 | https://api.github.com/repos/simonw/datasette/issues/1805 | 1264736537 | IC_kwDOBm6k_c5LYlkZ | 9599 | 2022-10-02T21:25:37Z | 2022-10-02T21:25:37Z | OWNER | `word-wrap: anywhere` had some nasty side-effects, removing that: - #1828 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1363552780 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262920929 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262920929 | IC_kwDOCGYnMM5LRqTh | 9599 | 2022-09-29T23:06:44Z | 2022-09-29T23:06:44Z | OWNER | Currently the only other use of `-t` is for this: ``` -t, --table Output as a formatted table ``` So I think it's OK to use it to mean something slightly different for this command, since `sqlite-utils insert` doesn't do any output of data in any format. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262918833 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262918833 | IC_kwDOCGYnMM5LRpyx | 9599 | 2022-09-29T23:02:52Z | 2022-09-29T23:02:52Z | OWNER | The other nice thing about having this as a separate command is that I can implement a tiny subset of the overall `sqlite-utils insert` features at first, and then add additional features in subsequent releases. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262917059 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262917059 | IC_kwDOCGYnMM5LRpXD | 9599 | 2022-09-29T22:59:28Z | 2022-09-29T22:59:28Z | OWNER | I quite like `sqlite-utils fast-csv` - I think it's clear enough what it does, and running `--help` can clarify if needed. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262915322 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262915322 | IC_kwDOCGYnMM5LRo76 | 9599 | 2022-09-29T22:57:31Z | 2022-09-29T22:57:42Z | OWNER | Maybe `sqlite-utils fast-csv` is right? Not entirely clear that's an insert though as opposed to a faster version of in-memory querying in the style of `sqlite-utils memory`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262914416 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262914416 | IC_kwDOCGYnMM5LRotw | 9599 | 2022-09-29T22:56:53Z | 2022-09-29T22:56:53Z | OWNER | Potential names/designs: - `sqlite-utils fast data.db rows rows.csv` - `sqlite-utils insert-fast data.db rows rows.csv` - `sqlite-utils fast-csv data.db rows rows.csv` Or more interestingly... what if it could accept multiple CSV files to create multiple tables? - `sqlite-utils fast data.db rows.csv other.csv` Would still need to support creating tables with different names though. Maybe like this: - `sqlite-utils fast data.db -t mytable rows.csv -t othertable other.csv` I seem to be leaning towards `fast` as the command name, but as a standalone command name it's a bit meaningless - how do we know that's about CSV import and not about fast querying or similar? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262913145 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1262913145 | IC_kwDOCGYnMM5LRoZ5 | 9599 | 2022-09-29T22:54:13Z | 2022-09-29T22:54:13Z | OWNER | After reviewing `sqlite-utils insert --help` I'm confident that MOST of these options wouldn't make sense for a "fast" moder that just supports CSV and works by piping directly to the `sqlite3` binary: https://github.com/simonw/sqlite-utils/blob/d792dad1cf5f16525da81b1e162fb71d469995f3/docs/cli-reference.rst#L251-L279 I'm going to implement a separate command instead. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/datasette/pull/1825#issuecomment-1260368537 | https://api.github.com/repos/simonw/datasette/issues/1825 | 1260368537 | IC_kwDOBm6k_c5LH7KZ | 9599 | 2022-09-28T04:21:18Z | 2022-09-28T04:21:18Z | OWNER | This is great, thank you very much! https://datasette--1825.org.readthedocs.build/en/1825/deploying.html#running-datasette-using-openrc | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1388227245 | |
https://github.com/simonw/datasette/issues/1826#issuecomment-1260357878 | https://api.github.com/repos/simonw/datasette/issues/1826 | 1260357878 | IC_kwDOBm6k_c5LH4j2 | 9599 | 2022-09-28T04:05:45Z | 2022-09-28T04:05:45Z | OWNER | Though now I notice that the copy right there needs to be updated to reflect the new `row` parameter to `render_cell`! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1388631785 | |
https://github.com/simonw/datasette/issues/1826#issuecomment-1260357583 | https://api.github.com/repos/simonw/datasette/issues/1826 | 1260357583 | IC_kwDOBm6k_c5LH4fP | 9599 | 2022-09-28T04:05:16Z | 2022-09-28T04:05:16Z | OWNER | This is deliberate. The Datasette plugin system allows you to specify only a subset of the parameters for a hook - in this example, only the `value` is needed so the others can be omitted. There's a note about this at the very top of that documentation page: https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hooks > When you implement a plugin hook you can accept any or all of the parameters that are documented as being passed to that hook. > > For example, you can implement the `render_cell` plugin hook like this even though the full documented hook signature is `render_cell(value, column, table, database, datasette)`: > ```python > @hookimpl > def render_cell(value, column): > if column == "stars": > return "*" * int(value) > ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1388631785 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1260355224 | https://api.github.com/repos/simonw/datasette/issues/526 | 1260355224 | IC_kwDOBm6k_c5LH36Y | 9599 | 2022-09-28T04:01:25Z | 2022-09-28T04:01:25Z | OWNER | The ultimate protection against those memory bombs is to support more streaming output formats. Related issues: - #1177 - #1062 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1259693536 | https://api.github.com/repos/simonw/datasette/issues/526 | 1259693536 | IC_kwDOBm6k_c5LFWXg | 9599 | 2022-09-27T15:42:55Z | 2022-09-27T15:42:55Z | OWNER | It's interesting to note WHY the time limit works against this so well. The time limit as-implemented looks like this: https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201 The key here is `conn.set_progress_handler(handler, n)` - which specifies that the handler function should be called every `n` SQLite operations. The handler function then checks to see if too much time has transpired and conditionally cancels the query. This also doubles up as a "maximum number of operations" guard, which is what's happening when you attempt to fetch an infinite number of rows from an infinite table. That limit code could even be extended to say "exit the query after either 5s or 50,000,000 operations". I don't think that's necessary though. To be honest I'm having trouble with the idea of dropping `max_returned_rows` mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I've designed in multiple redundant defence-in-depth mechanisms right from the start. | { "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1258906440 | https://api.github.com/repos/simonw/datasette/issues/526 | 1258906440 | IC_kwDOBm6k_c5LCWNI | 9599 | 2022-09-27T03:04:37Z | 2022-09-27T03:04:37Z | OWNER | It would be really neat if we could explore this idea in a plugin, but I don't think Datasette has plugin hooks in the right place for that at the moment. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1258905781 | https://api.github.com/repos/simonw/datasette/issues/526 | 1258905781 | IC_kwDOBm6k_c5LCWC1 | 9599 | 2022-09-27T03:03:35Z | 2022-09-27T03:03:47Z | OWNER | Yes good point, the time limit does already protect against that. I've been contemplating a permissioned-users-only relaxation of that time limit too, and I got that idea mixed up with this one in my head. On that basis maybe this feature would be safe after all? Would need to do some testing, but it may be that the existing time limit provides enough protection here already. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1258864140 | https://api.github.com/repos/simonw/datasette/issues/526 | 1258864140 | IC_kwDOBm6k_c5LCL4M | 9599 | 2022-09-27T01:55:32Z | 2022-09-27T01:55:32Z | OWNER | That recursive query is a great example of the kind of thing having a maximum row limit protects against. Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever. Even if this feature becomes a permission-guarded thing we still need to take that case into account. At the very least it would be good if the query could be cancelled if the client disconnects - so if someone accidentally starts an infinite query they can cancel the request and free up the server resources. It might be a good idea to implement a page that shows "currently running" queries and allows users with the right permission to terminate them from that page. Another option: a "limit of last resource" - either a very high row limit (10,000,000 perhaps) or even a time limit, saying that all queries will be cancelled if they take longer than thirty minutes or similar. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1258860845 | https://api.github.com/repos/simonw/datasette/issues/526 | 1258860845 | IC_kwDOBm6k_c5LCLEt | 9599 | 2022-09-27T01:48:31Z | 2022-09-27T01:50:01Z | OWNER | The protection is supposed to be from this line: ```python rows = cursor.fetchmany(max_returned_rows + 1) ``` By capping the call to `.fetchman()` at `max_returned_rows + 1` (the `+ 1` is to allow detection of whether or not there is a next page) I'm ensuring that Datasette never attempts to iterate over a huge result set. SQLite and the `sqlite3` library seem to handle this correctly. Here's an example: ```pycon >>> import sqlite3 >>> conn = sqlite3.connect(":memory:") >>> cursor = conn.execute(""" ... with recursive counter(x) as ( ... select 0 ... union ... select x + 1 from counter ... ) ... select * from counter""") >>> cursor.fetchmany(10) [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)] ``` `counter` there is an infinitely long table ([see TIL](https://til.simonwillison.net/sqlite/simple-recursive-cte)) - but we can retrieve the first 10 results without going into an infinite loop. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/issues/526#issuecomment-1258846992 | https://api.github.com/repos/simonw/datasette/issues/526 | 1258846992 | IC_kwDOBm6k_c5LCHsQ | 9599 | 2022-09-27T01:21:41Z | 2022-09-27T01:21:41Z | OWNER | My main concern here is that public Datasette instances could easily have all of their available database connections consumed by long-running queries - either accidentally or deliberately. I do totally understand the need for this feature though. I think it can absolutely make sense provided it's protected by authentication and permissions. Maybe even limit the number of concurrent downloads at once such that there's always at least one database connection free for other requests. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
459882902 | |
https://github.com/simonw/datasette/pull/1823#issuecomment-1258828705 | https://api.github.com/repos/simonw/datasette/issues/1823 | 1258828705 | IC_kwDOBm6k_c5LCDOh | 9599 | 2022-09-27T00:45:46Z | 2022-09-27T00:45:46Z | OWNER | Also need to do a bit more of an audit to see if there is anywhere else that this style should be applied. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386917344 | |
https://github.com/simonw/datasette/pull/1823#issuecomment-1258828509 | https://api.github.com/repos/simonw/datasette/issues/1823 | 1258828509 | IC_kwDOBm6k_c5LCDLd | 9599 | 2022-09-27T00:45:26Z | 2022-09-27T00:45:26Z | OWNER | I should update the documentation to reflect this change. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386917344 | |
https://github.com/simonw/datasette/issues/1822#issuecomment-1258827688 | https://api.github.com/repos/simonw/datasette/issues/1822 | 1258827688 | IC_kwDOBm6k_c5LCC-o | 9599 | 2022-09-27T00:44:04Z | 2022-09-27T00:44:04Z | OWNER | I'll do this in a PR. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386854246 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1258818028 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1258818028 | IC_kwDOBm6k_c5LCAns | 9599 | 2022-09-27T00:27:53Z | 2022-09-27T00:27:53Z | OWNER | Made a start on this: ```diff diff --git a/datasette/hookspecs.py b/datasette/hookspecs.py index 34e19664..fe0971e5 100644 --- a/datasette/hookspecs.py +++ b/datasette/hookspecs.py @@ -31,25 +31,29 @@ def prepare_jinja2_environment(env, datasette): @hookspec -def extra_css_urls(template, database, table, columns, view_name, request, datasette): +def extra_css_urls( + template, database, table, columns, sql, params, view_name, request, datasette +): """Extra CSS URLs added by this plugin""" @hookspec -def extra_js_urls(template, database, table, columns, view_name, request, datasette): +def extra_js_urls( + template, database, table, columns, sql, params, view_name, request, datasette +): """Extra JavaScript URLs added by this plugin""" @hookspec def extra_body_script( - template, database, table, columns, view_name, request, datasette + template, database, table, columns, sql, params, view_name, request, datasette ): """Extra JavaScript code to be included in <script> at bottom of body""" @hookspec def extra_template_vars( - template, database, table, columns, view_name, request, datasette + template, database, table, columns, sql, params, view_name, request, datasette ): """Extra template variables to be made available to the template - can return dict or callable or awaitable""" ``` ```diff diff --git a/datasette/app.py b/datasette/app.py index 03d1dacc..2f3a46fe 100644 --- a/datasette/app.py +++ b/datasette/app.py @@ -1036,7 +1036,9 @@ class Datasette: return await template.render_async(template_context) - async def _asset_urls(self, key, template, context, request, view_name): + async def _asset_urls( + self, key, template, context, request, view_name, sql, params + ): # Flatten list-of-lists from plugins: seen_urls = set() collected = [] @@ -1045,6 +1047,8 @@ class Datasette: database=context.get("database"), … | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/datasette/issues/1822#issuecomment-1258760299 | https://api.github.com/repos/simonw/datasette/issues/1822 | 1258760299 | IC_kwDOBm6k_c5LByhr | 9599 | 2022-09-26T23:25:12Z | 2022-09-26T23:25:55Z | OWNER | A start: ```diff diff --git a/datasette/utils/asgi.py b/datasette/utils/asgi.py index 8a2fa060..41ade961 100644 --- a/datasette/utils/asgi.py +++ b/datasette/utils/asgi.py @@ -118,7 +118,7 @@ class Request: return dict(parse_qsl(body.decode("utf-8"), keep_blank_values=True)) @classmethod - def fake(cls, path_with_query_string, method="GET", scheme="http", url_vars=None): + def fake(cls, path_with_query_string, *, method="GET", scheme="http", url_vars=None): """Useful for constructing Request objects for tests""" path, _, query_string = path_with_query_string.partition("?") scope = { @@ -204,7 +204,7 @@ class AsgiWriter: ) -async def asgi_send_json(send, info, status=200, headers=None): +async def asgi_send_json(send, info, *, status=200, headers=None): headers = headers or {} await asgi_send( send, @@ -215,7 +215,7 @@ async def asgi_send_json(send, info, status=200, headers=None): ) -async def asgi_send_html(send, html, status=200, headers=None): +async def asgi_send_html(send, html, *, status=200, headers=None): headers = headers or {} await asgi_send( send, @@ -226,7 +226,7 @@ async def asgi_send_html(send, html, status=200, headers=None): ) -async def asgi_send_redirect(send, location, status=302): +async def asgi_send_redirect(send, location, *, status=302): await asgi_send( send, "", @@ -236,12 +236,12 @@ async def asgi_send_redirect(send, location, status=302): ) -async def asgi_send(send, content, status, headers=None, content_type="text/plain"): +async def asgi_send(send, content, status, *, headers=None, content_type="text/plain"): await asgi_start(send, status, headers, content_type) await send({"type": "http.response.body", "body": content.encode("utf-8")}) -async def asgi_start(send, status, headers=None, content_type="text/plain"): +async def asgi_start(send, status, *, headers=None, con… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386854246 | |
https://github.com/simonw/datasette/issues/1822#issuecomment-1258757544 | https://api.github.com/repos/simonw/datasette/issues/1822 | 1258757544 | IC_kwDOBm6k_c5LBx2o | 9599 | 2022-09-26T23:21:23Z | 2022-09-26T23:21:23Z | OWNER | Everything on https://docs.datasette.io/en/stable/internals.html that uses keyword arguments should do this I think. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386854246 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1258756231 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1258756231 | IC_kwDOBm6k_c5LBxiH | 9599 | 2022-09-26T23:19:34Z | 2022-09-26T23:19:34Z | OWNER | This is a good idea - it's something I should do before Datasette 1.0. I was a tiny bit worried about compatibility (Datasette is 3.7+) but it looks like they have been in Python since 3.0! | { "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/datasette/issues/1819#issuecomment-1258754105 | https://api.github.com/repos/simonw/datasette/issues/1819 | 1258754105 | IC_kwDOBm6k_c5LBxA5 | 9599 | 2022-09-26T23:16:15Z | 2022-09-26T23:16:15Z | OWNER | Demo: https://latest.datasette.io/_memory?sql=with+recursive+counter(x)+as+(%0D%0A++select+0%0D%0A++++union%0D%0A++select+x+%2B+1+from+counter%0D%0A)%2C%0D%0Ablah+as+(select+*+from+counter+limit+5000000)%0D%0Aselect+count(*)+from+blah | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1385026210 | |
https://github.com/simonw/datasette/issues/1819#issuecomment-1258746600 | https://api.github.com/repos/simonw/datasette/issues/1819 | 1258746600 | IC_kwDOBm6k_c5LBvLo | 9599 | 2022-09-26T23:05:40Z | 2022-09-26T23:05:40Z | OWNER | Implementing it like this, so at least you can copy and paste the SQL query back out again: <img width="796" alt="image" src="https://user-images.githubusercontent.com/9599/192395953-48512c94-10e0-4cf8-8ae5-b9e65e3d7b0f.png"> I'm not doing a full textarea because this error can be raised in multiple places, including on the table page itself. It's not just an error associated with the manual query page. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1385026210 | |
https://github.com/simonw/datasette/issues/1819#issuecomment-1258738435 | https://api.github.com/repos/simonw/datasette/issues/1819 | 1258738435 | IC_kwDOBm6k_c5LBtMD | 9599 | 2022-09-26T22:52:19Z | 2022-09-26T22:52:19Z | OWNER | This is a good idea. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1385026210 | |
https://github.com/simonw/datasette/issues/1818#issuecomment-1258735747 | https://api.github.com/repos/simonw/datasette/issues/1818 | 1258735747 | IC_kwDOBm6k_c5LBsiD | 9599 | 2022-09-26T22:47:59Z | 2022-09-26T22:47:59Z | OWNER | Another option here is to tie into a feature I built in `sqlite-utils` with this problem in mind but never introduced on the Datasette side of things: https://sqlite-utils.datasette.io/en/stable/python-api.html#cached-table-counts-using-triggers | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384549993 | |
https://github.com/simonw/datasette/issues/1818#issuecomment-1258735283 | https://api.github.com/repos/simonw/datasette/issues/1818 | 1258735283 | IC_kwDOBm6k_c5LBsaz | 9599 | 2022-09-26T22:47:19Z | 2022-09-26T22:47:19Z | OWNER | That's a really interesting idea: for a lot of databases (those made out of straight imports from CSV) `max(rowid)` would indeed reflect the size of the table, but would be a MUCH faster operation than attempting a `count(*)`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384549993 | |
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258697384 | https://api.github.com/repos/simonw/sqlite-utils/issues/491 | 1258697384 | IC_kwDOCGYnMM5LBjKo | 9599 | 2022-09-26T22:12:45Z | 2022-09-26T22:12:45Z | OWNER | That feels like a slightly different command to me - maybe `sqlite-utils backup data.db data-backup.db`? It doesn't have any of the mechanics for merging tables together. Could be a useful feature separately though. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1383646615 | |
https://github.com/simonw/datasette/issues/1821#issuecomment-1258692555 | https://api.github.com/repos/simonw/datasette/issues/1821 | 1258692555 | IC_kwDOBm6k_c5LBh_L | 9599 | 2022-09-26T22:06:39Z | 2022-09-26T22:06:39Z | OWNER | - https://github.com/simonw/datasette/actions/runs/3131344150 - https://github.com/simonw/datasette/releases/tag/0.63a0 - https://pypi.org/project/datasette/0.63a0/ | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386734383 | |
https://github.com/simonw/sqlite-utils/issues/494#issuecomment-1258521333 | https://api.github.com/repos/simonw/sqlite-utils/issues/494 | 1258521333 | IC_kwDOCGYnMM5LA4L1 | 9599 | 2022-09-26T19:32:36Z | 2022-09-26T19:32:36Z | OWNER | Tweeted about it too: https://twitter.com/simonw/status/1574481628507668480 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386593843 | |
https://github.com/simonw/sqlite-utils/issues/494#issuecomment-1258516872 | https://api.github.com/repos/simonw/sqlite-utils/issues/494 | 1258516872 | IC_kwDOCGYnMM5LA3GI | 9599 | 2022-09-26T19:28:36Z | 2022-09-26T19:28:36Z | OWNER | New documentation: https://sqlite-utils.datasette.io/en/latest/contributing.html#using-just-and-pipenv | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386593843 | |
https://github.com/simonw/sqlite-utils/issues/483#issuecomment-1258479462 | https://api.github.com/repos/simonw/sqlite-utils/issues/483 | 1258479462 | IC_kwDOCGYnMM5LAt9m | 9599 | 2022-09-26T19:04:29Z | 2022-09-26T19:04:43Z | OWNER | Documentation: - https://sqlite-utils.datasette.io/en/latest/cli.html#cli-install - https://sqlite-utils.datasette.io/en/latest/cli.html#cli-uninstall - https://sqlite-utils.datasette.io/en/latest/cli-reference.html#install - https://sqlite-utils.datasette.io/en/latest/cli-reference.html#uninstall | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1363765916 | |
https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1258476455 | https://api.github.com/repos/simonw/sqlite-utils/issues/493 | 1258476455 | IC_kwDOCGYnMM5LAtOn | 9599 | 2022-09-26T19:01:49Z | 2022-09-26T19:01:49Z | OWNER | I tried the tips in https://stackoverflow.com/questions/15258831/how-to-handle-two-dashes-in-rest (not the settings change though, because I might want smart quotes elsewhere) and they didn't work. Maybe I should disable smart quotes entirely? I feel like there should be an escaping trick that works here though. I tried `insert -\\-convert` but it didn't help. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386562662 | |
https://github.com/simonw/sqlite-utils/issues/483#issuecomment-1258451968 | https://api.github.com/repos/simonw/sqlite-utils/issues/483 | 1258451968 | IC_kwDOCGYnMM5LAnQA | 9599 | 2022-09-26T18:37:54Z | 2022-09-26T18:40:41Z | OWNER | The implementation of this can be an almost exact copy of Datasette's, which was added in this commit: https://github.com/simonw/datasette/commit/01fe5b740171bfaea3752fc5754431dac53777e3 Current code for that is here: https://github.com/simonw/datasette/blob/0.62/datasette/cli.py#L319-L340 - which is improved to use the `from runpy import run_module` function. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1363765916 | |
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258450447 | https://api.github.com/repos/simonw/sqlite-utils/issues/491 | 1258450447 | IC_kwDOCGYnMM5LAm4P | 9599 | 2022-09-26T18:36:23Z | 2022-09-26T18:36:23Z | OWNER | This is also the kind of feature that would need to express itself in both the Python library and the CLI utility. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1383646615 | |
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258449887 | https://api.github.com/repos/simonw/sqlite-utils/issues/491 | 1258449887 | IC_kwDOCGYnMM5LAmvf | 9599 | 2022-09-26T18:35:50Z | 2022-09-26T18:35:50Z | OWNER | This is a really interesting idea. I'm nervous about needing to set the rules for how duplicate tables should be merged though. This feels like a complex topic - one where there isn't necessarily an obviously "correct" way of doing it, but where different problems that people are solving might need different merging approaches. Likewise, merging isn't just a database-to-database thing at that point - I could see a need for merging two tables using similar rules to those used for merging two databases. So I think I'd want to have some good concrete use-cases in mind before trying to design how something like this should work. Will leave this thread open for people to drop those in! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1383646615 | |
https://github.com/simonw/sqlite-utils/issues/492#issuecomment-1258446128 | https://api.github.com/repos/simonw/sqlite-utils/issues/492 | 1258446128 | IC_kwDOCGYnMM5LAl0w | 9599 | 2022-09-26T18:32:14Z | 2022-09-26T18:33:19Z | OWNER | This idea would make more sense if there was a good mechanism to say "run the conversion script held in this file" as opposed to passing it as an option. This would also make having to remember bash escaping rules ([see tip](https://til.simonwillison.net/zsh/argument-heredoc)) much easier! `shot-scraper` has that for `--javascript`, using the `--input` option: https://shot-scraper.datasette.io/en/stable/javascript.html#shot-scraper-javascript-help Maybe `--convert-script` would work here? Or `--convert-file`? It should accept `-` for stdin too. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1386530156 | |
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1258437060 | https://api.github.com/repos/simonw/sqlite-utils/issues/490 | 1258437060 | IC_kwDOCGYnMM5LAjnE | 9599 | 2022-09-26T18:24:44Z | 2022-09-26T18:24:44Z | OWNER | Just saw your great write-up on this: https://jeqo.github.io/notes/2022-09-24-ingest-logs-sqlite/ | { "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
1382457780 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1256662785 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1256662785 | IC_kwDOBm6k_c5K5ycB | 9599 | 2022-09-23T20:53:21Z | 2022-09-23T20:53:21Z | OWNER | Maybe the signature for that method should be: ```python async def render_template( self, templates, context=None, plugin_context=None, request=None, view_name=None ): ``` Where `plugin_context` is a special dictionary of values that can be passed through to plugin hooks that accept them - so `database`, `table`, `columns`, `sql` and `params`. Those would then be passed when specific views call `render_template()` - which they currently do via calling `BaseView.render(...)`, but actually the views that are used for tables and queries don't even call that directly due to the weird designed used with `DataView` subclasses that implement a `.data()` method. So yet another change that's blocked on fixing that long-running weird piece of technical debt: - https://github.com/simonw/datasette/issues/1518 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1256659788 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1256659788 | IC_kwDOBm6k_c5K5xtM | 9599 | 2022-09-23T20:49:22Z | 2022-09-23T20:49:22Z | OWNER | Implementation challenge: all four of those hooks are called inside the `datasette.render_template()` method, which has this signature: https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/app.py#L945-L947 So I would have to pull the `sql` and `params` variables out of the `context` since they are not being passed to that method. OR I could teach that method to take those as optional arguments. Might be an opportunity to clean up this hack: https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/app.py#L959-L964 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1256652548 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1256652548 | IC_kwDOBm6k_c5K5v8E | 9599 | 2022-09-23T20:41:32Z | 2022-09-23T20:41:32Z | OWNER | Which plugin hooks should take `sql` and `params`? - [extra_template_vars(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-template-vars-template-database-table-columns-view-name-request-datasette) - [extra_css_urls(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-css-urls-template-database-table-columns-view-name-request-datasette) - [extra_js_urls(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-js-urls-template-database-table-columns-view-name-request-datasette) - [extra_body_script(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-body-script-template-database-table-columns-view-name-request-datasette) And maybe these: - [render_cell(row, value, column, table, database, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#render-cell-row-value-column-table-database-datasette) - [table_actions(datasette, actor, database, table, request)](https://docs.datasette.io/en/0.62/plugin_hooks.html#table-actions-datasette-actor-database-table-request) I'll start by implementing the first set, then I'll think further about those "maybes". | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/datasette/issues/1817#issuecomment-1256650449 | https://api.github.com/repos/simonw/datasette/issues/1817 | 1256650449 | IC_kwDOBm6k_c5K5vbR | 9599 | 2022-09-23T20:38:53Z | 2022-09-23T20:38:53Z | OWNER | I've wanted something like this in the past too. I think the thing to do here might be to add `sql` and `params` arguments to a bunch of the plugin hooks, such that they can see the main query that is being used on the page that they are helping to render. While I'm working on this: https://docs.datasette.io/en/0.62/plugin_hooks.html#register-output-renderer-datasette output renderer functions take `sql` but do not currently take `params` - they should also take `params`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1384273985 | |
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1256428818 | https://api.github.com/repos/simonw/sqlite-utils/issues/490 | 1256428818 | IC_kwDOCGYnMM5K45US | 9599 | 2022-09-23T16:37:58Z | 2022-09-23T16:38:35Z | OWNER | It should be possible to achieve this with the `--text` option: https://sqlite-utils.datasette.io/en/stable/cli.html?highlight=text#convert-with-text Given an example like this in `multiline.log`: ``` 2022-03-01T12:04:52: Here is a log message that spans multiple lines 2022-03-01T12:04:52: This is a single line 2022-03-01T12:04:52: Here is another message that spans multiple lines ``` You should be able to run something like this: ``` sqlite-utils insert /tmp/log.db log multiline.log --text --convert " import re r = re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}):(?P<log>.*)', re.MULTILINE) def convert(text): return [m.groupdict() for m in r.finditer(text)] " ``` After running this I get: ``` sqlite-utils rows /tmp/log.db log [{"datetime": "2022-03-01T12:04:52", "log": " Here is a log message"}, {"datetime": "2022-03-01T12:04:52", "log": " This is a single line"}, {"datetime": "2022-03-01T12:04:52", "log": " Here is another message"}] ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1382457780 | |
https://github.com/simonw/sqlite-utils/issues/358#issuecomment-1255341690 | https://api.github.com/repos/simonw/sqlite-utils/issues/358 | 1255341690 | IC_kwDOCGYnMM5K0v56 | 9599 | 2022-09-22T17:35:23Z | 2022-09-22T17:35:23Z | OWNER | Make me think also that `sqlite-utils create-table` should have an option to dump out the SQL without actually creating the table. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1082651698 | |
https://github.com/simonw/sqlite-utils/issues/358#issuecomment-1255340974 | https://api.github.com/repos/simonw/sqlite-utils/issues/358 | 1255340974 | IC_kwDOCGYnMM5K0vuu | 9599 | 2022-09-22T17:34:45Z | 2022-09-22T17:34:45Z | OWNER | A few other recipes off the top of my head: - `title:maxlength:20` - set a max length, `length(title) <= 20` - `created:date` - check for `yyyy-mm-dd` date, `select :date == date(:date) is not null` ([demo](https://latest.datasette.io/_memory?sql=select+%3Adate+%3D%3D+date%28%3Adate%29+is+not+null&date=2022-01-01)) - `age:positiveint` - check `age` is a positive integer, `printf('%', age) = age and age > 0` (untested) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1082651698 | |
https://github.com/simonw/sqlite-utils/issues/358#issuecomment-1255333969 | https://api.github.com/repos/simonw/sqlite-utils/issues/358 | 1255333969 | IC_kwDOCGYnMM5K0uBR | 9599 | 2022-09-22T17:29:09Z | 2022-09-22T17:29:09Z | OWNER | Quick demo of a check constraint for JSON validation: ``` sqlite> create table test (id integer primary key, tags text, check (json(tags) is not null)); sqlite> sqlite> insert into test (tags ('["one", "two"]'); sqlite> insert into test (tags) values ('["one", "two"'); Error: stepping, malformed JSON (1) ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1082651698 | |
https://github.com/simonw/sqlite-utils/issues/358#issuecomment-1255332217 | https://api.github.com/repos/simonw/sqlite-utils/issues/358 | 1255332217 | IC_kwDOCGYnMM5K0tl5 | 9599 | 2022-09-22T17:27:34Z | 2022-09-22T17:27:34Z | OWNER | I've been thinking about this more recently. I think the first place to explore these will be in the `create-table` command (and underlying APIs). Relevant docs: https://www.sqlite.org/lang_createtable.html#check_constraints > A CHECK constraint may be attached to a column definition or specified as a table constraint. In practice it makes no difference. Each time a new row is inserted into the table or an existing row is updated, the expression associated with each CHECK constraint is evaluated and cast to a NUMERIC value in the same way as a [CAST expression](https://www.sqlite.org/lang_expr.html#castexpr). If the result is zero (integer value 0 or real value 0.0), then a constraint violation has occurred. If the CHECK expression evaluates to NULL, or any other non-zero value, it is not a constraint violation. The expression of a CHECK constraint may not contain a subquery. Something like this: sqlite-utils create-table data.db entries id integer title text tags text --pk id --check tags:json Where `--check tags:json` uses a pre-baked recipe for using the SQLite JSON function to check that the content is valid JSON and reject it otherwise. Then can bundle a bunch of other pre-baked recipes, but also support the following: --check 'x > 3' --check 'length(phone) >= 10' The besign reason for the `column:recipe` format here is to reuse `--check` for both pre-defined recipes that affect a single column AND for freeform expressions that get added to the end of the table. Detecting `column name:recipe` with a regex feels safe to me. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1082651698 | |
https://github.com/simonw/sqlite-utils/issues/488#issuecomment-1254033981 | https://api.github.com/repos/simonw/sqlite-utils/issues/488 | 1254033981 | IC_kwDOCGYnMM5Kvwo9 | 9599 | 2022-09-21T17:49:32Z | 2022-09-21T17:50:10Z | OWNER | It looks like SQLite has a `SELECT NULLIF(value, '')` function which returns `null` if that value is equal to `''`. We need to only apply that function to columns that we know to be of type integer or float though - text columns containing empty strings should not be rewritten to null. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373224657 | |
https://github.com/simonw/sqlite-utils/issues/488#issuecomment-1254032378 | https://api.github.com/repos/simonw/sqlite-utils/issues/488 | 1254032378 | IC_kwDOCGYnMM5KvwP6 | 9599 | 2022-09-21T17:47:54Z | 2022-09-21T17:47:54Z | OWNER | New tests should go in: https://github.com/simonw/sqlite-utils/blob/main/tests/test_transform.py I think the implementation fix needs to go near here: https://github.com/simonw/sqlite-utils/blob/0b315d3fa83c1584eaeec32f24912898621e437a/sqlite_utils/db.py#L1770-L1775 The trick is going to be teaching that generated SQL to know which columns are `integer` or `float` and to convert `""` to `null` as part of that operation. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373224657 | |
https://github.com/simonw/sqlite-utils/issues/488#issuecomment-1254029808 | https://api.github.com/repos/simonw/sqlite-utils/issues/488 | 1254029808 | IC_kwDOCGYnMM5Kvvnw | 9599 | 2022-09-21T17:45:20Z | 2022-09-21T17:45:41Z | OWNER | No, I'm going to say that this is a bug - it's WEIRD having a `integer` or `float` column containing an empty string. I'm OK changing that - I very much doubt anyone is relying on this functionality. So no need for a new option here - just fixing the bug is sensible. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373224657 | |
https://github.com/simonw/datasette/issues/1816#issuecomment-1251724180 | https://api.github.com/repos/simonw/datasette/issues/1816 | 1251724180 | IC_kwDOBm6k_c5Km8uU | 9599 | 2022-09-20T01:13:05Z | 2022-09-20T01:13:05Z | OWNER | Oops, that has a bug: ``` Error: Invalid setting '{key}' in settings.json ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1378640768 | |
https://github.com/simonw/datasette/issues/1816#issuecomment-1251682970 | https://api.github.com/repos/simonw/datasette/issues/1816 | 1251682970 | IC_kwDOBm6k_c5Kmyqa | 9599 | 2022-09-19T23:44:54Z | 2022-09-19T23:44:54Z | OWNER | I was going to add type validation too, but that's actually a bit tricky because the logic for that currently lives in Click option parsing here: https://github.com/simonw/datasette/blob/ddc999ad1296e8c69cffede3e367dda059b8adad/datasette/cli.py#L71-L88 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1378640768 | |
https://github.com/simonw/datasette/issues/1814#issuecomment-1251677554 | https://api.github.com/repos/simonw/datasette/issues/1814 | 1251677554 | IC_kwDOBm6k_c5KmxVy | 9599 | 2022-09-19T23:35:06Z | 2022-09-19T23:35:06Z | OWNER | It might have been useful for Datasette to show an error when started against a `settings.json` file that contains an invalid setting though. | { "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1378495690 | |
https://github.com/simonw/datasette/issues/1814#issuecomment-1251677220 | https://api.github.com/repos/simonw/datasette/issues/1814 | 1251677220 | IC_kwDOBm6k_c5KmxQk | 9599 | 2022-09-19T23:34:30Z | 2022-09-19T23:34:30Z | OWNER | The `settings.json` file can only be used with settings that are set using `--setting name value` - the full list of those is here: https://docs.datasette.io/en/stable/settings.html The `--static` option works differently. In configuration directory mode you can skip it entirely and instead have a `/static/` folder - so your directory structure would look like this: ``` bibliography/static/styles.css ``` And then when you run `datasette bibliography/` the following URL will work: http://127.0.0.1:8001/static/styles.css | { "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 } |
1378495690 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1249990033 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1249990033 | IC_kwDOBm6k_c5KgVWR | 9599 | 2022-09-17T03:39:05Z | 2022-09-17T03:39:05Z | OWNER | New docs section on the need to call `await ds.invoke_startup()`: https://github.com/simonw/datasette/blob/ddc999ad1296e8c69cffede3e367dda059b8adad/docs/testing_plugins.rst#setting-up-a-datasette-test-instance | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1249987643 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1249987643 | IC_kwDOBm6k_c5KgUw7 | 9599 | 2022-09-17T03:19:24Z | 2022-09-17T03:19:24Z | OWNER | In looking at the documentation on [writing tests](https://docs.datasette.io/en/latest/testing_plugins.html), there are a lot of examples like this: ```python def test_that_opens_the_debugger_or_errors(): ds = Datasette([db_path], pdb=True) response = await ds.client.get("/") ``` I really don't like having to tell people to add `await ds.invoke_startup()` to every test that might look like this. Since it's safe to call that function multiple times, I'm going to have `ds.client.get()` and friends call it for you too - so if you forget in a plugin test it won't matter. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1249986079 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1249986079 | IC_kwDOBm6k_c5KgUYf | 9599 | 2022-09-17T03:07:24Z | 2022-09-17T03:07:24Z | OWNER | Datasette's own tests started to break because calls to the `TestClient` were performed without awaiting that method. I fixed that by adding this to `_request()` inside that class: ```python async def _request( self, path, follow_redirects=True, redirect_count=0, method="GET", cookies=None, headers=None, post_body=None, content_type=None, if_none_match=None, ): if not self.ds._startup_invoked: await self.ds.invoke_startup() ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1249985971 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1249985971 | IC_kwDOBm6k_c5KgUWz | 9599 | 2022-09-17T03:06:32Z | 2022-09-17T03:06:32Z | OWNER | This is likely going to cause some tests in plugins to break, but I'm OK with that - I'll fix them as I find them once this release is out. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1249985741 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1249985741 | IC_kwDOBm6k_c5KgUTN | 9599 | 2022-09-17T03:04:51Z | 2022-09-17T03:04:51Z | OWNER | I'm going to throw an error in `ds.render_template()` if you haven't previously called `await ds.invoke_startup()`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/pull/1812#issuecomment-1249746777 | https://api.github.com/repos/simonw/datasette/issues/1812 | 1249746777 | IC_kwDOBm6k_c5KfZ9Z | 9599 | 2022-09-16T19:50:45Z | 2022-09-16T19:50:45Z | OWNER | Main difference I can see: ![CleanShot 2022-09-16 at 12 49 47@2x](https://user-images.githubusercontent.com/9599/190719563-a7b1bcc7-bfdc-4759-95c1-e19bcd0217c3.png) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1375930971 | |
https://github.com/simonw/datasette/pull/1812#issuecomment-1249745637 | https://api.github.com/repos/simonw/datasette/issues/1812 | 1249745637 | IC_kwDOBm6k_c5KfZrl | 9599 | 2022-09-16T19:49:12Z | 2022-09-16T19:49:12Z | OWNER | Preview looks good. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1375930971 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248621072 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248621072 | IC_kwDOCGYnMM5KbHIQ | 9599 | 2022-09-15T20:56:09Z | 2022-09-15T20:56:09Z | OWNER | Prototype so far: ```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 767b170..d96c507 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -1762,6 +1762,17 @@ def query( is_flag=True, help="Analyze resulting tables and output results", ) +@click.option("--key", help="read data from this key of the root object") +@click.option( + "--auto-key", + is_flag=True, + help="Find a key in the root object that is a list of objects", +) +@click.option( + "--analyze", + is_flag=True, + help="Analyze resulting tables and output results", +) @load_extension_option def memory( paths, @@ -1784,6 +1795,8 @@ def memory( schema, dump, save, + key, + auto_key, analyze, load_extension, ): @@ -1838,7 +1851,9 @@ def memory( csv_table = stem stem_counts[stem] = stem_counts.get(stem, 1) + 1 csv_fp = csv_path.open("rb") - rows, format_used = rows_from_file(csv_fp, format=format, encoding=encoding) + rows, format_used = rows_from_file( + csv_fp, format=format, encoding=encoding, key=key, auto_key=auto_key + ) tracker = None if format_used in (Format.CSV, Format.TSV) and not no_detect_types: tracker = TypeTracker() diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py index 8754554..2e69c26 100644 --- a/sqlite_utils/utils.py +++ b/sqlite_utils/utils.py @@ -231,6 +231,8 @@ def rows_from_file( encoding: Optional[str] = None, ignore_extras: Optional[bool] = False, extras_key: Optional[str] = None, + key: Optional[str] = None, + auto_key: Optional[bool] = False, ) -> Tuple[Iterable[dict], Format]: """ Load a sequence of dictionaries from a file-like object containing one of four different formats. @@ -271,13 +273,31 @@ def rows_from_file( :param encoding: the character encoding to use when reading CSV/TSV data :param ignore_extras: ignore any … | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248591268 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248591268 | IC_kwDOCGYnMM5Ka_2k | 9599 | 2022-09-15T20:36:02Z | 2022-09-15T20:40:03Z | OWNER | I had a big CSV file lying around, I converted it to other formats like this: sqlite-utils insert /tmp/t.db t /tmp/en.openfoodfacts.org.products.csv --csv sqlite-utils rows /tmp/t.db t --nl > /tmp/big.nl sqlite-utils rows /tmp/t.db t > /tmp/big.json Then tested the progress bar like this: sqlite-utils insert /tmp/t2.db t /tmp/big.nl --nl Output: ``` sqlite-utils insert /tmp/t2.db t /tmp/big.nl --nl [------------------------------------] 0% [#######-----------------------------] 20% 00:00:20 ``` With `--silent` it is silent. And for regular JSON: ``` sqlite-utils insert /tmp/t3.db t /tmp/big.json [####################################] 100% ``` This is actually not doing the right thing. The problem is that `sqlite-utils` doesn't include a streaming JSON parser, so it instead reads that entire JSON file into memory first (exhausting the progress bar to 100% instantly) and then does the rest of the work in-memory while the bar sticks at 100%. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/issues/485#issuecomment-1248597643 | https://api.github.com/repos/simonw/sqlite-utils/issues/485 | 1248597643 | IC_kwDOCGYnMM5KbBaL | 9599 | 2022-09-15T20:39:39Z | 2022-09-15T20:39:52Z | OWNER | A note from PR #486: https://github.com/simonw/sqlite-utils/issues/486#issuecomment-1248591268_ > ``` > sqlite-utils insert /tmp/t3.db t /tmp/big.json > [####################################] 100% > ``` > This is actually not doing the right thing. The problem is that `sqlite-utils` doesn't include a streaming JSON parser, so it instead reads that entire JSON file into memory first (exhausting the progress bar to 100% instantly) and then does the rest of the work in-memory while the bar sticks at 100%. I decided to land this anyway. If a streaming JSON parser is added later it will start to work. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366423176 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248593835 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248593835 | IC_kwDOCGYnMM5KbAer | 9599 | 2022-09-15T20:37:14Z | 2022-09-15T20:37:14Z | OWNER | I'm going to land this anyway. The lack of a streaming JSON parser is a separate issue, I don't think it should block landing this improvement. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248582147 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248582147 | IC_kwDOCGYnMM5Ka9oD | 9599 | 2022-09-15T20:29:17Z | 2022-09-15T20:29:17Z | OWNER | This looks good to me. I need to run some manual tests before merging (it's a good sign that the automated tests pass though). | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248568775 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248568775 | IC_kwDOCGYnMM5Ka6XH | 9599 | 2022-09-15T20:16:14Z | 2022-09-15T20:16:14Z | OWNER | https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#using-the-python-version-input says can set the full version: ``` - uses: actions/setup-python@v4 with: python-version: "3.10.6" ``` I'll try that. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248567323 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248567323 | IC_kwDOCGYnMM5Ka6Ab | 9599 | 2022-09-15T20:14:45Z | 2022-09-15T20:14:45Z | OWNER | There's a fix for `mypy` that has landed but isn't out in a release yet: - https://github.com/python/mypy/issues/13385 For the moment looks like pinning to Python 3.10.6 could help. Need to figure out how to do that in GitHub Actions though. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248565396 | https://api.github.com/repos/simonw/sqlite-utils/issues/486 | 1248565396 | IC_kwDOCGYnMM5Ka5iU | 9599 | 2022-09-15T20:12:50Z | 2022-09-15T20:12:50Z | OWNER | Annoying `mypy` test failure: ``` /Users/runner/hostedtoolcache/Python/3.10.7/x64/lib/python3.10/site-packages/numpy/__init__.pyi:636: error: Positional-only parameters are only supported in Python 3.8 and greater ``` Looks like this: - https://github.com/python/mypy/issues/13627 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1366512990 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248522618 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248522618 | IC_kwDOCGYnMM5KavF6 | 9599 | 2022-09-15T19:29:20Z | 2022-09-15T19:29:20Z | OWNER | I think refactoring `sqlite-utils insert` to use `rows_from_file` needs to happen as part of this work. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248512739 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248512739 | IC_kwDOCGYnMM5Kasrj | 9599 | 2022-09-15T19:18:24Z | 2022-09-15T19:21:01Z | OWNER | Why doesn't `sqlite-utils insert` use the `rows_from_file` function I wonder? https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841 says: > I can refactor `sqlite-utils insert` to use this new code too. Maybe I forgot to do that? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248501824 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248501824 | IC_kwDOCGYnMM5KaqBA | 9599 | 2022-09-15T19:10:48Z | 2022-09-15T19:10:48Z | OWNER | This feels pretty good: ``` % sqlite-utils memory ~/Downloads/CVR_Export_20220908084311/*.json --schema --auto-key CREATE TABLE [BallotTypeContestManifest] ( [BallotTypeId] INTEGER, [ContestId] INTEGER ); CREATE VIEW t1 AS select * from [BallotTypeContestManifest]; CREATE VIEW t AS select * from [BallotTypeContestManifest]; CREATE TABLE [BallotTypeManifest] ( [Description] TEXT, [Id] INTEGER, [ExternalId] TEXT ); ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248484094 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248484094 | IC_kwDOCGYnMM5Kalr- | 9599 | 2022-09-15T18:56:31Z | 2022-09-15T18:56:31Z | OWNER | Actually I quite like `--key X` - it could work for single nested objects too. You could insert a single record like this: ```json { "record" { "id": 1 } } ``` ``` sqlite-utils insert db.db records record.json --key record ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248481303 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248481303 | IC_kwDOCGYnMM5KalAX | 9599 | 2022-09-15T18:54:30Z | 2022-09-15T18:55:14Z | OWNER | Maybe this would make more sense as a mechanism where you can say "Use the data in the key called X" - but there's a special option for "figure out that key automatically". The syntax then could be: `--list-key List` Or for automatic detection: `--list-key-auto` Could also go with `--key List` and `--key-auto` - but would that be as obvious as `--list-key`? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248479485 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248479485 | IC_kwDOCGYnMM5Kakj9 | 9599 | 2022-09-15T18:52:52Z | 2022-09-15T18:53:45Z | OWNER | The most similar option I have at the moment is probably `--flatten`. What would good names for this option be? - `--auto-list` - `--auto-key` - `--inner-key` - `--auto-json` - `--find-list` - `--find-key` Those are all bad. Another option: introduce a new explicit format for it. Right now the explicit formats you can use are: https://github.com/simonw/sqlite-utils/blob/d9b9e075f07a20f1137cd2e34ed5d3f1a3db4ad8/docs/cli-reference.rst#L153-L158 So I could add a `:autojson` format. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248475718 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248475718 | IC_kwDOCGYnMM5KajpG | 9599 | 2022-09-15T18:49:05Z | 2022-09-15T18:49:53Z | OWNER | Here's how I used my prototype to build [that Gist](https://gist.github.com/simonw/0e6901974a14ab7d56c2746a04d72c8c): sqlite-utils memory ~/Downloads/CVR_Export_20220908084311/*.json --schema > database.sql | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248474806 | https://api.github.com/repos/simonw/sqlite-utils/issues/489 | 1248474806 | IC_kwDOCGYnMM5Kaja2 | 9599 | 2022-09-15T18:48:09Z | 2022-09-15T18:48:09Z | OWNER | Built a prototype of this that works really well: ```diff diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py index c0b7bf1..f9a482c 100644 --- a/sqlite_utils/utils.py +++ b/sqlite_utils/utils.py @@ -272,7 +272,19 @@ def rows_from_file( if format == Format.JSON: decoded = json.load(fp) if isinstance(decoded, dict): - decoded = [decoded] + # TODO: Solve for if this isn't what people want + # Does it have just one key that is a list of dicts? + list_keys = [ + k + for k in decoded + if isinstance(decoded[k], list) + and decoded[k] + and all(isinstance(o, dict) for o in decoded[k]) + ] + if len(list_keys) == 1: + decoded = decoded[list_keys[0]] + else: + decoded = [decoded] if not isinstance(decoded, list): raise RowsFromFileBadJSON("JSON must be a list or a dictionary") return decoded, Format.JSON ``` I used that to build this: https://gist.github.com/simonw/0e6901974a14ab7d56c2746a04d72c8c One problem though: right now, if you do this `sqlite-utils` treats it as a single object and adds a `tags` column with JSON in it: ``` echo '{"title": "Hi", "tags": [{"t": "one"}]}` | sqlite-utils insert db.db t - ``` If I implement this new mechanism the above line would behave differently - which would be a backwards incompatible change. So I probably need some kind of opt-in mechanism for this. And I need a good name for it. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374939463 | |
https://github.com/simonw/datasette/issues/1810#issuecomment-1248290151 | https://api.github.com/repos/simonw/datasette/issues/1810 | 1248290151 | IC_kwDOBm6k_c5KZ2Vn | 9599 | 2022-09-15T15:51:04Z | 2022-09-15T15:51:25Z | OWNER | I could prototype this idea as a `datasette-featured-tables` plugin that delivers its own custom `index.html` template. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374626873 | |
https://github.com/simonw/datasette/issues/1810#issuecomment-1248289857 | https://api.github.com/repos/simonw/datasette/issues/1810 | 1248289857 | IC_kwDOBm6k_c5KZ2RB | 9599 | 2022-09-15T15:50:46Z | 2022-09-15T15:50:46Z | OWNER | Idea: allow the user to specify one or more featured tables. Each table is then shown as a summary on the homepage - with the total number of rows and the first 5 rows. If the table has search configured there's a search box too. If the instance has only one database with only one table (excluding hidden tables) it gets featured automatically perhaps (maybe with a way to opt-out of that if you want to). | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374626873 | |
https://github.com/simonw/datasette/issues/1810#issuecomment-1248187089 | https://api.github.com/repos/simonw/datasette/issues/1810 | 1248187089 | IC_kwDOBm6k_c5KZdLR | 9599 | 2022-09-15T14:31:36Z | 2022-09-15T14:31:36Z | OWNER | Twitter conversation that inspired this issue: https://twitter.com/psychemedia/status/1570410108785684481 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1374626873 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247317941 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247317941 | IC_kwDOBm6k_c5KWI-1 | 9599 | 2022-09-14T21:24:43Z | 2022-09-14T21:24:43Z | OWNER | It looks like Datasette Lite does NOT invoke that method, which is likely a bug: https://github.com/simonw/datasette-lite/blob/e7ccaf621b3cdf613ebaf544304d387f2af32edf/webworker.js#L103-L110 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247316715 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247316715 | IC_kwDOBm6k_c5KWIrr | 9599 | 2022-09-14T21:23:10Z | 2022-09-14T21:23:10Z | OWNER | It might be good to have Datasette LOUDLY fail if you attempt to use it without calling `await ds.invoke_startup()`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247316097 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247316097 | IC_kwDOBm6k_c5KWIiB | 9599 | 2022-09-14T21:22:24Z | 2022-09-14T21:22:24Z | OWNER | It looks like this is the only place that calls `invoke_startup()`: https://github.com/simonw/datasette/blob/1d64c9a8dac45b9a3452acf8e76dfadea2b0bc49/datasette/cli.py#L590-L591 `datasette-publish-vercel` is the one deployment mechanism that skips running Uvicorn, and it calls that method separately here: https://github.com/simonw/datasette-publish-vercel/blob/1559d979b4e3b1f2f83c51c3c0c10192ff9a6d0c/datasette_publish_vercel/__init__.py#L42-L52 ```python ds = Datasette( [], {database_files}, static_mounts=static_mounts, metadata=metadata{extras}, secret=secret, cors=True, settings={settings}{crossdb} ) asyncio.run(ds.invoke_startup()) app = ds.app() ``` So preparing the Jinja environment inside that function would work fine. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247314352 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247314352 | IC_kwDOBm6k_c5KWIGw | 9599 | 2022-09-14T21:20:12Z | 2022-09-14T21:20:12Z | OWNER | The reason to support `await_me_maybe` is in case a hook wants to execute a SQL query as part of configuring the Jinja template loader. That's exactly what `datasette-edit-templates` needs to do, though it's currently achieving that in a `startup()` hook instead: https://github.com/simonw/datasette-edit-templates/blob/087f6a6cabc20020f2b0524f11aa3a7836320848/datasette_edit_templates/__init__.py#L32-L48 ```python @hookimpl def startup(datasette): datasette._edit_templates = {} async def inner(): db = get_database(datasette) # Does the table exist? if not await db.table_exists(TABLE): for sql in CREATE_TABLE: await db.execute_write(sql, block=True) else: # Load all templates from that table rows = await db.execute("select template, body FROM {}".format(TABLE)) for name, content in rows: datasette._edit_templates[name] = content return inner ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247313134 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247313134 | IC_kwDOBm6k_c5KWHzu | 9599 | 2022-09-14T21:18:46Z | 2022-09-14T21:18:46Z | OWNER | `await_me_maybe` might be tricky though, because the only place that hook is executed is here: https://github.com/simonw/datasette/blob/8430c3bc7dd22b173c1a8c6cd7180e3b31240cd1/datasette/app.py#L348 Which is inside the `Datasette.__init__` method. To implement an `await` on that would need to move it somewhere else - probably here: https://github.com/simonw/datasette/blob/8430c3bc7dd22b173c1a8c6cd7180e3b31240cd1/datasette/app.py#L391-L393 But I'm not 100% confident that is always executed at the right time to ensure the Jinja environment is properly configured? I think it is, but would need to make sure. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/datasette/issues/1809#issuecomment-1247311275 | https://api.github.com/repos/simonw/datasette/issues/1809 | 1247311275 | IC_kwDOBm6k_c5KWHWr | 9599 | 2022-09-14T21:16:32Z | 2022-09-14T21:16:32Z | OWNER | It should also implement the `await_me_maybe` pattern so you can return an `async` function from it. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1373595927 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1247161510 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1247161510 | IC_kwDOCGYnMM5KViym | 9599 | 2022-09-14T18:39:50Z | 2022-09-14T18:39:50Z | OWNER | Wrote that up as a TIL: https://til.simonwillison.net/python/pypy-macos | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 |