github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006311742 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006311742 | IC_kwDOCGYnMM47-xk- | 9599 | 2022-01-06T06:12:19Z | 2022-01-06T06:12:19Z | OWNER | Got that working: ``` % echo 'This is cool' | sqlite-utils insert words.db words - --text --convert '({"word": w} for w in text.split())' % sqlite-utils dump words.db BEGIN TRANSACTION; CREATE TABLE [words] ( [word] TEXT ); INSERT INTO "words" VALUES('This'); INSERT INTO "words" VALUES('is'); INSERT INTO "words" VALUES('cool'); COMMIT; ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006309834 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006309834 | IC_kwDOCGYnMM47-xHK | 9599 | 2022-01-06T06:08:01Z | 2022-01-06T06:08:01Z | OWNER | For `--text` the conversion function should be allowed to return an iterable instead of a dictionary, in which case it will be treated as the full list of records to be inserted. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006301546 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006301546 | IC_kwDOCGYnMM47-vFq | 9599 | 2022-01-06T05:44:47Z | 2022-01-06T05:44:47Z | OWNER | Just need documentation for `--convert` now against the various different types of input. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006300280 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006300280 | IC_kwDOCGYnMM47-ux4 | 9599 | 2022-01-06T05:40:45Z | 2022-01-06T05:40:45Z | OWNER | I'm going to rename `--all` to `--text`: > - Use `--text` to write the entire input to a column called "text" To avoid that clash with Python's `all()` function. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006299778 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006299778 | IC_kwDOCGYnMM47-uqC | 9599 | 2022-01-06T05:39:10Z | 2022-01-06T05:39:10Z | OWNER | `all` is a bad variable name because it clashes with the Python `all()` built-in function. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006295276 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006295276 | IC_kwDOCGYnMM47-tjs | 9599 | 2022-01-06T05:26:11Z | 2022-01-06T05:26:11Z | OWNER | Here's the traceback if your `--convert` function doesn't return a dict right now: ``` % sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert 'all.upper()' --all Traceback (most recent call last): File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils", line 33, in <module> sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')()) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1137, in __call__ return self.main(*args, **kwargs) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 763, in invoke return __callback(*args, **kwargs) File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py", line 949, in insert insert_upsert_implementation( File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py", line 834, in insert_upsert_implementation db[table].insert_all( File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 2602, in insert_all first_record = next(records) File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 3044, in fix_square_braces for record in records: File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py", line 831, in <genexpr> docs = (decode_base64_values(doc) for doc in docs) File "/Users/simon/Dropbox/Development/s… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006294777 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006294777 | IC_kwDOCGYnMM47-tb5 | 9599 | 2022-01-06T05:24:54Z | 2022-01-06T05:24:54Z | OWNER | > I added a custom error message for if the user's `--convert` code doesn't return a dict. That turned out to be a bad idea because it meant exhausting the iterator early for the check - before we got to the `.insert_all()` code that breaks the iterator up into chunks. I tried fixing that with `itertools.tee()` to run the generator twice but that's grossly memory-inefficient for large imports. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006288444 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006288444 | IC_kwDOCGYnMM47-r48 | 9599 | 2022-01-06T05:07:10Z | 2022-01-06T05:07:10Z | OWNER | And here's a demo of `--convert` used with `--all` - I added a custom error message for if the user's `--convert` code doesn't return a dict. ``` % sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert 'all.upper()' --all Error: Records returned by your --convert function must be dicts % sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert '{"all": all.upper()}' --all % sqlite-utils dump /tmp/all.db BEGIN TRANSACTION; CREATE TABLE [blah] ( [all] TEXT ); INSERT INTO "blah" VALUES('INFO: 127.0.0.1:60581 - "GET / HTTP/1.1" 200 OK INFO: 127.0.0.1:60581 - "GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1" 200 OK INFO: 127.0.0.1:60581 - "GET /FAVICON.ICO HTTP/1.1" 200 OK INFO: 127.0.0.1:60581 - "GET /FOO/TIDDLYWIKI HTTP/1.1" 200 OK INFO: 127.0.0.1:60581 - "GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1" 200 OK INFO: 127.0.0.1:60584 - "GET /FOO/-/STATIC/SQL-FORMATTER-2.3.3.MIN.JS HTTP/1.1" 200 OK INFO: 127.0.0.1:60586 - "GET /FOO/-/STATIC/CODEMIRROR-5.57.0.MIN.JS HTTP/1.1" 200 OK INFO: 127.0.0.1:60585 - "GET /FOO/-/STATIC/CODEMIRROR-5.57.0.MIN.CSS HTTP/1.1" 200 OK INFO: 127.0.0.1:60588 - "GET /FOO/-/STATIC/CODEMIRROR-5.57.0-SQL.MIN.JS HTTP/1.1" 200 OK INFO: 127.0.0.1:60587 - "GET /FOO/-/STATIC/CM-RESIZE-1.0.1.MIN.JS HTTP/1.1" 200 OK INFO: 127.0.0.1:60586 - "GET /FOO/TIDDLYWIKI/TIDDLERS HTTP/1.1" 200 OK INFO: 127.0.0.1:60586 - "GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1" 200 OK INFO: 127.0.0.1:60584 - "GET /FOO/-/STATIC/TABLE.JS HTTP/1.1" 200 OK '); COMMIT; ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006284673 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006284673 | IC_kwDOCGYnMM47-q-B | 9599 | 2022-01-06T04:55:52Z | 2022-01-06T04:55:52Z | OWNER | Test code that just worked for me: ``` sqlite-utils insert /tmp/blah.db blah /tmp/log.log --convert ' bits = line.split() return dict([("b_{}".format(i), bit) for i, bit in enumerate(bits)])' --lines ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006232013 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006232013 | IC_kwDOCGYnMM47-eHN | 9599 | 2022-01-06T02:21:35Z | 2022-01-06T02:21:35Z | OWNER | I'm having second thoughts about this bit: > Your Python code will be passed a "row" variable representing the imported row, and can return a modified row. > > If you are using `--lines` your code will be passed a "line" variable, and for `--all` an "all" variable. The code in question is this: https://github.com/simonw/sqlite-utils/blob/500a35ad4d91c8a6232134ce9406efec11bedff8/sqlite_utils/utils.py#L296-L303 Do I really want to add the complexity of supporting different variable names there? I think always using `value` might be better. Except... `value` made sense for the existing `sqlite-utils convert` command where you are running a conversion function against the value for the column in the current row - is it confusing if applied to lines or documents or `all`? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006230411 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006230411 | IC_kwDOCGYnMM47-duL | 9599 | 2022-01-06T02:17:35Z | 2022-01-06T02:17:35Z | OWNER | Documentation: https://github.com/simonw/sqlite-utils/blob/33223856ff7fe746b7b77750fbe5b218531d0545/docs/cli.rst#inserting-unstructured-data-with---lines-and---all - I went with a single section titled "Inserting unstructured data with --lines and --all" | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006220129 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006220129 | IC_kwDOCGYnMM47-bNh | 9599 | 2022-01-06T01:52:26Z | 2022-01-06T01:52:26Z | OWNER | I'm going to refactor all of the tests for `sqlite-utils insert` into a new `test_cli_insert.py` module. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006219848 | https://api.github.com/repos/simonw/sqlite-utils/issues/361 | 1006219848 | IC_kwDOCGYnMM47-bJI | 9599 | 2022-01-06T01:51:36Z | 2022-01-06T01:51:36Z | OWNER | So far I've just implemented the new help: ``` % sqlite-utils insert --help Usage: sqlite-utils insert [OPTIONS] PATH TABLE FILE Insert records from FILE into a table, creating the table if it does not already exist. By default the input is expected to be a JSON array of objects. Or: - Use --nl for newline-delimited JSON objects - Use --csv or --tsv for comma-separated or tab-separated input - Use --lines to write each incoming line to a column called "line" - Use --all to write the entire input to a column called "all" You can also use --convert to pass a fragment of Python code that will be used to convert each input. Your Python code will be passed a "row" variable representing the imported row, and can return a modified row. If you are using --lines your code will be passed a "line" variable, and for --all an "all" variable. Options: --pk TEXT Columns to use as the primary key, e.g. id --flatten Flatten nested JSON objects, so {"a": {"b": 1}} becomes {"a_b": 1} --nl Expect newline-delimited JSON -c, --csv Expect CSV input --tsv Expect TSV input --lines Treat each line as a single value called 'line' --all Treat input as a single value called 'all' --convert TEXT Python code to convert each item --import TEXT Python modules to import --delimiter TEXT Delimiter to use for CSV files --quotechar TEXT Quote character to use for CSV/TSV --sniff Detect delimiter and quote character --no-headers CSV file has no header row --batch-size INTEGER Commit every X records --alter Alter existing table to add any missing columns --not-null TEXT Columns that should be created as NOT NULL --default <TEXT TEXT>... Default value that should be set for a column --e… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1094890366 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997496626 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997496626 | IC_kwDOCGYnMM47dJcy | 9599 | 2021-12-20T00:38:15Z | 2022-01-06T01:29:03Z | OWNER | The implementation of this gets a tiny bit complicated. Ignoring `--convert`, the `--lines` option can internally produce `{"line": ...}` records and the `--all` option can produce `{"all": ...}` records. But... when `--convert` is used, what should the code run against? It could run against those already-converted records but that's a little bit strange, since you'd have to do this: sqlite-utils insert blah.db blah myfile.txt --all --convert '{"item": s for s in value["all"].split("-")}' Having to use `value["all"]` there is unintuitive. It would be nicer to have a `all` variable to work against. But then for `--lines` should the local variable be called `line`? And how best to summarize these different names for local variables in the inline help for the feature? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/360#issuecomment-1006211113 | https://api.github.com/repos/simonw/sqlite-utils/issues/360 | 1006211113 | IC_kwDOCGYnMM47-ZAp | 9599 | 2022-01-06T01:27:53Z | 2022-01-06T01:27:53Z | OWNER | It looks like you were using `sqlite-utils memory` - that works by loading the entire file into an in-memory database, so 170GB is very likely to run out of RAM. The line of code there exhibits another problem: it's reading the entire JSON file into a Python string, so it looks like it's going to run out of RAM even before it gets to the SQLite in-memory database section. To handle a file of this size you'd need to write it to a SQLite database on-disk first. The `sqlite-utils insert` command can do this, and it should be able to "stream" records in from a file without loading the entire thing into memory - but only for JSON-NL and CSV/TSV formats, not for JSON arrays. The code in question is here: https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L738-L773 That's using Python generators for the CSV/TSV/JSON-NL variants... but it's doing this for regular JSON which requires reading the entire thing into memory: https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L767 If you have the ability to control how your 170GB file is generated you may have more luck converting it to CSV or TSV or newline-delimited JSON, then using `sqlite-utils insert` to insert it into a database file. To be honest though I've never tested this tooling with anything nearly that big, so it's possible you'll still run into problems. If you do I'd love to hear about them! I would be tempted to tackle this size of job by writing a custom Python script, either using the `sqlite_utils` Python library or even calling `sqlite3` directly. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1091819089 | |
https://github.com/simonw/datasette/issues/1534#issuecomment-1005975080 | https://api.github.com/repos/simonw/datasette/issues/1534 | 1005975080 | IC_kwDOBm6k_c479fYo | 9599 | 2022-01-05T18:29:06Z | 2022-01-05T18:29:06Z | OWNER | A really big downside to this is that it turns out many CDNs - apparently including Cloudflare - don't support the Vary header at all! More in this thread: https://twitter.com/simonw/status/1478470282931163137 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1065432388 | |
https://github.com/simonw/datasette/issues/1585#issuecomment-1003575286 | https://api.github.com/repos/simonw/datasette/issues/1585 | 1003575286 | IC_kwDOBm6k_c470Vf2 | 9599 | 2022-01-01T15:40:38Z | 2022-01-01T15:40:38Z | OWNER | API tutorial: https://firebase.google.com/docs/hosting/api-deploy | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1091838742 | |
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 1003437288 | IC_kwDODFE5qs47zzzo | 28565 | 2021-12-31T19:06:20Z | 2021-12-31T19:06:20Z | NONE | > @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. Shouldn't be hard. The easiest way is probably to remove the `if body.content_type == "text/html"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 | |
https://github.com/simonw/datasette/issues/1583#issuecomment-1002825217 | https://api.github.com/repos/simonw/datasette/issues/1583 | 1002825217 | IC_kwDOBm6k_c47xeYB | 536941 | 2021-12-30T00:34:16Z | 2021-12-30T00:34:16Z | CONTRIBUTOR | if that is not desirable, it might be good to document that users might want to set up a lifecycle rule to automatically delete these build artifacts. something like https://stackoverflow.com/questions/59937542/can-i-delete-container-images-from-google-cloud-storage-artifacts-bucket | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1090810196 | |
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1002735370 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 1002735370 | IC_kwDODFE5qs47xIcK | 203343 | 2021-12-29T18:58:23Z | 2021-12-29T18:58:23Z | NONE | @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 | |
https://github.com/simonw/datasette/issues/1152#issuecomment-1001791592 | https://api.github.com/repos/simonw/datasette/issues/1152 | 1001791592 | IC_kwDOBm6k_c47tiBo | 9599 | 2021-12-27T23:04:31Z | 2021-12-27T23:04:31Z | OWNER | Another option: rethink permissions to always work in terms of where clauses users as part of a SQL query that returns the overall allowed set of databases or tables. This would require rethinking existing permissions but it might be worthwhile prior to 1.0. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
770598024 | |
https://github.com/simonw/datasette/issues/878#issuecomment-1001699559 | https://api.github.com/repos/simonw/datasette/issues/878 | 1001699559 | IC_kwDOBm6k_c47tLjn | 9599 | 2021-12-27T18:53:04Z | 2021-12-27T18:53:04Z | OWNER | I'm going to see if I can come up with the simplest possible version of this pattern for the `/-/metadata` and `/-/metadata.json` page, then try it for the database query page, before tackling the much more complex table page. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
648435885 | |
https://github.com/dogsheep/twitter-to-sqlite/issues/62#issuecomment-1001222213 | https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/62 | 1001222213 | IC_kwDODEm0Qs47rXBF | 6764957 | 2021-12-26T17:59:25Z | 2021-12-26T17:59:25Z | NONE | just confirmed that this error does not occur when i use my public main account. gets more interesting! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1088816961 | |
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-1001115286 | https://api.github.com/repos/simonw/sqlite-utils/issues/228 | 1001115286 | IC_kwDOCGYnMM47q86W | 1206106 | 2021-12-26T07:01:31Z | 2021-12-26T07:01:31Z | NONE | `--no-headers` does not work? ``` $ echo 'a,1\nb,2' | sqlite-utils memory --no-headers -t - 'select * from stdin' a 1 --- --- b 2 ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
807437089 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-1000935523 | https://api.github.com/repos/simonw/datasette/issues/1576 | 1000935523 | IC_kwDOBm6k_c47qRBj | 9599 | 2021-12-24T21:33:05Z | 2021-12-24T21:33:05Z | OWNER | Another option would be to attempt to import `contextvars` and, if the import fails (for Python 3.6) continue using the current mechanism - then let Python 3.6 users know in the documentation that under Python 3.6 they will miss out on nested traces. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1577#issuecomment-1000673444 | https://api.github.com/repos/simonw/datasette/issues/1577 | 1000673444 | IC_kwDOBm6k_c47pRCk | 9599 | 2021-12-24T06:08:58Z | 2021-12-24T06:08:58Z | OWNER | https://pypistats.org/packages/datasette shows a breakdown of downloads by Python version: <img width="986" alt="image" src="https://user-images.githubusercontent.com/9599/147323253-1ee22d93-3be2-472b-8ead-495d925958e5.png"> It looks like on a recent day I had 4,071 downloads from Python 3.7... and just 2 downloads from Python 3.6! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087913724 | |
https://github.com/simonw/datasette/issues/1534#issuecomment-1000535904 | https://api.github.com/repos/simonw/datasette/issues/1534 | 1000535904 | IC_kwDOBm6k_c47ovdg | 9599 | 2021-12-23T21:44:31Z | 2021-12-23T21:44:31Z | OWNER | A big downside to this is that I would need to use `Vary: Accept` for when Datasette is running behind a cache such as Cloudflare - would that greatly reduce overall cache efficiency due to subtle variations in the accept headers sent by common browsers? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1065432388 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000485719 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000485719 | IC_kwDOBm6k_c47ojNX | 9599 | 2021-12-23T19:19:45Z | 2021-12-23T19:19:45Z | OWNER | All of those removed `block=True` lines in 8c401ee0f054de2f568c3a8302c9223555146407 really help confirm to me that this was a good decision. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000485505 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000485505 | IC_kwDOBm6k_c47ojKB | 9599 | 2021-12-23T19:19:13Z | 2021-12-23T19:19:13Z | OWNER | Updated docs for `execute_write_fn()`: https://github.com/simonw/datasette/blob/75153ea9b94d09ec3d61f7c6ebdf378e0c0c7a0b/docs/internals.rst#await-dbexecute_write_fnfn-blocktrue | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000481686 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000481686 | IC_kwDOBm6k_c47oiOW | 9599 | 2021-12-23T19:09:23Z | 2021-12-23T19:09:23Z | OWNER | Re-opening this because I missed updating some of the docs, and I also need to update Datasette's own code to not use `block=True` in a bunch of places. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000479737 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000479737 | IC_kwDOBm6k_c47ohv5 | 9599 | 2021-12-23T19:04:23Z | 2021-12-23T19:04:23Z | OWNER | Updated documentation: https://github.com/simonw/datasette/blob/00a2895cd2dc42c63846216b36b2dc9f41170129/docs/internals.rst#await-dbexecute_writesql-paramsnone-blocktrue | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000477813 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000477813 | IC_kwDOBm6k_c47ohR1 | 9599 | 2021-12-23T18:59:41Z | 2021-12-23T18:59:41Z | OWNER | I'm going to go with `execute_write(..., block=False)` as the mechanism for fire-and-forget write queries. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000477621 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000477621 | IC_kwDOBm6k_c47ohO1 | 9599 | 2021-12-23T18:59:12Z | 2021-12-23T18:59:12Z | OWNER | The easiest way to change this would be to default to `block=True` such that you need to pass `block=False` to the APIs to have them do fire-and-forget. An alternative would be to add new, separately named methods which do the fire-and-forget thing. If I hadn't recently added `execute_write_script` and `execute_write_many` in #1570 I'd be more into this idea, but I don't want to end up with eight methods - `execute_write`, `execute_write_queue`, `execute_write_many`, `execute_write_many_queue`, `execute_write_script`, `execute_write_scrript_queue`, `execute_write_fn`, `execute_write_fn_queue`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1579#issuecomment-1000476413 | https://api.github.com/repos/simonw/datasette/issues/1579 | 1000476413 | IC_kwDOBm6k_c47og79 | 9599 | 2021-12-23T18:56:06Z | 2021-12-23T18:56:06Z | OWNER | This is technically a breaking change, but a GitHub code search at https://cs.github.com/?scopeName=All+repos&scope=&q=execute_write%20datasette%20-owner%3Asimonw shows only one repo not-owned-by-me using this, and they're using `block=True`: https://github.com/mfa/datasette-webhook-write/blob/e82440f372a2f2e3ed27d1bd34c9fa3a53b49b94/datasette_webhook_write/__init__.py#L88-L89 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087931918 | |
https://github.com/simonw/datasette/issues/1578#issuecomment-1000471782 | https://api.github.com/repos/simonw/datasette/issues/1578 | 1000471782 | IC_kwDOBm6k_c47ofzm | 9599 | 2021-12-23T18:44:01Z | 2021-12-23T18:44:01Z | OWNER | The example nginx config on https://docs.datasette.io/en/stable/deploying.html#nginx-proxy-configuration is currently: ``` daemon off; events { worker_connections 1024; } http { server { listen 80; location /my-datasette { proxy_pass http://127.0.0.1:8009/my-datasette; proxy_set_header Host $host; } } } ``` This looks to me like it might exhibit the bug. Need to confirm that and figure out an alternative. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087919372 | |
https://github.com/simonw/datasette/issues/1578#issuecomment-1000471371 | https://api.github.com/repos/simonw/datasette/issues/1578 | 1000471371 | IC_kwDOBm6k_c47oftL | 9599 | 2021-12-23T18:42:50Z | 2021-12-23T18:42:50Z | OWNER | Confirmed, that fixed the bug for me on my server. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087919372 | |
https://github.com/simonw/datasette/issues/1578#issuecomment-1000470652 | https://api.github.com/repos/simonw/datasette/issues/1578 | 1000470652 | IC_kwDOBm6k_c47ofh8 | 9599 | 2021-12-23T18:40:46Z | 2021-12-23T18:40:46Z | OWNER | [This StackOverflow answer](https://serverfault.com/a/463932) suggests that the fix is to change this: proxy_pass http://127.0.0.1:8000/; To this: proxy_pass http://127.0.0.1:8000; Quoting the nginx documentation: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass > A request URI is passed to the server as follows: > > - If the `proxy_pass` directive is specified with a URI, then when a request is passed to the server, the part of a [normalized](http://nginx.org/en/docs/http/ngx_http_core_module.html#location) request URI matching the location is replaced by a URI specified in the directive: > > location /name/ { > proxy_pass http://127.0.0.1/remote/; > } > > - If `proxy_pass` is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI: > > location /some/path/ { > proxy_pass http://127.0.0.1; > } | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087919372 | |
https://github.com/simonw/datasette/issues/1578#issuecomment-1000469107 | https://api.github.com/repos/simonw/datasette/issues/1578 | 1000469107 | IC_kwDOBm6k_c47ofJz | 9599 | 2021-12-23T18:36:38Z | 2021-12-23T18:36:38Z | OWNER | This problem doesn't occur on my `localhost` running Uvicorn directly - but I'm seeing it in my production environment that runs Datasette behind an nginx proxy: ``` location / { proxy_pass http://127.0.0.1:8000/; proxy_set_header Host $host; } ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087919372 | |
https://github.com/simonw/datasette/issues/1577#issuecomment-1000462309 | https://api.github.com/repos/simonw/datasette/issues/1577 | 1000462309 | IC_kwDOBm6k_c47odfl | 9599 | 2021-12-23T18:20:46Z | 2021-12-23T18:20:46Z | OWNER | There are a lot of improvements to `asyncio` in 3.7: https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-asyncio | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087913724 | |
https://github.com/simonw/datasette/issues/1577#issuecomment-1000461900 | https://api.github.com/repos/simonw/datasette/issues/1577 | 1000461900 | IC_kwDOBm6k_c47odZM | 9599 | 2021-12-23T18:19:44Z | 2021-12-23T18:19:44Z | OWNER | The 3.7 feature I want to use today is [contextvars](https://docs.python.org/3/library/contextvars.html) - but I have a workaround for the moment, see https://github.com/simonw/datasette/issues/1576#issuecomment-999987418 So I'm going to hold off on dropping 3.6 for a little bit longer. I imagine I'll drop it before Datasette 1.0 though. Leaving this issue open to gather thoughts and feedback on this issue from Datasette users and potential users. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087913724 | |
https://github.com/simonw/datasette/issues/1577#issuecomment-1000461275 | https://api.github.com/repos/simonw/datasette/issues/1577 | 1000461275 | IC_kwDOBm6k_c47odPb | 9599 | 2021-12-23T18:18:11Z | 2021-12-23T18:18:11Z | OWNER | From the Twitter thread, there are still a decent amount of LTS Linux releases out there that are stuck on pre-3.7 Python. Though many of those are 3.5 and Datasette dropped support for 3.5 in November 2019: cf7776d36fbacefa874cbd6e5fcdc9fff7661203 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087913724 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999990414 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999990414 | IC_kwDOBm6k_c47mqSO | 9599 | 2021-12-23T02:08:39Z | 2021-12-23T18:16:35Z | OWNER | It's tiny: I'm tempted to vendor it. https://github.com/Skyscanner/aiotask-context/blob/master/aiotask_context/__init__.py No, I'll add it as a pinned dependency, which I can then drop when I drop 3.6 support. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999987418 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999987418 | IC_kwDOBm6k_c47mpja | 9599 | 2021-12-23T01:59:58Z | 2021-12-23T02:02:12Z | OWNER | Another option: https://github.com/Skyscanner/aiotask-context - looks like it might be better as it's been updated for Python 3.7 in this commit https://github.com/Skyscanner/aiotask-context/commit/67108c91d2abb445655cc2af446fdb52ca7890c4 The Skyscanner one doesn't attempt to wrap any existing factories, but that's OK for my purposes since I don't need to handle arbitrary `asyncio` code written by other people. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999876666 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999876666 | IC_kwDOBm6k_c47mOg6 | 9599 | 2021-12-22T20:59:22Z | 2021-12-22T21:18:09Z | OWNER | This article is relevant: [Context information storage for asyncio](https://blog.sqreen.com/asyncio/) - in particular the section https://blog.sqreen.com/asyncio/#context-inheritance-between-tasks which describes exactly the problem I have and their solution, which involves this trickery: ```python def request_task_factory(loop, coro): child_task = asyncio.tasks.Task(coro, loop=loop) parent_task = asyncio.Task.current_task(loop=loop) current_request = getattr(parent_task, 'current_request', None) setattr(child_task, 'current_request', current_request) return child_task loop = asyncio.get_event_loop() loop.set_task_factory(request_task_factory) ``` They released their solution as a library: https://pypi.org/project/aiocontext/ and https://github.com/sqreen/AioContext - but that company was acquired by Datadog back in April and doesn't seem to be actively maintaining their open source stuff any more: https://twitter.com/SqreenIO/status/1384906075506364417 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999878907 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999878907 | IC_kwDOBm6k_c47mPD7 | 9599 | 2021-12-22T21:03:49Z | 2021-12-22T21:10:46Z | OWNER | `context_vars` can solve this but they were introduced in Python 3.7: https://www.python.org/dev/peps/pep-0567/ Python 3.6 support ends in a few days time, and it looks like Glitch has updated to 3.7 now - so maybe I can get away with Datasette needing 3.7 these days? Tweeted about that here: https://twitter.com/simonw/status/1473761478155010048 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999874886 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999874886 | IC_kwDOBm6k_c47mOFG | 9599 | 2021-12-22T20:55:42Z | 2021-12-22T20:57:28Z | OWNER | One way to solve this would be to introduce a `set_task_id()` method, which sets an ID which will be returned by `get_task_id()` instead of using `id(current_task(loop=loop))`. It would be really nice if I could solve this using `with` syntax somehow. Something like: ```python with trace_child_tasks(): ( suggested_facets, (facet_results, facets_timed_out), ) = await asyncio.gather( execute_suggested_facets(), execute_facets(), ) ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1576#issuecomment-999874484 | https://api.github.com/repos/simonw/datasette/issues/1576 | 999874484 | IC_kwDOBm6k_c47mN-0 | 9599 | 2021-12-22T20:54:52Z | 2021-12-22T20:54:52Z | OWNER | Here's the full current relevant code from `tracer.py`: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/tracer.py#L8-L64 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1087181951 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999870993 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999870993 | IC_kwDOBm6k_c47mNIR | 9599 | 2021-12-22T20:47:18Z | 2021-12-22T20:50:24Z | OWNER | The reason they aren't showing up in the traces is that traces are stored just for the currently executing `asyncio` task ID: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/tracer.py#L13-L25 This is so traces for other incoming requests don't end up mixed together. But there's no current mechanism to track async tasks that are effectively "child tasks" of the current request, and hence should be tracked the same. https://stackoverflow.com/a/69349501/6083 suggests that you pass the task ID as an argument to the child tasks that are executed using `asyncio.gather()` to work around this kind of problem. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999870282 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999870282 | IC_kwDOBm6k_c47mM9K | 9599 | 2021-12-22T20:45:56Z | 2021-12-22T20:46:08Z | OWNER | > New short-term goal: get facets and suggested facets to execute in parallel with the main query. Generate a trace graph that proves that is happening using `datasette-pretty-traces`. I wrote code to execute those in parallel using `asyncio.gather()` - which seems to work but causes the SQL run inside the parallel `async def` functions not to show up in the trace graph at all. ```diff diff --git a/datasette/views/table.py b/datasette/views/table.py index 9808fd2..ec9db64 100644 --- a/datasette/views/table.py +++ b/datasette/views/table.py @@ -1,3 +1,4 @@ +import asyncio import urllib import itertools import json @@ -615,44 +616,37 @@ class TableView(RowTableShared): if request.args.get("_timelimit"): extra_args["custom_time_limit"] = int(request.args.get("_timelimit")) - # Execute the main query! - results = await db.execute(sql, params, truncate=True, **extra_args) - - # Calculate the total count for this query - filtered_table_rows_count = None - if ( - not db.is_mutable - and self.ds.inspect_data - and count_sql == f"select count(*) from {table} " - ): - # We can use a previously cached table row count - try: - filtered_table_rows_count = self.ds.inspect_data[database]["tables"][ - table - ]["count"] - except KeyError: - pass - - # Otherwise run a select count(*) ... - if count_sql and filtered_table_rows_count is None and not nocount: - try: - count_rows = list(await db.execute(count_sql, from_sql_params)) - filtered_table_rows_count = count_rows[0][0] - except QueryInterrupted: - pass - - # Faceting - if not self.ds.setting("allow_facet") and any( - arg.startswith("_facet") for arg in request.args - ): - raise BadRequest("_facet= is not allo… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999863269 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999863269 | IC_kwDOBm6k_c47mLPl | 9599 | 2021-12-22T20:35:41Z | 2021-12-22T20:37:13Z | OWNER | It looks like the count has to be executed before facets can be, because the facet_class constructor needs that total count figure: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L660-L671 It's used in facet suggestion logic here: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/facets.py#L172-L178 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999850191 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999850191 | IC_kwDOBm6k_c47mIDP | 9599 | 2021-12-22T20:29:38Z | 2021-12-22T20:29:38Z | OWNER | New short-term goal: get facets and suggested facets to execute in parallel with the main query. Generate a trace graph that proves that is happening using `datasette-pretty-traces`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999837569 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999837569 | IC_kwDOBm6k_c47mE-B | 9599 | 2021-12-22T20:15:45Z | 2021-12-22T20:15:45Z | OWNER | Also the whole `special_args` v.s. `request.args` thing is pretty confusing, I think that might be an older code pattern back from when I was using Sanic. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999837220 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999837220 | IC_kwDOBm6k_c47mE4k | 9599 | 2021-12-22T20:15:04Z | 2021-12-22T20:15:04Z | OWNER | I think I can move this much higher up in the method, it's a bit confusing having it half way through: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L414-L436 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-999831967 | https://api.github.com/repos/simonw/datasette/issues/1518 | 999831967 | IC_kwDOBm6k_c47mDmf | 9599 | 2021-12-22T20:04:47Z | 2021-12-22T20:10:11Z | OWNER | I think I might be able to clean up a lot of the stuff in here using the `render_cell` plugin hook: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L87-L89 The catch with that hook - https://docs.datasette.io/en/stable/plugin_hooks.html#render-cell-value-column-table-database-datasette - is that it gets called for every single cell. I don't want the overhead of looking up the foreign key relationships etc once for every value in a specific column. But maybe I could extend the hook to include a shared cache that gets used for all of the cells in a specific table? Something like this: ```python render_cell(value, column, table, database, datasette, cache) ``` `cache` is a dictionary - and the same dictionary is passed to every call to that hook while rendering a specific page. It's a bit of a gross hack though, and would it ever be useful for plugins outside of the default plugin in Datasette which does the foreign key stuff? If I can think of one other potential application for this `cache` then I might implement it. No, this optimization doesn't make sense: the most complex cell enrichment logic is the stuff that does a `select * from categories where id in (2, 5, 6)` query, using just the distinct set of IDs that are rendered on the current page. That's not going to fit in the `render_cell` hook no matter how hard I try to warp it into the right shape, because it needs full visibility of all of the results that are being rendered in order to collect those unique ID values. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1181#issuecomment-998999230 | https://api.github.com/repos/simonw/datasette/issues/1181 | 998999230 | IC_kwDOBm6k_c47i4S- | 9308268 | 2021-12-21T18:25:15Z | 2021-12-21T18:25:15Z | NONE | I wonder if I'm encountering the same bug (or something related). I had previously been using the .csv feature to run queries and then fetch results for the pandas `read_csv()` function, but it seems to have stopped working recently. https://ilsweb.cincinnatilibrary.org/collection-analysis/collection-analysis/current_collection-3d56dbf.csv?sql=select%0D%0A++*%0D%0Afrom%0D%0A++bib%0D%0Alimit%0D%0A++100&_size=max Datasette v0.59.4 ![image](https://user-images.githubusercontent.com/9308268/146979957-66911877-2cd9-4022-bc76-fd54e4a3a6f7.png) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
781262510 | |
https://github.com/simonw/datasette/pull/1554#issuecomment-998354538 | https://api.github.com/repos/simonw/datasette/issues/1554 | 998354538 | IC_kwDOBm6k_c47ga5q | 9599 | 2021-12-20T23:52:04Z | 2021-12-20T23:52:04Z | OWNER | Abandoning this since it didn't work how I wanted. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079129258 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997519202 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997519202 | IC_kwDOBm6k_c47dO9i | 127565 | 2021-12-20T01:36:58Z | 2021-12-20T01:36:58Z | CONTRIBUTOR | Yep, that works -- thanks! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997514220 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997514220 | IC_kwDOBm6k_c47dNvs | 9599 | 2021-12-20T01:26:25Z | 2021-12-20T01:26:25Z | OWNER | OK, this should hopefully fix that for you: pip install https://github.com/simonw/datasette/archive/f36e010b3b69ada104b79d83c7685caf9359049e.zip | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997513369 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997513369 | IC_kwDOBm6k_c47dNiZ | 9599 | 2021-12-20T01:24:43Z | 2021-12-20T01:24:43Z | OWNER | @wragge thanks, that's a bug! Working on that in #1575. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/datasette/issues/1575#issuecomment-997513177 | https://api.github.com/repos/simonw/datasette/issues/1575 | 997513177 | IC_kwDOBm6k_c47dNfZ | 9599 | 2021-12-20T01:24:25Z | 2021-12-20T01:24:25Z | OWNER | Looks like `specname` is new in Pluggy 1.0: https://github.com/pytest-dev/pluggy/blob/main/CHANGELOG.rst#pluggy-100-2021-08-25 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1084257842 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997511968 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997511968 | IC_kwDOBm6k_c47dNMg | 127565 | 2021-12-20T01:21:59Z | 2021-12-20T01:21:59Z | CONTRIBUTOR | I've installed the alpha version but get an error when starting up Datasette: ``` Traceback (most recent call last): File "/Users/tim/.pyenv/versions/stock-exchange/bin/datasette", line 5, in <module> from datasette.cli import cli File "/Users/tim/.pyenv/versions/3.8.5/envs/stock-exchange/lib/python3.8/site-packages/datasette/cli.py", line 15, in <module> from .app import Datasette, DEFAULT_SETTINGS, SETTINGS, SQLITE_LIMIT_ATTACHED, pm File "/Users/tim/.pyenv/versions/3.8.5/envs/stock-exchange/lib/python3.8/site-packages/datasette/app.py", line 31, in <module> from .views.database import DatabaseDownload, DatabaseView File "/Users/tim/.pyenv/versions/3.8.5/envs/stock-exchange/lib/python3.8/site-packages/datasette/views/database.py", line 25, in <module> from datasette.plugins import pm File "/Users/tim/.pyenv/versions/3.8.5/envs/stock-exchange/lib/python3.8/site-packages/datasette/plugins.py", line 29, in <module> mod = importlib.import_module(plugin) File "/Users/tim/.pyenv/versions/3.8.5/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/tim/.pyenv/versions/3.8.5/envs/stock-exchange/lib/python3.8/site-packages/datasette/filters.py", line 9, in <module> @hookimpl(specname="filters_from_request") TypeError: __call__() got an unexpected keyword argument 'specname' ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997507074 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997507074 | IC_kwDOCGYnMM47dMAC | 9599 | 2021-12-20T01:10:06Z | 2021-12-20T01:16:11Z | OWNER | Work-in-progress improved help: ``` Usage: sqlite-utils insert [OPTIONS] PATH TABLE FILE Insert records from FILE into a table, creating the table if it does not already exist. By default the input is expected to be a JSON array of objects. Or: - Use --nl for newline-delimited JSON objects - Use --csv or --tsv for comma-separated or tab-separated input - Use --lines to write each incoming line to a column called "line" - Use --all to write the entire input to a column called "all" You can also use --convert to pass a fragment of Python code that will be used to convert each input. Your Python code will be passed a "row" variable representing the imported row, and can return a modified row. If you are using --lines your code will be passed a "line" variable, and for --all an "all" variable. Options: --pk TEXT Columns to use as the primary key, e.g. id --flatten Flatten nested JSON objects, so {"a": {"b": 1}} becomes {"a_b": 1} --nl Expect newline-delimited JSON -c, --csv Expect CSV input --tsv Expect TSV input --lines Treat each line as a single value called 'line' --all Treat input as a single value called 'all' --convert TEXT Python code to convert each item --import TEXT Python modules to import --delimiter TEXT Delimiter to use for CSV files --quotechar TEXT Quote character to use for CSV/TSV --sniff Detect delimiter and quote character --no-headers CSV file has no header row --batch-size INTEGER Commit every X records --alter Alter existing table to add any missing columns --not-null TEXT Columns that should be created as NOT NULL --default <TEXT TEXT>... Default value that should be set for a column --encoding TEXT Character encoding… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997508728 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997508728 | IC_kwDOCGYnMM47dMZ4 | 9599 | 2021-12-20T01:14:43Z | 2021-12-20T01:14:43Z | OWNER | (This makes me want `--extract` from #352 even more.) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/163#issuecomment-997502242 | https://api.github.com/repos/simonw/sqlite-utils/issues/163 | 997502242 | IC_kwDOCGYnMM47dK0i | 9599 | 2021-12-20T00:56:45Z | 2021-12-20T00:56:52Z | OWNER | > Maybe `sqlite-utils` should absorb all of the functionality from `sqlite-transform` - having two separate tools doesn't necessarily make sense. I implemented that in: - #251 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
706001517 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997497262 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997497262 | IC_kwDOCGYnMM47dJmu | 9599 | 2021-12-20T00:40:15Z | 2021-12-20T00:40:15Z | OWNER | `--flatten` could do with a better description too. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997496931 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997496931 | IC_kwDOCGYnMM47dJhj | 9599 | 2021-12-20T00:39:14Z | 2021-12-20T00:39:52Z | OWNER | ``` % sqlite-utils insert --help Usage: sqlite-utils insert [OPTIONS] PATH TABLE JSON_FILE Insert records from JSON file into a table, creating the table if it does not already exist. Input should be a JSON array of objects, unless --nl or --csv is used. Options: --pk TEXT Columns to use as the primary key, e.g. id --nl Expect newline-delimited JSON --flatten Flatten nested JSON objects -c, --csv Expect CSV --tsv Expect TSV --convert TEXT Python code to convert each item --import TEXT Python modules to import --delimiter TEXT Delimiter to use for CSV files --quotechar TEXT Quote character to use for CSV/TSV --sniff Detect delimiter and quote character --no-headers CSV file has no header row --batch-size INTEGER Commit every X records --alter Alter existing table to add any missing columns --not-null TEXT Columns that should be created as NOT NULL --default <TEXT TEXT>... Default value that should be set for a column --encoding TEXT Character encoding for input, defaults to utf-8 -d, --detect-types Detect types for columns in CSV/TSV data --load-extension TEXT SQLite extensions to load --silent Do not show progress bar --ignore Ignore records if pk already exists --replace Replace records if pk already exists --truncate Truncate table before inserting records, if table already exists -h, --help Show this message and exit. ``` I can add a bunch of extra help at the top there to explain all of this stuff. That "Input should be a JSON array of objects" bit could be expanded to several paragraphs. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997492872 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997492872 | IC_kwDOCGYnMM47dIiI | 9599 | 2021-12-20T00:23:31Z | 2021-12-20T00:23:31Z | OWNER | I think this should work on JSON, or CSV, or individual lines, or the entire content at once. So I'll require `--lines --convert ...` to import individual lines, or `--all --convert` to run the conversion against the entire input at once. What would `--lines` or `--all` do without `--convert`? Maybe insert records as `{"line": "line of text"}` or `{"all": "whole input}`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997486156 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997486156 | IC_kwDOCGYnMM47dG5M | 9599 | 2021-12-19T23:51:02Z | 2021-12-19T23:51:02Z | OWNER | This is going to need a `--import` multi option too. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997485361 | https://api.github.com/repos/simonw/sqlite-utils/issues/356 | 997485361 | IC_kwDOCGYnMM47dGsx | 9599 | 2021-12-19T23:45:30Z | 2021-12-19T23:45:30Z | OWNER | Really interesting example input for this: https://blog.timac.org/2021/1219-state-of-swift-and-swiftui-ios15/iOS13.txt - see https://blog.timac.org/2021/1219-state-of-swift-and-swiftui-ios15/ | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1077431957 | |
https://github.com/simonw/datasette/issues/1565#issuecomment-997474022 | https://api.github.com/repos/simonw/datasette/issues/1565 | 997474022 | IC_kwDOBm6k_c47dD7m | 9599 | 2021-12-19T22:36:49Z | 2021-12-19T22:37:29Z | OWNER | No way with a tagged template literal to pass an extra database name argument, so instead I need a method that returns a callable that can be used for the tagged template literal for a specific database - or the default database. This could work (bit weird looking though): ```javascript var rows = await datasette.query("fixtures")`select * from foo`; ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083657868 | |
https://github.com/simonw/datasette/issues/1565#issuecomment-997473856 | https://api.github.com/repos/simonw/datasette/issues/1565 | 997473856 | IC_kwDOBm6k_c47dD5A | 9599 | 2021-12-19T22:35:20Z | 2021-12-19T22:35:20Z | OWNER | Quick prototype of that tagged template `query` function: ```javascript function query(pieces, ...parameters) { var qs = new URLSearchParams(); var sql = pieces[0]; parameters.forEach((param, i) => { sql += `:p${i}${pieces[i + 1]}`; qs.append(`p${i}`, param); }); qs.append("sql", sql); return qs.toString(); } var id = 4; console.log(query`select * from ids where id > ${id}`); ``` Outputs: ``` p0=4&sql=select+*+from+ids+where+id+%3E+%3Ap0 ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083657868 | |
https://github.com/simonw/datasette/issues/1565#issuecomment-997472639 | https://api.github.com/repos/simonw/datasette/issues/1565 | 997472639 | IC_kwDOBm6k_c47dDl_ | 9599 | 2021-12-19T22:25:50Z | 2021-12-19T22:25:50Z | OWNER | Or... ```javascript rows = await datasette.query`select * from searchable where id > ${id}`; ``` And it knows how to turn that into a parameterized call using tagged template literals. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083657868 | |
https://github.com/simonw/datasette/issues/1565#issuecomment-997472509 | https://api.github.com/repos/simonw/datasette/issues/1565 | 997472509 | IC_kwDOBm6k_c47dDj9 | 9599 | 2021-12-19T22:24:50Z | 2021-12-19T22:24:50Z | OWNER | ... huh, it could even expose a JavaScript function that can be called to execute a SQL query. ```javascript datasette.query("select * from blah").then(...) ``` Maybe it takes an optional second argument that specifies the database - defaulting to the one for the current page. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083657868 | |
https://github.com/simonw/datasette/issues/1565#issuecomment-997472370 | https://api.github.com/repos/simonw/datasette/issues/1565 | 997472370 | IC_kwDOBm6k_c47dDhy | 9599 | 2021-12-19T22:23:36Z | 2021-12-19T22:23:36Z | OWNER | This should also expose the JSON API endpoints used to execute SQL against this database. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083657868 | |
https://github.com/simonw/datasette/issues/1518#issuecomment-997472214 | https://api.github.com/repos/simonw/datasette/issues/1518 | 997472214 | IC_kwDOBm6k_c47dDfW | 9599 | 2021-12-19T22:22:08Z | 2021-12-19T22:22:08Z | OWNER | I sketched out a chained SQL builder pattern that might be useful for further tidying up this code - though with the new plugin hook I'm less excited about it than I was: ```python class TableQuery: def __init__(self, table, columns, pks, is_view=False, prev=None): self.table = table self.columns = columns self.pks = pks self.is_view = is_view self.prev = prev # These can be changed for different instances in the chain: self._where_clauses = None self._order_by = None self._page_size = None self._offset = None self._select_columns = None self.select_all_columns = '*' self.select_specified_columns = '*' @property def where_clauses(self): wheres = [] current = self while current: if current._where_clauses is not None: wheres.extend(current._where_clauses) current = current.prev return list(reversed(wheres)) def where(self, where): new_cls = TableQuery(self.table, self.columns, self.pks, self.is_view, self) new_cls._where_clauses = [where] return new_cls @classmethod async def introspect(cls, db, table): return cls( table, columns = await db.table_columns(table), pks = await db.primary_keys(table), is_view = bool(await db.get_view_definition(table)) ) @property def sql_from(self): return f"from {self.table}{self.sql_where}" @property def sql_where(self): if not self.where_clauses: return "" else: return f" where {' and '.join(self.where_clauses)}" @property def sql_no_order_no_limit(self): return f"select {self.select_all_columns} from {self.table}{self.sql_where}" @property def sql(self): return f"select {self.select_specified_columns} from {se… | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1058072543 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997471672 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997471672 | IC_kwDOBm6k_c47dDW4 | 9599 | 2021-12-19T22:18:26Z | 2021-12-19T22:18:26Z | OWNER | I released this [in an alpha](https://github.com/simonw/datasette/releases/tag/0.60a1), so you can try out this fix using: pip install datasette==0.60a1 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/datasette/issues/1566#issuecomment-997470633 | https://api.github.com/repos/simonw/datasette/issues/1566 | 997470633 | IC_kwDOBm6k_c47dDGp | 9599 | 2021-12-19T22:12:00Z | 2021-12-19T22:12:00Z | OWNER | Released another alpha, 0.60a1: https://github.com/simonw/datasette/releases/tag/0.60a1 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083669410 | |
https://github.com/simonw/datasette/issues/1545#issuecomment-997462604 | https://api.github.com/repos/simonw/datasette/issues/1545 | 997462604 | IC_kwDOBm6k_c47dBJM | 9599 | 2021-12-19T21:17:08Z | 2021-12-19T21:17:08Z | OWNER | Here's the relevant code: https://github.com/simonw/datasette/blob/4094741c2881c2ada3f3f878b532fdaec7914953/datasette/app.py#L1204-L1219 It's using `route_path.split("/")` which should be OK because that's the incoming `request.path` path - which I would expect to use `/` even on Windows. Then it uses `os.path.join` which should do the right thing. I need to get myself a proper Windows development environment setup to investigate this one. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1075893249 | |
https://github.com/simonw/datasette/issues/1573#issuecomment-997462117 | https://api.github.com/repos/simonw/datasette/issues/1573 | 997462117 | IC_kwDOBm6k_c47dBBl | 9599 | 2021-12-19T21:13:13Z | 2021-12-19T21:13:13Z | OWNER | This might also be the impetus I need to bring the https://datasette.io/plugins/datasette-pretty-traces plugin into Datasette core itself. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1084185188 | |
https://github.com/simonw/datasette/issues/1547#issuecomment-997460731 | https://api.github.com/repos/simonw/datasette/issues/1547 | 997460731 | IC_kwDOBm6k_c47dAr7 | 9599 | 2021-12-19T21:02:15Z | 2021-12-19T21:02:15Z | OWNER | Yes, this is a bug. It looks like the problem is with the `if write:` branch in this code here: https://github.com/simonw/datasette/blob/5fac26aa221a111d7633f2dd92014641f7c0ade9/datasette/views/database.py#L252-L327 Is missing this bit of code: https://github.com/simonw/datasette/blob/5fac26aa221a111d7633f2dd92014641f7c0ade9/datasette/views/database.py#L343-L347 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1076388044 | |
https://github.com/simonw/datasette/issues/1570#issuecomment-997460061 | https://api.github.com/repos/simonw/datasette/issues/1570 | 997460061 | IC_kwDOBm6k_c47dAhd | 9599 | 2021-12-19T20:56:54Z | 2021-12-19T20:56:54Z | OWNER | Documentation: https://docs.datasette.io/en/latest/internals.html#await-db-execute-write-sql-params-none-block-false | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083921371 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997459958 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997459958 | IC_kwDOBm6k_c47dAf2 | 9599 | 2021-12-19T20:55:59Z | 2021-12-19T20:55:59Z | OWNER | Closing this issue because I've optimized this a whole bunch, and it's definitely good enough for the moment. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997325189 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997325189 | IC_kwDOBm6k_c47cfmF | 9599 | 2021-12-19T03:55:01Z | 2021-12-19T20:54:51Z | OWNER | It's a bit annoying that the queries no longer show up in the trace at all now, thanks to running in `.execute_fn()`. I wonder if there's something smart I can do about that - maybe have `trace()` record that function with a traceback even though it doesn't have the executed SQL string? 5fac26aa221a111d7633f2dd92014641f7c0ade9 has the same problem. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997459637 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997459637 | IC_kwDOBm6k_c47dAa1 | 9599 | 2021-12-19T20:53:46Z | 2021-12-19T20:53:46Z | OWNER | Using #1571 showed me that the `DELETE FROM columns/foreign_keys/indexes WHERE database_name = ? and table_name = ?` queries were running way more times than I expected. I came up with a new optimization that just does `DELETE FROM columns/foreign_keys/indexes WHERE database_name = ?` instead. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1566#issuecomment-997457790 | https://api.github.com/repos/simonw/datasette/issues/1566 | 997457790 | IC_kwDOBm6k_c47c_9- | 9599 | 2021-12-19T20:40:50Z | 2021-12-19T20:40:57Z | OWNER | Also release new version of `datasette-pretty-traces` with this feature: - https://github.com/simonw/datasette-pretty-traces/issues/7 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083669410 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997342494 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997342494 | IC_kwDOBm6k_c47cj0e | 9599 | 2021-12-19T07:22:04Z | 2021-12-19T07:22:04Z | OWNER | Another option would be to provide an abstraction that makes it easier to run a group of SQL queries in the same thread at the same time, and have them traced correctly. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997324666 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997324666 | IC_kwDOBm6k_c47cfd6 | 9599 | 2021-12-19T03:47:51Z | 2021-12-19T03:48:09Z | OWNER | Here's a hacked together prototype of running all of that stuff inside a single function passed to `.execute_fn()`: ```diff diff --git a/datasette/utils/internal_db.py b/datasette/utils/internal_db.py index 95055d8..58f9982 100644 --- a/datasette/utils/internal_db.py +++ b/datasette/utils/internal_db.py @@ -1,4 +1,5 @@ import textwrap +from datasette.utils import table_column_details async def init_internal_db(db): @@ -70,49 +71,70 @@ async def populate_schema_tables(internal_db, db): "DELETE FROM tables WHERE database_name = ?", [database_name], block=True ) tables = (await db.execute("select * from sqlite_master WHERE type = 'table'")).rows - tables_to_insert = [] - columns_to_delete = [] - columns_to_insert = [] - foreign_keys_to_delete = [] - foreign_keys_to_insert = [] - indexes_to_delete = [] - indexes_to_insert = [] - for table in tables: - table_name = table["name"] - tables_to_insert.append( - (database_name, table_name, table["rootpage"], table["sql"]) - ) - columns_to_delete.append((database_name, table_name)) - columns = await db.table_column_details(table_name) - columns_to_insert.extend( - { - **{"database_name": database_name, "table_name": table_name}, - **column._asdict(), - } - for column in columns - ) - foreign_keys_to_delete.append((database_name, table_name)) - foreign_keys = ( - await db.execute(f"PRAGMA foreign_key_list([{table_name}])") - ).rows - foreign_keys_to_insert.extend( - { - **{"database_name": database_name, "table_name": table_name}, - **dict(foreign_key), - } - for foreign_key in foreign_keys - ) - indexes_to_delete.append((database_name, table_name)) - indexes = (await db.execute(f"PRAGMA index_list([{table_name}])")).rows - … | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997324156 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997324156 | IC_kwDOBm6k_c47cfV8 | 9599 | 2021-12-19T03:40:05Z | 2021-12-19T03:40:05Z | OWNER | Using the prototype of this: - https://github.com/simonw/datasette-pretty-traces/issues/5 I'm seeing about 180ms spent running all of these queries on startup! ![CleanShot 2021-12-18 at 19 38 37@2x](https://user-images.githubusercontent.com/9599/146663045-46bda669-90de-474f-8870-345182725dc1.png) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321767 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321767 | IC_kwDOBm6k_c47cewn | 9599 | 2021-12-19T03:10:58Z | 2021-12-19T03:10:58Z | OWNER | I wonder how much overhead there is switching between the `async` event loop main code and the thread that runs the SQL queries. Would there be a performance boost if I gathered all of the column/index information in a single function run on the thread using `db.execute_fn()` I wonder? It would eliminate a bunch of switching between threads. Would be great to understand how much of an impact that would have. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321653 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321653 | IC_kwDOBm6k_c47ceu1 | 9599 | 2021-12-19T03:09:43Z | 2021-12-19T03:09:43Z | OWNER | On that same documentation page I just spotted this: > This feature is experimental and is subject to change. Further documentation will become available if and when the table-valued functions for PRAGMAs feature becomes officially supported. This makes me nervous to rely on pragma function optimizations in Datasette itself. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321477 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321477 | IC_kwDOBm6k_c47cesF | 9599 | 2021-12-19T03:07:33Z | 2021-12-19T03:07:33Z | OWNER | If I want to continue supporting SQLite prior to 3.16.0 (2017-01-02) I'll need this optimization to only kick in with versions that support table-valued PRAGMA functions, while keeping the old `PRAGMA foreign_key_list(table)` stuff working for those older versions. That's feasible, but it's a bit more work - and I need to make sure I have robust testing in place for SQLite 3.15.0. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321327 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321327 | IC_kwDOBm6k_c47cepv | 9599 | 2021-12-19T03:05:39Z | 2021-12-19T03:05:44Z | OWNER | This caught me out once before in: - https://github.com/simonw/datasette/issues/1276 Turns out Glitch was running SQLite 3.11.0 from 2016-02-15. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321217 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321217 | IC_kwDOBm6k_c47ceoB | 9599 | 2021-12-19T03:04:16Z | 2021-12-19T03:04:16Z | OWNER | One thing to watch out for though, from https://sqlite.org/pragma.html#pragfunc > The table-valued functions for PRAGMA feature was added in SQLite version 3.16.0 (2017-01-02). Prior versions of SQLite cannot use this feature. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997321115 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997321115 | IC_kwDOBm6k_c47cemb | 9599 | 2021-12-19T03:03:12Z | 2021-12-19T03:03:12Z | OWNER | Table columns is a bit harder, because `table_xinfo` is only in SQLite 3.26.0 or higher: https://github.com/simonw/datasette/blob/d637ed46762fdbbd8e32b86f258cd9a53c1cfdc7/datasette/utils/__init__.py#L565-L581 So if that function is available: https://latest.datasette.io/fixtures?sql=SELECT%0D%0A++sqlite_master.name%2C%0D%0A++table_xinfo.*%0D%0AFROM%0D%0A++sqlite_master%2C%0D%0A++pragma_table_xinfo%28sqlite_master.name%29+AS+table_xinfo%0D%0AWHERE%0D%0A++sqlite_master.type+%3D+%27table%27 ```sql SELECT sqlite_master.name, table_xinfo.* FROM sqlite_master, pragma_table_xinfo(sqlite_master.name) AS table_xinfo WHERE sqlite_master.type = 'table' ``` And otherwise, using `table_info`: https://latest.datasette.io/fixtures?sql=SELECT%0D%0A++sqlite_master.name%2C%0D%0A++table_info.*%2C%0D%0A++0+as+hidden%0D%0AFROM%0D%0A++sqlite_master%2C%0D%0A++pragma_table_info%28sqlite_master.name%29+AS+table_info%0D%0AWHERE%0D%0A++sqlite_master.type+%3D+%27table%27 ```sql SELECT sqlite_master.name, table_info.*, 0 as hidden FROM sqlite_master, pragma_table_info(sqlite_master.name) AS table_info WHERE sqlite_master.type = 'table' ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997320824 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997320824 | IC_kwDOBm6k_c47ceh4 | 9599 | 2021-12-19T02:59:57Z | 2021-12-19T03:00:44Z | OWNER | To list all indexes: https://latest.datasette.io/fixtures?sql=SELECT%0D%0A++sqlite_master.name%2C%0D%0A++index_list.*%0D%0AFROM%0D%0A++sqlite_master%2C%0D%0A++pragma_index_list%28sqlite_master.name%29+AS+index_list%0D%0AWHERE%0D%0A++sqlite_master.type+%3D+%27table%27 ```sql SELECT sqlite_master.name, index_list.* FROM sqlite_master, pragma_index_list(sqlite_master.name) AS index_list WHERE sqlite_master.type = 'table' ``` Foreign keys: https://latest.datasette.io/fixtures?sql=SELECT%0D%0A++sqlite_master.name%2C%0D%0A++foreign_key_list.*%0D%0AFROM%0D%0A++sqlite_master%2C%0D%0A++pragma_foreign_key_list%28sqlite_master.name%29+AS+foreign_key_list%0D%0AWHERE%0D%0A++sqlite_master.type+%3D+%27table%27 ```sql SELECT sqlite_master.name, foreign_key_list.* FROM sqlite_master, pragma_foreign_key_list(sqlite_master.name) AS foreign_key_list WHERE sqlite_master.type = 'table' ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1566#issuecomment-997272328 | https://api.github.com/repos/simonw/datasette/issues/1566 | 997272328 | IC_kwDOBm6k_c47cSsI | 9599 | 2021-12-18T19:18:01Z | 2021-12-18T19:18:01Z | OWNER | Added some useful new documented internal methods in: - #1570 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083669410 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997272223 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997272223 | IC_kwDOBm6k_c47cSqf | 9599 | 2021-12-18T19:17:13Z | 2021-12-18T19:17:13Z | OWNER | That's a good optimization. Still need to deal with the huge flurry of `PRAGMA` queries though before I can consider this done. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1570#issuecomment-997267583 | https://api.github.com/repos/simonw/datasette/issues/1570 | 997267583 | IC_kwDOBm6k_c47cRh_ | 9599 | 2021-12-18T18:46:05Z | 2021-12-18T18:46:12Z | OWNER | This will replace the work done in #1569. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083921371 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997267416 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997267416 | IC_kwDOBm6k_c47cRfY | 9599 | 2021-12-18T18:44:53Z | 2021-12-18T18:45:28Z | OWNER | Rather than adding a `executemany=True` parameter, I'm now thinking a better design might be to have three methods: - `db.execute_write(sql, params=None, block=False)` - `db.execute_writescript(sql, block=False)` - `db.execute_writemany(sql, params_seq, block=False)` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 | |
https://github.com/simonw/datasette/issues/1569#issuecomment-997266687 | https://api.github.com/repos/simonw/datasette/issues/1569 | 997266687 | IC_kwDOBm6k_c47cRT_ | 9599 | 2021-12-18T18:41:40Z | 2021-12-18T18:41:40Z | OWNER | Updated documentation: https://docs.datasette.io/en/latest/internals.html#await-db-execute-write-sql-params-none-executescript-false-block-false | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1083895395 | |
https://github.com/simonw/datasette/issues/1555#issuecomment-997266100 | https://api.github.com/repos/simonw/datasette/issues/1555 | 997266100 | IC_kwDOBm6k_c47cRK0 | 9599 | 2021-12-18T18:40:02Z | 2021-12-18T18:40:02Z | OWNER | The implementation of `cursor.executemany()` looks very efficient - it turns into a call to this C function with `multiple` set to `1`: https://github.com/python/cpython/blob/e002bbc6cce637171fb2b1391ffeca8643a13843/Modules/_sqlite/cursor.c#L468-L469 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1079149656 |