github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue
https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212	https://api.github.com/repos/simonw/datasette/issues/1836	1271003212	IC_kwDOBm6k_c5LwfhM	536941	2022-10-07T01:52:04Z	2022-10-07T01:52:04Z	CONTRIBUTOR	and if we try immutable mode, which is how things are opened by `datasette inspect` we duplicate the files!!! ```python # test_sql_immutable.py import sqlite3 import sys db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400374908
https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795	https://api.github.com/repos/simonw/datasette/issues/1836	1270992795	IC_kwDOBm6k_c5Lwc-b	536941	2022-10-07T01:29:15Z	2022-10-07T01:50:14Z	CONTRIBUTOR	fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0 ```python # test_sql_ro.py import sqlite3 import sys db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ``` that's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400374908
https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081	https://api.github.com/repos/simonw/datasette/issues/1836	1270988081	IC_kwDOBm6k_c5Lwb0x	536941	2022-10-07T01:19:01Z	2022-10-07T01:27:35Z	CONTRIBUTOR	okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file. make a little test script like this: ```python # test_sql.py import sqlite3 import sys db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ``` then ```docker RUN python test_sql.py nlrb.db ``` produced a layer that's the same size as `nlrb.db`!!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400374908
https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982	https://api.github.com/repos/simonw/datasette/issues/1836	1270936982	IC_kwDOBm6k_c5LwPWW	536941	2022-10-07T00:52:41Z	2022-10-07T00:52:41Z	CONTRIBUTOR	it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the "inspect" layer still has the same very large size.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400374908
https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537	https://api.github.com/repos/simonw/datasette/issues/1836	1270923537	IC_kwDOBm6k_c5LwMER	536941	2022-10-07T00:46:08Z	2022-10-07T00:46:08Z	CONTRIBUTOR	i thought it was maybe to do with reading through all the files, but that does not seem to be the case if i make a little test file like: ```python # test_read.py import hashlib import sys import pathlib HASH_BLOCK_SIZE = 1024 * 1024 def inspect_hash(path): """Calculate the hash of a database, efficiently.""" m = hashlib.sha256() with path.open("rb") as fp: while True: data = fp.read(HASH_BLOCK_SIZE) if not data: break m.update(data) return m.hexdigest() inspect_hash(pathlib.Path(sys.argv[1])) ``` then a line in the Dockerfile like ```docker RUN python test_read.py nlrb.db && echo "[]" > /etc/inspect.json ``` just produes a layer of `3B`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400374908
https://github.com/simonw/datasette/pull/1837#issuecomment-1270855853	https://api.github.com/repos/simonw/datasette/issues/1837	1270855853	IC_kwDOBm6k_c5Lv7it	22429695	2022-10-07T00:01:20Z	2022-10-07T00:01:20Z	NONE	# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report Base: 92.50% // Head: 92.50% // No change to project coverage :thumbsup: > Coverage data is based on head [(`c12447e`)](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`eff1124`)](https://codecov.io/gh/simonw/datasette/commit/eff112498ecc499323c26612d707908831446d25?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > Patch has no changes to coverable lines. <details><summary>Additional details and impacted files</summary> ```diff @@ Coverage Diff @@ ## main #1837 +/- ## ======================================= Coverage 92.50% 92.50% ======================================= Files 35 35 Lines 4400 4400 ======================================= Hits 4070 4070 Misses 330 330 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) </details> [:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). :loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400431789
https://github.com/simonw/datasette/pull/1835#issuecomment-1270595328	https://api.github.com/repos/simonw/datasette/issues/1835	1270595328	IC_kwDOBm6k_c5Lu78A	22429695	2022-10-06T19:42:25Z	2022-10-06T19:42:25Z	NONE	# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report Base: 91.71% // Head: 92.50% // Increases project coverage by `+0.78%` :tada: > Coverage data is based on head [(`b4b92df`)](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`cb1e093`)](https://codecov.io/gh/simonw/datasette/commit/cb1e093fd361b758120aefc1a444df02462389a3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > Patch has no changes to coverable lines. <details><summary>Additional details and impacted files</summary> ```diff @@ Coverage Diff @@ ## main #1835 +/- ## ========================================== + Coverage 91.71% 92.50% +0.78% ========================================== Files 38 35 -3 Lines 4754 4400 -354 ========================================== - Hits 4360 4070 -290 + Misses 394 330 -64 ``` \| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) \| Coverage Δ \| \| \|---\|---\|---\| \| [datasette/database.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2RhdGFiYXNlLnB5) \| \| \| \| [datasette/utils/shutil\_backport.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3NodXRpbF9iYWNrcG9ydC5weQ==) \| \| \| \| [datasette/\_\_…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400121355
https://github.com/simonw/datasette/pull/1835#issuecomment-1270586897	https://api.github.com/repos/simonw/datasette/issues/1835	1270586897	IC_kwDOBm6k_c5Lu54R	9599	2022-10-06T19:34:00Z	2022-10-06T19:34:00Z	OWNER	Wow, great catch! The whole point of inspect data was to avoid this kind of expensive operation on startup so this makes total sense - I had no idea Datasette was still trying to hash a giant file every time the server started.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1400121355
https://github.com/simonw/datasette/issues/1480#issuecomment-1269847461	https://api.github.com/repos/simonw/datasette/issues/1480	1269847461	IC_kwDOBm6k_c5LsFWl	536941	2022-10-06T11:21:49Z	2022-10-06T11:21:49Z	CONTRIBUTOR	thanks @simonw, i'll spend a little more time trying to figure out why this isn't working on cloudrun, and then will flip over to fly if i can't.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1015646369
https://github.com/simonw/datasette/issues/1480#issuecomment-1269275153	https://api.github.com/repos/simonw/datasette/issues/1480	1269275153	IC_kwDOBm6k_c5Lp5oR	9599	2022-10-06T03:54:33Z	2022-10-06T03:54:33Z	OWNER	I've been having success using Fly recently for a project which I thought would be too large for Cloud Run. I wrote about that here: - https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1015646369
https://github.com/simonw/datasette/issues/1480#issuecomment-1268629159	https://api.github.com/repos/simonw/datasette/issues/1480	1268629159	IC_kwDOBm6k_c5Lnb6n	536941	2022-10-05T16:00:55Z	2022-10-05T16:00:55Z	CONTRIBUTOR	as a next step, i'll fetch the docker image from the google registry, and see what memory and disk usage looks like when i run it locally.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1015646369
https://github.com/simonw/datasette/issues/1480#issuecomment-1268613335	https://api.github.com/repos/simonw/datasette/issues/1480	1268613335	IC_kwDOBm6k_c5LnYDX	536941	2022-10-05T15:45:49Z	2022-10-05T15:45:49Z	CONTRIBUTOR	running into this as i continue to grow my labor data warehouse. Here a CloudRun PM says the container size should not count against memory: https://stackoverflow.com/a/56570717	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1015646369
https://github.com/simonw/datasette/issues/1824#issuecomment-1268398461	https://api.github.com/repos/simonw/datasette/issues/1824	1268398461	IC_kwDOBm6k_c5Lmjl9	562352	2022-10-05T12:55:05Z	2022-10-05T12:55:05Z	NONE	Here is some working javascript code. There might be better solution, I'm not a JS expert. ```javascript var show_hide = document.querySelector(".show-hide-sql > a"); // Hide SQL query if the URL opened with #_hide_sql var hash = window.location.hash; if(hash === "#_hide_sql") { hide_sql(); } show_hide.setAttribute("href", "#"); show_hide.addEventListener("click", toggle_sql_display); function toggle_sql_display() { if (show_hide.innerText === "hide") { hide_sql(); return; } if (show_hide.innerText === "show") { show_sql(); return; } } function hide_sql() { sql_element.style.cssText="display:none"; show_hide.innerHTML = "show"; show_hide.setAttribute("href", "#_hide_sql"); } function show_sql() { sql_element.style.cssText="display:block"; show_hide.innerHTML = "hide"; show_hide.setAttribute("href", "#_show_sql"); } ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1387712501
https://github.com/simonw/datasette/pull/1823#issuecomment-1258833358	https://api.github.com/repos/simonw/datasette/issues/1823	1258833358	IC_kwDOBm6k_c5LCEXO	22429695	2022-09-27T00:54:15Z	2022-10-05T04:37:54Z	NONE	# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report Base: 91.58% // Head: 92.50% // Increases project coverage by `+0.91%` :tada: > Coverage data is based on head [(`b545b6a`)](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`5f9f567`)](https://codecov.io/gh/simonw/datasette/commit/5f9f567acbc58c9fcd88af440e68034510fb5d2b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > Patch coverage: 90.47% of modified lines in pull request are covered. <details><summary>Additional details and impacted files</summary> ```diff @@ Coverage Diff @@ ## main #1823 +/- ## ========================================== + Coverage 91.58% 92.50% +0.91% ========================================== Files 36 35 -1 Lines 4444 4400 -44 ========================================== Hits 4070 4070 + Misses 374 330 -44 ``` \| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) \| Coverage Δ \| \| \|---\|---\|---\| \| [datasette/utils/asgi.py](https://codecov.io/gh/simonw/datasette/pull/1823/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL2FzZ2kucHk=) \| `91.06% <88.23%> (ø)` \| \| \| [datasette/app.py](https://codecov.io/gh/simonw/datasette/pull/1823/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2FwcC5weQ==) \| `94.11%…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386917344
https://github.com/simonw/datasette/issues/1832#issuecomment-1267925830	https://api.github.com/repos/simonw/datasette/issues/1832	1267925830	IC_kwDOBm6k_c5LkwNG	9599	2022-10-05T04:31:57Z	2022-10-05T04:31:57Z	OWNER	Turns out this already works - `__bool__` falls back on `__len__`: https://docs.python.org/3/reference/datamodel.html#object.__bool__ > When this method is not defined, [`__len__()`](https://docs.python.org/3/reference/datamodel.html#object.__len__ "object.__len__") is called, if it is defined, and the object is considered true if its result is nonzero. I'll add a test to demonstrate this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1397193691
https://github.com/simonw/datasette/issues/1832#issuecomment-1267918117	https://api.github.com/repos/simonw/datasette/issues/1832	1267918117	IC_kwDOBm6k_c5LkuUl	9599	2022-10-05T04:19:52Z	2022-10-05T04:19:52Z	OWNER	Code can go here: https://github.com/simonw/datasette/blob/b6ba117b7978b58b40e3c3c2b723b92c3010ed53/datasette/database.py#L511-L515	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1397193691
https://github.com/simonw/datasette/issues/1829#issuecomment-1267709546	https://api.github.com/repos/simonw/datasette/issues/1829	1267709546	IC_kwDOBm6k_c5Lj7Zq	9599	2022-10-04T23:19:24Z	2022-10-04T23:21:07Z	OWNER	There's also a `check_visibility()` helper which I'm not using in these particular cases but which may be relevant. It's called like this: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L65-L77 And is defined here: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/app.py#L694-L710 It's actually documented as a public method here: https://docs.datasette.io/en/stable/internals.html#await-check-visibility-actor-action-resource-none > This convenience method can be used to answer the question "should this item be considered private, in that it is visible to me but it is not visible to anonymous users?" > > It returns a tuple of two booleans, `(visible, private)`. `visible` indicates if the actor can see this resource. `private` will be `True` if an anonymous user would not be able to view the resource. Note that this documented method cannot actually do the right thing - because it's not being given the multiple permissions that need to be checked in order to completely answer the question. So I probably need to redesign that method a bit.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1396948693
https://github.com/simonw/datasette/issues/1829#issuecomment-1267708232	https://api.github.com/repos/simonw/datasette/issues/1829	1267708232	IC_kwDOBm6k_c5Lj7FI	9599	2022-10-04T23:17:36Z	2022-10-04T23:17:36Z	OWNER	Here's the relevant code from the table page: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/table.py#L215-L227 Note how `ensure_permissions()` there takes the table, database and instance into account... but the `private` assignment (used to decide if the padlock should display or not) only considers the `view-table` check. Here's the same code for the database page: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L139-L141 And for canned query pages: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L228-L240	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1396948693
https://github.com/dogsheep/github-to-sqlite/pull/65#issuecomment-1266141699	https://api.github.com/repos/dogsheep/github-to-sqlite/issues/65	1266141699	IC_kwDODFdgUs5Ld8oD	231498	2022-10-03T22:35:03Z	2022-10-03T22:35:03Z	NONE	@simonw rebased against latest, please let me know if i should drop this PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	923270900
https://github.com/simonw/datasette/issues/1805#issuecomment-1265161668	https://api.github.com/repos/simonw/datasette/issues/1805	1265161668	IC_kwDOBm6k_c5LaNXE	562352	2022-10-03T09:18:05Z	2022-10-03T09:18:05Z	NONE	> I'm tempted to add `word-wrap: anywhere` only to links that are know to be longer than a certain threshold. Make sense IMHO.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363552780
https://github.com/simonw/datasette/issues/485#issuecomment-1264769569	https://api.github.com/repos/simonw/datasette/issues/485	1264769569	IC_kwDOBm6k_c5LYtoh	9599	2022-10-03T00:04:42Z	2022-10-03T00:04:42Z	OWNER	I love these tips - tools that can compile a simple machine learning model to a SQL query! Would be pretty cool if I could bundle a model in Datasette itself as a big in-memory SQLite SQL query: - https://github.com/Chryzanthemum/xgb2sql - https://github.com/konstantint/SKompiler	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	447469253
https://github.com/simonw/datasette/issues/1805#issuecomment-1264753894	https://api.github.com/repos/simonw/datasette/issues/1805	1264753894	IC_kwDOBm6k_c5LYpzm	9599	2022-10-02T23:02:54Z	2022-10-02T23:02:54Z	OWNER	I'm tempted to add `word-wrap: anywhere` only to links that are know to be longer than a certain threshold.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363552780
https://github.com/simonw/datasette/issues/1805#issuecomment-1264753725	https://api.github.com/repos/simonw/datasette/issues/1805	1264753725	IC_kwDOBm6k_c5LYpw9	9599	2022-10-02T23:02:17Z	2022-10-02T23:02:17Z	OWNER	After reverting `word--wrap anywhere` https://latest.datasette.io/_memory?sql=select+%27https%3A%2F%2Fexample.com%2Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363552780
https://github.com/simonw/datasette/issues/1828#issuecomment-1264753439	https://api.github.com/repos/simonw/datasette/issues/1828	1264753439	IC_kwDOBm6k_c5LYpsf	9599	2022-10-02T23:01:17Z	2022-10-02T23:01:17Z	OWNER	That change deployed and https://github-to-sqlite.dogsheep.net/github/commits now looks like this: <img width="1388" alt="image" src="https://user-images.githubusercontent.com/9599/193480158-de81ac0a-5cb2-4d53-a75c-025c78f293ee.png">	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1393903845
https://github.com/simonw/datasette/issues/1828#issuecomment-1264738081	https://api.github.com/repos/simonw/datasette/issues/1828	1264738081	IC_kwDOBm6k_c5LYl8h	9599	2022-10-02T21:34:37Z	2022-10-02T21:34:37Z	OWNER	I'm running a build of that demo instance here (takes ~30m) https://github.com/dogsheep/github-to-sqlite/actions/runs/3170164705	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1393903845
https://github.com/simonw/datasette/issues/485#issuecomment-1264737290	https://api.github.com/repos/simonw/datasette/issues/485	1264737290	IC_kwDOBm6k_c5LYlwK	9599	2022-10-02T21:29:59Z	2022-10-02T21:29:59Z	OWNER	To clarify: the feature this issue is talking about relates to the way Datasette automatically displays foreign key relationships, for example on this page: https://github-to-sqlite.dogsheep.net/github/commits <img width="1233" alt="image" src="https://user-images.githubusercontent.com/9599/193476985-d41148cf-2b2f-49b9-b717-e92145afab31.png"> Each of those columns is a foreign key to another table. The link text that is displayed there comes from the "label column" that has either been configured or automatically detected for that other table. I wonder if this could be handled with a tiny machine learning model that's trained to help pick the best label column? Inputs to that model could include: - The names of the columns - The number of unique values in each column - The type of each column (or maybe only `TEXT` columns should be considered) - How many `null` values there are - Is the column marked as unique? - What's the average (or median or some other statistic) string length of values in each column? Output would be the most likely label column, or some indicator that no likely candidates had been found. My hunch is that this would be better solved using a few extra heuristics rather than by training a model, but it does feel like an interesting opportunity to experiment with a tiny ML model. Asked for tips about this on Twitter: https://twitter.com/simonw/status/1576680930680262658	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	447469253
https://github.com/simonw/datasette/issues/1805#issuecomment-1264736537	https://api.github.com/repos/simonw/datasette/issues/1805	1264736537	IC_kwDOBm6k_c5LYlkZ	9599	2022-10-02T21:25:37Z	2022-10-02T21:25:37Z	OWNER	`word-wrap: anywhere` had some nasty side-effects, removing that: - #1828	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363552780
https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223554	https://api.github.com/repos/simonw/sqlite-utils/issues/409	1264223554	IC_kwDOCGYnMM5LWoVC	7908073	2022-10-01T03:42:50Z	2022-10-01T03:42:50Z	CONTRIBUTOR	oh weird. it inserts into db2	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1149661489
https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223363	https://api.github.com/repos/simonw/sqlite-utils/issues/409	1264223363	IC_kwDOCGYnMM5LWoSD	7908073	2022-10-01T03:41:45Z	2022-10-01T03:41:45Z	CONTRIBUTOR	``` pytest xklb/check.py --pdb xklb/check.py:11: in test_transaction assert list(db2["t"].rows) == [] E AssertionError: assert [{'foo': 1}] == [] E + where [{'foo': 1}] = list(<generator object Queryable.rows_where at 0x7f2d84d1f0d0>) E + where <generator object Queryable.rows_where at 0x7f2d84d1f0d0> = <Table t (foo)>.rows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > /home/xk/github/xk/lb/xklb/check.py(11)test_transaction() 9 with db1.conn: 10 db1["t"].insert({"foo": 1}) ---> 11 assert list(db2["t"].rows) == [] 12 assert list(db2["t"].rows) == [{"foo": 1}] ``` It fails because it is already inserted. btw if you put these two lines in you pyproject.toml you can get `ipdb` in pytest ``` [tool.pytest.ini_options] addopts = "--pdbcls=IPython.terminal.debugger:TerminalPdb --ignore=tests/data --capture=tee-sys --log-cli-level=ERROR" ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1149661489
https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1264219650	https://api.github.com/repos/simonw/sqlite-utils/issues/493	1264219650	IC_kwDOCGYnMM5LWnYC	7908073	2022-10-01T03:22:50Z	2022-10-01T03:23:58Z	CONTRIBUTOR	this is likely what you are looking for: https://stackoverflow.com/a/51076749/697964 but yeah I would say just disable smart quotes	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386562662
https://github.com/simonw/datasette/pull/1827#issuecomment-1263570186	https://api.github.com/repos/simonw/datasette/issues/1827	1263570186	IC_kwDOBm6k_c5LUI0K	22429695	2022-09-30T13:22:15Z	2022-09-30T13:22:15Z	NONE	# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report Base: 92.50% // Head: 92.50% // No change to project coverage :thumbsup: > Coverage data is based on head [(`1f0c557`)](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`34defdc`)](https://codecov.io/gh/simonw/datasette/commit/34defdc10aa293294ca01cfab70780755447e1d7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > Patch has no changes to coverable lines. <details><summary>Additional details and impacted files</summary> ```diff @@ Coverage Diff @@ ## main #1827 +/- ## ======================================= Coverage 92.50% 92.50% ======================================= Files 35 35 Lines 4400 4400 ======================================= Hits 4070 4070 Misses 330 330 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) </details> [:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). :loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1392426838
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262920929	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262920929	IC_kwDOCGYnMM5LRqTh	9599	2022-09-29T23:06:44Z	2022-09-29T23:06:44Z	OWNER	Currently the only other use of `-t` is for this: ``` -t, --table Output as a formatted table ``` So I think it's OK to use it to mean something slightly different for this command, since `sqlite-utils insert` doesn't do any output of data in any format.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262918833	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262918833	IC_kwDOCGYnMM5LRpyx	9599	2022-09-29T23:02:52Z	2022-09-29T23:02:52Z	OWNER	The other nice thing about having this as a separate command is that I can implement a tiny subset of the overall `sqlite-utils insert` features at first, and then add additional features in subsequent releases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262917059	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262917059	IC_kwDOCGYnMM5LRpXD	9599	2022-09-29T22:59:28Z	2022-09-29T22:59:28Z	OWNER	I quite like `sqlite-utils fast-csv` - I think it's clear enough what it does, and running `--help` can clarify if needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262915322	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262915322	IC_kwDOCGYnMM5LRo76	9599	2022-09-29T22:57:31Z	2022-09-29T22:57:42Z	OWNER	Maybe `sqlite-utils fast-csv` is right? Not entirely clear that's an insert though as opposed to a faster version of in-memory querying in the style of `sqlite-utils memory`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262914416	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262914416	IC_kwDOCGYnMM5LRotw	9599	2022-09-29T22:56:53Z	2022-09-29T22:56:53Z	OWNER	Potential names/designs: - `sqlite-utils fast data.db rows rows.csv` - `sqlite-utils insert-fast data.db rows rows.csv` - `sqlite-utils fast-csv data.db rows rows.csv` Or more interestingly... what if it could accept multiple CSV files to create multiple tables? - `sqlite-utils fast data.db rows.csv other.csv` Would still need to support creating tables with different names though. Maybe like this: - `sqlite-utils fast data.db -t mytable rows.csv -t othertable other.csv` I seem to be leaning towards `fast` as the command name, but as a standalone command name it's a bit meaningless - how do we know that's about CSV import and not about fast querying or similar?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262913145	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262913145	IC_kwDOCGYnMM5LRoZ5	9599	2022-09-29T22:54:13Z	2022-09-29T22:54:13Z	OWNER	After reviewing `sqlite-utils insert --help` I'm confident that MOST of these options wouldn't make sense for a "fast" moder that just supports CSV and works by piping directly to the `sqlite3` binary: https://github.com/simonw/sqlite-utils/blob/d792dad1cf5f16525da81b1e162fb71d469995f3/docs/cli-reference.rst#L251-L279 I'm going to implement a separate command instead.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/datasette/issues/370#issuecomment-1261930179	https://api.github.com/repos/simonw/datasette/issues/370	1261930179	IC_kwDOBm6k_c5LN4bD	72577720	2022-09-29T08:17:46Z	2022-09-29T08:17:46Z	CONTRIBUTOR	Just watched this video which demonstrates the integration of any webapp into JupyterLab: https://youtu.be/FH1dKKmvFtc Maybe this is the answer?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	377155320
https://github.com/simonw/datasette/issues/1624#issuecomment-1261194164	https://api.github.com/repos/simonw/datasette/issues/1624	1261194164	IC_kwDOBm6k_c5LLEu0	38532	2022-09-28T16:54:22Z	2022-09-28T16:54:22Z	NONE	https://github.com/simonw/datasette-cors seems to workaround this	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1122427321
https://github.com/simonw/datasette/issues/1062#issuecomment-1260909128	https://api.github.com/repos/simonw/datasette/issues/1062	1260909128	IC_kwDOBm6k_c5LJ_JI	536941	2022-09-28T13:22:53Z	2022-09-28T14:09:54Z	CONTRIBUTOR	if you went this route: ```python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) for chunk in c.fetchmany(chunk_size): yield from chunk ``` then `time_limit_ms` would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing. i wonder if this was why you were thinking this feature would need a dedicated connection? --- reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors). maybe something like this would be okay? ```python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) # step through at least one to evaluate the statement, not sure if this is necessary yield c.execute.fetchone() for chunk in c.fetchmany(chunk_size): yield from chunk ``` this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	732674148
https://github.com/simonw/datasette/issues/1062#issuecomment-1260829829	https://api.github.com/repos/simonw/datasette/issues/1062	1260829829	IC_kwDOBm6k_c5LJryF	536941	2022-09-28T12:27:19Z	2022-09-28T12:27:19Z	CONTRIBUTOR	for teaching `register_output_renderer` to stream it seems like the two options are to 1. a [nested query technique ](https://github.com/simonw/datasette/issues/526#issuecomment-505162238)to paginate through 2. a fetching model that looks like something ```python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) for chunk in c.fetchmany(chunk_size): yield from chunk ``` currently `db.execute` is not a generator, so this would probably need a new method?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	732674148
https://github.com/simonw/datasette/issues/1826#issuecomment-1260373403	https://api.github.com/repos/simonw/datasette/issues/1826	1260373403	IC_kwDOBm6k_c5LH8Wb	66709385	2022-09-28T04:30:27Z	2022-09-28T04:30:27Z	NONE	I'm glad the bug report served some purpose. Frankly I just needed the method signature, that is why the documentation you mention wasn't read. On Tue, Sep 27, 2022, 9:05 PM Simon Willison *@.> wrote: > Though now I notice that the copy right there needs to be updated to > reflect the new row parameter to render_cell! > > — > Reply to this email directly, view it on GitHub > <https://github.com/simonw/datasette/issues/1826#issuecomment-1260357878>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AP46PCLI6KVDQTFLWVZODRLWAO72JANCNFSM6AAAAAAQXKWIJA> > . > You are receiving this because you authored the thread.Message ID: > @.*> >	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1388631785
https://github.com/simonw/datasette/pull/1825#issuecomment-1260368537	https://api.github.com/repos/simonw/datasette/issues/1825	1260368537	IC_kwDOBm6k_c5LH7KZ	9599	2022-09-28T04:21:18Z	2022-09-28T04:21:18Z	OWNER	This is great, thank you very much! https://datasette--1825.org.readthedocs.build/en/1825/deploying.html#running-datasette-using-openrc	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1388227245
https://github.com/simonw/datasette/pull/1825#issuecomment-1260368122	https://api.github.com/repos/simonw/datasette/issues/1825	1260368122	IC_kwDOBm6k_c5LH7D6	22429695	2022-09-28T04:20:28Z	2022-09-28T04:20:28Z	NONE	# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report Base: 91.58% // Head: 91.58% // No change to project coverage :thumbsup: > Coverage data is based on head [(`b16eb2f`)](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`5f9f567`)](https://codecov.io/gh/simonw/datasette/commit/5f9f567acbc58c9fcd88af440e68034510fb5d2b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > Patch has no changes to coverable lines. > :exclamation: Current head b16eb2f differs from pull request most recent head e7e96dc. Consider uploading reports for the commit e7e96dc to get more accurate results <details><summary>Additional details and impacted files</summary> ```diff @@ Coverage Diff @@ ## main #1825 +/- ## ======================================= Coverage 91.58% 91.58% ======================================= Files 36 36 Lines 4444 4444 ======================================= Hits 4070 4070 Misses 374 374 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) </details> [:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1388227245
https://github.com/simonw/datasette/issues/1826#issuecomment-1260357878	https://api.github.com/repos/simonw/datasette/issues/1826	1260357878	IC_kwDOBm6k_c5LH4j2	9599	2022-09-28T04:05:45Z	2022-09-28T04:05:45Z	OWNER	Though now I notice that the copy right there needs to be updated to reflect the new `row` parameter to `render_cell`!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1388631785
https://github.com/simonw/datasette/issues/1826#issuecomment-1260357583	https://api.github.com/repos/simonw/datasette/issues/1826	1260357583	IC_kwDOBm6k_c5LH4fP	9599	2022-09-28T04:05:16Z	2022-09-28T04:05:16Z	OWNER	This is deliberate. The Datasette plugin system allows you to specify only a subset of the parameters for a hook - in this example, only the `value` is needed so the others can be omitted. There's a note about this at the very top of that documentation page: https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hooks > When you implement a plugin hook you can accept any or all of the parameters that are documented as being passed to that hook. > > For example, you can implement the `render_cell` plugin hook like this even though the full documented hook signature is `render_cell(value, column, table, database, datasette)`: > ```python > @hookimpl > def render_cell(value, column): > if column == "stars": > return "" int(value) > ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1388631785
https://github.com/simonw/datasette/issues/526#issuecomment-1260355224	https://api.github.com/repos/simonw/datasette/issues/526	1260355224	IC_kwDOBm6k_c5LH36Y	9599	2022-09-28T04:01:25Z	2022-09-28T04:01:25Z	OWNER	The ultimate protection against those memory bombs is to support more streaming output formats. Related issues: - #1177 - #1062	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1259718517	https://api.github.com/repos/simonw/datasette/issues/526	1259718517	IC_kwDOBm6k_c5LFcd1	536941	2022-09-27T16:02:51Z	2022-09-27T16:04:46Z	CONTRIBUTOR	i think that `max_returned_rows` is a defense mechanism, just not for connection exhaustion. `max_returned_rows` is a defense mechanism against memory bombs. if you are potentially yielding out hundreds of thousands or even millions of rows, you need to be quite careful about data flow to not run out of memory on the server, or on the client. you have a lot of places in your code that are protective of that right now, but `max_returned_rows` acts as the final backstop. so, given that, it makes sense to have removing `max_returned_rows` altogether be a non-goal, but instead allow for for specific codepaths (like streaming csv's) be able to bypass. that could dramatically lower the surface area for a memory-bomb attack.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1259693536	https://api.github.com/repos/simonw/datasette/issues/526	1259693536	IC_kwDOBm6k_c5LFWXg	9599	2022-09-27T15:42:55Z	2022-09-27T15:42:55Z	OWNER	It's interesting to note WHY the time limit works against this so well. The time limit as-implemented looks like this: https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201 The key here is `conn.set_progress_handler(handler, n)` - which specifies that the handler function should be called every `n` SQLite operations. The handler function then checks to see if too much time has transpired and conditionally cancels the query. This also doubles up as a "maximum number of operations" guard, which is what's happening when you attempt to fetch an infinite number of rows from an infinite table. That limit code could even be extended to say "exit the query after either 5s or 50,000,000 operations". I don't think that's necessary though. To be honest I'm having trouble with the idea of dropping `max_returned_rows` mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I've designed in multiple redundant defence-in-depth mechanisms right from the start.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258910228	https://api.github.com/repos/simonw/datasette/issues/526	1258910228	IC_kwDOBm6k_c5LCXIU	536941	2022-09-27T03:11:07Z	2022-09-27T03:11:07Z	CONTRIBUTOR	i think this feature would be safe, as its really only the time limit that can, and imo, should protect against long running queries, as it is pretty easy to make very expensive queries that don't return many rows. moving away from `max_returned_rows` will requires some thinking about: 1. memory usage and data flows to handle potentially very large result sets 2. how to avoid rendering tens or hundreds of thousands of [html rows](#1655).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258906440	https://api.github.com/repos/simonw/datasette/issues/526	1258906440	IC_kwDOBm6k_c5LCWNI	9599	2022-09-27T03:04:37Z	2022-09-27T03:04:37Z	OWNER	It would be really neat if we could explore this idea in a plugin, but I don't think Datasette has plugin hooks in the right place for that at the moment.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258905781	https://api.github.com/repos/simonw/datasette/issues/526	1258905781	IC_kwDOBm6k_c5LCWC1	9599	2022-09-27T03:03:35Z	2022-09-27T03:03:47Z	OWNER	Yes good point, the time limit does already protect against that. I've been contemplating a permissioned-users-only relaxation of that time limit too, and I got that idea mixed up with this one in my head. On that basis maybe this feature would be safe after all? Would need to do some testing, but it may be that the existing time limit provides enough protection here already.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258878311	https://api.github.com/repos/simonw/datasette/issues/526	1258878311	IC_kwDOBm6k_c5LCPVn	536941	2022-09-27T02:19:48Z	2022-09-27T02:19:48Z	CONTRIBUTOR	this sql query doesn't trip up `maximum_returned_rows` but does timeout ```sql with recursive counter(x) as ( select 0 union select x + 1 from counter ) select * from counter LIMIT 10 OFFSET 100000000 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258871525	https://api.github.com/repos/simonw/datasette/issues/526	1258871525	IC_kwDOBm6k_c5LCNrl	536941	2022-09-27T02:09:32Z	2022-09-27T02:14:53Z	CONTRIBUTOR	thanks @simonw, i learned something i didn't know about sqlite's execution model! > Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever. why wouldn't the `sqlite_timelimit` guard prevent that? --- on my local version which has the code to [turn off truncations for query csv](#1820), `sqlite_timelimit` does protect me. ![Screenshot 2022-09-26 at 22-14-31 Error 500](https://user-images.githubusercontent.com/536941/192415680-94b32b7f-868f-4b89-8194-5752d45f6009.png)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258864140	https://api.github.com/repos/simonw/datasette/issues/526	1258864140	IC_kwDOBm6k_c5LCL4M	9599	2022-09-27T01:55:32Z	2022-09-27T01:55:32Z	OWNER	That recursive query is a great example of the kind of thing having a maximum row limit protects against. Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever. Even if this feature becomes a permission-guarded thing we still need to take that case into account. At the very least it would be good if the query could be cancelled if the client disconnects - so if someone accidentally starts an infinite query they can cancel the request and free up the server resources. It might be a good idea to implement a page that shows "currently running" queries and allows users with the right permission to terminate them from that page. Another option: a "limit of last resource" - either a very high row limit (10,000,000 perhaps) or even a time limit, saying that all queries will be cancelled if they take longer than thirty minutes or similar.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258860845	https://api.github.com/repos/simonw/datasette/issues/526	1258860845	IC_kwDOBm6k_c5LCLEt	9599	2022-09-27T01:48:31Z	2022-09-27T01:50:01Z	OWNER	The protection is supposed to be from this line: ```python rows = cursor.fetchmany(max_returned_rows + 1) ``` By capping the call to `.fetchman()` at `max_returned_rows + 1` (the `+ 1` is to allow detection of whether or not there is a next page) I'm ensuring that Datasette never attempts to iterate over a huge result set. SQLite and the `sqlite3` library seem to handle this correctly. Here's an example: ```pycon >>> import sqlite3 >>> conn = sqlite3.connect(":memory:") >>> cursor = conn.execute(""" ... with recursive counter(x) as ( ... select 0 ... union ... select x + 1 from counter ... ) ... select * from counter""") >>> cursor.fetchmany(10) [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)] ``` `counter` there is an infinitely long table ([see TIL](https://til.simonwillison.net/sqlite/simple-recursive-cte)) - but we can retrieve the first 10 results without going into an infinite loop.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258849766	https://api.github.com/repos/simonw/datasette/issues/526	1258849766	IC_kwDOBm6k_c5LCIXm	536941	2022-09-27T01:27:03Z	2022-09-27T01:27:03Z	CONTRIBUTOR	i agree with that concern! but if i'm understanding the code correctly, `maximum_returned_rows` does not protect against long-running queries in any way.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258846992	https://api.github.com/repos/simonw/datasette/issues/526	1258846992	IC_kwDOBm6k_c5LCHsQ	9599	2022-09-27T01:21:41Z	2022-09-27T01:21:41Z	OWNER	My main concern here is that public Datasette instances could easily have all of their available database connections consumed by long-running queries - either accidentally or deliberately. I do totally understand the need for this feature though. I think it can absolutely make sense provided it's protected by authentication and permissions. Maybe even limit the number of concurrent downloads at once such that there's always at least one database connection free for other requests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/pull/1823#issuecomment-1258828705	https://api.github.com/repos/simonw/datasette/issues/1823	1258828705	IC_kwDOBm6k_c5LCDOh	9599	2022-09-27T00:45:46Z	2022-09-27T00:45:46Z	OWNER	Also need to do a bit more of an audit to see if there is anywhere else that this style should be applied.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386917344
https://github.com/simonw/datasette/pull/1823#issuecomment-1258828509	https://api.github.com/repos/simonw/datasette/issues/1823	1258828509	IC_kwDOBm6k_c5LCDLd	9599	2022-09-27T00:45:26Z	2022-09-27T00:45:26Z	OWNER	I should update the documentation to reflect this change.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386917344
https://github.com/simonw/datasette/issues/1822#issuecomment-1258827688	https://api.github.com/repos/simonw/datasette/issues/1822	1258827688	IC_kwDOBm6k_c5LCC-o	9599	2022-09-27T00:44:04Z	2022-09-27T00:44:04Z	OWNER	I'll do this in a PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386854246
https://github.com/simonw/datasette/issues/1817#issuecomment-1258818028	https://api.github.com/repos/simonw/datasette/issues/1817	1258818028	IC_kwDOBm6k_c5LCAns	9599	2022-09-27T00:27:53Z	2022-09-27T00:27:53Z	OWNER	Made a start on this: ```diff diff --git a/datasette/hookspecs.py b/datasette/hookspecs.py index 34e19664..fe0971e5 100644 --- a/datasette/hookspecs.py +++ b/datasette/hookspecs.py @@ -31,25 +31,29 @@ def prepare_jinja2_environment(env, datasette): @hookspec -def extra_css_urls(template, database, table, columns, view_name, request, datasette): +def extra_css_urls( + template, database, table, columns, sql, params, view_name, request, datasette +): """Extra CSS URLs added by this plugin""" @hookspec -def extra_js_urls(template, database, table, columns, view_name, request, datasette): +def extra_js_urls( + template, database, table, columns, sql, params, view_name, request, datasette +): """Extra JavaScript URLs added by this plugin""" @hookspec def extra_body_script( - template, database, table, columns, view_name, request, datasette + template, database, table, columns, sql, params, view_name, request, datasette ): """Extra JavaScript code to be included in <script> at bottom of body""" @hookspec def extra_template_vars( - template, database, table, columns, view_name, request, datasette + template, database, table, columns, sql, params, view_name, request, datasette ): """Extra template variables to be made available to the template - can return dict or callable or awaitable""" ``` ```diff diff --git a/datasette/app.py b/datasette/app.py index 03d1dacc..2f3a46fe 100644 --- a/datasette/app.py +++ b/datasette/app.py @@ -1036,7 +1036,9 @@ class Datasette: return await template.render_async(template_context) - async def _asset_urls(self, key, template, context, request, view_name): + async def _asset_urls( + self, key, template, context, request, view_name, sql, params + ): # Flatten list-of-lists from plugins: seen_urls = set() collected = [] @@ -1045,6 +1047,8 @@ class Datasette: database=context.get("database"), …	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/pull/1820#issuecomment-1258803261	https://api.github.com/repos/simonw/datasette/issues/1820	1258803261	IC_kwDOBm6k_c5LB9A9	536941	2022-09-27T00:03:09Z	2022-09-27T00:03:09Z	CONTRIBUTOR	the pattern in this PR `max_returned_rows` control the maximum rows rendered through html and json, and the csv render bypasses that. i think it would be better to have each of these different query renderers have more direct control for how many rows to fetch, instead of relying on the internals of the `execute` method. generally, users will not want to paginate through tens of thousands of results, but often will want to download a full query as json or as csv.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386456717
https://github.com/simonw/datasette/issues/1822#issuecomment-1258760299	https://api.github.com/repos/simonw/datasette/issues/1822	1258760299	IC_kwDOBm6k_c5LByhr	9599	2022-09-26T23:25:12Z	2022-09-26T23:25:55Z	OWNER	A start: ```diff diff --git a/datasette/utils/asgi.py b/datasette/utils/asgi.py index 8a2fa060..41ade961 100644 --- a/datasette/utils/asgi.py +++ b/datasette/utils/asgi.py @@ -118,7 +118,7 @@ class Request: return dict(parse_qsl(body.decode("utf-8"), keep_blank_values=True)) @classmethod - def fake(cls, path_with_query_string, method="GET", scheme="http", url_vars=None): + def fake(cls, path_with_query_string, , method="GET", scheme="http", url_vars=None): """Useful for constructing Request objects for tests""" path, _, query_string = path_with_query_string.partition("?") scope = { @@ -204,7 +204,7 @@ class AsgiWriter: ) -async def asgi_send_json(send, info, status=200, headers=None): +async def asgi_send_json(send, info, , status=200, headers=None): headers = headers or {} await asgi_send( send, @@ -215,7 +215,7 @@ async def asgi_send_json(send, info, status=200, headers=None): ) -async def asgi_send_html(send, html, status=200, headers=None): +async def asgi_send_html(send, html, , status=200, headers=None): headers = headers or {} await asgi_send( send, @@ -226,7 +226,7 @@ async def asgi_send_html(send, html, status=200, headers=None): ) -async def asgi_send_redirect(send, location, status=302): +async def asgi_send_redirect(send, location, , status=302): await asgi_send( send, "", @@ -236,12 +236,12 @@ async def asgi_send_redirect(send, location, status=302): ) -async def asgi_send(send, content, status, headers=None, content_type="text/plain"): +async def asgi_send(send, content, status, , headers=None, content_type="text/plain"): await asgi_start(send, status, headers, content_type) await send({"type": "http.response.body", "body": content.encode("utf-8")}) -async def asgi_start(send, status, headers=None, content_type="text/plain"): +async def asgi_start(send, status, , headers=None, con…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386854246
https://github.com/simonw/datasette/issues/1822#issuecomment-1258757544	https://api.github.com/repos/simonw/datasette/issues/1822	1258757544	IC_kwDOBm6k_c5LBx2o	9599	2022-09-26T23:21:23Z	2022-09-26T23:21:23Z	OWNER	Everything on https://docs.datasette.io/en/stable/internals.html that uses keyword arguments should do this I think.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386854246
https://github.com/simonw/datasette/issues/1817#issuecomment-1258756231	https://api.github.com/repos/simonw/datasette/issues/1817	1258756231	IC_kwDOBm6k_c5LBxiH	9599	2022-09-26T23:19:34Z	2022-09-26T23:19:34Z	OWNER	This is a good idea - it's something I should do before Datasette 1.0. I was a tiny bit worried about compatibility (Datasette is 3.7+) but it looks like they have been in Python since 3.0!	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/issues/1819#issuecomment-1258754105	https://api.github.com/repos/simonw/datasette/issues/1819	1258754105	IC_kwDOBm6k_c5LBxA5	9599	2022-09-26T23:16:15Z	2022-09-26T23:16:15Z	OWNER	Demo: https://latest.datasette.io/_memory?sql=with+recursive+counter(x)+as+(%0D%0A++select+0%0D%0A++++union%0D%0A++select+x+%2B+1+from+counter%0D%0A)%2C%0D%0Ablah+as+(select++from+counter+limit+5000000)%0D%0Aselect+count()+from+blah	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1385026210
https://github.com/simonw/datasette/issues/1819#issuecomment-1258746600	https://api.github.com/repos/simonw/datasette/issues/1819	1258746600	IC_kwDOBm6k_c5LBvLo	9599	2022-09-26T23:05:40Z	2022-09-26T23:05:40Z	OWNER	Implementing it like this, so at least you can copy and paste the SQL query back out again: <img width="796" alt="image" src="https://user-images.githubusercontent.com/9599/192395953-48512c94-10e0-4cf8-8ae5-b9e65e3d7b0f.png"> I'm not doing a full textarea because this error can be raised in multiple places, including on the table page itself. It's not just an error associated with the manual query page.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1385026210
https://github.com/simonw/datasette/issues/1818#issuecomment-1258738740	https://api.github.com/repos/simonw/datasette/issues/1818	1258738740	IC_kwDOBm6k_c5LBtQ0	5363	2022-09-26T22:52:45Z	2022-09-26T22:55:57Z	NONE	thoughts on order of precedence to use: * sqlite-utils count, if present. closest thing to a standard i guess. * row(max_id) if like, the first and/or last x amount of rows ids are all contiguous. kind of a cheap/dumb/imperfect heuristic to see if the table is dump/not dump. if the check passes, still stick on `est.` after the display. * count(*) if enabled in datasette	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384549993
https://github.com/simonw/datasette/issues/1819#issuecomment-1258738435	https://api.github.com/repos/simonw/datasette/issues/1819	1258738435	IC_kwDOBm6k_c5LBtMD	9599	2022-09-26T22:52:19Z	2022-09-26T22:52:19Z	OWNER	This is a good idea.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1385026210
https://github.com/simonw/datasette/issues/1818#issuecomment-1258735747	https://api.github.com/repos/simonw/datasette/issues/1818	1258735747	IC_kwDOBm6k_c5LBsiD	9599	2022-09-26T22:47:59Z	2022-09-26T22:47:59Z	OWNER	Another option here is to tie into a feature I built in `sqlite-utils` with this problem in mind but never introduced on the Datasette side of things: https://sqlite-utils.datasette.io/en/stable/python-api.html#cached-table-counts-using-triggers	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384549993
https://github.com/simonw/datasette/issues/1818#issuecomment-1258735283	https://api.github.com/repos/simonw/datasette/issues/1818	1258735283	IC_kwDOBm6k_c5LBsaz	9599	2022-09-26T22:47:19Z	2022-09-26T22:47:19Z	OWNER	That's a really interesting idea: for a lot of databases (those made out of straight imports from CSV) `max(rowid)` would indeed reflect the size of the table, but would be a MUCH faster operation than attempting a `count(*)`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384549993
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1258712931	IC_kwDOCGYnMM5LBm9j	25778	2022-09-26T22:31:58Z	2022-09-26T22:31:58Z	CONTRIBUTOR	Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258697384	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1258697384	IC_kwDOCGYnMM5LBjKo	9599	2022-09-26T22:12:45Z	2022-09-26T22:12:45Z	OWNER	That feels like a slightly different command to me - maybe `sqlite-utils backup data.db data-backup.db`? It doesn't have any of the mechanics for merging tables together. Could be a useful feature separately though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/datasette/issues/1821#issuecomment-1258692555	https://api.github.com/repos/simonw/datasette/issues/1821	1258692555	IC_kwDOBm6k_c5LBh_L	9599	2022-09-26T22:06:39Z	2022-09-26T22:06:39Z	OWNER	- https://github.com/simonw/datasette/actions/runs/3131344150 - https://github.com/simonw/datasette/releases/tag/0.63a0 - https://pypi.org/project/datasette/0.63a0/	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386734383
https://github.com/simonw/sqlite-utils/issues/494#issuecomment-1258521333	https://api.github.com/repos/simonw/sqlite-utils/issues/494	1258521333	IC_kwDOCGYnMM5LA4L1	9599	2022-09-26T19:32:36Z	2022-09-26T19:32:36Z	OWNER	Tweeted about it too: https://twitter.com/simonw/status/1574481628507668480	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386593843
https://github.com/simonw/sqlite-utils/issues/494#issuecomment-1258516872	https://api.github.com/repos/simonw/sqlite-utils/issues/494	1258516872	IC_kwDOCGYnMM5LA3GI	9599	2022-09-26T19:28:36Z	2022-09-26T19:28:36Z	OWNER	New documentation: https://sqlite-utils.datasette.io/en/latest/contributing.html#using-just-and-pipenv	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386593843
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1258508215	IC_kwDOCGYnMM5LA0-3	25778	2022-09-26T19:22:14Z	2022-09-26T19:22:14Z	CONTRIBUTOR	This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/sqlite-utils/issues/483#issuecomment-1258479462	https://api.github.com/repos/simonw/sqlite-utils/issues/483	1258479462	IC_kwDOCGYnMM5LAt9m	9599	2022-09-26T19:04:29Z	2022-09-26T19:04:43Z	OWNER	Documentation: - https://sqlite-utils.datasette.io/en/latest/cli.html#cli-install - https://sqlite-utils.datasette.io/en/latest/cli.html#cli-uninstall - https://sqlite-utils.datasette.io/en/latest/cli-reference.html#install - https://sqlite-utils.datasette.io/en/latest/cli-reference.html#uninstall	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363765916
https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1258476455	https://api.github.com/repos/simonw/sqlite-utils/issues/493	1258476455	IC_kwDOCGYnMM5LAtOn	9599	2022-09-26T19:01:49Z	2022-09-26T19:01:49Z	OWNER	I tried the tips in https://stackoverflow.com/questions/15258831/how-to-handle-two-dashes-in-rest (not the settings change though, because I might want smart quotes elsewhere) and they didn't work. Maybe I should disable smart quotes entirely? I feel like there should be an escaping trick that works here though. I tried `insert -\\-convert` but it didn't help.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386562662
https://github.com/simonw/sqlite-utils/issues/483#issuecomment-1258451968	https://api.github.com/repos/simonw/sqlite-utils/issues/483	1258451968	IC_kwDOCGYnMM5LAnQA	9599	2022-09-26T18:37:54Z	2022-09-26T18:40:41Z	OWNER	The implementation of this can be an almost exact copy of Datasette's, which was added in this commit: https://github.com/simonw/datasette/commit/01fe5b740171bfaea3752fc5754431dac53777e3 Current code for that is here: https://github.com/simonw/datasette/blob/0.62/datasette/cli.py#L319-L340 - which is improved to use the `from runpy import run_module` function.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1363765916
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258450447	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1258450447	IC_kwDOCGYnMM5LAm4P	9599	2022-09-26T18:36:23Z	2022-09-26T18:36:23Z	OWNER	This is also the kind of feature that would need to express itself in both the Python library and the CLI utility.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258449887	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1258449887	IC_kwDOCGYnMM5LAmvf	9599	2022-09-26T18:35:50Z	2022-09-26T18:35:50Z	OWNER	This is a really interesting idea. I'm nervous about needing to set the rules for how duplicate tables should be merged though. This feels like a complex topic - one where there isn't necessarily an obviously "correct" way of doing it, but where different problems that people are solving might need different merging approaches. Likewise, merging isn't just a database-to-database thing at that point - I could see a need for merging two tables using similar rules to those used for merging two databases. So I think I'd want to have some good concrete use-cases in mind before trying to design how something like this should work. Will leave this thread open for people to drop those in!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/sqlite-utils/issues/492#issuecomment-1258446128	https://api.github.com/repos/simonw/sqlite-utils/issues/492	1258446128	IC_kwDOCGYnMM5LAl0w	9599	2022-09-26T18:32:14Z	2022-09-26T18:33:19Z	OWNER	This idea would make more sense if there was a good mechanism to say "run the conversion script held in this file" as opposed to passing it as an option. This would also make having to remember bash escaping rules ([see tip](https://til.simonwillison.net/zsh/argument-heredoc)) much easier! `shot-scraper` has that for `--javascript`, using the `--input` option: https://shot-scraper.datasette.io/en/stable/javascript.html#shot-scraper-javascript-help Maybe `--convert-script` would work here? Or `--convert-file`? It should accept `-` for stdin too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1386530156
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1258437060	https://api.github.com/repos/simonw/sqlite-utils/issues/490	1258437060	IC_kwDOCGYnMM5LAjnE	9599	2022-09-26T18:24:44Z	2022-09-26T18:24:44Z	OWNER	Just saw your great write-up on this: https://jeqo.github.io/notes/2022-09-24-ingest-logs-sqlite/	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	1382457780
https://github.com/simonw/datasette/issues/526#issuecomment-1258337011	https://api.github.com/repos/simonw/datasette/issues/526	1258337011	IC_kwDOBm6k_c5LALLz	536941	2022-09-26T16:49:48Z	2022-09-26T16:49:48Z	CONTRIBUTOR	i think the smallest change that gets close to what i want is to change the behavior so that `max_returned_rows` is not applied in the `execute` method when we are are asking for a csv of query. there are some infelicities for that approach, but i'll make a PR to make it easier to discuss.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/526#issuecomment-1258167564	https://api.github.com/repos/simonw/datasette/issues/526	1258167564	IC_kwDOBm6k_c5K_h0M	536941	2022-09-26T14:57:44Z	2022-09-26T15:08:36Z	CONTRIBUTOR	reading the database execute method i have a few questions. https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/database.py#L229-L242 --- unless i'm missing something (which is very likely!!), the `max_returned_rows` argument doesn't actually offer any protections against running very expensive queries. It's not like adding a `LIMIT max_rows` argument. it make sense that it isn't because, the query could already have an `LIMIT` argument. Doing something like `select * from (query) limit {max_returned_rows}` might be protective but wouldn't always. Instead the code executes the full original query, and if still has time it fetches out the first `max_rows + 1` rows. this does offer some protection of memory exhaustion, as you won't hydrate a huge result set into python (however, there are [data flow patterns](https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113) that could avoid that too) given the current architecture, i don't see how creating a new connection would be use? --- If we just removed the `max_return_rows` limitation, then i think most things would be fine except for the QueryViews. Right now rendering, just [5000 rows takes a lot of client-side memory](https://github.com/simonw/datasette/issues/1655) so some form of pagination would be required.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	459882902
https://github.com/simonw/datasette/issues/1655#issuecomment-1258166572	https://api.github.com/repos/simonw/datasette/issues/1655	1258166572	IC_kwDOBm6k_c5K_hks	536941	2022-09-26T14:57:04Z	2022-09-26T14:57:04Z	CONTRIBUTOR	I think that paginating, even in javascript, could be very helpful. Maybe render json or csv into the page and let javascript loading that into the dom?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1163369515
https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113	https://api.github.com/repos/simonw/datasette/issues/1727	1258129113	IC_kwDOBm6k_c5K_YbZ	536941	2022-09-26T14:30:11Z	2022-09-26T14:48:31Z	CONTRIBUTOR	from your analysis, it seems like the GIL is blocking on loading of the data from sqlite to python, (particularly in the `fetchmany` call) this is probably a simplistic idea, but what if you had the python code in the `execute` method iterate over the cursor and yield out rows or small chunks of rows. something like: ```python with sqlite_timelimit(conn, time_limit_ms): try: cursor = conn.cursor() cursor.execute(sql, params if params is not None else {}) except: ... max_returned_rows = self.ds.max_returned_rows if max_returned_rows == page_size: max_returned_rows += 1 if max_returned_rows and truncate: for i, row in enumerate(cursor): yield row if i == max_returned_rows - 1: break else: for row in cursor: yield row truncated = False ``` this kind of thing works well with a postgres server side cursor, but i'm not sure if it will hold for sqlite. you would still spend about the same amount of time in python and would be contending for the gil, but it would be could be non blocking. depending on the data flow, this could also some benefit for memory. (data stays in more compact sqlite-land until you need it)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1217759117
https://github.com/simonw/datasette/issues/1818#issuecomment-1257290709	https://api.github.com/repos/simonw/datasette/issues/1818	1257290709	IC_kwDOBm6k_c5K8LvV	5363	2022-09-25T22:17:06Z	2022-09-25T22:17:06Z	NONE	I wonder if having an option for displaying the max row id might help too. Not accurate especially if something was deleted, but useful for DBs as a dump.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384549993
https://github.com/simonw/sqlite-utils/issues/358#issuecomment-1257136801	https://api.github.com/repos/simonw/sqlite-utils/issues/358	1257136801	IC_kwDOCGYnMM5K7mKh	11597658	2022-09-25T07:15:07Z	2022-09-25T07:15:59Z	NONE	HI Simon, looks good, I noticed you wanted to use a regex to detect it, you might be interested in [github.com/iafisher/sqliteparser](https://github.com/iafisher/sqliteparser) which creates an ast of the create table statement, not every option supported yet but i forked it and am adding all the possible options in a create table (and create index) statement.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1082651698
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1257072258	https://api.github.com/repos/simonw/sqlite-utils/issues/490	1257072258	IC_kwDOCGYnMM5K7WaC	6180701	2022-09-24T22:01:05Z	2022-09-24T22:01:05Z	NONE	For completeness, the regex requires a bit more dark magic to capture the following lines, here is a _working_ expression: https://regex101.com/r/rsuEcs/1 ``` sqlite-utils insert /tmp/log.db log multiline.log --text --convert " import re r = re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})(?:\:\s)(?P<log>(.\s\s.\|.*)+)', re.MULTILINE) def convert(text): return [m.groupdict() for m in r.finditer(text)] " ``` ``` BEGIN TRANSACTION; CREATE TABLE [log] ( [datetime] TEXT, [log] TEXT ); INSERT INTO "log" VALUES('2022-03-01T12:04:52','Here is a log message that spans multiple lines'); INSERT INTO "log" VALUES('2022-03-01T12:04:52','This is a single line'); INSERT INTO "log" VALUES('2022-03-01T12:04:52','Here is another message that spans multiple lines'); COMMIT; ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1382457780
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1257063174	https://api.github.com/repos/simonw/sqlite-utils/issues/490	1257063174	IC_kwDOCGYnMM5K7UMG	6180701	2022-09-24T20:50:15Z	2022-09-24T20:50:15Z	NONE	🤯 this is beautiful. Thanks @simonw !	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1382457780
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1256858763	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1256858763	IC_kwDOCGYnMM5K6iSL	7908073	2022-09-24T04:50:59Z	2022-09-24T04:52:08Z	CONTRIBUTOR	Instead of outputting binary data to stdout the interface might be better like this ``` sqlite-utils merge animals.db cats.db dogs.db ``` similar to `zip`, `ogr2ogr`, etc Actually I think this might already be possible within `ogr2ogr`. I don't believe spatial data is a requirement though it might add an `ogc_id` column or something ``` cp cats.db animals.db ogr2ogr -append animals.db dogs.db ogr2ogr -append animals.db another.db ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/datasette/issues/1817#issuecomment-1256781274	https://api.github.com/repos/simonw/datasette/issues/1817	1256781274	IC_kwDOBm6k_c5K6PXa	50527	2022-09-23T22:59:46Z	2022-09-23T22:59:46Z	CONTRIBUTOR	While you are adding features, would you be future-proofing your APIs if you switched over some arguments over to keyword-only arguments or would that be too disruptive? Thinking out loud: ``` async def render_template( self, templates, *, context=None, plugin_context=None, request=None, view_name=None ): ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/issues/1817#issuecomment-1256662785	https://api.github.com/repos/simonw/datasette/issues/1817	1256662785	IC_kwDOBm6k_c5K5ycB	9599	2022-09-23T20:53:21Z	2022-09-23T20:53:21Z	OWNER	Maybe the signature for that method should be: ```python async def render_template( self, templates, context=None, plugin_context=None, request=None, view_name=None ): ``` Where `plugin_context` is a special dictionary of values that can be passed through to plugin hooks that accept them - so `database`, `table`, `columns`, `sql` and `params`. Those would then be passed when specific views call `render_template()` - which they currently do via calling `BaseView.render(...)`, but actually the views that are used for tables and queries don't even call that directly due to the weird designed used with `DataView` subclasses that implement a `.data()` method. So yet another change that's blocked on fixing that long-running weird piece of technical debt: - https://github.com/simonw/datasette/issues/1518	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/issues/1817#issuecomment-1256659788	https://api.github.com/repos/simonw/datasette/issues/1817	1256659788	IC_kwDOBm6k_c5K5xtM	9599	2022-09-23T20:49:22Z	2022-09-23T20:49:22Z	OWNER	Implementation challenge: all four of those hooks are called inside the `datasette.render_template()` method, which has this signature: https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/app.py#L945-L947 So I would have to pull the `sql` and `params` variables out of the `context` since they are not being passed to that method. OR I could teach that method to take those as optional arguments. Might be an opportunity to clean up this hack: https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/app.py#L959-L964	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/issues/1817#issuecomment-1256652548	https://api.github.com/repos/simonw/datasette/issues/1817	1256652548	IC_kwDOBm6k_c5K5v8E	9599	2022-09-23T20:41:32Z	2022-09-23T20:41:32Z	OWNER	Which plugin hooks should take `sql` and `params`? - [extra_template_vars(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-template-vars-template-database-table-columns-view-name-request-datasette) - [extra_css_urls(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-css-urls-template-database-table-columns-view-name-request-datasette) - [extra_js_urls(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-js-urls-template-database-table-columns-view-name-request-datasette) - [extra_body_script(template, database, table, columns, view_name, request, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#extra-body-script-template-database-table-columns-view-name-request-datasette) And maybe these: - [render_cell(row, value, column, table, database, datasette)](https://docs.datasette.io/en/0.62/plugin_hooks.html#render-cell-row-value-column-table-database-datasette) - [table_actions(datasette, actor, database, table, request)](https://docs.datasette.io/en/0.62/plugin_hooks.html#table-actions-datasette-actor-database-table-request) I'll start by implementing the first set, then I'll think further about those "maybes".	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/datasette/issues/1817#issuecomment-1256650449	https://api.github.com/repos/simonw/datasette/issues/1817	1256650449	IC_kwDOBm6k_c5K5vbR	9599	2022-09-23T20:38:53Z	2022-09-23T20:38:53Z	OWNER	I've wanted something like this in the past too. I think the thing to do here might be to add `sql` and `params` arguments to a bunch of the plugin hooks, such that they can see the main query that is being used on the page that they are helping to render. While I'm working on this: https://docs.datasette.io/en/0.62/plugin_hooks.html#register-output-renderer-datasette output renderer functions take `sql` but do not currently take `params` - they should also take `params`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1384273985
https://github.com/simonw/sqlite-utils/issues/490#issuecomment-1256428818	https://api.github.com/repos/simonw/sqlite-utils/issues/490	1256428818	IC_kwDOCGYnMM5K45US	9599	2022-09-23T16:37:58Z	2022-09-23T16:38:35Z	OWNER	It should be possible to achieve this with the `--text` option: https://sqlite-utils.datasette.io/en/stable/cli.html?highlight=text#convert-with-text Given an example like this in `multiline.log`: ``` 2022-03-01T12:04:52: Here is a log message that spans multiple lines 2022-03-01T12:04:52: This is a single line 2022-03-01T12:04:52: Here is another message that spans multiple lines ``` You should be able to run something like this: ``` sqlite-utils insert /tmp/log.db log multiline.log --text --convert " import re r = re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}):(?P<log>.*)', re.MULTILINE) def convert(text): return [m.groupdict() for m in r.finditer(text)] " ``` After running this I get: ``` sqlite-utils rows /tmp/log.db log [{"datetime": "2022-03-01T12:04:52", "log": " Here is a log message"}, {"datetime": "2022-03-01T12:04:52", "log": " This is a single line"}, {"datetime": "2022-03-01T12:04:52", "log": " Here is another message"}] ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1382457780
https://github.com/simonw/datasette/issues/1415#issuecomment-1255603780	https://api.github.com/repos/simonw/datasette/issues/1415	1255603780	IC_kwDOBm6k_c5K1v5E	17532695	2022-09-22T22:06:10Z	2022-09-22T22:06:10Z	NONE	This would be great! I just went through the process of figuring out the minimum permissions for a service account to run `datasette publish cloudrun` for [PUDL](https://github.com/catalyst-cooperative/pudl)'s [datasette deployment](https://data.catalyst.coop/). These are the roles I gave the service account (disclaim: I'm not sure these are the minimum permissions): - Cloud Build Service Account: The SA needs this role to publish the build on Cloud Build. - Cloud Run Admin for the Cloud Run datasette service so the SA can deploy the build. - I gave the SA the Storage Admin role on the bucket Cloud Build creates to store the build tar files. - The Viewer Role is [required for storing build logs in the default bucket](https://cloud.google.com/build/docs/running-builds/submit-build-via-cli-api#permissions). More on this below! The Viewer Role is a Basic IAM role that [Google does not recommend using](https://cloud.google.com/build/docs/running-builds/submit-build-via-cli-api#permissions): > Caution: Basic roles include thousands of permissions across all Google Cloud services. In production environments, do not grant basic roles unless there is no alternative. Instead, grant the most limited [predefined roles](https://cloud.google.com/iam/docs/understanding-roles#predefined_roles) or [custom roles](https://cloud.google.com/iam/docs/understanding-custom-roles) that meet your needs. If you don't grant the Viewer role the `gcloud builds submit` command will successfully create a build but returns exit code 1, preventing the script from getting to the cloud run step: ``` ERROR: (gcloud.builds.submit) The build is running, and logs are being written to the default logs bucket. This tool can only stream logs if you are Viewer/Owner of the project and, if applicable, allowed by your VPC-SC security policy. The default logs bucket is always outside any VPC-SC security perimeter. If you want your logs saved inside your VPC-SC perimeter, use your own bucket. See https://cloud.google.com/build/docs…	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 }	959137143

github

Custom SQL query returning 101 rows (hide)

Query parameters