github

Custom SQL query returning 101 rows (hide)

html_urlissue_urlidnode_idusercreated_atupdated_atauthor_associationbodyreactionsissueperformed_via_github_app
https://github.com/simonw/datasette/issues/766#issuecomment-791509910 https://api.github.com/repos/simonw/datasette/issues/766 791509910 MDEyOklzc3VlQ29tbWVudDc5MTUwOTkxMA== 6371750 2021-03-05T15:57:35Z 2021-03-05T16:35:21Z NONE Hello, I have the same wildcards search problems with an instance of Datasette. http://crbc-dataset.huma-num.fr/inventaires/fonds_auguste_dupouy_1872_1967?_search=gwerz&_sort=rowid is OK but http://crbc-dataset.huma-num.fr/inventaires/fonds_auguste_dupouy_1872_1967?_search=gwe* is not (FTS is activated on "Reference" "IntituleAnalyse" "NomDuProducteur" "PresentationDuContenu" "Notes"). Notice that a SQL query as below launched directly from SQLite in the server's shell, retrieves results. `select * from fonds_auguste_dupouy_1872_1967_fts where IntituleAnalyse MATCH "gwe*";` Thanks,
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
617323873  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791530093 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 791530093 MDEyOklzc3VlQ29tbWVudDc5MTUzMDA5Mw== 306240 2021-03-05T16:28:07Z 2021-03-05T16:28:07Z NONE > I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. > > Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once? @maxhawkins a limitation of the python mbox module is it loads the entire mbox into memory. I did find another approach to this problem that didn't use the builtin python mbox module and created a generator so that it didn't have to load the whole mbox into memory. I was hoping to use standard library modules, but this might be a good reason to investigate that approach a bit more. My worry is making sure a custom processor handles all the ins and outs of the mbox format correctly. Hm. As I'm writing this, I thought of something. I think I can parse each message one at a time, and then use an mbox function to load each message using the python mbox module. That way the mbox module can still deal with the specifics of the mbox format, but I can use a generator. I'll give that a try. Thanks for the feedback @maxhawkins and @simonw. I'll give that a try. @simonw can we hold off on merging this until I can test this new approach?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791089881 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 791089881 MDEyOklzc3VlQ29tbWVudDc5MTA4OTg4MQ== 28565 2021-03-05T02:03:19Z 2021-03-05T02:03:19Z NONE I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/dogsheep-photos/issues/32#issuecomment-791053721 https://api.github.com/repos/dogsheep/dogsheep-photos/issues/32 791053721 MDEyOklzc3VlQ29tbWVudDc5MTA1MzcyMQ== 6213 2021-03-05T00:31:27Z 2021-03-05T00:31:27Z NONE I am getting the same thing for US West (N. California) us-west-1
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
803333769  
https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-790934616 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4 790934616 MDEyOklzc3VlQ29tbWVudDc5MDkzNDYxNg== 203343 2021-03-04T20:54:44Z 2021-03-04T20:54:44Z NONE Sorry for the delay, I got sidetracked after class last night. I am getting the following error: ``` /content# google-takeout-to-sqlite mbox takeout.db Takeout/Mail/gmail.mbox Usage: google-takeout-to-sqlite [OPTIONS] COMMAND [ARGS]...Try 'google-takeout-to-sqlite --help' for help. Error: No such command 'mbox'. ``` On the box, I installed with pip after cloning: https://github.com/UtahDave/google-takeout-to-sqlite.git
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
778380836  
https://github.com/simonw/datasette/issues/1238#issuecomment-790857004 https://api.github.com/repos/simonw/datasette/issues/1238 790857004 MDEyOklzc3VlQ29tbWVudDc5MDg1NzAwNA== 79913 2021-03-04T19:06:55Z 2021-03-04T19:06:55Z NONE @rgieseke Ah, that's super helpful. Thank you for the workaround for now!
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813899472  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790695126 MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg== 9599 2021-03-04T15:20:42Z 2021-03-04T15:20:42Z MEMBER I'm not sure why but my most recent import, when displayed in Datasette, looks like this: <img width="574" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png"> Sorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790693674 MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA== 9599 2021-03-04T15:18:36Z 2021-03-04T15:18:36Z MEMBER I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790669767 MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw== 9599 2021-03-04T14:46:06Z 2021-03-04T14:46:06Z MEMBER Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790668263 MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw== 9599 2021-03-04T14:43:58Z 2021-03-04T14:43:58Z MEMBER I added this code to output a message ID on errors: ```diff print("Errors: {}".format(num_errors)) print(traceback.format_exc()) + print("Message-Id: {}".format(email.get("Message-Id", "None"))) continue ``` Having found a message ID that had an error, I ran this command to see the context: rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox This was for the following error: ``` File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 102, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 178, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` Here's what I spotted in the `ripgrep` output: ``` 177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI> 177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) 177133572-X-Mailer: IncrediMail (5002253) ``` So it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)` string.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790391711 MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ== 306240 2021-03-04T07:36:24Z 2021-03-04T07:36:24Z NONE > Looks like you're doing this: > > ```python > elif message.get_content_type() == "text/plain": > body = message.get_payload(decode=True) > ``` > > So presumably that decodes to a unicode string? > > I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the `message.get_payload()` is being deprecated in favor of `message.get_content()` or something like that. I'll see if that handles the decoding better, too. Thanks for the feedback. I should have time tomorrow to put together some improvements.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790389335 MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ== 306240 2021-03-04T07:32:04Z 2021-03-04T07:32:04Z NONE > The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: > > https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 > > I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long. The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6 790384087 MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw== 9599 2021-03-04T07:22:51Z 2021-03-04T07:22:51Z MEMBER #3 also mentions the conflicting version with other tools.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
821841046  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790380839 MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ== 9599 2021-03-04T07:17:05Z 2021-03-04T07:17:05Z MEMBER Looks like you're doing this: ```python elif message.get_content_type() == "text/plain": body = message.get_payload(decode=True) ``` So presumably that decodes to a unicode string? I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790379629 MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ== 9599 2021-03-04T07:14:41Z 2021-03-04T07:14:41Z MEMBER Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790378658 MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA== 9599 2021-03-04T07:12:48Z 2021-03-04T07:12:48Z MEMBER It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this: <img width="1650" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924808-b4b96980-7c75-11eb-8c9e-307f2ae32d5a.png"> If I `datasette install datasette-render-binary` and then try again I get this: <img width="1487" alt="mbox__mbox_emails__753_446_rows" src="https://user-images.githubusercontent.com/9599/109924944-ea5e5280-7c75-11eb-9a32-404f3d68455f.png"> It would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790373024 MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA== 9599 2021-03-04T07:01:58Z 2021-03-04T07:04:06Z MEMBER I got 9 warnings that look like this: ``` Errors: 1 Traceback (most recent call last): File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 103, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 167, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790372621 MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ== 9599 2021-03-04T07:01:18Z 2021-03-04T07:01:18Z MEMBER I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file. https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens. It can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790370485 MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ== 9599 2021-03-04T06:57:25Z 2021-03-04T06:57:48Z MEMBER The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790369076 MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng== 9599 2021-03-04T06:54:46Z 2021-03-04T06:54:46Z MEMBER The Rich-powered progress bar is pretty: ![rich](https://user-images.githubusercontent.com/9599/109923307-71f69200-7c73-11eb-9ee2-8f0a240f3994.gif)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 790312268 MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA== 9599 2021-03-04T05:48:16Z 2021-03-04T05:48:16Z MEMBER Wow, my mbox is a 10.35 GB download!
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/simonw/datasette/pull/1243#issuecomment-790311215 https://api.github.com/repos/simonw/datasette/issues/1243 790311215 MDEyOklzc3VlQ29tbWVudDc5MDMxMTIxNQ== 9599 2021-03-04T05:45:57Z 2021-03-04T05:45:57Z OWNER Thanks!
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
815955014  
https://github.com/simonw/datasette/issues/268#issuecomment-790257263 https://api.github.com/repos/simonw/datasette/issues/268 790257263 MDEyOklzc3VlQ29tbWVudDc5MDI1NzI2Mw== 649467 2021-03-04T03:20:23Z 2021-03-04T03:20:23Z NONE It's kind of an ugly hack, but you can try out what using the fts5 table as an actual datasette-accessible table looks like without changing any datasette code by creating yet another view on top of the fts5 table: `create view proxyview as select *, rank, table_fts as fts from table_fts;` That's now visible from datasette, just like any other view, but you can use `fts match escape_fts(search_string) order by rank`. This is only good as a proof of concept because you're inefficiently going from view -> fts5 external content table -> view -> data table. However, it does show it works.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
323718842  
https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-790198930 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4 790198930 MDEyOklzc3VlQ29tbWVudDc5MDE5ODkzMA== 203343 2021-03-04T00:58:40Z 2021-03-04T00:58:40Z NONE I am just seeing this sorry, yes! I will kick the tires later on tonight. My apologies for the delay.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
778380836  
https://github.com/simonw/datasette/issues/283#issuecomment-789680230 https://api.github.com/repos/simonw/datasette/issues/283 789680230 MDEyOklzc3VlQ29tbWVudDc4OTY4MDIzMA== 605492 2021-03-03T12:28:42Z 2021-03-03T12:28:42Z NONE One note on using this pragma I got an error on starting datasette `no such table: pragma_database_list`. I diagnosed this to an older version of sqlite3 (3.14.2) and upgrading to a newer version (3.34.2) fixed the issue.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
325958506  
https://github.com/simonw/datasette/issues/268#issuecomment-789409126 https://api.github.com/repos/simonw/datasette/issues/268 789409126 MDEyOklzc3VlQ29tbWVudDc4OTQwOTEyNg== 649467 2021-03-03T03:57:15Z 2021-03-03T03:58:40Z NONE In FTS5, I think doing an FTS search is actually much easier than doing a join against the main table like datasette does now. In fact, FTS5 external content tables provide a transparent interface back to the original table or view. Here's what I'm currently doing: * build a view that joins whatever tables I want and rename the columns to non-joiny names (e.g, `chapter.name AS chapter_name` in the view where needed) * Create an FTS5 table with `content="viewname"` * As described in the "external content tables" section (https://www.sqlite.org/fts5.html#external_content_tables), sql queries can be made directly to the FTS table, which behind the covers makes select calls to the content table when the content of the original columns are needed. * In addition, you get "rank" and "bm25()" available to you when you select on the _fts table. Unfortunately, datasette doesn't currently seem happy being coerced into doing a real query on an fts5 table. This works: ```select col1, col2, col3 from table_fts where coll1="value" and table_fts match escape_fts("search term") order by rank``` But this doesn't work in the datasette SQL query interface: ```select col1, col2, col3 from table_fts where coll1="value" and table_fts match escape_fts(:search) order by rank``` (the "search" input text field doesn't show up) For what datasette is doing right now, I think you could just use contentless fts5 tables (`content=""`), since all you care about is the rowid since all you're doing a subselect to get the rowid anyway. In fts5, that's just a contentless table. I guess if you want to follow this suggestion, you'd need a somewhat different code path for fts5.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
323718842  
https://github.com/simonw/datasette/issues/1238#issuecomment-789186458 https://api.github.com/repos/simonw/datasette/issues/1238 789186458 MDEyOklzc3VlQ29tbWVudDc4OTE4NjQ1OA== 198537 2021-03-02T20:19:30Z 2021-03-02T20:19:30Z CONTRIBUTOR A custom `templates/index.html` seems to work and custom `pages` as a workaround with moving them to `pages/base_url_dir`.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813899472  
https://github.com/simonw/datasette/issues/1247#issuecomment-787616446 https://api.github.com/repos/simonw/datasette/issues/1247 787616446 MDEyOklzc3VlQ29tbWVudDc4NzYxNjQ0Ng== 9599 2021-03-01T03:50:37Z 2021-03-01T03:50:37Z OWNER I like the `.add_memory_database()` option. I also like that it makes it more obvious that this is a capability of Datasette, since I'm excited to see more plugins, features and tests that take advantage of it.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
818430405  
https://github.com/simonw/datasette/issues/1247#issuecomment-787616158 https://api.github.com/repos/simonw/datasette/issues/1247 787616158 MDEyOklzc3VlQ29tbWVudDc4NzYxNjE1OA== 9599 2021-03-01T03:49:27Z 2021-03-01T03:49:27Z OWNER A couple of options: ```python datasette.add_memory_database("test_json_array") # or make that first argument to add_database() optional and support: datasette.add_database(memory_name="test_json_array") ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
818430405  
https://github.com/simonw/datasette/issues/1246#issuecomment-787611153 https://api.github.com/repos/simonw/datasette/issues/1246 787611153 MDEyOklzc3VlQ29tbWVudDc4NzYxMTE1Mw== 9599 2021-03-01T03:30:57Z 2021-03-01T03:30:57Z OWNER I'm going to try a new pattern for testing this, enabled by #1151 - the test will create a new named in-memory database, write some records to it and then run some test facets against that. This will save me from having to add yet another fixtures table for this.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817597268  
https://github.com/simonw/datasette/issues/1005#issuecomment-787536267 https://api.github.com/repos/simonw/datasette/issues/1005 787536267 MDEyOklzc3VlQ29tbWVudDc4NzUzNjI2Nw== 9599 2021-02-28T22:30:37Z 2021-02-28T22:30:37Z OWNER It's out! https://github.com/encode/httpx/releases/tag/0.17.0
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
718259202  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787532279 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787532279 MDEyOklzc3VlQ29tbWVudDc4NzUzMjI3OQ== 9599 2021-02-28T22:09:37Z 2021-02-28T22:09:37Z OWNER Microsoft's playwright Python library solves this problem by code generating both their sync AND their async libraries https://github.com/microsoft/playwright-python/tree/master/scripts
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787198202 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787198202 MDEyOklzc3VlQ29tbWVudDc4NzE5ODIwMg== 9599 2021-02-27T22:33:58Z 2021-02-27T22:33:58Z OWNER Hah or use this trick, which genuinely rewrites the code at runtime using a class decorator! https://github.com/python-happybase/aiohappybase/blob/0990ef45cfdb720dc987afdb4957a0fac591cb99/aiohappybase/sync/_util.py#L19-L32
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787195536 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787195536 MDEyOklzc3VlQ29tbWVudDc4NzE5NTUzNg== 9599 2021-02-27T22:13:24Z 2021-02-27T22:13:24Z OWNER Some other interesting background reading: https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html - in particular see how SQLALchemy has a `await conn.run_sync(meta.drop_all)` mechanism for running methods that haven't themselves been provided in an async version
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787190562 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787190562 MDEyOklzc3VlQ29tbWVudDc4NzE5MDU2Mg== 9599 2021-02-27T22:04:00Z 2021-02-27T22:04:00Z OWNER From the poster here: https://github.com/sethmlarson/pycon-async-sync-poster/blob/master/poster.pdf <img width="624" alt="pycon-async-sync-poster_poster_pdf_at_master_·_sethmlarson_pycon-async-sync-poster" src="https://user-images.githubusercontent.com/9599/109401634-9f0a1400-7904-11eb-8b3a-37df0678b8dc.png">
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787186826 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787186826 MDEyOklzc3VlQ29tbWVudDc4NzE4NjgyNg== 9599 2021-02-27T22:01:54Z 2021-02-27T22:01:54Z OWNER `unasync` is an implementation of the exact pattern I was talking about above - it uses the `tokenize` module from the Python standard library to apply some clever rules to transform an async codebase into a sync one. https://unasync.readthedocs.io/en/latest/ - implementation here: https://github.com/python-trio/unasync/blob/v0.5.0/src/unasync/__init__.py
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787175126 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787175126 MDEyOklzc3VlQ29tbWVudDc4NzE3NTEyNg== 9599 2021-02-27T21:55:05Z 2021-02-27T21:55:05Z OWNER "how to use some new tools to more easily maintain a codebase that supports both async and synchronous I/O and multiple async libraries" - yeah that's exactly what I need, thank you!
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787150276 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787150276 MDEyOklzc3VlQ29tbWVudDc4NzE1MDI3Ng== 37962604 2021-02-27T21:27:26Z 2021-02-27T21:27:26Z NONE I had this resource by Seth Michael Larson saved https://github.com/sethmlarson/pycon-async-sync-poster I haven't had a look at it, but it may contain useful info. On twitter, I mentioned passing an aiosqlite connection during the `Database` creation. I'm not 100% familiar with the `sqlite-utils` codebase, so I may be wrong here, but maybe decorating internal functions could be an option? Then they are awaited or not inside the decorator depending on how they are called.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787144523 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787144523 MDEyOklzc3VlQ29tbWVudDc4NzE0NDUyMw== 9599 2021-02-27T21:18:46Z 2021-02-27T21:18:46Z OWNER Here's a really wild idea: I wonder if it would be possible to run a source transformation against either the sync or the async versions of the code to produce the equivalent for the other paradigm? Could that even be as simple as a set of regular expressions against the `await ...` version that strips out or replaces the `await` and `async def` and `async for` statements? If so... I could maintain just the async version, generate the sync version with a script and rely on robust unit testing to guarantee that this actually works.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787142066 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787142066 MDEyOklzc3VlQ29tbWVudDc4NzE0MjA2Ng== 9599 2021-02-27T21:17:10Z 2021-02-27T21:17:10Z OWNER I have a hunch this is actually going to be quite difficult, due to the internal complexity of some of the `sqlite-utils` API methods. Consider `db[table].extract(...)` for example. It does a whole bunch of extra queries inside the method - each of those would need to be turned into an `await` call for the async version. Here's the method body today: https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1060-L1152 Writing this method twice - looking similar but with `await ...` tucked in before every internal method it calls that needs to execute SQL - is going to be pretty messy. One thing that would help a LOT is figuring out how to share the majority of the test code. If the exact same tests could run against both the sync and async versions with a bit of test trickery, maintaining parallel implementations would at least be a bit more feasible.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787121933 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787121933 MDEyOklzc3VlQ29tbWVudDc4NzEyMTkzMw== 25778 2021-02-27T19:18:57Z 2021-02-27T19:18:57Z NONE I think HTTPX gets it exactly right, with a clear separation between sync and async clients, each with a basically identical API. (I'm about to switch [feed-to-sqlite](https://github.com/eyeseast/feed-to-sqlite) over to it, from Requests, to eventually make way for async support.)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787120136 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787120136 MDEyOklzc3VlQ29tbWVudDc4NzEyMDEzNg== 9599 2021-02-27T19:04:47Z 2021-02-27T19:04:47Z OWNER Another option here would be to add https://github.com/omnilib/aiosqlite/blob/main/aiosqlite/core.py as a dependency - it's four years old now and actively marinated, and the code is pretty small so it looks like a solid, stable, reliable dependency.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787118691 https://api.github.com/repos/simonw/sqlite-utils/issues/242 787118691 MDEyOklzc3VlQ29tbWVudDc4NzExODY5MQ== 9599 2021-02-27T18:53:23Z 2021-02-27T18:53:23Z OWNER Datasette has its own implementation of a write queue for exactly this purpose - and there's no reason at all that should stay in Datasette rather than being extracted out and moved over here to `sqlite-utils`. One small concern I have is around the API design. I'd want to keep supporting the existing synchronous API while also providing a similar API with await-based methods. What are some good examples of libraries that do this? I like how https://www.python-httpx.org/ handles it, maybe that's a good example to imitate?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817989436  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-786925280 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 786925280 MDEyOklzc3VlQ29tbWVudDc4NjkyNTI4MA== 9599 2021-02-26T22:23:10Z 2021-02-26T22:23:10Z MEMBER Thanks! I requested my Gmail export from takeout - once that arrives I'll test it against this and then merge the PR.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/simonw/datasette/issues/1238#issuecomment-786849095 https://api.github.com/repos/simonw/datasette/issues/1238 786849095 MDEyOklzc3VlQ29tbWVudDc4Njg0OTA5NQ== 9599 2021-02-26T19:29:38Z 2021-02-26T19:29:38Z OWNER Here's the test I wrote: ```diff git diff tests/test_custom_pages.py diff --git a/tests/test_custom_pages.py b/tests/test_custom_pages.py index 6a23192..5a71f56 100644 --- a/tests/test_custom_pages.py +++ b/tests/test_custom_pages.py @@ -2,11 +2,19 @@ import pathlib import pytest from .fixtures import make_app_client +TEST_TEMPLATE_DIRS = str(pathlib.Path(__file__).parent / "test_templates") + @pytest.fixture(scope="session") def custom_pages_client(): + with make_app_client(template_dir=TEST_TEMPLATE_DIRS) as client: + yield client + + +@pytest.fixture(scope="session") +def custom_pages_client_with_base_url(): with make_app_client( - template_dir=str(pathlib.Path(__file__).parent / "test_templates") + template_dir=TEST_TEMPLATE_DIRS, config={"base_url": "/prefix/"} ) as client: yield client @@ -23,6 +31,12 @@ def test_request_is_available(custom_pages_client): assert "path:/request" == response.text +def test_custom_pages_with_base_url(custom_pages_client_with_base_url): + response = custom_pages_client_with_base_url.get("/prefix/request") + assert 200 == response.status + assert "path:/prefix/request" == response.text + + def test_custom_pages_nested(custom_pages_client): response = custom_pages_client.get("/nested/nest") assert 200 == response.status ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813899472  
https://github.com/simonw/datasette/issues/1238#issuecomment-786848654 https://api.github.com/repos/simonw/datasette/issues/1238 786848654 MDEyOklzc3VlQ29tbWVudDc4Njg0ODY1NA== 9599 2021-02-26T19:28:48Z 2021-02-26T19:28:48Z OWNER I added a debug line just before `for regex, wildcard_template` here: https://github.com/simonw/datasette/blob/afed51b1e36cf275c39e71c7cb262d6c5bdbaa31/datasette/app.py#L1148-L1155 And it showed that for some reason `request.path` is `/prefix/prefix/request` here - the prefix got doubled somehow.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813899472  
https://github.com/simonw/datasette/issues/1238#issuecomment-786841261 https://api.github.com/repos/simonw/datasette/issues/1238 786841261 MDEyOklzc3VlQ29tbWVudDc4Njg0MTI2MQ== 9599 2021-02-26T19:13:44Z 2021-02-26T19:13:44Z OWNER Sounds like a bug - thanks for reporting this.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813899472  
https://github.com/simonw/datasette/issues/1246#issuecomment-786840734 https://api.github.com/repos/simonw/datasette/issues/1246 786840734 MDEyOklzc3VlQ29tbWVudDc4Njg0MDczNA== 9599 2021-02-26T19:12:39Z 2021-02-26T19:12:47Z OWNER Could I take this part: ```python suggested_facet_sql = """ select distinct json_type({column}) from ({sql}) """.format( column=escape_sqlite(column), sql=self.sql ) ``` And add `where {column} is not null and {column} != ''` perhaps?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817597268  
https://github.com/simonw/datasette/issues/1246#issuecomment-786840425 https://api.github.com/repos/simonw/datasette/issues/1246 786840425 MDEyOklzc3VlQ29tbWVudDc4Njg0MDQyNQ== 9599 2021-02-26T19:11:56Z 2021-02-26T19:11:56Z OWNER Relevant code: https://github.com/simonw/datasette/blob/afed51b1e36cf275c39e71c7cb262d6c5bdbaa31/datasette/facets.py#L271-L295
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817597268  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-786830832 https://api.github.com/repos/simonw/sqlite-utils/issues/239 786830832 MDEyOklzc3VlQ29tbWVudDc4NjgzMDgzMg== 9599 2021-02-26T18:52:40Z 2021-02-26T18:52:40Z OWNER Could this handle lists of objects too? That would be pretty amazing - if the column has a `[{...}, {...}]` list in it could turn that into a many-to-many.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/datasette/issues/1240#issuecomment-786813506 https://api.github.com/repos/simonw/datasette/issues/1240 786813506 MDEyOklzc3VlQ29tbWVudDc4NjgxMzUwNg== 9599 2021-02-26T18:19:46Z 2021-02-26T18:19:46Z OWNER Linking to rows from custom queries is a lot harder - because given an arbitrary string of SQL it's difficult to analyze it and figure out which (if any) of the returned columns represent a primary key. It's possible to manually write a SQL query that returns a column that will be treated as a link to another page using this plugin, but it's not particularly straight-forward: https://datasette.io/plugins/datasette-json-html
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814591962  
https://github.com/simonw/datasette/issues/1240#issuecomment-786812716 https://api.github.com/repos/simonw/datasette/issues/1240 786812716 MDEyOklzc3VlQ29tbWVudDc4NjgxMjcxNg== 9599 2021-02-26T18:18:18Z 2021-02-26T18:18:18Z OWNER Agreed, this would be extremely useful. I'd love to be able to facet against custom queries. It's a fair bit of work to implement but it's not impossible. Closing this as a duplicate of #972.
{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
814591962  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-786795132 https://api.github.com/repos/simonw/sqlite-utils/issues/239 786795132 MDEyOklzc3VlQ29tbWVudDc4Njc5NTEzMg== 9599 2021-02-26T17:45:53Z 2021-02-26T17:45:53Z OWNER If there's no primary key in the JSON could use the `hash_id` mechanism.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-786794435 https://api.github.com/repos/simonw/sqlite-utils/issues/239 786794435 MDEyOklzc3VlQ29tbWVudDc4Njc5NDQzNQ== 9599 2021-02-26T17:44:38Z 2021-02-26T17:44:38Z OWNER This came up in office hours!
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/datasette/issues/1244#issuecomment-786786645 https://api.github.com/repos/simonw/datasette/issues/1244 786786645 MDEyOklzc3VlQ29tbWVudDc4Njc4NjY0NQ== 9599 2021-02-26T17:30:38Z 2021-02-26T17:30:38Z OWNER New paragraph at the top of https://docs.datasette.io/en/latest/writing_plugins.html > Want to start by looking at an example? The [Datasette plugins directory](https://datasette.io/plugins) lists more than 50 open source plugins with code you can explore. The [plugin hooks](https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hooks) page includes links to example plugins for each of the documented hooks.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
817528452  
https://github.com/simonw/sqlite-utils/issues/237#issuecomment-786050562 https://api.github.com/repos/simonw/sqlite-utils/issues/237 786050562 MDEyOklzc3VlQ29tbWVudDc4NjA1MDU2Mg== 9599 2021-02-25T16:57:56Z 2021-02-25T16:57:56Z OWNER `sqlite-utils create-view` currently has a `--ignore` option, so adding that to `sqlite-utils drop-view` and `sqlite-utils drop-table` makes sense as well.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
815554385  
https://github.com/simonw/sqlite-utils/issues/237#issuecomment-786049686 https://api.github.com/repos/simonw/sqlite-utils/issues/237 786049686 MDEyOklzc3VlQ29tbWVudDc4NjA0OTY4Ng== 9599 2021-02-25T16:56:42Z 2021-02-25T16:56:42Z OWNER So: ```python db["my_table"].drop(ignore=True) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
815554385  
https://github.com/simonw/sqlite-utils/issues/237#issuecomment-786049394 https://api.github.com/repos/simonw/sqlite-utils/issues/237 786049394 MDEyOklzc3VlQ29tbWVudDc4NjA0OTM5NA== 9599 2021-02-25T16:56:14Z 2021-02-25T16:56:14Z OWNER Other methods (`db.create_view()` for example) have `ignore=True` to mean "don't throw an error if this causes a problem", so I'm good with adding that to `.drop_view()`. I don't like using it as the default partly because that would be a very minor breaking API change, but mainly because I don't want to hide mistakes people make - e.g. if you mistype the name of the table you are trying to drop.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
815554385  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786037219 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786037219 MDEyOklzc3VlQ29tbWVudDc4NjAzNzIxOQ== 9599 2021-02-25T16:39:23Z 2021-02-25T16:39:23Z OWNER Example from the docs: ```pycon >>> db = sqlite_utils.Database(memory=True) >>> db["dogs"].insert({"name": "Cleo"}) >>> for pk, row in db["dogs"].pks_and_rows_where(): ... print(pk, row) 1 {'rowid': 1, 'name': 'Cleo'} >>> db["dogs_with_pk"].insert({"id": 5, "name": "Cleo"}, pk="id") >>> for pk, row in db["dogs_with_pk"].pks_and_rows_where(): ... print(pk, row) 5 {'id': 5, 'name': 'Cleo'} >>> db["dogs_with_compound_pk"].insert( ... {"species": "dog", "id": 3, "name": "Cleo"}, ... pk=("species", "id") ... ) >>> for pk, row in db["dogs_with_compound_pk"].pks_and_rows_where(): ... print(pk, row) ('dog', 3) {'species': 'dog', 'id': 3, 'name': 'Cleo'} ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786036355 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786036355 MDEyOklzc3VlQ29tbWVudDc4NjAzNjM1NQ== 9599 2021-02-25T16:38:07Z 2021-02-25T16:38:07Z OWNER Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#listing-rows-with-their-primary-keys
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-786035142 https://api.github.com/repos/simonw/sqlite-utils/issues/239 786035142 MDEyOklzc3VlQ29tbWVudDc4NjAzNTE0Mg== 9599 2021-02-25T16:36:17Z 2021-02-25T16:36:17Z OWNER WIP in a pull request.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786016380 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786016380 MDEyOklzc3VlQ29tbWVudDc4NjAxNjM4MA== 9599 2021-02-25T16:10:01Z 2021-02-25T16:10:01Z OWNER I prototyped this and I like it: ``` In [1]: import sqlite_utils In [2]: db = sqlite_utils.Database("/Users/simon/Dropbox/Development/datasette/fixtures.db") In [3]: list(db["compound_primary_key"].pks_and_rows_where()) Out[3]: [(('a', 'b'), {'pk1': 'a', 'pk2': 'b', 'content': 'c'})] ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786007209 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786007209 MDEyOklzc3VlQ29tbWVudDc4NjAwNzIwOQ== 9599 2021-02-25T15:57:50Z 2021-02-25T15:57:50Z OWNER `table.pks_and_rows_where(...)` is explicit and I think less ambiguous than the other options.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786006794 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786006794 MDEyOklzc3VlQ29tbWVudDc4NjAwNjc5NA== 9599 2021-02-25T15:57:17Z 2021-02-25T15:57:28Z OWNER I quite like `pks_with_rows_where(...)` - but grammatically it suggests it will return the primary keys that exist where their rows match the criteria - "pks with rows" can be interpreted as "pks for the rows that..." as opposed to "pks accompanied by rows"
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786005078 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786005078 MDEyOklzc3VlQ29tbWVudDc4NjAwNTA3OA== 9599 2021-02-25T15:54:59Z 2021-02-25T15:56:16Z OWNER Is `pk_rows_where()` a good name? It sounds like it returns "primary key rows" which isn't a thing. It actually returns rows along with their primary key. Other options: - `table.rows_with_pk_where(...)` - should this return `(row, pk)` rather than `(pk, row)`? - `table.rows_where_pk(...)` - `table.pk_and_rows_where(...)` - `table.pk_with_rows_where(...)` - `table.pks_with_rows_where(...)` - because rows is pluralized, so pks should be pluralized too? - `table.pks_rows_where(...)`
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/240#issuecomment-786001768 https://api.github.com/repos/simonw/sqlite-utils/issues/240 786001768 MDEyOklzc3VlQ29tbWVudDc4NjAwMTc2OA== 9599 2021-02-25T15:50:28Z 2021-02-25T15:52:12Z OWNER One option: `.rows_where()` could grow a `ensure_pk=True` option which checks to see if the table is a `rowid` table and, if it is, includes that in the `select`. Or... how about you can call `.rows_where(..., pks=True)` and it will yield `(pk, rowdict)` tuple pairs instead of just returning the sequence of dictionaries? I'm always a little bit nervous of methods that vary their return type based on their arguments. Maybe this would be a separate method instead? ```python for pk, row in table.pk_rows_where(...): # ... ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816560819  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785992158 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785992158 MDEyOklzc3VlQ29tbWVudDc4NTk5MjE1OA== 9599 2021-02-25T15:37:04Z 2021-02-25T15:37:04Z OWNER Here's the current implementation of `.extract()`: https://github.com/simonw/sqlite-utils/blob/806c21044ac8d31da35f4c90600e98115aade7c6/sqlite_utils/db.py#L1049-L1074 Tricky detail here: I create the lookup table first, based on the types of the columns that are being extracted. I need to do this because extraction currently uses unique tuples of values, so the table has to be created in advance. But if I'm using these new expand functions to figure out what's going to be extracted, I don't know the names of the columns and their types in advance. I'm only going to find those out during the transformation. This may turn out to be incompatible with how `.extract()` works at the moment. I may need a new method, `.extract_expand()` perhaps? It could be simpler - work only against a single column for example. I can still use the existing `sqlite-utils extract` CLI command though, with a `--json` flag and a rule that you can't run it against multiple columns.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785983837 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785983837 MDEyOklzc3VlQ29tbWVudDc4NTk4MzgzNw== 9599 2021-02-25T15:25:21Z 2021-02-25T15:28:57Z OWNER Problem with calling this argument `transform=` is that the term "transform" already means something else in this library. I could use `convert=` instead. ... but that doesn't instantly make me think of turning a value into multiple columns. How about `expand=`? I've not used that term anywhere yet. db["Reports"].extract(["Reported by"], expand={"Reported by": json.loads}) I think that works. You're expanding a single value into several columns of information.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785983070 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785983070 MDEyOklzc3VlQ29tbWVudDc4NTk4MzA3MA== 9599 2021-02-25T15:24:17Z 2021-02-25T15:24:17Z OWNER I'm going to go with last-wins - so if multiple transform functions return the same key the last one will over-write the others.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785980813 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785980813 MDEyOklzc3VlQ29tbWVudDc4NTk4MDgxMw== 9599 2021-02-25T15:21:02Z 2021-02-25T15:23:47Z OWNER Maybe the Python version takes an optional dictionary mapping column names to transformation functions? It could then merge all of those results together - and maybe throw an error if the same key is produced by more than one column. ```python db["Reports"].extract(["Reported by"], transform={"Reported by": json.loads}) ``` Or it could have an option for different strategies if keys collide: first wins, last wins, throw exception, add a prefix to the new column name. That feels a bit too complex for an edge-case though.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785980083 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785980083 MDEyOklzc3VlQ29tbWVudDc4NTk4MDA4Mw== 9599 2021-02-25T15:20:02Z 2021-02-25T15:20:02Z OWNER It would be OK if the CLI version only allows you to specify a single column if you are using the `--json` option.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785979769 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785979769 MDEyOklzc3VlQ29tbWVudDc4NTk3OTc2OQ== 9599 2021-02-25T15:19:37Z 2021-02-25T15:19:37Z OWNER For the Python version I'd like to be able to provide a transformation callback function - which can be `json.loads` but could also be anything else which accepts the value of the current column and returns a Python dictionary of columns and their values to use in the new table.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785979192 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785979192 MDEyOklzc3VlQ29tbWVudDc4NTk3OTE5Mg== 9599 2021-02-25T15:18:46Z 2021-02-25T15:18:46Z OWNER Likewise the `sqlite-utils extract` command takes one or more columns: ``` Usage: sqlite-utils extract [OPTIONS] PATH TABLE COLUMNS... Extract one or more columns into a separate table Options: --table TEXT Name of the other table to extract columns to --fk-column TEXT Name of the foreign key column to add to the table --rename <TEXT TEXT>... Rename this column in extracted table ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/239#issuecomment-785978689 https://api.github.com/repos/simonw/sqlite-utils/issues/239 785978689 MDEyOklzc3VlQ29tbWVudDc4NTk3ODY4OQ== 9599 2021-02-25T15:18:03Z 2021-02-25T15:18:03Z OWNER The Python `.extract()` method currently starts like this: ```python def extract(self, columns, table=None, fk_column=None, rename=None): rename = rename or {} if isinstance(columns, str): columns = [columns] if not set(columns).issubset(self.columns_dict.keys()): raise InvalidColumns( "Invalid columns {} for table with columns {}".format( columns, list(self.columns_dict.keys()) ) ) ... ``` Note that it takes a list of columns (and treats a string as a single item list). That's because it can be called with a list of columns and it will use them to populate another table of unique tuples of those column values. So a new mechanism that can instead read JSON values from a single column needs to be compatible with that existing design.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816526538  
https://github.com/simonw/sqlite-utils/issues/238#issuecomment-785972074 https://api.github.com/repos/simonw/sqlite-utils/issues/238 785972074 MDEyOklzc3VlQ29tbWVudDc4NTk3MjA3NA== 9599 2021-02-25T15:08:36Z 2021-02-25T15:08:36Z OWNER I bet the bug is in here: https://github.com/simonw/sqlite-utils/blob/806c21044ac8d31da35f4c90600e98115aade7c6/sqlite_utils/db.py#L593-L602
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
816523763  
https://github.com/simonw/datasette/pull/1243#issuecomment-785485597 https://api.github.com/repos/simonw/datasette/issues/1243 785485597 MDEyOklzc3VlQ29tbWVudDc4NTQ4NTU5Nw== 22429695 2021-02-25T00:28:30Z 2021-02-25T00:28:30Z NONE # [Codecov](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=h1) Report > Merging [#1243](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=desc) (887bfd2) into [main](https://codecov.io/gh/simonw/datasette/commit/726f781c50e88f557437f6490b8479c3d6fabfc2?el=desc) (726f781) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/simonw/datasette/pull/1243/graphs/tree.svg?width=650&height=150&src=pr&token=eSahVY7kw1)](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## main #1243 +/- ## ======================================= Coverage 91.56% 91.56% ======================================= Files 34 34 Lines 4242 4242 ======================================= Hits 3884 3884 Misses 358 358 ``` ------ [Continue to review full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=footer). Last update [726f781...32652d9](https://codecov.io/gh/simonw/datasette/pull/1243?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
815955014  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-784638394 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 784638394 MDEyOklzc3VlQ29tbWVudDc4NDYzODM5NA== 306240 2021-02-24T00:36:18Z 2021-02-24T00:36:18Z NONE I noticed that @simonw is using black for formatting. I ran black on my additions in this PR.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/simonw/datasette/issues/1241#issuecomment-784567547 https://api.github.com/repos/simonw/datasette/issues/1241 784567547 MDEyOklzc3VlQ29tbWVudDc4NDU2NzU0Nw== 9599 2021-02-23T22:45:56Z 2021-02-23T22:46:12Z OWNER I really like the way the Share feature on Stack Overflow works: https://stackoverflow.com/questions/18934149/how-can-i-use-postgresqls-text-column-type-in-django
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814595021  
https://github.com/simonw/datasette/issues/1241#issuecomment-784347646 https://api.github.com/repos/simonw/datasette/issues/1241 784347646 MDEyOklzc3VlQ29tbWVudDc4NDM0NzY0Ng== 7107523 2021-02-23T16:55:26Z 2021-02-23T16:57:39Z NONE > I think it's possible that many users these days no longer assume they can paste a URL from the browser address bar (if they ever understood that at all) because to many apps are SPAs with broken URLs. Absolutely, that's why I thought my corner case with `iframe` preventing access to the datasette URL could actually be relevant in more general situations.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814595021  
https://github.com/simonw/datasette/issues/1241#issuecomment-784334931 https://api.github.com/repos/simonw/datasette/issues/1241 784334931 MDEyOklzc3VlQ29tbWVudDc4NDMzNDkzMQ== 9599 2021-02-23T16:37:26Z 2021-02-23T16:37:26Z OWNER A "Share link" button would only be needed on the table page and the arbitrary query page I think - and maybe on the row page, especially as that page starts to grow more features in the future.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814595021  
https://github.com/simonw/datasette/issues/1241#issuecomment-784333768 https://api.github.com/repos/simonw/datasette/issues/1241 784333768 MDEyOklzc3VlQ29tbWVudDc4NDMzMzc2OA== 9599 2021-02-23T16:35:51Z 2021-02-23T16:35:51Z OWNER This can definitely be done with a plugin. Adding to Datasette itself is an interesting idea. I think it's possible that many users these days no longer assume they can paste a URL from the browser address bar (if they ever understood that at all) because to many apps are SPAs with broken URLs. The shareable URLs are actually a key feature of Datasette - so maybe they should be highlighted in the default UI? I built a "copy to clipboard" feature for `datasette-copyable` and wrote up how that works here: https://til.simonwillison.net/javascript/copy-button
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814595021  
https://github.com/simonw/datasette/issues/1240#issuecomment-784312460 https://api.github.com/repos/simonw/datasette/issues/1240 784312460 MDEyOklzc3VlQ29tbWVudDc4NDMxMjQ2MA== 7107523 2021-02-23T16:07:10Z 2021-02-23T16:08:28Z NONE Likewise, while answering to another issue regarding the Vega plugin, I realized that there is no such way of linking rows after a custom query, I only get this "Link" column with individual URLs for the default SQL view: ![ss-2021-02-23_170559](https://user-images.githubusercontent.com/7107523/108871491-1e3fd500-75f1-11eb-8f76-5d5a82cc14d7.png) Or is it there and I am just missing the option in my custom queries?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
814591962  
https://github.com/simonw/datasette/issues/1218#issuecomment-784157345 https://api.github.com/repos/simonw/datasette/issues/1218 784157345 MDEyOklzc3VlQ29tbWVudDc4NDE1NzM0NQ== 1244799 2021-02-23T12:12:17Z 2021-02-23T12:12:17Z NONE Topline this fixed the same problem for me. ``` brew install python@3.7 ln -s /usr/local/opt/python@3.7/bin/python3.7 /usr/local/opt/python/bin/python3.7 pip3 uninstall -y numpy pip3 uninstall -y setuptools pip3 install setuptools pip3 install numpy pip3 install datasette-publish-fly ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
803356942  
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-783794520 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 783794520 MDEyOklzc3VlQ29tbWVudDc4Mzc5NDUyMA== 306240 2021-02-23T01:13:54Z 2021-02-23T01:13:54Z NONE Also, @simonw I created a test based off the existing tests. I think it's working correctly
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
https://github.com/simonw/datasette/issues/1239#issuecomment-783774084 https://api.github.com/repos/simonw/datasette/issues/1239 783774084 MDEyOklzc3VlQ29tbWVudDc4Mzc3NDA4NA== 9599 2021-02-23T00:18:56Z 2021-02-23T00:19:18Z OWNER Bug is here: https://github.com/simonw/datasette/blob/42caabf7e9e6e4d69ef6dd7de16f2cd96bc79d5b/datasette/filters.py#L149-L165 Those `json_each` lines should be: select {t}.rowid from {t}, json_each([{t}].[{c}]) j
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813978858  
https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-783688547 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4 783688547 MDEyOklzc3VlQ29tbWVudDc4MzY4ODU0Nw== 306240 2021-02-22T21:31:28Z 2021-02-22T21:31:28Z NONE @Btibert3 I've opened a PR with my initial attempt at this. Would you be willing to give this a try? https://github.com/dogsheep/google-takeout-to-sqlite/pull/5
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
778380836  
https://github.com/simonw/datasette/issues/1237#issuecomment-783676548 https://api.github.com/repos/simonw/datasette/issues/1237 783676548 MDEyOklzc3VlQ29tbWVudDc4MzY3NjU0OA== 9599 2021-02-22T21:10:19Z 2021-02-22T21:10:25Z OWNER This is another change which is a little bit hard to figure out because I haven't solved #878 yet.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
812704869  
https://github.com/simonw/datasette/issues/1234#issuecomment-783674659 https://api.github.com/repos/simonw/datasette/issues/1234 783674659 MDEyOklzc3VlQ29tbWVudDc4MzY3NDY1OQ== 9599 2021-02-22T21:06:28Z 2021-02-22T21:06:28Z OWNER I'm not going to work on this for a while, but if anyone has needs or ideas around that they can add them to this issue.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
811505638  
https://github.com/simonw/datasette/issues/1236#issuecomment-783674038 https://api.github.com/repos/simonw/datasette/issues/1236 783674038 MDEyOklzc3VlQ29tbWVudDc4MzY3NDAzOA== 9599 2021-02-22T21:05:21Z 2021-02-22T21:05:21Z OWNER It's good on mobile - iOS at least. Going to close this open new issues if anyone reports bugs.
{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
812228314  
https://github.com/simonw/sqlite-utils/issues/220#issuecomment-783662968 https://api.github.com/repos/simonw/sqlite-utils/issues/220 783662968 MDEyOklzc3VlQ29tbWVudDc4MzY2Mjk2OA== 649467 2021-02-22T20:44:51Z 2021-02-22T20:44:51Z NONE Actually, coming back to this, I have a clearer use case for enabling fts generation for views: making it easier to bring in text from lookup tables and other joins. The datasette documentation describes populating an fts table like so: ``` INSERT INTO "items_fts" (rowid, name, description, category_name) SELECT items. rowid, items.name, items.description, categories.name FROM items JOIN categories ON items.category_id=categories.id; ``` Alternatively if you have fts support in sqlite_utils for views (which sqlite and fts5 support), you can do the same thing just by creating a view that captures the above joins as columns, then creating an fts table from that view. Such an fts table can be created using sqlite_utils, where one created with your method can't. The resulting fts table can then be used by a whole family of related tables and views in the manner you described earlier in this issue.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
783778672  
https://github.com/simonw/datasette/issues/1166#issuecomment-783560017 https://api.github.com/repos/simonw/datasette/issues/1166 783560017 MDEyOklzc3VlQ29tbWVudDc4MzU2MDAxNw== 94334 2021-02-22T18:00:57Z 2021-02-22T18:13:11Z NONE Hi! I don't think Prettier supports this syntax for globs: `datasette/static/*[!.min].js` Are you sure that works? Prettier uses https://github.com/mrmlnc/fast-glob, which in turn uses https://github.com/micromatch/micromatch, and the docs for these packages don't mention this syntax. As per the docs, square brackets should work as in regexes (`foo-[1-5].js`). Tested it. Apparently, it works as a negated character class in regexes (like `[^.min]`). I wonder where this syntax comes from. Micromatch doesn't support that: ```js micromatch(['static/table.js', 'static/n.js'], ['static/*[!.min].js']); // result: ["static/n.js"] -- brackets are treated like [!.min] in regexes, without negation ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
777140799  
https://github.com/simonw/datasette/issues/782#issuecomment-783265830 https://api.github.com/repos/simonw/datasette/issues/782 783265830 MDEyOklzc3VlQ29tbWVudDc4MzI2NTgzMA== 30665 2021-02-22T10:21:14Z 2021-02-22T10:21:14Z NONE @simonw: > The problem there is that ?_size=x isn't actually doing the same thing as the SQL limit keyword. Interesting! Although I don't think it matters too much what the underlying implementation is - I more meant that `limit` is familiar to developers conceptually as "up to and including this number, if they exist", whereas "size" is potentially more ambiguous. However, it's probably no big deal either way.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782789598 https://api.github.com/repos/simonw/datasette/issues/782 782789598 MDEyOklzc3VlQ29tbWVudDc4Mjc4OTU5OA== 9599 2021-02-21T03:30:02Z 2021-02-21T03:30:02Z OWNER Another benefit to default:object - I could include a key that shows a list of available extras. I could then use that to power an interactive API explorer.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782765665 https://api.github.com/repos/simonw/datasette/issues/782 782765665 MDEyOklzc3VlQ29tbWVudDc4Mjc2NTY2NQ== 9599 2021-02-20T23:34:41Z 2021-02-20T23:34:41Z OWNER OK, I'm back to the "top level object as the default" side of things now - it's pretty much unanimous at this point, and it's certainly true that it's not a decision you'll even regret.
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782756398 https://api.github.com/repos/simonw/datasette/issues/782 782756398 MDEyOklzc3VlQ29tbWVudDc4Mjc1NjM5OA== 601316 2021-02-20T22:05:48Z 2021-02-20T22:05:48Z NONE > I think it’s a good idea if the top level item of the response JSON is always an object, rather than an array, at least as the default. I agree it is more predictable if the top level item is an object with a rows or data object that contains an array of data, which then allows for other top-level meta data. I can see the argument for removing this and just using an array for convenience - but I think that's OK as an option (as you have now). Rather than have lots of top-level keys you could have a "meta" object to contain non-data stuff. You could use something like "links" for API endpoint URLs (or use a standard like HAL). Which would then leave the top level a bit cleaner - if that's what you what. Have you had much feedback from users who use the Datasette API a lot?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782748501 https://api.github.com/repos/simonw/datasette/issues/782 782748501 MDEyOklzc3VlQ29tbWVudDc4Mjc0ODUwMQ== 9599 2021-02-20T20:58:18Z 2021-02-20T20:58:18Z OWNER Yet another option: support a `?_path=x` option which returns a nested path from the result. So you could do this: `/github/commits.json?_path=rows` - to get back a top-level array pulled from the `"rows"` key.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782748093 https://api.github.com/repos/simonw/datasette/issues/782 782748093 MDEyOklzc3VlQ29tbWVudDc4Mjc0ODA5Mw== 9599 2021-02-20T20:54:52Z 2021-02-20T20:54:52Z OWNER > Have you given any thought as to whether to pretty print (format with spaces) the output or not? Can be useful for debugging/exploring in a browser or other basic tools which don’t parse the JSON. Could be default (can’t be much bigger with gzip?) or opt-in. Adding a `?_pretty=1` option that does that is a great idea, I'm filing a ticket for it: #1237
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782747878 https://api.github.com/repos/simonw/datasette/issues/782 782747878 MDEyOklzc3VlQ29tbWVudDc4Mjc0Nzg3OA== 9599 2021-02-20T20:53:11Z 2021-02-20T20:53:11Z OWNER ... though thinking about this further, I could re-implement the `select * from commits` (but only return a max of 10 results) feature using a nested `select * from (select * from commits) limit 10` query.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782747743 https://api.github.com/repos/simonw/datasette/issues/782 782747743 MDEyOklzc3VlQ29tbWVudDc4Mjc0Nzc0Mw== 9599 2021-02-20T20:52:10Z 2021-02-20T20:52:10Z OWNER > Minor suggestion: rename `size` query param to `limit`, to better reflect that it’s a maximum number of rows returned rather than a guarantee of getting that number, and also for consistency with the SQL keyword? The problem there is that `?_size=x` isn't actually doing the same thing as the SQL `limit` keyword. Consider this query: https://latest-with-plugins.datasette.io/github?sql=select+*+from+commits - `select * from commits` Datasette returns 1,000 results, and shows a "Custom SQL query returning more than 1,000 rows" message at the top. That's the `size` kicking in - I only fetch the first 1,000 results from the cursor to avoid exhausting resources. In the JSON version of that at https://latest-with-plugins.datasette.io/github.json?sql=select+*+from+commits there's a `"truncated": true` key to let you know what happened. I find myself using `?_size=2` against Datasette occasionally if I know the rows being returned are really big and I don't want to load 10+MB of HTML. This is only really a concern for arbitrary SQL queries though - for table pages such as https://latest-with-plugins.datasette.io/github/commits?_size=10 adding `?_size=10` actually puts a `limit 10` on the underlying SQL query.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782747164 https://api.github.com/repos/simonw/datasette/issues/782 782747164 MDEyOklzc3VlQ29tbWVudDc4Mjc0NzE2NA== 9599 2021-02-20T20:47:16Z 2021-02-20T20:47:16Z OWNER (I started a thread on Twitter about this: https://twitter.com/simonw/status/1363220355318358016)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-782746755 https://api.github.com/repos/simonw/datasette/issues/782 782746755 MDEyOklzc3VlQ29tbWVudDc4Mjc0Njc1NQ== 30665 2021-02-20T20:44:05Z 2021-02-20T20:44:05Z NONE Minor suggestion: rename `size` query param to `limit`, to better reflect that it’s a maximum number of rows returned rather than a guarantee of getting that number, and also for consistency with the SQL keyword? I like the idea of specifying a limit of 0 if you don’t want any rows data - and returning an empty array under the `rows` key seems fine. Have you given any thought as to whether to pretty print (format with spaces) the output or not? Can be useful for debugging/exploring in a browser or other basic tools which don’t parse the JSON. Could be default (can’t be much bigger with gzip?) or opt-in.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
627794879