issue_comments

3,861 rows sorted by updated_at descending

View and edit SQL

Suggested facets: reactions, created_at (date), updated_at (date)

issue

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
700929721 https://github.com/simonw/datasette/issues/980#issuecomment-700929721 https://api.github.com/repos/simonw/datasette/issues/980 MDEyOklzc3VlQ29tbWVudDcwMDkyOTcyMQ== simonw 9599 2020-09-29T19:21:50Z 2020-09-29T19:21:50Z OWNER

That fixed it: https://latest-with-plugins.datasette.io/fixtures?sql=select%0D%0A++dateutil_rrule(%27FREQ%3DHOURLY%3BCOUNT%3D5%27)%2C%0D%0A++dateutil_rrule_date(%0D%0A++++%27FREQ%3DDAILY%3BCOUNT%3D3%27%2C%0D%0A++++%271st+jan+2020%27%0D%0A++)%3B

https://user-images.githubusercontent.com/9599/94605680-4d266a80-024e-11eb-8818-bc8bd7958df4.png">

<style>
@media only screen and (max-width: 576px) {

    .rows-and-columns td:nth-of-type(1):before { content: "dateutil_rrule(\000027FREQ=HOURLY;COUNT=5\000027)"; }

    .rows-and-columns td:nth-of-type(2):before { content: "dateutil_rrule_date(\00000A    \000027FREQ=DAILY;COUNT=3\000027,\00000A    \0000271st jan 2020\000027\00000A  )"; }

}
</style>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Another rendering glitch with column headers on mobile 710819020  
700490225 https://github.com/simonw/datasette/issues/980#issuecomment-700490225 https://api.github.com/repos/simonw/datasette/issues/980 MDEyOklzc3VlQ29tbWVudDcwMDQ5MDIyNQ== simonw 9599 2020-09-29T06:53:37Z 2020-09-29T06:53:37Z OWNER

This time it's because there are newlines in the column header:

<style>
@media only screen and (max-width: 576px) {

    .rows-and-columns td:nth-of-type(1):before { content: "dateutil_rrule(\000027FREQ=HOURLY;COUNT=5\000027)"; }

    .rows-and-columns td:nth-of-type(2):before { content: "dateutil_rrule_date(
\00000A    \000027FREQ=DAILY;COUNT=3\000027,
\00000A    \0000271st jan 2020\000027
\00000A  )"; }

}
</style>

Those need to be escaped somehow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Another rendering glitch with column headers on mobile 710819020  
700343373 https://github.com/simonw/datasette/issues/979#issuecomment-700343373 https://api.github.com/repos/simonw/datasette/issues/979 MDEyOklzc3VlQ29tbWVudDcwMDM0MzM3Mw== simonw 9599 2020-09-28T23:56:27Z 2020-09-28T23:56:27Z OWNER

This would benefit https://github.com/simonw/datasette-import-table - which currently ignores the CREATE TABLE and derives the schema by inserting rows.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default table view JSON should include CREATE TABLE 710650633  
700343229 https://github.com/simonw/datasette/issues/979#issuecomment-700343229 https://api.github.com/repos/simonw/datasette/issues/979 MDEyOklzc3VlQ29tbWVudDcwMDM0MzIyOQ== simonw 9599 2020-09-28T23:55:55Z 2020-09-28T23:55:55Z OWNER

Here's the code that adds it to the HTML context: https://github.com/simonw/datasette/blob/c11383e6284e000b2641569457efa16ac9e0d6ae/datasette/views/table.py#L835-L837

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default table view JSON should include CREATE TABLE 710650633  
700320480 https://github.com/simonw/datasette/issues/978#issuecomment-700320480 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMyMDQ4MA== simonw 9599 2020-09-28T22:39:18Z 2020-09-28T22:39:18Z OWNER
def escape_css_string(s):
    return _css_re.sub(lambda m: "\\" + ("{:X}".format(ord(m.group())).zfill(6)), s)

That fixes it:
https://user-images.githubusercontent.com/9599/94493173-c23b6680-01a0-11eb-9468-e972c51b015c.png">

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700319656 https://github.com/simonw/datasette/issues/978#issuecomment-700319656 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMxOTY1Ng== simonw 9599 2020-09-28T22:36:44Z 2020-09-28T22:36:44Z OWNER

Weirdly even those leading 0s doesn't fix it:

https://user-images.githubusercontent.com/9599/94492937-44775b00-01a0-11eb-9c5f-e991af620404.png">

But... padding to six characters does! See https://www.w3.org/International/questions/qa-escapes

https://user-images.githubusercontent.com/9599/94492988-61139300-01a0-11eb-8304-bffe448c7d2b.png">

In [32]: print('\\' + "{:X}".format(ord('"')).zfill(6))
\000022
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700317760 https://github.com/simonw/datasette/issues/978#issuecomment-700317760 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMxNzc2MA== simonw 9599 2020-09-28T22:30:25Z 2020-09-28T22:30:25Z OWNER
print('\\' + "{:X}".format(ord('"')).zfill(4))
\0022
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700316511 https://github.com/simonw/datasette/issues/978#issuecomment-700316511 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMxNjUxMQ== simonw 9599 2020-09-28T22:26:38Z 2020-09-28T22:26:38Z OWNER

The fix may be to use \0022 instead of \22.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700314509 https://github.com/simonw/datasette/issues/978#issuecomment-700314509 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMxNDUwOQ== simonw 9599 2020-09-28T22:20:51Z 2020-09-28T22:20:51Z OWNER

Here's the HTML for the broken example above:

<style>
@media only screen and (max-width: 576px) {

    .rows-and-columns td:nth-of-type(1):before { content: "dateutil_parse(\2210 october 2020 3pm\22)"; }

    .rows-and-columns td:nth-of-type(2):before { content: "dateutil_easter(\222020\22)"; }

    .rows-and-columns td:nth-of-type(3):before { content: "dateutil_parse_fuzzy(\22This is due 10 september\22)"; }

    .rows-and-columns td:nth-of-type(4):before { content: "dateutil_parse(\221/2/2020\22)"; }

    .rows-and-columns td:nth-of-type(5):before { content: "dateutil_parse(\222020-03-04\22)"; }

    .rows-and-columns td:nth-of-type(6):before { content: "dateutil_parse_dayfirst(\222020-03-04\22)"; }

    .rows-and-columns td:nth-of-type(7):before { content: "dateutil_easter(2020)"; }

}
</style>

The glitch affects the ones where the quote is followed by digits.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700313836 https://github.com/simonw/datasette/issues/978#issuecomment-700313836 https://api.github.com/repos/simonw/datasette/issues/978 MDEyOklzc3VlQ29tbWVudDcwMDMxMzgzNg== simonw 9599 2020-09-28T22:19:05Z 2020-09-28T22:19:05Z OWNER

Looks like a bug in this function: https://github.com/simonw/datasette/blob/1f021c37110fc9019b0ef70062c28c335e568ae2/datasette/utils/__init__.py#L269-L274

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rendering glitch with column headings on mobile 710506708  
700012161 https://github.com/simonw/datasette/pull/977#issuecomment-700012161 https://api.github.com/repos/simonw/datasette/issues/977 MDEyOklzc3VlQ29tbWVudDcwMDAxMjE2MQ== codecov[bot] 22429695 2020-09-28T13:37:44Z 2020-09-28T13:37:44Z NONE

Codecov Report

Merging #977 into main will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #977   +/-   ##
=======================================
  Coverage   84.27%   84.27%           
=======================================
  Files          28       28           
  Lines        3847     3847           
=======================================
  Hits         3242     3242           
  Misses        605      605           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a6d0dc...5c01344. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Update pytest requirement from <6.1.0,>=5.2.2 to >=5.2.2,<6.2.0 710269200  
699762881 https://github.com/simonw/sqlite-utils/issues/181#issuecomment-699762881 https://api.github.com/repos/simonw/sqlite-utils/issues/181 MDEyOklzc3VlQ29tbWVudDY5OTc2Mjg4MQ== simonw 9599 2020-09-28T04:29:23Z 2020-09-28T04:29:23Z OWNER

Relevant code: https://github.com/simonw/sqlite-utils/blob/94fc62857ee2655a21d85f6dae84b67bbfa5956d/sqlite_utils/db.py#L331-L367

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
pk=["id"] should have same effect as pk="id" 709920027  
699718788 https://github.com/simonw/sqlite-utils/issues/180#issuecomment-699718788 https://api.github.com/repos/simonw/sqlite-utils/issues/180 MDEyOklzc3VlQ29tbWVudDY5OTcxODc4OA== simonw 9599 2020-09-28T01:11:45Z 2020-09-28T01:11:45Z OWNER

https://hypothesis.readthedocs.io/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Try running some tests using Hypothesis 709861194  
699690034 https://github.com/simonw/datasette/issues/858#issuecomment-699690034 https://api.github.com/repos/simonw/datasette/issues/858 MDEyOklzc3VlQ29tbWVudDY5OTY5MDAzNA== smithdc1 39445562 2020-09-27T21:23:04Z 2020-09-27T21:23:04Z NONE

Hi Simon,

Thanks so much for all your work on datasette, it's an excellent project and I wish you all the best with it. I particularly enjoyed your talk at the Django London Meetup a short while back.

I've been trying to publish to Heroku from Windows 10 and I was running into this error. I'm not sure why it can't be run without shell=True on Windows but this seems to help. With this change, I am able to publish if I pass in a name to the publish command. When a name is not passed the default of datasette is used and therefore this line here fails (as datasette at heroku already exists) and causes the recession error mentioned above.

https://github.com/simonw/datasette/blob/9a6d0dce282e7fb58c5610e24c74098c923abfdc/datasette/publish/heroku.py#L126

I tried to write a patch for this but I am really struggling with being on Windows (many of the tests seem to fail anyway?), and my lack of knowledge of Mock, so sorry for this. Hope this is of some help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
publish heroku does not work on Windows 10 642388564  
699524671 https://github.com/simonw/sqlite-utils/issues/179#issuecomment-699524671 https://api.github.com/repos/simonw/sqlite-utils/issues/179 MDEyOklzc3VlQ29tbWVudDY5OTUyNDY3MQ== simonw 9599 2020-09-26T17:31:23Z 2020-09-27T20:31:50Z OWNER

SQL query for detecting integers:

select
  'contains_non_integer' as result
from
  mytable
where
  cast(cast(mycolumn AS INTEGER) AS TEXT) != mycolumn
limit
  1

This will return a single row with a 1 as soon as it comes across a column that contains a non-integer - so it short circuits quickly on TEXT columns with non-integers in them.

If everything in the column is an integer it will scan the whole thing before returning no rows.

More extensive demo:

select
  value,
  cast(cast(value AS INTEGER) AS TEXT) = value as is_valid_int
from
  (
    select
      '1' as value
    union
    select
      '1.1' as value
    union
    select
      'dog' as value
    union
    select
      null as value
  )

https://latest.datasette.io/fixtures?sql=select%0D%0A++value%2C%0D%0A++cast%28cast%28value+AS+INTEGER%29+AS+TEXT%29+%3D+value+as+is_valid_int%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++%271%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%271.1%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%27dog%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++null+as+value%0D%0A++%29

<table> <thead> <tr> <th>value</th> <th>is_valid_int</th> </tr> </thead> <tbody> <tr> <td> </td> <td> </td> </tr> <tr> <td>1</td> <td>1</td> </tr> <tr> <td>1.1</td> <td>0</td> </tr> <tr> <td>dog</td> <td>0</td> </tr> </tbody> </table>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform/insert --detect-types 709577625  
699684535 https://github.com/simonw/sqlite-utils/issues/179#issuecomment-699684535 https://api.github.com/repos/simonw/sqlite-utils/issues/179 MDEyOklzc3VlQ29tbWVudDY5OTY4NDUzNQ== simonw 9599 2020-09-27T20:30:31Z 2020-09-27T20:30:31Z OWNER

This recipe looks like it might be the way to detect floats:

select
  value,
  cast(cast(value AS REAL) AS TEXT) in (value, value || '.0') as is_valid_float
from
  (
    select
      '1' as value
    union
    select
      '1.1' as value
    union
    select
      'dog' as value
    union
    select
      null as value
  )

Demo: https://latest.datasette.io/fixtures?sql=select%0D%0A++value%2C%0D%0A++cast%28cast%28value+AS+REAL%29+AS+TEXT%29+in+%28value%2C+value+%7C%7C+%27.0%27%29+as+is_valid_float%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++%271%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%271.1%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++%27dog%27+as+value%0D%0A++++union%0D%0A++++select%0D%0A++++++null+as+value%0D%0A++%29

<table> <thead> <tr> <th>value</th> <th>is_valid_float</th> </tr> </thead> <tbody> <tr> <td> </td> <td> </td> </tr> <tr> <td>1</td> <td>1</td> </tr> <tr> <td>1.1</td> <td>1</td> </tr> <tr> <td>dog</td> <td>0</td> </tr> </tbody> </table>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform/insert --detect-types 709577625  
699526149 https://github.com/simonw/sqlite-utils/issues/179#issuecomment-699526149 https://api.github.com/repos/simonw/sqlite-utils/issues/179 MDEyOklzc3VlQ29tbWVudDY5OTUyNjE0OQ== simonw 9599 2020-09-26T17:43:28Z 2020-09-26T17:43:28Z OWNER

Posed a question about this on the SQLite forum here: https://sqlite.org/forum/forumpost/ab0dcd66ef

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform/insert --detect-types 709577625  
698626768 https://github.com/simonw/sqlite-utils/issues/138#issuecomment-698626768 https://api.github.com/repos/simonw/sqlite-utils/issues/138 MDEyOklzc3VlQ29tbWVudDY5ODYyNjc2OA== simonw 9599 2020-09-24T22:46:56Z 2020-09-24T22:46:56Z OWNER

Yeah this works fine, added a new confirmatory test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
extracts= doesn't configure foreign keys 684118950  
698578959 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698578959 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3ODk1OQ== simonw 9599 2020-09-24T20:44:35Z 2020-09-24T20:50:19Z OWNER

I'm using a click.File() at the moment: https://github.com/simonw/sqlite-utils/blob/5a63b9e88c5887432eb1d7df39f304ea55038437/sqlite_utils/cli.py#L496

I'll need to change that to be something that I can easily measure progress through. Also I should change its name - json_file is a bad name when it sometimes handles csv or tsv instead.

It looks like the argument provided by click.File doesn't provide a way to read the size of the file, so I need to switch that out for a file path instead. https://click.palletsprojects.com/en/7.x/api/#click.Path

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698579389 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698579389 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3OTM4OQ== simonw 9599 2020-09-24T20:45:29Z 2020-09-24T20:45:29Z OWNER

Relevant code: https://github.com/simonw/sqlite-utils/blob/5a63b9e88c5887432eb1d7df39f304ea55038437/sqlite_utils/cli.py#L550-L560

Changing that to track progress through NL-JSON, CSV and TSV shouldn't be too hard.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698577508 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698577508 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3NzUwOA== simonw 9599 2020-09-24T20:41:18Z 2020-09-24T20:41:18Z OWNER

I know how to build this for CSV and TSV - I can read them via a file wrapper that counts how many bytes it has seen.

Not sure how to do it for JSON though. Maybe I could provide it just for newline-delimited JSON? Again I can measure progress based on how many bytes have been read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698575545 https://github.com/simonw/sqlite-utils/issues/119#issuecomment-698575545 https://api.github.com/repos/simonw/sqlite-utils/issues/119 MDEyOklzc3VlQ29tbWVudDY5ODU3NTU0NQ== simonw 9599 2020-09-24T20:36:59Z 2020-09-24T20:36:59Z OWNER

This was implemented in #161.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to remove a foreign key 652700770  
698572493 https://github.com/simonw/sqlite-utils/issues/176#issuecomment-698572493 https://api.github.com/repos/simonw/sqlite-utils/issues/176 MDEyOklzc3VlQ29tbWVudDY5ODU3MjQ5Mw== simonw 9599 2020-09-24T20:30:18Z 2020-09-24T20:30:18Z OWNER

Documentation: https://sqlite-utils.readthedocs.io/en/stable/cli.html#transforming-tables

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform column order option 708293114  
698572264 https://github.com/simonw/sqlite-utils/issues/175#issuecomment-698572264 https://api.github.com/repos/simonw/sqlite-utils/issues/175 MDEyOklzc3VlQ29tbWVudDY5ODU3MjI2NA== simonw 9599 2020-09-24T20:29:48Z 2020-09-24T20:29:48Z OWNER

Documentation: https://sqlite-utils.readthedocs.io/en/stable/python-api.html#transforming-a-table

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add docs for .transform(column_order=) 708261775  
698488971 https://github.com/simonw/datasette/issues/976#issuecomment-698488971 https://api.github.com/repos/simonw/datasette/issues/976 MDEyOklzc3VlQ29tbWVudDY5ODQ4ODk3MQ== simonw 9599 2020-09-24T17:42:09Z 2020-09-24T17:42:35Z OWNER

This is complex enough new logic that it will need test coverage - specifically covering tables or databases with strange names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: -o could open to a more convenient location 708289783  
698444567 https://github.com/simonw/sqlite-utils/issues/177#issuecomment-698444567 https://api.github.com/repos/simonw/sqlite-utils/issues/177 MDEyOklzc3VlQ29tbWVudDY5ODQ0NDU2Nw== simonw 9599 2020-09-24T16:14:47Z 2020-09-24T16:14:47Z OWNER

This is a backwards incompatible change, so technically I should bump the major version to 3. I'm not going to do that, because the feature is brand new and the chance that anyone has written code or shell scripts that use it is vanishingly small.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Simplify .transform(drop_foreign_keys=) and sqlite-transform --drop-foreign-key 708301810  
698438043 https://github.com/simonw/sqlite-utils/issues/176#issuecomment-698438043 https://api.github.com/repos/simonw/sqlite-utils/issues/176 MDEyOklzc3VlQ29tbWVudDY5ODQzODA0Mw== simonw 9599 2020-09-24T16:02:55Z 2020-09-24T16:02:55Z OWNER

I think I'll call this option --column-order with a shortcut of -o.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform column order option 708293114  
698434811 https://github.com/simonw/sqlite-utils/issues/175#issuecomment-698434811 https://api.github.com/repos/simonw/sqlite-utils/issues/175 MDEyOklzc3VlQ29tbWVudDY5ODQzNDgxMQ== simonw 9599 2020-09-24T15:57:17Z 2020-09-24T15:57:17Z OWNER

Landed that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add docs for .transform(column_order=) 708261775  
698434236 https://github.com/simonw/datasette/issues/970#issuecomment-698434236 https://api.github.com/repos/simonw/datasette/issues/970 MDEyOklzc3VlQ29tbWVudDY5ODQzNDIzNg== simonw 9599 2020-09-24T15:56:18Z 2020-09-24T15:56:50Z OWNER

Idea: if a database only has a single table, this could open straight to /db/table. If it has multiple tables but a single database it could open straight to /db.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
request an "-o" option on "datasette server" to open the default browser at the running url 705108492  
698412692 https://github.com/simonw/sqlite-utils/issues/175#issuecomment-698412692 https://api.github.com/repos/simonw/sqlite-utils/issues/175 MDEyOklzc3VlQ29tbWVudDY5ODQxMjY5Mg== simonw 9599 2020-09-24T15:19:28Z 2020-09-24T15:19:28Z OWNER

Need to land #174 first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add docs for .transform(column_order=) 708261775  
698400790 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698400790 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODQwMDc5MA== simonw 9599 2020-09-24T14:59:50Z 2020-09-24T14:59:50Z OWNER

For reusing the lookup table: I'm going to raise an error if a lookup table exists but without the correct columns. The caller can then add those columns and try again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698184166 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698184166 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4NDE2Ng== simonw 9599 2020-09-24T08:01:07Z 2020-09-24T08:01:07Z OWNER

I may revert the now unnecessary undocumented tweaks to the .update() method made in 66d506587eba9f0715267d6560b97c1fa44cc781 as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698182656 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698182656 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4MjY1Ng== simonw 9599 2020-09-24T07:58:08Z 2020-09-24T07:58:08Z OWNER

The way the lookup table works here differs from the previous implementation. In the previous implementation the usage of .lookup() meant that an existing table would be modified to fit the new purpose. That no longer happens in this version. Need to make a design decision about how this should work.

It should definitely be possible to use an existing lookup table - imagine a database where several tables have a "Departments" column and we want to extract all of those values out to a single shared "Departments" table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698182037 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698182037 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4MjAzNw== simonw 9599 2020-09-24T07:56:50Z 2020-09-24T07:56:50Z OWNER

I could also be a bit smarter about transaction handling. I think it may be possible to run this entire operation in a single transaction now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698181478 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698181478 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4MTQ3OA== simonw 9599 2020-09-24T07:55:45Z 2020-09-24T07:55:45Z OWNER

import functools is no longer needed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698180705 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698180705 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4MDcwNQ== simonw 9599 2020-09-24T07:54:10Z 2020-09-24T07:54:10Z OWNER

After running through the steps in https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/ I get a table that looks like this:

https://user-images.githubusercontent.com/9599/94116875-666b8900-fe00-11ea-9e97-2b9ccbfeae29.png">

The foreign key columns are all at the end of the table. It would be nicer if they were arranged in the same order as the columns they replaced.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698180113 https://github.com/simonw/sqlite-utils/pull/174#issuecomment-698180113 https://api.github.com/repos/simonw/sqlite-utils/issues/174 MDEyOklzc3VlQ29tbWVudDY5ODE4MDExMw== simonw 9599 2020-09-24T07:53:03Z 2020-09-24T07:53:03Z OWNER

This could do with a little bit more testing - I'm worried there may be column or table name edge cases that are not covered yet. I also need to remove the progress bar code since that no longer makes sense for this implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Much, much faster extract() implementation 707944044  
698178101 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-698178101 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5ODE3ODEwMQ== simonw 9599 2020-09-24T07:48:57Z 2020-09-24T07:49:20Z OWNER

I wonder if I could make this faster by separating it out into a few steps:

* Create the new lookup table with all of the distinct rows

* Add the blank foreign key column

* run a `UPDATE table SET blah_id = (select id from lookup where thang = table.thang)`

* Drop the value columns

My prototype of this knocked the time down from 10 minutes to 4 seconds, so I think the change is worth it!

% date
sqlite-utils extract salaries.db salaries \
   'Department Code' 'Department' \
  --table 'departments' \
  --fk-column 'department_id' \
  --rename 'Department Code' code \
  --rename 'Department' name
date
sqlite-utils extract salaries.db salaries \
   'Union Code' 'Union' \
  --table 'unions' \
  --fk-column 'union_id' \
  --rename 'Union Code' code \
  --rename 'Union' name
date
sqlite-utils extract salaries.db salaries \
   'Job Family Code' 'Job Family' \
  --table 'job_families' \
  --fk-column 'job_family_id' \
  --rename 'Job Family Code' code \
  --rename 'Job Family' name
date
sqlite-utils extract salaries.db salaries \
   'Job Code' 'Job' \
  --table 'jobs' \
  --fk-column 'job_id' \
  --rename 'Job Code' code \
  --rename 'Job' name
date
Thu Sep 24 00:48:16 PDT 2020

Thu Sep 24 00:48:20 PDT 2020

Thu Sep 24 00:48:24 PDT 2020

Thu Sep 24 00:48:28 PDT 2020

Thu Sep 24 00:48:32 PDT 2020
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
698174957 https://github.com/simonw/datasette/issues/123#issuecomment-698174957 https://api.github.com/repos/simonw/datasette/issues/123 MDEyOklzc3VlQ29tbWVudDY5ODE3NDk1Nw== obra 45416 2020-09-24T07:42:05Z 2020-09-24T07:42:05Z NONE

Oh. Awesome.

On Thu, Sep 24, 2020 at 12:28:53AM -0700, Simon Willison wrote:

@obra there's a plugin for that! https://github.com/simonw/
datasette-upload-csvs

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.*

--

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette serve should accept paths/URLs to CSVs and other file formats 275125561  
698168648 https://github.com/simonw/datasette/issues/123#issuecomment-698168648 https://api.github.com/repos/simonw/datasette/issues/123 MDEyOklzc3VlQ29tbWVudDY5ODE2ODY0OA== simonw 9599 2020-09-24T07:28:38Z 2020-09-24T07:28:38Z OWNER

@obra there's a plugin for that! https://github.com/simonw/datasette-upload-csvs

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette serve should accept paths/URLs to CSVs and other file formats 275125561  
698110492 https://github.com/simonw/datasette/issues/974#issuecomment-698110492 https://api.github.com/repos/simonw/datasette/issues/974 MDEyOklzc3VlQ29tbWVudDY5ODExMDQ5Mg== simonw 9599 2020-09-24T04:50:56Z 2020-09-24T04:51:05Z OWNER

Come to think of it I've noticed that in the logs when it's running on my laptop, definitely worth fixing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
static assets and favicon aren't cached by the browser 707849175  
698110186 https://github.com/simonw/datasette/issues/123#issuecomment-698110186 https://api.github.com/repos/simonw/datasette/issues/123 MDEyOklzc3VlQ29tbWVudDY5ODExMDE4Ng== obra 45416 2020-09-24T04:49:51Z 2020-09-24T04:49:51Z NONE

As a half-measure, I'd get value out of being able to upload a CSV and have datasette run csv-to-sqlite on it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette serve should accept paths/URLs to CSVs and other file formats 275125561  
698024773 https://github.com/simonw/datasette/issues/619#issuecomment-698024773 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5ODAyNDc3Mw== simonw 9599 2020-09-23T23:31:46Z 2020-09-23T23:31:46Z OWNER

I'm going to have to untangle Datasette's error handling a bit for this - currently the expectation is that exceptions will be handled at a higher level, but I need to rethink that to make it cleaner for views like the "execute custom SQL" view to add their own error handling (and still be able to return the correct HTTP status codes, even with custom pages).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697998045 https://github.com/simonw/datasette/issues/619#issuecomment-697998045 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5Nzk5ODA0NQ== simonw 9599 2020-09-23T22:09:06Z 2020-09-23T22:09:06Z OWNER

I'll add this to the succesful JSON format:

{
  "ok": true,
  "error": null
}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697995885 https://github.com/simonw/datasette/issues/619#issuecomment-697995885 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5Nzk5NTg4NQ== simonw 9599 2020-09-23T22:02:44Z 2020-09-23T22:08:28Z OWNER

So the JSON (still served with a 500 code) will look something like this:

{
  "ok": false,
  "status": 500,
  "database": "fixtures",
  "query_name": null,
  "rows": [],
  "truncated": false,
  "error": "Error message goes here",
  "columns": [],
  "query": {
    "sql": "the query that broke goes here",
    "params": {}
  },
  "private": false,
  "allow_execute_sql": true,
  "query_ms": 0.8716583251953125,
  "source": "tests/fixtures.py",
  "source_url": "https://github.com/simonw/datasette/blob/master/tests/fixtures.py",
  "license": "Apache License 2.0",
  "license_url": "https://github.com/simonw/datasette/blob/master/LICENSE"
}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697995303 https://github.com/simonw/datasette/issues/619#issuecomment-697995303 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5Nzk5NTMwMw== simonw 9599 2020-09-23T22:01:08Z 2020-09-23T22:01:08Z OWNER

This is a little tricky to solve, because of the location of the form and the need to return JSON as well as HTML. It would be weird if a JSON request came in and got back the standard output from https://latest.datasette.io/fixtures.json when they were expecting to get back JSON in the shape of https://latest.datasette.io/fixtures.json?sql=select%20*%20from%20sqlite_master

I'm going to return the HTML view that you would get for 0 results for a query - https://latest.datasette.io/fixtures?sql=select%201%20limit%200 - but with an error message.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697980061 https://github.com/simonw/datasette/issues/619#issuecomment-697980061 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5Nzk4MDA2MQ== simonw 9599 2020-09-23T21:22:42Z 2020-09-23T21:22:42Z OWNER

Yeah that sucks. Bumping this up the priority list.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697973420 https://github.com/simonw/datasette/issues/619#issuecomment-697973420 https://api.github.com/repos/simonw/datasette/issues/619 MDEyOklzc3VlQ29tbWVudDY5Nzk3MzQyMA== obra 45416 2020-09-23T21:07:58Z 2020-09-23T21:07:58Z NONE

I've just run into this after crafting a complex query and discovered that hitting back loses my query.

Even showing me the whole bad query would be a huge improvement over the current status quo.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Invalid SQL" page should let you edit the SQL 520655983  
697869886 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697869886 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5Nzg2OTg4Ng== simonw 9599 2020-09-23T18:45:30Z 2020-09-23T18:45:30Z OWNER

There's something to be said for making this operation pausable and resumable, especially if I'm going to make it available in a Datasette plugin at some point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697866885 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697866885 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5Nzg2Njg4NQ== simonw 9599 2020-09-23T18:43:37Z 2020-09-23T18:43:37Z OWNER

Also what would happen if the table had new rows added to it while that command was running?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697863116 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697863116 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5Nzg2MzExNg== simonw 9599 2020-09-23T18:41:06Z 2020-09-23T18:41:06Z OWNER

Problem with this approach is it's not compatible with progress bars - but if it's a multiple of times faster it's worth it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697859772 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697859772 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5Nzg1OTc3Mg== simonw 9599 2020-09-23T18:38:43Z 2020-09-23T18:38:52Z OWNER

I wonder if I could make this faster by separating it out into a few steps:
- Create the new lookup table with all of the distinct rows
- Add the blank foreign key column
- run a UPDATE table SET blah_id = (select id from lookup where thang = table.thang)
- Drop the value columns

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697835956 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697835956 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5NzgzNTk1Ng== simonw 9599 2020-09-23T18:22:49Z 2020-09-23T18:22:49Z OWNER

I ran sudo py-spy top -p 123 against the process while it was running and the most time is definitely spent in .update():

Total Samples 1000
GIL: 0.00%, Active: 90.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                  
 38.00%  38.00%    3.85s     3.85s   update (sqlite_utils/db.py:1283)
 27.00%  27.00%    2.12s     2.12s   execute (sqlite_utils/db.py:161)
 10.00%  10.00%   0.890s    0.890s   execute (sqlite_utils/db.py:163)
 10.00%  17.00%   0.870s     1.54s   columns (sqlite_utils/db.py:553)
  0.00%   0.00%   0.110s    0.210s   <listcomp> (sqlite_utils/db.py:554)
  0.00%   3.00%   0.100s    0.320s   table_names (sqlite_utils/db.py:191)
  0.00%   0.00%   0.100s    0.100s   __new__ (<string>:1)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697577646 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-697577646 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5NzU3NzY0Ng== simonw 9599 2020-09-23T15:48:51Z 2020-09-23T15:48:51Z OWNER

This can only work when it's reading from a file, not when it's reading from standard input.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
697545290 https://github.com/simonw/datasette/issues/111#issuecomment-697545290 https://api.github.com/repos/simonw/datasette/issues/111 MDEyOklzc3VlQ29tbWVudDY5NzU0NTI5MA== simonw 9599 2020-09-23T15:29:11Z 2020-09-23T15:29:11Z OWNER

This is still a good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add “last_updated” to metadata 274615452  
697473247 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697473247 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5NzQ3MzI0Nw== simonw 9599 2020-09-23T14:45:13Z 2020-09-23T14:45:13Z OWNER

lookup_table.lookup(lookups) is doing a SQL lookup. This could be cached in-memory, maybe with a LRU cache, to avoid looking up the primary key for records that we have recently used.

The .update() method it is calling first does a get() and then does a SQL UPDATE ... WHERE:

https://github.com/simonw/sqlite-utils/blob/1ebffe1dbeaed7311e5b61ed988f4cd701e84808/sqlite_utils/db.py#L1244-L1264

Batching those updates may have an effect. Or finding a way to skip the .get() since we already know we have a valid record.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697467833 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697467833 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5NzQ2NzgzMw== simonw 9599 2020-09-23T14:42:03Z 2020-09-23T14:42:03Z OWNER

Here's the loop that's taking the time: https://github.com/simonw/sqlite-utils/blob/1ebffe1dbeaed7311e5b61ed988f4cd701e84808/sqlite_utils/db.py#L892-L897

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697466497 https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697466497 https://api.github.com/repos/simonw/sqlite-utils/issues/172 MDEyOklzc3VlQ29tbWVudDY5NzQ2NjQ5Nw== simonw 9599 2020-09-23T14:41:17Z 2020-09-23T14:41:17Z OWNER

Steps to produce that database:

curl -o salaries.csv 'https://data.sfgov.org/api/views/88g8-5mnd/rows.csv?accessType=DOWNLOAD'
sqlite-utils insert salaries.db salaries salaries.csv --csv
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improve performance of extract operations 707427200  
697073465 https://github.com/simonw/datasette/issues/970#issuecomment-697073465 https://api.github.com/repos/simonw/datasette/issues/970 MDEyOklzc3VlQ29tbWVudDY5NzA3MzQ2NQ== secretGeek 2861690 2020-09-23T01:49:05Z 2020-09-23T01:49:05Z NONE

Oh wow oh wow. Thanks so much Simon. In an astoundingly rough week, this is a shining jewel. 🤣

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
request an "-o" option on "datasette server" to open the default browser at the running url 705108492  
697047591 https://github.com/simonw/sqlite-utils/issues/170#issuecomment-697047591 https://api.github.com/repos/simonw/sqlite-utils/issues/170 MDEyOklzc3VlQ29tbWVudDY5NzA0NzU5MQ== simonw 9599 2020-09-23T00:14:52Z 2020-09-23T00:14:52Z OWNER

@simonw
@db.register_function decorator, closes #162
4824775
@simonw
table.transform() method - closes #114
987dd12
@simonw
Keyword only arguments for transform()
f8e10df

Also renamed columns= to types=

Closes #165

Commits on Sep 22, 2020
@simonw
Implemented sqlite-utils transform command, closes #164
752d261
@simonw
Applied Black
f29f682
@simonw
table.extract() method, refs #42
f855379
@simonw
Docstring for sqlite-utils transform
c755f28
@simonw
Added table.extract(rename=) option, refs #42
c3210f2
@simonw
Applied Black
317071a
@simonw
New .rows_where(select=) argument
7178231
@simonw
table.extract() now works with rowid tables, refs #42
2db6c5b
@simonw
sqlite-utils extract, closes #42
55cf928
@simonw
Progress bar for "sqlite-utils extract", closes #169
5c4d58d
@simonw
Fixed PRAGMA foreign_keys handling for .transform, closes #167

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Release notes for 2.20 706768798  
697037974 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697037974 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAzNzk3NA== simonw 9599 2020-09-22T23:39:31Z 2020-09-22T23:39:31Z OWNER

Documentation for sqlite-utils extract: https://sqlite-utils.readthedocs.io/en/latest/cli.html#extracting-columns-into-a-separate-table

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697031174 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697031174 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAzMTE3NA== simonw 9599 2020-09-22T23:16:00Z 2020-09-22T23:16:00Z OWNER

Trying this demo again:

wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv
sqlite-utils extract global.db power_plants country country_long --table countries --rename country_long name

It worked!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697025403 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697025403 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAyNTQwMw== simonw 9599 2020-09-22T22:57:53Z 2020-09-22T22:57:53Z OWNER

The documentation for the .extract() method is here: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#extracting-columns-into-a-separate-table

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697019944 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697019944 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxOTk0NA== simonw 9599 2020-09-22T22:40:00Z 2020-09-22T22:40:00Z OWNER

I tried out the prototype of the CLI on the Global Power Plants data:

wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv
sqlite-utils extract global.db power_plants country country_long

This threw an error because rowid columns are not yet supported. I fixed that like so:

sqlite-utils transform global.db power_plants --rename rowid id
sqlite-utils extract global.db power_plants country country_long

That worked! But it didn't play great with Datasette, because the resulting extracted table had columns country and country_long and neither of those are called name or value or title.

Based on this I need to add rowid table support AND I need to implement the proposed rename= argument for renaming columns on their way into the new table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697013681 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697013681 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxMzY4MQ== simonw 9599 2020-09-22T22:22:49Z 2020-09-22T22:22:49Z OWNER

The command-line version of this needs to accept a table and one or more columns, then a --table and --fk-column option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697012111 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697012111 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxMjExMQ== simonw 9599 2020-09-22T22:18:13Z 2020-09-22T22:18:13Z OWNER

Here's how I'm generating the examples for the documentation:

In [2]: import sqlite_utils

In [3]: db = sqlite_utils.Database(memory=True)

In [4]: db["Trees"].insert({"id": 1, "TreeAddress": "52 Vine St", "CommonName":
   ...: "Palm", "LatinName": "foo"}, pk="id")
Out[4]: <Table Trees (id, TreeAddress, CommonName, LatinName)>

In [5]: db["Trees"].extract(["CommonName", "LatinName"], table="Species", fk_col
   ...: umn="species_id")

In [6]: print(db["Trees"].schema)
CREATE TABLE "Trees" (
   [id] INTEGER PRIMARY KEY,
   [TreeAddress] TEXT,
   [species_id] INTEGER,
   FOREIGN KEY(species_id) REFERENCES Species(id)
)

In [7]: print(db["Species"].schema)
CREATE TABLE [Species] (
   [id] INTEGER PRIMARY KEY,
   [CommonName] TEXT,
   [LatinName] TEXT
)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696987925 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987925 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4NzkyNQ== simonw 9599 2020-09-22T21:19:04Z 2020-09-22T21:19:04Z OWNER

Need to make sure this works correctly for rowid tables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696987257 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987257 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4NzI1Nw== simonw 9599 2020-09-22T21:17:34Z 2020-09-22T21:17:34Z OWNER

What to do if the table already exists? The .lookup() function already knows how to modify an existing table to create the correct constraints etc, so I'll rely on that mechanism.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696980709 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980709 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4MDcwOQ== simonw 9599 2020-09-22T21:05:07Z 2020-09-22T21:05:07Z OWNER

So .extract() probably takes a batch_size= argument too, which defaults to maybe 1000.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696980503 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980503 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4MDUwMw== simonw 9599 2020-09-22T21:04:45Z 2020-09-22T21:04:45Z OWNER

table.extract() can take an optional progress= argument which is a callback which will be used to report progress - called after each batch with (num_done, total). It will get called with (0, total) once at the start to allow progress bars to be initialized. The command-line progress bar will use this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696979626 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979626 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3OTYyNg== simonw 9599 2020-09-22T21:03:11Z 2020-09-22T21:03:11Z OWNER

And if you want to rename some of the columns in the new table:

db["trees"].extract(["common_name", "latin_name"], table="species", rename={"common_name": "name"})
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696979168 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979168 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3OTE2OA== simonw 9599 2020-09-22T21:02:24Z 2020-09-22T21:02:24Z OWNER

In Python it looks like this:

# Simple case - species column species_id pointing to species table
db["trees"].extract("species")

# Setting a custom table
db["trees"].extract("species", table="Species")

# Custom foreign key column on trees
db["trees"].extract("species", fk_column="species")

# Extracting multiple columns
db["trees"].extract(["common_name", "latin_name"])
# (this creates a lookup table called common_name_latin_name ref'd by common_name_latin_name_id)

# Or with explicit table (fk_column here defaults to species_id because of the table name)
db["trees"].extract(["common_name", "latin_name"], table="species")
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696976678 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696976678 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3NjY3OA== simonw 9599 2020-09-22T20:57:57Z 2020-09-22T20:57:57Z OWNER

I think I understand the shape of this feature now. It lets you specify one or more columns on the source table which will be extracted into another table. It uses the .lookup() mechanism to populate that other table, which means each unique column value / pair / triple will be assigned an integer ID.

That integer ID gets written back into the first of the columns that are being transformed. A .transform() call then converts that column to an integer (and drops the additional columns). Finally we set up the new foreign key relationship.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696893774 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893774 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njg5Mzc3NA== simonw 9599 2020-09-22T18:15:33Z 2020-09-22T18:15:33Z OWNER

I think the new foreign key column is called company_name_id by default in this example but can be customized by passing --fk-column=xxx

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696893244 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893244 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njg5MzI0NA== simonw 9599 2020-09-22T18:14:33Z 2020-09-22T18:14:45Z OWNER

Thinking more about this one:

$ sqlite-utils extract my.db \
    dea_sales company_name company_address \
    --table companies

The goal here is to pull the company name and address pair out into a separate table.

Some questions:
- should this first verify that every company_name has just one company_address? I like the idea of a unique constraint on the created table for this.
- what should the foreign key column that gets added to the companies table be called?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
513262013 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw== simonw 9599 2019-07-19T14:58:23Z 2020-09-22T18:12:11Z OWNER

CLI design idea:

$ sqlite-utils extract my.db \
    dea_sales company_name

Here we just specify the original table and column - the new extracted table will automatically be called "company_name" and will have "id" and "value" columns, by default.

To set a custom extract table:

$ sqlite-utils extract my.db \
    dea_sales company_name \
    --table companies

And for extracting multiple columns and renaming them on the created table, maybe something like this:

$ sqlite-utils extract my.db \
    dea_sales company_name company_address \
    --table companies \
    --column company_name name \
    --column company_address address
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696800410 https://github.com/simonw/datasette/issues/973#issuecomment-696800410 https://api.github.com/repos/simonw/datasette/issues/973 MDEyOklzc3VlQ29tbWVudDY5NjgwMDQxMA== simonw 9599 2020-09-22T15:35:28Z 2020-09-22T15:35:28Z OWNER

Confirmed in local dev:

% datasette fixtures.db --inspect-file inspect.json
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/bin/datasette", line 11, in <module>
    load_entry_point('datasette', 'console_scripts', 'datasette')()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/simon/Dropbox/Development/datasette/datasette/cli.py", line 406, in serve
    inspect_data = json.load(open(inspect_file))
TypeError: 'bool' object is not callable
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
'bool' object is not callable error 706486323  
696798114 https://github.com/simonw/datasette/issues/973#issuecomment-696798114 https://api.github.com/repos/simonw/datasette/issues/973 MDEyOklzc3VlQ29tbWVudDY5Njc5ODExNA== simonw 9599 2020-09-22T15:31:25Z 2020-09-22T15:31:25Z OWNER

D'oh because I have a new variable called open.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
'bool' object is not callable error 706486323  
696788109 https://github.com/simonw/datasette/issues/969#issuecomment-696788109 https://api.github.com/repos/simonw/datasette/issues/969 MDEyOklzc3VlQ29tbWVudDY5Njc4ODEwOQ== simonw 9599 2020-09-22T15:15:14Z 2020-09-22T15:15:14Z OWNER

I don't think a standard "pass these extra arguments to the publish tool" mechanism will work because there's no guarantee that a publisher uses a CLI tool - or if it does, it might make several calls to different CLI tools. The Cloud Run one runs a couple of commands, as illustrated by this test:

https://github.com/simonw/datasette/blob/a648bb82bac201c7658f6fdb499ff8ac17ebd2e8/tests/test_publish_cloudrun.py#L63-L73

Adding a --tar option for datasette publish heroku is a good fix for this though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Passing additional flags to tools used during publishing 705057955  
696778735 https://github.com/simonw/datasette/issues/943#issuecomment-696778735 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc3ODczNQ== simonw 9599 2020-09-22T15:00:13Z 2020-09-22T15:00:39Z OWNER

Am I going to rewrite ALL of my tests to use this instead? It would clean up a lot of test code, at the cost of quite a bit of work.

It would make for much neater plugin tests too, and neater testing documentation: https://docs.datasette.io/en/stable/testing_plugins.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696777886 https://github.com/simonw/datasette/issues/943#issuecomment-696777886 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc3Nzg4Ng== simonw 9599 2020-09-22T14:58:54Z 2020-09-22T14:58:54Z OWNER
class DatasetteClient:
    def __init__(self, ds):
        self._client = httpx.AsyncClient(app=ds.app())

    def _fix(self, path):
        if path.startswith("/"):
            path = "http://localhost{}".format(path)
        return path

    async def get(self, path, **kwargs):
        return await self._client.get(self._fix(path), **kwargs)

    async def options(self, path, **kwargs):
        return await self._client.options(self._fix(path), **kwargs)

    async def head(self, path, **kwargs):
        return await self._client.head(self._fix(path), **kwargs)

    async def post(self, path, **kwargs):
        return await self._client.post(self._fix(path), **kwargs)

    async def put(self, path, **kwargs):
        return await self._client.put(self._fix(path), **kwargs)

    async def patch(self, path, **kwargs):
        return await self._client.patch(self._fix(path), **kwargs)

    async def delete(self, path, **kwargs):
        return await self._client.delete(self._fix(path), **kwargs)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696776828 https://github.com/simonw/datasette/issues/943#issuecomment-696776828 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc3NjgyOA== simonw 9599 2020-09-22T14:57:13Z 2020-09-22T14:57:13Z OWNER

I may as well implement all of the HTTP methods supported by the httpx client:

  • get
  • options
  • head
  • post
  • put
  • patch
  • delete
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696775516 https://github.com/simonw/datasette/issues/943#issuecomment-696775516 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc3NTUxNg== simonw 9599 2020-09-22T14:55:10Z 2020-09-22T14:55:10Z OWNER

Even smaller DatasetteClient implementation:

class DatasetteClient:
    def __init__(self, ds):
        self._client = httpx.AsyncClient(app=ds.app())

    def _fix(self, path):
        if path.startswith("/"):
            path = "http://localhost{}".format(path)
        return path

    async def get(self, path, **kwargs):
        return await self._client.get(self._fix(path), **kwargs)

    async def post(self, path, **kwargs):
        return await self._client.post(self._fix(path), **kwargs)

    async def options(self, path, **kwargs):
        return await self._client.options(self._fix(path), **kwargs)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696774711 https://github.com/simonw/datasette/issues/943#issuecomment-696774711 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc3NDcxMQ== simonw 9599 2020-09-22T14:53:56Z 2020-09-22T14:53:56Z OWNER

How important is it to use httpx.AsyncClient with a context manager?

https://www.python-httpx.org/async/#opening-and-closing-clients says:

Alternatively, use await client.aclose() if you want to close a client explicitly:

client = httpx.AsyncClient() ... await client.aclose()
The .aclose() method has a comment saying "Close transport and proxies" - I'm not using proxies, so the relevant implementation seems to be a call to await self._transport.aclose() in https://github.com/encode/httpx/blob/f932af9172d15a803ad40061a4c2c0cd891645cf/httpx/_client.py#L1741-L1751

The transport I am using is a class called ASGITransport in https://github.com/encode/httpx/blob/master/httpx/_transports/asgi.py

The aclose() method on that class does nothing. So it looks like I can instantiate a client without bothering with the async with httpx.AsyncClient bit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696769853 https://github.com/simonw/datasette/issues/943#issuecomment-696769853 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc2OTg1Mw== simonw 9599 2020-09-22T14:46:21Z 2020-09-22T14:46:21Z OWNER

This adds httpx as a dependency - I think I'm OK with that. I use it for testing in all of my plugins anyway.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696769501 https://github.com/simonw/datasette/issues/943#issuecomment-696769501 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5Njc2OTUwMQ== simonw 9599 2020-09-22T14:45:49Z 2020-09-22T14:45:49Z OWNER

I put together a minimal prototype of this and it feels pretty good:

diff --git a/datasette/app.py b/datasette/app.py
index 20aae7d..fb3bdad 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -4,6 +4,7 @@ import collections
 import datetime
 import glob
 import hashlib
+import httpx
 import inspect
 import itertools
 from itsdangerous import BadSignature
@@ -312,6 +313,7 @@ class Datasette:
         self._register_renderers()
         self._permission_checks = collections.deque(maxlen=200)
         self._root_token = secrets.token_hex(32)
+        self.client = DatasetteClient(self)

     async def invoke_startup(self):
         for hook in pm.hook.startup(datasette=self):
@@ -1209,3 +1211,25 @@ def route_pattern_from_filepath(filepath):

 class NotFoundExplicit(NotFound):
     pass
+
+
+class DatasetteClient:
+    def __init__(self, ds):
+        self.app = ds.app()
+
+    def _fix(self, path):
+        if path.startswith("/"):
+            path = "http://localhost{}".format(path)
+        return path
+
+    async def get(self, path, **kwargs):
+        async with httpx.AsyncClient(app=self.app) as client:
+            return await client.get(self._fix(path), **kwargs)
+
+    async def post(self, path, **kwargs):
+        async with httpx.AsyncClient(app=self.app) as client:
+            return await client.post(self._fix(path), **kwargs)
+
+    async def options(self, path, **kwargs):
+        async with httpx.AsyncClient(app=self.app) as client:
+            return await client.options(self._fix(path), **kwargs)

Used like this in ipython:

In [1]: from datasette.app import Datasette

In [2]: ds = Datasette(["fixtures.db"])

In [3]: (await ds.client.get("/-/config.json")).json()
Out[3]: 
{'default_page_size': 100,
 'max_returned_rows': 1000,
 'num_sql_threads': 3,
 'sql_time_limit_ms': 1000,
 'default_facet_size': 30,
 'facet_time_limit_ms': 200,
 'facet_suggest_time_limit_ms': 50,
 'hash_urls': False,
 'allow_facet': True,
 'allow_download': True,
 'suggest_facets': True,
 'default_cache_ttl': 5,
 'default_cache_ttl_hashed': 31536000,
 'cache_size_kb': 0,
 'allow_csv_stream': True,
 'max_csv_mb': 100,
 'truncate_cells_html': 2048,
 'force_https_urls': False,
 'template_debug': False,
 'base_url': '/'}

In [4]: (await ds.client.get("/fixtures/facetable.json?_shape=array")).json()
Out[4]: 
[{'pk': 1,
  'created': '2019-01-14 08:00:00',
  'planet_int': 1,
  'on_earth': 1,
  'state': 'CA',
  'city_id': 1,
  'neighborhood': 'Mission',
  'tags': '["tag1", "tag2"]',
  'complex_array': '[{"foo": "bar"}]',
  'distinct_some_null': 'one'},
 {'pk': 2,
  'created': '2019-01-14 08:00:00',
  'planet_int': 1,
  'on_earth': 1,
  'state': 'CA',
  'city_id': 1,
  'neighborhood': 'Dogpatch',
  'tags': '["tag1", "tag3"]',
  'complex_array': '[]',
  'distinct_some_null': 'two'},
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
693009048 https://github.com/simonw/datasette/issues/943#issuecomment-693009048 https://api.github.com/repos/simonw/datasette/issues/943 MDEyOklzc3VlQ29tbWVudDY5MzAwOTA0OA== simonw 9599 2020-09-15T22:17:30Z 2020-09-22T14:37:00Z OWNER

Maybe instead of implementing datasette.get() and datasette.post() and datasette.request() and datasette.stream() I could instead have a nested object called datasette.client which is a preconfigured AsyncClient instance.

response = await datasette.client.get("/")

Or perhaps this should be a method in case I ever need to be able to await it:

response = await (await datasette.client()).get("/")

This is a bit cosmetically ugly though, I'd rather avoid that if possible.

Maybe I could get this working by returning an object from .client() which provides a await obj.get() method:

response = await datasette.client().get("/")

I don't think there's any benefit to that over await datasette.client.get() though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
await datasette.client.get(path) mechanism for executing internal requests 681375466  
696573944 https://github.com/simonw/sqlite-utils/issues/168#issuecomment-696573944 https://api.github.com/repos/simonw/sqlite-utils/issues/168 MDEyOklzc3VlQ29tbWVudDY5NjU3Mzk0NA== simonw 9599 2020-09-22T08:11:30Z 2020-09-22T08:11:30Z OWNER

Huh... maybe I don't need to do anything here? It looks like it's been kept up to date: https://github.com/Homebrew/homebrew-core/commits/master/Formula/sqlite-utils.rb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Automate (as much as possible) updates published to Homebrew 706167456  
696567988 https://github.com/simonw/sqlite-utils/issues/164#issuecomment-696567988 https://api.github.com/repos/simonw/sqlite-utils/issues/164 MDEyOklzc3VlQ29tbWVudDY5NjU2Nzk4OA== simonw 9599 2020-09-22T07:57:50Z 2020-09-22T07:57:50Z OWNER

Documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#transforming-tables

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform sub-command 706017416  
696567460 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696567460 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NjU2NzQ2MA== simonw 9599 2020-09-22T07:56:42Z 2020-09-22T07:56:42Z OWNER

.transform() has landed now which should make this a lot easier to solve.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696566750 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-696566750 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDY5NjU2Njc1MA== simonw 9599 2020-09-22T07:55:00Z 2020-09-22T07:55:00Z OWNER

Problem: extract means something else now, see #47 and the upcoming work in #42.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
696565981 https://github.com/simonw/sqlite-utils/issues/167#issuecomment-696565981 https://api.github.com/repos/simonw/sqlite-utils/issues/167 MDEyOklzc3VlQ29tbWVudDY5NjU2NTk4MQ== simonw 9599 2020-09-22T07:53:13Z 2020-09-22T07:53:13Z OWNER

Confirmed this is a bug, https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_changes explicitly says you should do the PRAGMA foreign_keys bits before and after the transaction, not during.

Right now my code does this INSIDE the transaction: https://github.com/simonw/sqlite-utils/blob/f29f6821f2d08e91c5c6d65d885a1bbc0c743bdd/sqlite_utils/db.py#L790-L793

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Review the foreign key pragma stuff 706098005  
696520928 https://github.com/simonw/sqlite-utils/issues/164#issuecomment-696520928 https://api.github.com/repos/simonw/sqlite-utils/issues/164 MDEyOklzc3VlQ29tbWVudDY5NjUyMDkyOA== simonw 9599 2020-09-22T05:50:17Z 2020-09-22T05:50:17Z OWNER

Idea for CLI options:

--type age integer
--drop colname
--rename oldname newname
--not-null col
--not-null-false col
--pk new_id
--pk-none
--default col value
--default-none column
--drop-foreign-key col other_table other_column
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform sub-command 706017416  
696500922 https://github.com/simonw/sqlite-utils/issues/164#issuecomment-696500922 https://api.github.com/repos/simonw/sqlite-utils/issues/164 MDEyOklzc3VlQ29tbWVudDY5NjUwMDkyMg== simonw 9599 2020-09-22T04:22:40Z 2020-09-22T04:22:40Z OWNER

Documentation for the .transform() method #114 (now landed) is here: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#transforming-a-table

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform sub-command 706017416  
696500767 https://github.com/simonw/sqlite-utils/issues/114#issuecomment-696500767 https://api.github.com/repos/simonw/sqlite-utils/issues/114 MDEyOklzc3VlQ29tbWVudDY5NjUwMDc2Nw== simonw 9599 2020-09-22T04:21:45Z 2020-09-22T04:21:45Z OWNER

Documentation: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#transforming-a-table

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method for advanced alter table 621989740  
696494070 https://github.com/simonw/sqlite-utils/pull/161#issuecomment-696494070 https://api.github.com/repos/simonw/sqlite-utils/issues/161 MDEyOklzc3VlQ29tbWVudDY5NjQ5NDA3MA== simonw 9599 2020-09-22T03:48:58Z 2020-09-22T03:48:58Z OWNER

One last thing. https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_change says that the first step should be:

If foreign key constraints are enabled, disable them using PRAGMA foreign_keys=OFF.

And the last steps should be:

If foreign key constraints were originally enabled then run PRAGMA foreign_key_check to verify that the schema change did not break any foreign key constraints.

Commit the transaction started in step 2.

If foreign keys constraints were originally enabled, reenable them now.

I need to implement that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method 705975133  
696490851 https://github.com/simonw/sqlite-utils/pull/161#issuecomment-696490851 https://api.github.com/repos/simonw/sqlite-utils/issues/161 MDEyOklzc3VlQ29tbWVudDY5NjQ5MDg1MQ== simonw 9599 2020-09-22T03:33:54Z 2020-09-22T03:33:54Z OWNER

It would be neat if .transform(pk=None) converted a primary key table to a rowid table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method 705975133  
696488201 https://github.com/simonw/sqlite-utils/pull/161#issuecomment-696488201 https://api.github.com/repos/simonw/sqlite-utils/issues/161 MDEyOklzc3VlQ29tbWVudDY5NjQ4ODIwMQ== simonw 9599 2020-09-22T03:21:16Z 2020-09-22T03:21:16Z OWNER

Just needs documentation now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method 705975133  
696485791 https://github.com/simonw/sqlite-utils/pull/161#issuecomment-696485791 https://api.github.com/repos/simonw/sqlite-utils/issues/161 MDEyOklzc3VlQ29tbWVudDY5NjQ4NTc5MQ== simonw 9599 2020-09-22T03:10:15Z 2020-09-22T03:10:15Z OWNER

Design decision needed on foreign keys: what does the syntax look like for removing an existing foreign key?

Since I already have a good implementation of add_foreign_key() I'm tempted to only support dropping them. Maybe like this:

table.transform(drop_foreign_keys=[("author_id", "author", "id")])

It's a bit crufty but it's such a rare use-case that I think this will be good enough.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method 705975133  
696480925 https://github.com/simonw/sqlite-utils/pull/161#issuecomment-696480925 https://api.github.com/repos/simonw/sqlite-utils/issues/161 MDEyOklzc3VlQ29tbWVudDY5NjQ4MDkyNQ== simonw 9599 2020-09-22T02:45:47Z 2020-09-22T02:45:47Z OWNER

I'm not going to do conversions= because it would be inconsistent with how they work elsewhere. The SQL generated by this function looks like this:

INSERT INTO dogs_new_tmp VALUES (a, b) SELECT a, b from dogs;

So passing conversions={"name": "upper(?)"}) wouldn't make sense, since we're not using arguments hence there is no-where for that ? to go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.transform() method 705975133  

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Query took 1043.779ms · About: github-to-sqlite