6,033 rows sorted by updated_at descending

View and edit SQL

issue

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
869071167 https://github.com/simonw/datasette/issues/1384#issuecomment-869071167 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA3MTE2Nw== simonw 9599 2021-06-26T22:55:36Z 2021-06-26T22:55:36Z OWNER

Just realized I already have an issue open for this, at #860. I'm going to close that and continue work on this in this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869070941 https://github.com/simonw/datasette/issues/1384#issuecomment-869070941 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA3MDk0MQ== simonw 9599 2021-06-26T22:53:34Z 2021-06-26T22:53:34Z OWNER

The await thing is worrying me a lot - it feels like this plugin hook is massively less useful if it can't make it's own DB queries and generally do asynchronous stuff - but I'd also like not to break every existing plugin that calls datasette.metadata(...).

One solution that could work: introduce a new method, maybe await datasette.get_metadata(...), which uses this plugin hook - and keep the existing datasette.metadata() method (which doesn't call the hook) around. This would ensure existing plugins keep on working.

Then, upgrade those plugins separately - with the goal of deprecating and removing .metadata() entirely in Datasette 1.0 - having upgraded the plugins in the meantime.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869070348 https://github.com/simonw/datasette/issues/1384#issuecomment-869070348 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA3MDM0OA== simonw 9599 2021-06-26T22:46:18Z 2021-06-26T22:46:18Z OWNER

Here's where the plugin hook is called, demonstrating the fallback= argument: https://github.com/simonw/datasette/blob/05a312caf3debb51aa1069939923a49e21cd2bd1/datasette/app.py#L426-L472

I'm not convinced of the use-case for passing fallback= to the hook here - is there a reason a plugin might care whether fallback is True or False, seeing as the metadata() method already respects that fallback logic on line 459?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869070076 https://github.com/simonw/datasette/issues/1384#issuecomment-869070076 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA3MDA3Ng== simonw 9599 2021-06-26T22:42:21Z 2021-06-26T22:42:21Z OWNER

Hmmm... that's tricky, since one of the most obvious ways to use this hook is to load metadata from database tables using SQL queries.

@brandonrobertz do you have a working example of using this hook to populate metadata from database tables I can try?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869069926 https://github.com/simonw/datasette/issues/1384#issuecomment-869069926 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA2OTkyNg== simonw 9599 2021-06-26T22:40:15Z 2021-06-26T22:40:53Z OWNER

The documentation says:

datasette: You can use this to access plugin configuration options via datasette.plugin_config(your_plugin_name), or to execute SQL queries.

That's not accurate: since the plugin hook is a regular function, not an awaitable, you can't use it to run await db.execute(...) so you can't execute SQL queries.

I can fix this with the await-me-maybe pattern, used for other plugin hooks: https://simonwillison.net/2020/Sep/2/await-me-maybe/

BUT... that requires changing the ds.metadata() function to be awaitable, which will affect every existing plugn that uses that documented internal method!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869069768 https://github.com/simonw/datasette/issues/1384#issuecomment-869069768 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA2OTc2OA== simonw 9599 2021-06-26T22:37:53Z 2021-06-26T22:37:53Z OWNER

The documentation doesn't describe the fallback argument at the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869069655 https://github.com/simonw/datasette/issues/1384#issuecomment-869069655 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA2OTY1NQ== simonw 9599 2021-06-26T22:36:14Z 2021-06-26T22:37:37Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  
869068554 https://github.com/simonw/datasette/pull/1368#issuecomment-869068554 https://api.github.com/repos/simonw/datasette/issues/1368 MDEyOklzc3VlQ29tbWVudDg2OTA2ODU1NA== simonw 9599 2021-06-26T22:23:57Z 2021-06-26T22:23:57Z OWNER

The only test failure is Black. I'm going to merge this and then reformat.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
DRAFT: A new plugin hook for dynamic metadata 913865304  
868881190 https://github.com/simonw/sqlite-utils/issues/37#issuecomment-868881190 https://api.github.com/repos/simonw/sqlite-utils/issues/37 MDEyOklzc3VlQ29tbWVudDg2ODg4MTE5MA== simonw 9599 2021-06-25T23:24:28Z 2021-06-25T23:24:28Z OWNER

Maybe I could release a separate Python package types-sqlite-utils-numpy which adds an over-ridden type definition that includes the numpy types?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Experiment with type hints 465815372  
868881033 https://github.com/simonw/sqlite-utils/issues/37#issuecomment-868881033 https://api.github.com/repos/simonw/sqlite-utils/issues/37 MDEyOklzc3VlQ29tbWVudDg2ODg4MTAzMw== simonw 9599 2021-06-25T23:23:49Z 2021-06-25T23:23:49Z OWNER

Twitter conversation about how to add types to the .create_table(columns=) parameter: https://twitter.com/simonw/status/1408532867592818693

Anyone know how to write a mypy type definition for this?

{"id": int, "name": str, "image": bytes, "weight": float}

It's a dict where keys are strings and values are one of int/str/bytes/float (weird API design I know, but I designed this long before I was thinking about mypy)

Looks like this could work:

    def create_table(
        self,
        name,
        columns: Dict[str, Union[Type[int], Type[bytes], Type[str], Type[float]]],
        pk=None,
        foreign_keys=None,
        column_order=None,
        not_null=None,
        defaults=None,
        hash_id=None,
        extracts=None,
    ):

Except... that method can optionally also accept numpy types if numpy is installed. I don't know if it's possible to dynamically change a signature based on an import, since mypy is a static type analyzer and doesn't ever execute the code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Experiment with type hints 465815372  
868728092 https://github.com/simonw/sqlite-utils/pull/293#issuecomment-868728092 https://api.github.com/repos/simonw/sqlite-utils/issues/293 MDEyOklzc3VlQ29tbWVudDg2ODcyODA5Mg== simonw 9599 2021-06-25T17:39:35Z 2021-06-25T17:39:35Z OWNER

Here's more about this problem: https://github.com/numpy/numpy/issues/15947

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Test against Python 3.10-dev 929748885  
868134040 https://github.com/simonw/sqlite-utils/pull/293#issuecomment-868134040 https://api.github.com/repos/simonw/sqlite-utils/issues/293 MDEyOklzc3VlQ29tbWVudDg2ODEzNDA0MA== simonw 9599 2021-06-25T01:49:44Z 2021-06-25T01:50:33Z OWNER

Test failed on 3.10 with numpy on macOS:

sqlite_utils/__init__.py:1: in <module>
11
    from .db import Database
12
sqlite_utils/db.py:48: in <module>
13
    import numpy as np  # type: ignore
14
../../../hostedtoolcache/Python/3.10.0-beta.3/x64/lib/python3.10/site-packages/numpy/__init__.py:391: in <module>
15
    raise RuntimeError(msg)
16
E   RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, more information is available at https://numpy.org/doc/stable/user/building.html#accelerated-blas-lapack-libraries Otherwise report this to the vendor that provided NumPy.
17
E   RankWarning: Polyfit may be poorly conditioned
18
Error: Process completed with exit code 4.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Test against Python 3.10-dev 929748885  
868125750 https://github.com/simonw/sqlite-utils/pull/293#issuecomment-868125750 https://api.github.com/repos/simonw/sqlite-utils/issues/293 MDEyOklzc3VlQ29tbWVudDg2ODEyNTc1MA== codecov[bot] 22429695 2021-06-25T01:42:43Z 2021-06-25T01:42:43Z NONE

Codecov Report

Merging #293 (ae0f46a) into main (747be60) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #293   +/-   ##
=======================================
  Coverage   96.03%   96.03%           
=======================================
  Files           4        4           
  Lines        1994     1994           
=======================================
  Hits         1915     1915           
  Misses         79       79           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 747be60...ae0f46a. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Test against Python 3.10-dev 929748885  
868021624 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-868021624 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2ODAyMTYyNA== simonw 9599 2021-06-24T23:17:38Z 2021-06-24T23:17:38Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
867209791 https://github.com/simonw/datasette/issues/1377#issuecomment-867209791 https://api.github.com/repos/simonw/datasette/issues/1377 MDEyOklzc3VlQ29tbWVudDg2NzIwOTc5MQ== simonw 9599 2021-06-23T22:51:32Z 2021-06-23T22:51:32Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for plugins to exclude certain paths from CSRF checks 920884085  
867102944 https://github.com/simonw/datasette/pull/1368#issuecomment-867102944 https://api.github.com/repos/simonw/datasette/issues/1368 MDEyOklzc3VlQ29tbWVudDg2NzEwMjk0NA== simonw 9599 2021-06-23T19:32:01Z 2021-06-23T19:32:01Z OWNER

Yes, let's move ahead with getting this into an alpha.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
DRAFT: A new plugin hook for dynamic metadata 913865304  
866466388 https://github.com/simonw/sqlite-utils/issues/291#issuecomment-866466388 https://api.github.com/repos/simonw/sqlite-utils/issues/291 MDEyOklzc3VlQ29tbWVudDg2NjQ2NjM4OA== simonw 9599 2021-06-23T02:10:24Z 2021-06-23T02:10:24Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Adopt flake8 927766296  
866461926 https://github.com/simonw/sqlite-utils/issues/291#issuecomment-866461926 https://api.github.com/repos/simonw/sqlite-utils/issues/291 MDEyOklzc3VlQ29tbWVudDg2NjQ2MTkyNg== simonw 9599 2021-06-23T01:59:57Z 2021-06-23T01:59:57Z OWNER

That shouldn't be failing: it's complaining about these: https://github.com/simonw/sqlite-utils/blob/02898bf7af4a4e484ecc8ec852d5fee98463277b/tests/test_register_function.py#L56-L67

But I added # noqa: F811 to them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Adopt flake8 927766296  
866241836 https://github.com/simonw/sqlite-utils/issues/289#issuecomment-866241836 https://api.github.com/repos/simonw/sqlite-utils/issues/289 MDEyOklzc3VlQ29tbWVudDg2NjI0MTgzNg== adamchainz 857609 2021-06-22T18:44:36Z 2021-06-22T18:44:36Z NONE

Great!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mypy fixes for rows_from_file() 925677191  
865511810 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-865511810 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2NTUxMTgxMA== simonw 9599 2021-06-22T04:07:34Z 2021-06-22T18:26:21Z OWNER

That documentation section is pretty weak at the moment - here's the whole thing:

Executing queries

The db.execute() and db.executescript() methods provide wrappers around .execute() and .executescript() on the underlying SQLite connection. These wrappers log to the tracer function if one has been registered.
python db = Database(memory=True) db["dogs"].insert({"name": "Cleo"}) db.execute("update dogs set name = 'Cleopaws'")
You can pass parameters as an optional second argument, using either a list or a dictionary. These will be correctly quoted and escaped.
```python

Using ? and a list:

db.execute("update dogs set name = ?", ["Cleopaws"])

Or using :name and a dictionary:

db.execute("update dogs set name = :name", {"name": "Cleopaws"})
```

  • Talks about .execute() - I want to talk about .query() instead
  • Doesn't clarify that .execute() returns a Cursor - and assumes you know what to do with one
  • Doesn't show an example of a select query at all
  • The "tracer function" bit is confusing (should at least link to docs further down)
  • For UPDATE should show how to access the number of rows modified (probably using .execute() there)

It does at least cover the two types of parameters, though that could be bulked out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
509681590 https://github.com/simonw/sqlite-utils/issues/37#issuecomment-509681590 https://api.github.com/repos/simonw/sqlite-utils/issues/37 MDEyOklzc3VlQ29tbWVudDUwOTY4MTU5MA== simonw 9599 2019-07-09T15:07:12Z 2021-06-22T18:17:53Z OWNER

Here's a magic incantation for generating types detected through running the tests with https://github.com/Instagram/MonkeyType

pip install pytest-monkeytype
pytest --monkeytype-output=./monkeytype.sqlite3
monkeytype list-modules
monkeytype apply sqlite_utils.utils
monkeytype apply sqlite_utils.cli
monkeytype apply sqlite_utils.db

Here's the result: https://github.com/simonw/sqlite-utils/commit/d18c694fc25b7dd3d76e250c77ddf56d10ddf935

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Experiment with type hints 465815372  
866219755 https://github.com/simonw/sqlite-utils/issues/289#issuecomment-866219755 https://api.github.com/repos/simonw/sqlite-utils/issues/289 MDEyOklzc3VlQ29tbWVudDg2NjIxOTc1NQ== simonw 9599 2021-06-22T18:13:26Z 2021-06-22T18:13:26Z OWNER

Thanks @adamchainz - mypy now has a foothold on this project (and runs in CI).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mypy fixes for rows_from_file() 925677191  
866184260 https://github.com/simonw/sqlite-utils/issues/267#issuecomment-866184260 https://api.github.com/repos/simonw/sqlite-utils/issues/267 MDEyOklzc3VlQ29tbWVudDg2NjE4NDI2MA== simonw 9599 2021-06-22T17:26:18Z 2021-06-22T17:27:27Z OWNER

If an.update() method doesn't work because it collides with an existing dictionary method a .pk property could still be nice:

for row in db["sometable"].rows:
    db["sometable"].update(row.pk, {"modified": 1})
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
row.update() or row.pk 915421499  
866182655 https://github.com/simonw/sqlite-utils/issues/267#issuecomment-866182655 https://api.github.com/repos/simonw/sqlite-utils/issues/267 MDEyOklzc3VlQ29tbWVudDg2NjE4MjY1NQ== simonw 9599 2021-06-22T17:24:03Z 2021-06-22T17:24:03Z OWNER

I'm re-opening this as a research task because it may be possible to cleanly implement this using a dict subclass - some notes on that here: https://treyhunner.com/2019/04/why-you-shouldnt-inherit-from-list-and-dict-in-python/

Since this would just be for adding methods (and maybe a property for returning the primary keys for a row) the usual disadvantages of subclassing dict described in that article shouldn't apply.

One catch: dictionaries already have a .update() method! So would have to pick another name.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
row.update() or row.pk 915421499  
865510796 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-865510796 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2NTUxMDc5Ng== simonw 9599 2021-06-22T04:04:40Z 2021-06-22T04:04:48Z OWNER

Still needs documentation, which will involve rewriting the whole Executing queries section.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
865497846 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-865497846 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2NTQ5Nzg0Ng== simonw 9599 2021-06-22T03:21:38Z 2021-06-22T03:21:38Z OWNER

The Python docs say: https://docs.python.org/3/library/sqlite3.html

To retrieve data after executing a SELECT statement, you can either treat the cursor as an iterator, call the cursor’s fetchone() method to retrieve a single matching row, or call fetchall() to get a list of the matching rows.

Looking at the C source code, both fetchmany() and fetchall() work under the hood by assembling a Python list: https://github.com/python/cpython/blob/be1cb3214d09d4bf0288bc45f3c1f167f67e4514/Modules/_sqlite/cursor.c#L907-L972 - see calls to PyList_Append()

So it looks like the most efficient way to iterate over a cursor may well be for row in cursor: - which I think calls this C function: https://github.com/python/cpython/blob/be1cb3214d09d4bf0288bc45f3c1f167f67e4514/Modules/_sqlite/cursor.c#L813-L876

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
865495370 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-865495370 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2NTQ5NTM3MA== simonw 9599 2021-06-22T03:14:30Z 2021-06-22T03:14:30Z OWNER

One small problem with the existing method:
https://github.com/simonw/sqlite-utils/blob/8cedc6a8b29180e68326f6b76f249d5e39e4b591/sqlite_utils/db.py#L362-L365

It returns a full list, but what if the user would rather have a generator they can iterate over without loading the results into memory in one go?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
865491922 https://github.com/simonw/sqlite-utils/issues/290#issuecomment-865491922 https://api.github.com/repos/simonw/sqlite-utils/issues/290 MDEyOklzc3VlQ29tbWVudDg2NTQ5MTkyMg== simonw 9599 2021-06-22T03:05:35Z 2021-06-22T03:05:35Z OWNER

Potential names:

  • db.query(sql) - it's weird to have both this and db.execute() but it is at least short and memorable
  • db.sql(sql)
  • db.execute_d(sql) - ugly
  • db.execute_dicts(sql) - confusing
  • db.execute_sql(sql) - easily confused with db.execute(sql)

I think db.query(sql) may be the best option here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`db.query()` method (renamed `db.execute_returning_dicts()`) 926777310  
865204472 https://github.com/simonw/datasette/pull/1368#issuecomment-865204472 https://api.github.com/repos/simonw/datasette/issues/1368 MDEyOklzc3VlQ29tbWVudDg2NTIwNDQ3Mg== brandonrobertz 2670795 2021-06-21T17:11:37Z 2021-06-21T17:11:37Z CONTRIBUTOR

If this is a concept ACK then I will move onto fixing the tests (adding new ones) and updating the documentation for the new plugin hook.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
DRAFT: A new plugin hook for dynamic metadata 913865304  
865160132 https://github.com/simonw/datasette/pull/1368#issuecomment-865160132 https://api.github.com/repos/simonw/datasette/issues/1368 MDEyOklzc3VlQ29tbWVudDg2NTE2MDEzMg== simonw 9599 2021-06-21T16:07:06Z 2021-06-21T16:08:48Z OWNER

A few tests failed - Black, the test that checks the docs mention the new hook - the most interesting failing test looks like this one:

            updated_metadata["databases"]["fixtures"]["queries"]["magic_parameters"][
                "allow"
            ] = (allow if "query" in permissions else deny)
>           cascade_app_client.ds._metadata = updated_metadata
E           AttributeError: can't set attribute

From https://github.com/simonw/datasette/blob/0a7621f96f8ad14da17e7172e8a7bce24ef78966/tests/test_permissions.py#L439-L467

This test is directly manipulating _metadata purely for the purposes of simulating different permissions - I think updating it to manipulate _local_metadata instead would fix that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
DRAFT: A new plugin hook for dynamic metadata 913865304  
864621099 https://github.com/simonw/sqlite-utils/issues/278#issuecomment-864621099 https://api.github.com/repos/simonw/sqlite-utils/issues/278 MDEyOklzc3VlQ29tbWVudDg2NDYyMTA5OQ== mcint 601708 2021-06-20T22:39:57Z 2021-06-20T22:39:57Z CONTRIBUTOR

Fair. I looked into it, it looks like it could be done, but it would be a bit ugly. I can upload and link a gist of my exploration. Click can parse a first argument while still recognizing it as a sub-command keyword. From there, the program could:
1. ignore it preemptively if it matches a sub-command
2. and/or check if a (db) file exists at the path.

It would then also need to set a shared db argument variable.

Click also makes it easy to parse arguments from environment variables. If you're amenable, I may submit a patch for only that, which would update each sub-command to check for a DB/SQLITE_UTILS_DB environment variable. The goal would be usage that looks like: DB=./convenient.db sqlite-utils [operation] [args]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support db as first parameter before subcommand, or as environment variable 923697888  
864609271 https://github.com/simonw/sqlite-utils/issues/289#issuecomment-864609271 https://api.github.com/repos/simonw/sqlite-utils/issues/289 MDEyOklzc3VlQ29tbWVudDg2NDYwOTI3MQ== simonw 9599 2021-06-20T20:42:07Z 2021-06-20T20:42:07Z OWNER

Wow, thank you! I didn't know about typing.cast().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mypy fixes for rows_from_file() 925677191  
864594956 https://github.com/simonw/sqlite-utils/issues/286#issuecomment-864594956 https://api.github.com/repos/simonw/sqlite-utils/issues/286 MDEyOklzc3VlQ29tbWVudDg2NDU5NDk1Ng== simonw 9599 2021-06-20T18:38:05Z 2021-06-20T18:38:05Z OWNER

3.10 is out in Homebrew now (they turn that around so fast): https://formulae.brew.sh/formula/sqlite-utils

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add installation instructions 925487946  
864480051 https://github.com/simonw/datasette/issues/1382#issuecomment-864480051 https://api.github.com/repos/simonw/datasette/issues/1382 MDEyOklzc3VlQ29tbWVudDg2NDQ4MDA1MQ== simonw 9599 2021-06-20T00:20:06Z 2021-06-20T00:21:02Z OWNER

Yes you can - thanks for pointing this out, I've added a comment to the install.sh script in the datasette-csvs Glitch project:

pip3 install -U --no-cache-dir -r requirements.txt --user && \
  mkdir -p .data && \
  rm .data/data.db || true && \
  for f in *.csv
    do
      # Add --encoding=latin-1 to the following if your CSVs use a different encoding:
      sqlite-utils insert .data/data.db ${f%.*} $f --csv
    done

So if you edit that file in your own project and change the line to this:

  sqlite-utils insert .data/data.db ${f%.*} $f --csv --encoding=iso-8859-1

It should fix this for you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette with Glitch - is it possible to use CSV with ISO-8859-1 encoding? 925406964  
864476167 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-864476167 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2NDQ3NjE2Nw== simonw 9599 2021-06-19T23:36:48Z 2021-06-19T23:36:48Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
864419283 https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864419283 https://api.github.com/repos/simonw/sqlite-utils/issues/284 MDEyOklzc3VlQ29tbWVudDg2NDQxOTI4Mw== simonw 9599 2021-06-19T15:15:34Z 2021-06-19T15:15:34Z OWNER

I think this code is at fault: https://github.com/simonw/sqlite-utils/blob/5b257949d996fe43dc5d218d4308b88796a90740/sqlite_utils/db.py#L1017-L1023

It's using .pks which adds rowid if it's missing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.transform(types=) turns rowid into a concrete column 925320167  
864418795 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864418795 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxODc5NQ== simonw 9599 2021-06-19T15:11:05Z 2021-06-19T15:11:14Z OWNER

Actually I'm going to go with use_rowid instead - because the table doesn't inherently use a rowid itself, but you should use one if you want to query it in a way that gives you back a primary key.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864418188 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864418188 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxODE4OA== simonw 9599 2021-06-19T15:05:53Z 2021-06-19T15:05:53Z OWNER
    @property
    def uses_rowid(self):
        return not any(column for column in self.columns if column.is_pk)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864417808 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864417808 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxNzgwOA== simonw 9599 2021-06-19T15:03:00Z 2021-06-19T15:03:00Z OWNER

I think I like table.uses_rowid best - it reads well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864417765 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864417765 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxNzc2NQ== simonw 9599 2021-06-19T15:02:42Z 2021-06-19T15:02:42Z OWNER

Some options:

  • table.rowid_only
  • table.rowid_as_pk
  • table.no_pks
  • table.no_pk
  • table.uses_rowid
  • table.use_rowid
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864417493 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864417493 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxNzQ5Mw== simonw 9599 2021-06-19T15:00:43Z 2021-06-19T15:00:43Z OWNER

I have to be careful about the language I use here. Here's the official definition: https://www.sqlite.org/rowidtable.html

A "rowid table" is any table in an SQLite schema that

Most tables in a typical SQLite database schema are rowid tables.

Rowid tables are distinguished by the fact that they all have a unique, non-NULL, signed 64-bit integer rowid that is used as the access key for the data in the underlying B-tree storage engine.

So it's not correct to call a table a "rowid table" only if it is missing its own primary keys.

Maybe table.has_rowid is the right language to use here? No, that's no good - because tables with their own primary keys usually also have a rowid.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864417133 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864417133 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxNzEzMw== simonw 9599 2021-06-19T14:57:36Z 2021-06-19T14:57:36Z OWNER

So the logic is:

[column.name for column in self.columns if column.is_pk]

I need to decide on a property name. Existing names are documented here: https://sqlite-utils.datasette.io/en/stable/python-api.html#introspection

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864417031 https://github.com/simonw/sqlite-utils/issues/285#issuecomment-864417031 https://api.github.com/repos/simonw/sqlite-utils/issues/285 MDEyOklzc3VlQ29tbWVudDg2NDQxNzAzMQ== simonw 9599 2021-06-19T14:56:45Z 2021-06-19T14:56:45Z OWNER
>>> db = sqlite_utils.Database(memory=True)
>>> db["rowid_table"].insert({"name": "Cleo"})
<Table rowid_table (name)>
>>> db["regular_table"].insert({"id": 1, "name": "Cleo"}, pk="id")
<Table regular_table (id, name)>
>>> db["rowid_table"].pks
['rowid']
>>> db["regular_table"].pks
['id']

But that's because the .pks property hides the difference: https://github.com/simonw/sqlite-utils/blob/dc94f4bb8cfe922bb2f9c89f8f0f29092ea63133/sqlite_utils/db.py#L805-L810

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Introspection property for telling if a table is a rowid table 925410305  
864416911 https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864416911 https://api.github.com/repos/simonw/sqlite-utils/issues/284 MDEyOklzc3VlQ29tbWVudDg2NDQxNjkxMQ== simonw 9599 2021-06-19T14:55:45Z 2021-06-19T14:55:45Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.transform(types=) turns rowid into a concrete column 925320167  
864416785 https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864416785 https://api.github.com/repos/simonw/sqlite-utils/issues/284 MDEyOklzc3VlQ29tbWVudDg2NDQxNjc4NQ== simonw 9599 2021-06-19T14:54:41Z 2021-06-19T14:54:41Z OWNER
>>> db = sqlite_utils.Database(memory=True)
>>> db["rowid_table"].insert({"name": "Cleo"})
<Table rowid_table (name)>
>>> db["regular_table"].insert({"id": 1, "name": "Cleo"}, pk="id")
<Table regular_table (id, name)>
>>> db["rowid_table"].pks
['rowid']
>>> db["regular_table"].pks
['id']

I think I need an introspection property for working out if a table is a rowid table or not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.transform(types=) turns rowid into a concrete column 925320167  
864416086 https://github.com/simonw/sqlite-utils/issues/283#issuecomment-864416086 https://api.github.com/repos/simonw/sqlite-utils/issues/283 MDEyOklzc3VlQ29tbWVudDg2NDQxNjA4Ng== simonw 9599 2021-06-19T14:49:06Z 2021-06-19T14:49:13Z OWNER

Once again, this is difficult because of the use of a generator here - rows_from_file() only yields rows, so there is no obvious mechanism for it to communicate back to the wrapping code that the detected format was CSV or TSV as opposed to JSON.

I'm going to change rows_from_file() to return a (generator, detected_format) tuple.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
memory: Shouldn't detect types for JSON 925319214  
864358951 https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864358951 https://api.github.com/repos/simonw/sqlite-utils/issues/284 MDEyOklzc3VlQ29tbWVudDg2NDM1ODk1MQ== simonw 9599 2021-06-19T05:30:00Z 2021-06-19T05:30:00Z OWNER

If this can be fixed it will be in the transform_sql() method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.transform(types=) turns rowid into a concrete column 925320167  
864358680 https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864358680 https://api.github.com/repos/simonw/sqlite-utils/issues/284 MDEyOklzc3VlQ29tbWVudDg2NDM1ODY4MA== simonw 9599 2021-06-19T05:27:13Z 2021-06-19T05:27:13Z OWNER

How easy is it to detect a rowid table? Is it as simple as .pks returning None? If so the documentation should mention that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.transform(types=) turns rowid into a concrete column 925320167  
864354627 https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864354627 https://api.github.com/repos/simonw/sqlite-utils/issues/282 MDEyOklzc3VlQ29tbWVudDg2NDM1NDYyNw== simonw 9599 2021-06-19T04:42:03Z 2021-06-19T04:42:03Z OWNER

Demo:

curl -s 'https://api.github.com/users/simonw/repos?per_page=100' | \
  sqlite-utils memory - 'select sum(size), sum(stargazers_count) from stdin limit 1'
[{"sum(size)": 2042547, "sum(stargazers_count)": 6769}]
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Automatic type detection for CSV data 925305186  
864350407 https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864350407 https://api.github.com/repos/simonw/sqlite-utils/issues/282 MDEyOklzc3VlQ29tbWVudDg2NDM1MDQwNw== simonw 9599 2021-06-19T03:52:20Z 2021-06-19T03:52:20Z OWNER

I'll have an environment variable for --detect-types so users who really want that as the default option can turn it on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Automatic type detection for CSV data 925305186  
864349123 https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864349123 https://api.github.com/repos/simonw/sqlite-utils/issues/282 MDEyOklzc3VlQ29tbWVudDg2NDM0OTEyMw== simonw 9599 2021-06-19T03:36:54Z 2021-06-19T03:36:54Z OWNER

I may change the default for sqlite-utils insert to detect types if I release sqlite-utils 4.0, as a backwards-incompatible change.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Automatic type detection for CSV data 925305186  
864349066 https://github.com/simonw/sqlite-utils/issues/179#issuecomment-864349066 https://api.github.com/repos/simonw/sqlite-utils/issues/179 MDEyOklzc3VlQ29tbWVudDg2NDM0OTA2Ng== simonw 9599 2021-06-19T03:36:04Z 2021-06-19T03:36:04Z OWNER

This work is going to happen in #282.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils transform/insert --detect-types 709577625  
864348954 https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864348954 https://api.github.com/repos/simonw/sqlite-utils/issues/282 MDEyOklzc3VlQ29tbWVudDg2NDM0ODk1NA== simonw 9599 2021-06-19T03:34:42Z 2021-06-19T03:35:46Z OWNER

I built some prototype code here for something which looks at every row in a CSV import and records the likely types: https://gist.github.com/simonw/465f9356f175d1cf86957947dff501d4

This could be used by the command-line tools to figure out what table.transform(types=...) method to use at the end.

This is a different approach to the pure SQL version I tried building in https://github.com/simonw/sqlite-utils/issues/179 - I think this is a better approach though, it's less prone to weird idiosyncrasies of SQLite types, and it's also easy for us to add on to the existing CSV import code in a way that won't require scanning the data twice.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Automatic type detection for CSV data 925305186  
864330508 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864330508 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDMzMDUwOA== simonw 9599 2021-06-19T00:34:24Z 2021-06-19T00:34:24Z OWNER

Got this working:

% curl 'https://api.github.com/repos/simonw/datasette/issues' | sqlite-utils memory - 'select id from stdin'
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864328927 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDMyODkyNw== simonw 9599 2021-06-19T00:25:08Z 2021-06-19T00:25:17Z OWNER

I tried writing this function with type hints, but eventually gave up:

def rows_from_file(
    fp: BinaryIO,
    format: Optional[Format] = None,
    dialect: Optional[Type[csv.Dialect]] = None,
    encoding: Optional[str] = None,
) -> Generator[dict, None, None]:
    if format == Format.JSON:
        decoded = json.load(fp)
        if isinstance(decoded, dict):
            decoded = [decoded]
        if not isinstance(decoded, list):
            raise RowsFromFileBadJSON("JSON must be a list or a dictionary")
        yield from decoded
    elif format == Format.CSV:
        decoded_fp = io.TextIOWrapper(fp, encoding=encoding or "utf-8-sig")
        yield from csv.DictReader(decoded_fp)
    elif format == Format.TSV:
        yield from rows_from_file(
            fp, format=Format.CSV, dialect=csv.excel_tab, encoding=encoding
        )
    elif format is None:
        # Detect the format, then call this recursively
        buffered = io.BufferedReader(fp, buffer_size=4096)
        first_bytes = buffered.peek(2048).strip()
        if first_bytes[0] in (b"[", b"{"):
            # TODO: Detect newline-JSON
            yield from rows_from_file(fp, format=Format.JSON)
        else:
            dialect = csv.Sniffer().sniff(first_bytes.decode(encoding, "ignore"))
            yield from rows_from_file(
                fp, format=Format.CSV, dialect=dialect, encoding=encoding
            )
    else:
        raise RowsFromFileError("Bad format")

mypy said:

sqlite_utils/utils.py:157: error: Argument 1 to "BufferedReader" has incompatible type "BinaryIO"; expected "RawIOBase"
sqlite_utils/utils.py:163: error: Argument 1 to "decode" of "bytes" has incompatible type "Optional[str]"; expected "str"
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864323438 https://github.com/simonw/sqlite-utils/issues/281#issuecomment-864323438 https://api.github.com/repos/simonw/sqlite-utils/issues/281 MDEyOklzc3VlQ29tbWVudDg2NDMyMzQzOA== simonw 9599 2021-06-18T23:55:06Z 2021-06-18T23:55:06Z OWNER

The -:json idea is flawed: Click thinks that's the syntax for an option called :json.

I'm going to do stdin:json - which means you can't open a file called stdin - but you could use cat stdin | sqlite-utils memory stdin:json ... instead which is an OK workaround.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for explicitly stating CSV or JSON or TSV for sqlite-utils memory 924992318  
864208476 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864208476 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDIwODQ3Ng== simonw 9599 2021-06-18T18:30:08Z 2021-06-18T23:30:19Z OWNER

So maybe this is a function which can either be told the format or, if none is provided, it detects one for itself.

def rows_from_file(fp, format=None):
    # ...
    yield from rows
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864207841 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDIwNzg0MQ== simonw 9599 2021-06-18T18:28:40Z 2021-06-18T18:28:46Z OWNER
def detect_format(fp):
    # ...
    return "csv", fp, dialect
    # or
    return "json", fp, parsed_data
    # or
    return "json-nl", fp, docs

The mixed return types here are ugly. In all of these cases what we really want is to return a generator of {...} objects. So maybe it returns that instead.

def filepointer_to_documents(fp):
    # ...
    yield from documents

I can refactor sqlite-utils insert to use this new code too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864206308 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864206308 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDIwNjMwOA== simonw 9599 2021-06-18T18:25:04Z 2021-06-18T18:25:04Z OWNER

Or... since I'm not using a streaming JSON parser at the moment, if I think something is JSON I can load the entire thing into memory to validate it.

I still need to detect newline-delimited JSON. For that I can consume the first line of the input to see if it's a valid JSON object, then maybe sniff the second line too?

This does mean that if the input is a single line of GIANT JSON it will all be consumed into memory at once, but that's going to happen anyway.

So I need a function which, given a file pointer, consumes from it, detects the type, then returns that type AND a file pointer to the beginning of the file again. I can use io.BufferedReader for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864129273 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864129273 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDEyOTI3Mw== simonw 9599 2021-06-18T15:47:47Z 2021-06-18T15:47:47Z OWNER

Detecting valid JSON is tricky - just because a stream starts with [ or { doesn't mean the entire stream is valid JSON. You need to parse the entire stream to determine that for sure.

One way to solve this would be with a custom state machine. Another would be to use the ijson streaming parser - annoyingly it throws the same exception class for invalid JSON for different reasons, but the e.args[0] for that exception includes human-readable text about the error - if it's anything other than parse error: premature EOF then it probably means the JSON was invalid.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864128489 https://github.com/simonw/sqlite-utils/issues/278#issuecomment-864128489 https://api.github.com/repos/simonw/sqlite-utils/issues/278 MDEyOklzc3VlQ29tbWVudDg2NDEyODQ4OQ== simonw 9599 2021-06-18T15:46:24Z 2021-06-18T15:46:24Z OWNER

A workaround could be to define a bash or zsh alias of some sort.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support db as first parameter before subcommand, or as environment variable 923697888  
864126781 https://github.com/simonw/sqlite-utils/issues/278#issuecomment-864126781 https://api.github.com/repos/simonw/sqlite-utils/issues/278 MDEyOklzc3VlQ29tbWVudDg2NDEyNjc4MQ== simonw 9599 2021-06-18T15:43:19Z 2021-06-18T15:43:19Z OWNER

I don't think it's possible to do this without breaking backwards compatibility, unfortunately.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support db as first parameter before subcommand, or as environment variable 923697888  
864103005 https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864103005 https://api.github.com/repos/simonw/sqlite-utils/issues/279 MDEyOklzc3VlQ29tbWVudDg2NDEwMzAwNQ== simonw 9599 2021-06-18T15:04:15Z 2021-06-18T15:04:15Z OWNER

To detect JSON, check to see if the stream starts with [ or { - maybe do something more sophisticated than that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory should handle TSV and JSON in addition to CSV 924990677  
864101267 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-864101267 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2NDEwMTI2Nw== simonw 9599 2021-06-18T15:01:41Z 2021-06-18T15:01:41Z OWNER

I'll split the remaining work out into separate issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
864099764 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-864099764 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2NDA5OTc2NA== simonw 9599 2021-06-18T14:59:27Z 2021-06-18T14:59:27Z OWNER

I'm going to merge this as-is and work on the JSON/TSV support in a separate issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
864092515 https://github.com/simonw/sqlite-utils/pull/277#issuecomment-864092515 https://api.github.com/repos/simonw/sqlite-utils/issues/277 MDEyOklzc3VlQ29tbWVudDg2NDA5MjUxNQ== simonw 9599 2021-06-18T14:47:57Z 2021-06-18T14:47:57Z OWNER

This is a neat improvement.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
add -h support closes #276 923612361  
863230355 https://github.com/simonw/datasette/pull/1378#issuecomment-863230355 https://api.github.com/repos/simonw/datasette/issues/1378 MDEyOklzc3VlQ29tbWVudDg2MzIzMDM1NQ== codecov[bot] 22429695 2021-06-17T13:16:17Z 2021-06-17T13:16:17Z NONE

Codecov Report

Merging #1378 (0c132d1) into main (83e9c8b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1378   +/-   ##
=======================================
  Coverage   91.68%   91.68%           
=======================================
  Files          34       34           
  Lines        4340     4340           
=======================================
  Hits         3979     3979           
  Misses        361      361           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83e9c8b...0c132d1. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Update pytest-xdist requirement from <2.3,>=2.2.1 to >=2.2.1,<2.4 923910375  
863205049 https://github.com/simonw/sqlite-utils/pull/277#issuecomment-863205049 https://api.github.com/repos/simonw/sqlite-utils/issues/277 MDEyOklzc3VlQ29tbWVudDg2MzIwNTA0OQ== codecov[bot] 22429695 2021-06-17T12:40:49Z 2021-06-17T12:40:49Z NONE

Codecov Report

Merging #277 (abbd324) into main (a19ce1a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #277   +/-   ##
=======================================
  Coverage   96.06%   96.06%           
=======================================
  Files           4        4           
  Lines        1828     1829    +1     
=======================================
+ Hits         1756     1757    +1     
  Misses         72       72           
<table> <thead> <tr> <th>Impacted Files</th> <th>Coverage Δ</th> <th></th> </tr> </thead> <tbody> <tr> <td>sqlite_utils/cli.py</td> <td>94.03% <100.00%> (+<0.01%)</td> <td>:arrow_up:</td> </tr> </tbody> </table>

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a19ce1a...abbd324. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
add -h support closes #276 923612361  
862817185 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862817185 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjgxNzE4NQ== codecov[bot] 22429695 2021-06-17T00:15:34Z 2021-06-17T00:15:34Z NONE

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@78aebb6). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #273   +/-   ##
=======================================
  Coverage        ?   96.10%           
=======================================
  Files           ?        4           
  Lines           ?     1873           
  Branches        ?        0           
=======================================
  Hits            ?     1800           
  Misses          ?       73           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78aebb6...df7a37b. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862617165 https://github.com/simonw/sqlite-utils/issues/275#issuecomment-862617165 https://api.github.com/repos/simonw/sqlite-utils/issues/275 MDEyOklzc3VlQ29tbWVudDg2MjYxNzE2NQ== simonw 9599 2021-06-16T18:34:51Z 2021-06-16T18:34:51Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Enable code coverage 922955697  
862605436 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862605436 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjYwNTQzNg== simonw 9599 2021-06-16T18:19:05Z 2021-06-16T18:19:05Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862574390 https://github.com/simonw/sqlite-utils/issues/270#issuecomment-862574390 https://api.github.com/repos/simonw/sqlite-utils/issues/270 MDEyOklzc3VlQ29tbWVudDg2MjU3NDM5MA== frafra 4068 2021-06-16T17:34:49Z 2021-06-16T17:34:49Z NONE

Sorry, I got confused because SQLite has a JSON column type, even if it is treated as TEXT, and I though automatic facets were available for JSON arrays stored as JSON only :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Cannot set type JSON 919314806  
862494864 https://github.com/simonw/sqlite-utils/issues/267#issuecomment-862494864 https://api.github.com/repos/simonw/sqlite-utils/issues/267 MDEyOklzc3VlQ29tbWVudDg2MjQ5NDg2NA== simonw 9599 2021-06-16T15:51:28Z 2021-06-16T16:26:15Z OWNER

I did add a slightly clumsy mechanism recently to help a bit here though: the pks_and_rows_where() method: https://sqlite-utils.datasette.io/en/stable/python-api.html#listing-rows-with-their-primary-keys

More details in the issue for that feature: #240

The idea here is that if you want to call update you need the primary key for the row - so you can do this:

for pk, row in db["sometable"].pks_and_rows_where():
    db["sometable"].update(pk, {"modified": 1}")

The pk may end up as a single value or a tuple depending on if the table has a compound primary key - but you don't need to worry about that if you use this method as it will return the correct primary key value for you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
row.update() or row.pk 915421499  
862495803 https://github.com/simonw/sqlite-utils/issues/131#issuecomment-862495803 https://api.github.com/repos/simonw/sqlite-utils/issues/131 MDEyOklzc3VlQ29tbWVudDg2MjQ5NTgwMw== simonw 9599 2021-06-16T15:52:33Z 2021-06-16T15:52:33Z OWNER

I like -t or --type for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils insert: options for column types 675753042  
862493179 https://github.com/simonw/sqlite-utils/issues/267#issuecomment-862493179 https://api.github.com/repos/simonw/sqlite-utils/issues/267 MDEyOklzc3VlQ29tbWVudDg2MjQ5MzE3OQ== simonw 9599 2021-06-16T15:49:13Z 2021-06-16T15:49:13Z OWNER

The big challenge here is that the rows returned by this library aren't objects, they are Python dictionaries - so adding methods to them isn't possible without changing the type that is returned by these methods.

Part of the philosophy of the library is that it should make it as easy as possible to round-trip between Python dictionaries and SQLite table data, so I don't think adding methods like this is going to fit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
row.update() or row.pk 915421499  
862491721 https://github.com/simonw/sqlite-utils/issues/270#issuecomment-862491721 https://api.github.com/repos/simonw/sqlite-utils/issues/270 MDEyOklzc3VlQ29tbWVudDg2MjQ5MTcyMQ== simonw 9599 2021-06-16T15:47:06Z 2021-06-16T15:47:06Z OWNER

SQLite doesn't have a JSON column type - it has JSON processing functions, but they operate against TEXT columns - so there's nothing I can do here unfortunately.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Cannot set type JSON 919314806  
862491016 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862491016 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ5MTAxNg== simonw 9599 2021-06-16T15:46:13Z 2021-06-16T15:46:13Z OWNER

Columns from data imported from CSV in this way is currently treated as TEXT, which means numeric sorts and suchlike won't work as people might expect. It would be good to do automatic type detection here, see #179.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862485408 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862485408 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ4NTQwOA== simonw 9599 2021-06-16T15:38:58Z 2021-06-16T15:39:28Z OWNER

Also sqlite-utils memory reflects the existing sqlite-utils :memory: mechanism, which is a point in its favour.

And it helps emphasize that the file you are querying will be loaded into memory, so probably don't try this against a 1GB CSV file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862484557 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862484557 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ4NDU1Nw== simonw 9599 2021-06-16T15:37:51Z 2021-06-16T15:38:34Z OWNER

I wonder if there's a better name for this than sqlite-utils memory?

  • sqlite-utils memory hello.csv "select * from hello"
  • sqlite-utils mem hello.csv "select * from hello"
  • sqlite-utils temp hello.csv "select * from hello"
  • sqlite-utils adhoc hello.csv "select * from hello"
  • sqlite-utils scratch hello.csv "select * from hello"

I think memory is best. I don't like the others, except for scratch which is OK.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862479704 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862479704 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3OTcwNA== simonw 9599 2021-06-16T15:31:31Z 2021-06-16T15:31:31Z OWNER

Plus, could I make this change to sqlite-utils query without breaking backwards compatibility? Adding a new sqlite-utils memory command is completely safe from that perspective.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862478881 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862478881 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3ODg4MQ== simonw 9599 2021-06-16T15:30:24Z 2021-06-16T15:30:24Z OWNER

But... sqlite-utils my.csv "select * from my" is a much more compelling initial experience than sqlite-utils memory my.csv "select * from my".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862475685 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862475685 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3NTY4NQ== simonw 9599 2021-06-16T15:26:19Z 2021-06-16T15:29:38Z OWNER

Here's a radical idea: what if I combined sqlite-utils memory into sqlite-utils query?

The trick here would be to detect if the arguments passed on the command-line refer to SQLite databases or if they refer to CSV/JSON data that should be imported into temporary tables.

Detecting a SQLite database file is actually really easy - they all start with the same binary string:

>>> open("my.db", "rb").read(100)
b'SQLite format 3\x00...

(Need to carefully check that a CSV file withSQLite format 3 as the first column name doesn't accidentally get interpreted as a SQLite DB though).

So then what would the semantics of sqlite-utils query (which is also the default command) be?

  • sqlite-utils mydb.db "select * from x"
  • sqlite-utils my.csv "select * from my"
  • sqlite-utils mydb.db my.csv "select * from mydb.x join my on ..." - this is where it gets weird. We can't import the CSV data directly into mpdb.db - it's suppose to go into the in-memory database - so now we need to start using database aliases like mydb.x because we passed at least one other file?

The complexity here is definitely in the handling of a combination of SQLite database files and CSV filenames. Also, sqlite-utils query doesn't accept multiple filenames at the moment, so that will change.

I'm not 100% sold on this as being better than having a separate sqlite-utils memory command, as seen in #273.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862046009 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862046009 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjA0NjAwOQ== simonw 9599 2021-06-16T05:15:38Z 2021-06-16T05:15:38Z OWNER

I'm going to add a --encoding option - it will affect ALL CSV input files, so if you have CSV files with different encodings you'll need to sort that mess out yourself (likely by importing each CSV file separately into a database using sqlite-utils insert with different --encoding values).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862045639 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862045639 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjA0NTYzOQ== simonw 9599 2021-06-16T05:14:38Z 2021-06-16T05:14:38Z OWNER

Can't share much code though since a bunch of that insert stuff is specific to that command - showing progress bars, returning errors on illegal option combinations etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862045438 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862045438 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjA0NTQzOA== simonw 9599 2021-06-16T05:14:00Z 2021-06-16T05:14:00Z OWNER

I should probably refactor the CSV/JSON/loading stuff into a function in utils.py in order to share some of the implementation with the existing sqlite-utils insert code: https://github.com/simonw/sqlite-utils/blob/287cdcae8908916687f2ecccc87c38549d004ac6/sqlite_utils/cli.py#L691-L734

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862043974 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862043974 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjA0Mzk3NA== simonw 9599 2021-06-16T05:10:12Z 2021-06-16T05:10:12Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862042110 https://github.com/simonw/sqlite-utils/pull/273#issuecomment-862042110 https://api.github.com/repos/simonw/sqlite-utils/issues/273 MDEyOklzc3VlQ29tbWVudDg2MjA0MjExMA== simonw 9599 2021-06-16T05:05:51Z 2021-06-16T05:06:11Z OWNER

Initial documentation is here: https://github.com/simonw/sqlite-utils/blob/c7234cae8336b8525034e8f917d82dd0699abd42/docs/cli.rst#running-queries-directly-against-csv-data

It only talks about CSV at the moment - needs to be updated to mention JSON too once that is implemented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils memory command for directly querying CSV/JSON data 922099793  
862040971 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862040971 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjA0MDk3MQ== simonw 9599 2021-06-16T05:02:56Z 2021-06-16T05:02:56Z OWNER

Moving this to a PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862040906 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862040906 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjA0MDkwNg== simonw 9599 2021-06-16T05:02:47Z 2021-06-16T05:02:47Z OWNER

Got a prototype working!

 % curl -s 'https://fivethirtyeight.datasettes.com/polls/president_approval_polls.csv?_size=max&_stream=1' | sqlite-utils memory - 'select * from t limit 5' --nl 
{"rowid": "1", "question_id": "139304", "poll_id": "74225", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "568", "pollster": "YouGov", "sponsor_ids": "352", "sponsors": "Economist", "display_name": "YouGov", "pollster_rating_id": "391", "pollster_rating_name": "YouGov", "fte_grade": "B", "sample_size": "1500", "population": "a", "population_full": "a", "methodology": "Online", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://docs.cdn.yougov.com/y9zsit5bzd/weeklytrackingreport.pdf", "source": "538", "yes": "42.0", "no": "53.0"}
{"rowid": "2", "question_id": "139305", "poll_id": "74225", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "568", "pollster": "YouGov", "sponsor_ids": "352", "sponsors": "Economist", "display_name": "YouGov", "pollster_rating_id": "391", "pollster_rating_name": "YouGov", "fte_grade": "B", "sample_size": "1155", "population": "rv", "population_full": "rv", "methodology": "Online", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://docs.cdn.yougov.com/y9zsit5bzd/weeklytrackingreport.pdf", "source": "538", "yes": "44.0", "no": "55.0"}
{"rowid": "3", "question_id": "139306", "poll_id": "74226", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "23", "pollster": "American Research Group", "sponsor_ids": "", "sponsors": "", "display_name": "American Research Group", "pollster_rating_id": "9", "pollster_rating_name": "American Research Group", "fte_grade": "B", "sample_size": "1100", "population": "a", "population_full": "a", "methodology": "Live Phone", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://americanresearchgroup.com/economy/", "source": "538", "yes": "30.0", "no": "66.0"}
{"rowid": "4", "question_id": "139307", "poll_id": "74226", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "23", "pollster": "American Research Group", "sponsor_ids": "", "sponsors": "", "display_name": "American Research Group", "pollster_rating_id": "9", "pollster_rating_name": "American Research Group", "fte_grade": "B", "sample_size": "990", "population": "rv", "population_full": "rv", "methodology": "Live Phone", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://americanresearchgroup.com/economy/", "source": "538", "yes": "29.0", "no": "67.0"}
{"rowid": "5", "question_id": "139298", "poll_id": "74224", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "1528", "pollster": "AtlasIntel", "sponsor_ids": "", "sponsors": "", "display_name": "AtlasIntel", "pollster_rating_id": "546", "pollster_rating_name": "AtlasIntel", "fte_grade": "B/C", "sample_size": "5188", "population": "a", "population_full": "a", "methodology": "Online", "start_date": "1/15/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/19/21 21:52", "notes": "", "url": "https://projects.fivethirtyeight.com/polls/20210119_US_Atlas2.pdf", "source": "538", "yes": "44.6", "no": "53.9"}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862018937 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862018937 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjAxODkzNw== simonw 9599 2021-06-16T03:59:28Z 2021-06-16T04:00:05Z OWNER

Mainly for debugging purposes it would be useful to be able to save the created in-memory database back to a file again later. This could be done with:

sqlite-utils memory blah.csv --save saved.db

Can use .iterdump() to implement this: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.iterdump

Maybe instead (or as-well-as) offer --dump which dumps out the SQL from that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861989987 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861989987 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4OTk4Nw== simonw 9599 2021-06-16T02:34:21Z 2021-06-16T02:34:21Z OWNER

The documentation already covers this

$ sqlite-utils :memory: "select sqlite_version()"
[{"sqlite_version()": "3.29.0"}]

https://sqlite-utils.datasette.io/en/latest/cli.html#running-queries-and-returning-json

sqlite-utils memory "select sqlite_version()" is a little bit more intuitive than that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861987651 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861987651 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NzY1MQ== simonw 9599 2021-06-16T02:27:20Z 2021-06-16T02:27:20Z OWNER

Solution: sqlite-utils memory - attempts to detect the input based on if it starts with a { or [ (likely JSON) or if it doesn't use the csv.Sniffer() mechanism. Or you can use sqlite-utils memory -:csv to specifically indicate the type of input.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861985944 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861985944 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NTk0NA== simonw 9599 2021-06-16T02:22:52Z 2021-06-16T02:22:52Z OWNER

Another option: allow an optional :suffix specifying the type of the file. If this is missing we detect based on the filename.

sqlite-utils memory somefile:csv "select * from somefile"

One catch: how to treat - for standard input?

cat blah.csv | sqlite-utils memory - "select * from stdin"

That's fine for CSV, but what about TSV or JSON or nl-JSON? Maybe this:

cat blah.csv | sqlite-utils memory -:json "select * from stdin"

Bit weird though. The alternative would be to support this:

cat blah.csv | sqlite-utils memory --load-csv -

But that's verbose compared to the version without the long --load-x option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861984707 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861984707 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NDcwNw== simonw 9599 2021-06-16T02:19:48Z 2021-06-16T02:19:48Z OWNER

This is going to need to be a separate command, for relatively non-obvious reasons.

sqlite-utils blah.db "select * from x"

Is equivalent to this, because query is the default sub-command:

sqlite-utils query blah.db "select * from x"

But... this means that making the filename optional doesn't actually work - because then this is ambiguous:

sqlite-utils --load-csv blah.csv "select * from blah"

So instead, I'm going to add a new sub-command. I'm currently thinking memory to reflect that this command operates on an in-memory database:

sqlite-utils memory --load-csv blah.csv "select * from blah"

I still think I need to use --load-csv rather than --csv because one interesting use-case for this is loading in CSV and converting it to JSON, or vice-versa.

Another option: allow multiple arguments which are filenames, and use the extension (or sniff the content) to decide what to do with them:

sqlite-utils memory blah.csv foo.csv "select * from foo join blah on ..."

This would require the last positional argument to always be a SQL query, and would treat all other positional arguments as files that should be imported into memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861944202 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861944202 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk0NDIwMg== eyeseast 25778 2021-06-16T01:41:03Z 2021-06-16T01:41:03Z NONE

So, I do things like this a lot, too. I like the idea of piping in from stdin. Something like this would be nice to do in a makefile:

cat file.csv | sqlite-utils --csv --table data - 'SELECT * FROM data WHERE col="whatever"' > filtered.csv

If you assumed that you're always piping out the same format you're piping in, the option names don't have to change. Depends how much you want to change formats.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891835 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891835 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTgzNQ== simonw 9599 2021-06-15T23:09:31Z 2021-06-15T23:09:31Z OWNER

--load-csv and --load-json and --load-nl and --load-tsv are unambiguous.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891693 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891693 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTY5Mw== simonw 9599 2021-06-15T23:09:08Z 2021-06-15T23:09:08Z OWNER

Problem: --csv and --json and --nl are already options for sqlite-utils query - need new non-conflicting names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891272 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891272 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTI3Mg== simonw 9599 2021-06-15T23:08:02Z 2021-06-15T23:08:02Z OWNER

--csv - should work though, for reading from stdin. The table can be called stdin.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891110 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891110 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTExMA== simonw 9599 2021-06-15T23:07:38Z 2021-06-15T23:07:38Z OWNER

--csvt seems unnecessary to me: if people want to load different CSV files with the same filename (but in different directories) they will get an error unless they rename the files first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861890689 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861890689 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MDY4OQ== simonw 9599 2021-06-15T23:06:37Z 2021-06-15T23:06:37Z OWNER

How about --json and --nl and --tsv too? Imitating the format options for sqlite-utils insert.

And what happens if you provide a filename too? I'm tempted to say that the --csv stuff still gets loaded into an in-memory database but it's given a name and can then be joined against using SQLite memory.blah syntax.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);