home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where "updated_at" is on date 2022-06-21 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

issue 3

  • Option for importing CSV data using the SQLite .import mechanism 6
  • Incorrect syntax highlighting in docs CLI reference 3
  • Use Just to automate running tests and linters locally 1

user 1

  • simonw 10

author_association 1

  • OWNER 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1162234441 https://github.com/simonw/sqlite-utils/issues/446#issuecomment-1162234441 https://api.github.com/repos/simonw/sqlite-utils/issues/446 IC_kwDOCGYnMM5FRkpJ simonw 9599 2022-06-21T19:28:35Z 2022-06-21T19:28:35Z OWNER

just -l now does this:

% just -l Available recipes: black # Apply Black cog # Rebuild docs with cog default # Run tests and linters lint # Run linters: black, flake8, mypy, cog test *options # Run pytest with supplied options

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Use Just to automate running tests and linters locally 1277328147  
1162231111 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162231111 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM5FRj1H simonw 9599 2022-06-21T19:25:44Z 2022-06-21T19:25:44Z OWNER

Pushed that prototype to a branch.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
1162223668 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162223668 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM5FRiA0 simonw 9599 2022-06-21T19:19:22Z 2022-06-21T19:22:15Z OWNER

Built a prototype of --fast for the sqlite-utils memory command:

``` % time sqlite-utils memory taxi.csv 'SELECT passenger_count, COUNT(), AVG(total_amount) FROM taxi GROUP BY passenger_count' --fast passenger_count COUNT() AVG(total_amount)


             128020    32.2371511482553

0 42228 17.0214016766151 1 1533197 17.6418833067999 2 286461 18.0975870711456 3 72852 17.9153958710923 4 25510 18.452774990196
5 50291 17.2709248175672 6 32623 17.6002964166367 7 2 87.17
8 2 95.705
9 1 113.6
sqlite-utils memory taxi.csv --fast 12.71s user 0.48s system 104% cpu 12.627 total `` Takes 13s - about the same time as callingsqlite3 :memory: ...` directly as seen in https://til.simonwillison.net/sqlite/one-line-csv-operations

Without the --fast option that takes several minutes (262s = 4m20s)!

Here's the prototype so far:

```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..1c83ef6 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -14,6 +14,8 @@ import io import itertools import json import os +import shutil +import subprocess import sys import csv as csv_std import tabulate @@ -1669,6 +1671,7 @@ def query( is_flag=True, help="Analyze resulting tables and output results", ) +@click.option("--fast", is_flag=True, help="Fast mode, only works with CSV and TSV") @load_extension_option def memory( paths, @@ -1692,6 +1695,7 @@ def memory( save, analyze, load_extension, + fast, ): """Execute SQL query against an in-memory database, optionally populated by imported data

@@ -1719,6 +1723,22 @@ def memory( \b sqlite-utils memory animals.csv --schema """ + if fast: + if ( + attach + or flatten + or param + or encoding + or no_detect_types + or analyze + or load_extension + ): + raise click.ClickException( + "--fast mode does not support any of the following options: --attach, --flatten, --param, --encoding, --no-detect-types, --analyze, --load-extension" + ) + # TODO: Figure out and pass other supported options + memory_fast(paths, sql) + return db = sqlite_utils.Database(memory=True) # If --dump or --save or --analyze used but no paths detected, assume SQL query is a path: if (dump or save or schema or analyze) and not paths: @@ -1791,6 +1811,33 @@ def memory( )

+def memory_fast(paths, sql): + if not shutil.which("sqlite3"): + raise click.ClickException("sqlite3 not found in PATH") + args = ["sqlite3", ":memory:", "-cmd", ".mode csv"] + table_names = [] + + def name(path): + base_name = pathlib.Path(path).stem or "t" + table_name = base_name + prefix = 1 + while table_name in table_names: + prefix += 1 + table_name = "{}_{}".format(base_name, prefix) + return table_name + + for path in paths: + table_name = name(path) + table_names.append(table_name) + args.extend( + ["-cmd", ".import {} {}".format(pathlib.Path(path).resolve(), table_name)] + ) + + args.extend(["-cmd", ".mode column"]) + args.append(sql) + subprocess.run(args) + + def _execute_query( db, sql, param, raw, table, csv, tsv, no_headers, fmt, nl, arrays, json_cols ): ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
1162186856 https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1162186856 https://api.github.com/repos/simonw/sqlite-utils/issues/447 IC_kwDOCGYnMM5FRZBo simonw 9599 2022-06-21T18:48:46Z 2022-06-21T18:48:46Z OWNER

That fixed it:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect syntax highlighting in docs CLI reference 1278571700  
1162179354 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162179354 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM5FRXMa simonw 9599 2022-06-21T18:44:03Z 2022-06-21T18:44:03Z OWNER

The thing I like about that --fast option is that it could selectively use this alternative mechanism just for the files for which it can work (CSV and TSV files). I could also add a --fast option to sqlite-utils memory which could then kick in only for operations that involve just TSV and CSV files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
1161869859 https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1161869859 https://api.github.com/repos/simonw/sqlite-utils/issues/447 IC_kwDOCGYnMM5FQLoj simonw 9599 2022-06-21T15:00:42Z 2022-06-21T15:00:42Z OWNER

Deploying that to https://sqlite-utils.datasette.io/en/latest/cli-reference.html#insert

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect syntax highlighting in docs CLI reference 1278571700  
1161857806 https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1161857806 https://api.github.com/repos/simonw/sqlite-utils/issues/447 IC_kwDOCGYnMM5FQIsO simonw 9599 2022-06-21T14:55:51Z 2022-06-21T14:58:14Z OWNER

https://stackoverflow.com/a/44379513 suggests that the fix is:

.. code-block:: text

Or set this in conf.py:

highlight_language = "none"

I like that better - I don't like that all :: blocks default to being treated as Python code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect syntax highlighting in docs CLI reference 1278571700  
1161849874 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1161849874 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM5FQGwS simonw 9599 2022-06-21T14:49:12Z 2022-06-21T14:49:12Z OWNER

Since there are all sorts of existing options for sqlite-utils insert that won't work with this, maybe it would be better to have an entirely separate command - this for example:

sqlite-utils fast-insert data.db mytable data.csv
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
882052693 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052693 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM40kw5V simonw 9599 2021-07-18T12:57:54Z 2022-06-21T13:17:15Z OWNER

Another implementation option would be to use the CSV virtual table mechanism. This could avoid shelling out to the sqlite3 binary, but requires solving the harder problem of compiling and distributing a loadable SQLite module: https://www.sqlite.org/csv.html

(Would be neat to produce a Python wheel of this, see https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/)

This would also help solve the challenge of making this optimization available to the sqlite-utils memory command. That command operates against an in-memory database so it's not obvious how it could shell out to a binary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
1160991031 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1160991031 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM5FM1E3 simonw 9599 2022-06-21T00:35:20Z 2022-06-21T00:35:20Z OWNER

Relevant TIL: https://til.simonwillison.net/sqlite/one-line-csv-operations

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1425.781ms · About: github-to-sqlite