html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155310521,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155310521,IC_kwDOCGYnMM5E3KO5,9599,2022-06-14T14:58:50Z,2022-06-14T14:58:50Z,OWNER,"Interesting challenge in writing tests for this: if you give `csv.Sniffer` a short example with an invalid row in it sometimes it picks the wrong delimiter!
id,name\r\n1,Cleo,oops
It decided the delimiter there was `e`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155317293,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155317293,IC_kwDOCGYnMM5E3L4t,9599,2022-06-14T15:04:01Z,2022-06-14T15:04:01Z,OWNER,"I think that's unavoidable: it looks like `csv.Sniffer` only works if you feed it a CSV file with an equal number of values in each row, which is understandable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155350755,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155350755,IC_kwDOCGYnMM5E3UDj,9599,2022-06-14T15:25:18Z,2022-06-14T15:25:18Z,OWNER,"That broke `mypy`:
`sqlite_utils/utils.py:229: error: Incompatible types in assignment (expression has type ""Iterable[Dict[Any, Any]]"", variable has type ""DictReader[str]"")`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155358637,IC_kwDOCGYnMM5E3V-t,9599,2022-06-14T15:31:34Z,2022-06-14T15:31:34Z,OWNER,"Getting this past `mypy` is really hard!
```
% mypy sqlite_utils
sqlite_utils/utils.py:189: error: No overload variant of ""pop"" of ""MutableMapping"" matches argument type ""None""
sqlite_utils/utils.py:189: note: Possible overload variants:
sqlite_utils/utils.py:189: note: def pop(self, key: str) -> str
sqlite_utils/utils.py:189: note: def [_T] pop(self, key: str, default: Union[str, _T] = ...) -> Union[str, _T]
```
That's because of this line:
row.pop(key=None)
Which is legit here - we have a dictionary where one of the keys is `None` and we want to remove that key. But the baked in type is apparently `def pop(self, key: str) -> str`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1155364367,https://api.github.com/repos/simonw/sqlite-utils/issues/412,1155364367,IC_kwDOCGYnMM5E3XYP,9599,2022-06-14T15:36:28Z,2022-06-14T15:36:28Z,OWNER,"Here's as far as I got with my initial prototype, in `sqlite_utils/pandas.py`:
```python
from .db import Database as _Database, Table as _Table, View as _View
import pandas as pd
from typing import (
Iterable,
Union,
Optional,
)
class Database(_Database):
def query(
self, sql: str, params: Optional[Union[Iterable, dict]] = None
) -> pd.DataFrame:
return pd.DataFrame(super().query(sql, params))
def table(self, table_name: str, **kwargs) -> Union[""Table"", ""View""]:
""Return a table object, optionally configured with default options.""
klass = View if table_name in self.view_names() else Table
return klass(self, table_name, **kwargs)
class PandasQueryable:
def rows_where(
self,
where: str = None,
where_args: Optional[Union[Iterable, dict]] = None,
order_by: str = None,
select: str = ""*"",
limit: int = None,
offset: int = None,
) -> pd.DataFrame:
return pd.DataFrame(
super().rows_where(
where,
where_args,
order_by=order_by,
select=select,
limit=limit,
offset=offset,
)
)
class Table(PandasQueryable, _Table):
pass
class View(PandasQueryable, _View):
pass
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1160182768,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155389614,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155389614,IC_kwDOCGYnMM5E3diu,9599,2022-06-14T15:54:03Z,2022-06-14T15:54:03Z,OWNER,"Filed an issue against `python/typeshed`:
- https://github.com/python/typeshed/issues/8075","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155421299,https://api.github.com/repos/simonw/sqlite-utils/issues/441,1155421299,IC_kwDOCGYnMM5E3lRz,9599,2022-06-14T16:23:52Z,2022-06-14T16:23:52Z,OWNER,Actually I have a thought for something that could help here: I could add a mechanism for inserting additional where filters and parameters into that `.search()` method.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1257724585,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155666672,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155666672,IC_kwDOCGYnMM5E4hLw,9599,2022-06-14T20:11:52Z,2022-06-14T20:11:52Z,OWNER,I'm going to rename `restkey` to `extras_key` for consistency with `ignore_extras`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/443#issuecomment-1155672522,https://api.github.com/repos/simonw/sqlite-utils/issues/443,1155672522,IC_kwDOCGYnMM5E4inK,9599,2022-06-14T20:18:58Z,2022-06-14T20:18:58Z,OWNER,New documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1269998342,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155672675,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155672675,IC_kwDOCGYnMM5E4ipj,9599,2022-06-14T20:19:07Z,2022-06-14T20:19:07Z,OWNER,Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file,"{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/442#issuecomment-1155714131,https://api.github.com/repos/simonw/sqlite-utils/issues/442,1155714131,IC_kwDOCGYnMM5E4sxT,9599,2022-06-14T21:07:50Z,2022-06-14T21:07:50Z,OWNER,"Here's the commit where I added that originally, including a test: https://github.com/simonw/sqlite-utils/commit/1a93b72ba710ea2271eaabc204685a27d2469374","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1269886084,
https://github.com/simonw/sqlite-utils/issues/442#issuecomment-1155748444,https://api.github.com/repos/simonw/sqlite-utils/issues/442,1155748444,IC_kwDOCGYnMM5E41Jc,9599,2022-06-14T21:55:15Z,2022-06-14T21:55:15Z,OWNER,Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#setting-the-maximum-csv-field-size-limit,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1269886084,
https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1155749696,https://api.github.com/repos/simonw/sqlite-utils/issues/433,1155749696,IC_kwDOCGYnMM5E41dA,9599,2022-06-14T21:57:05Z,2022-06-14T21:57:05Z,OWNER,Marking this as help wanted because I can't figure out how to replicate it!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239034903,
https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155750270,https://api.github.com/repos/simonw/sqlite-utils/issues/441,1155750270,IC_kwDOCGYnMM5E41l-,9599,2022-06-14T21:57:57Z,2022-06-14T21:57:57Z,OWNER,I added `where=` and `where_args=` parameters to that `.search()` method - updated documentation is here: https://sqlite-utils.datasette.io/en/latest/python-api.html#searching-with-table-search,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1257724585,
https://github.com/simonw/sqlite-utils/issues/431#issuecomment-1155753397,https://api.github.com/repos/simonw/sqlite-utils/issues/431,1155753397,IC_kwDOCGYnMM5E42W1,9599,2022-06-14T22:01:38Z,2022-06-14T22:01:38Z,OWNER,"Yeah, I think it would be neat if the library could support self-referential many-to-many in a nice way.
I'm not sure about the `left_name/right_name` design though. Would it be possible to have this work as the user intends, by spotting that the other table name `""people""` matches the name of the current table?
```python
db[""people""].insert({""name"": ""Mary""}, pk=""name"").m2m(
""people"", [{""name"": ""Michael""}, {""name"": ""Suzy""}], m2m_table=""parent_child"", pk=""name""
)
```
The created table could look like this:
```sql
CREATE TABLE [parent_child] (
[people_id_1] TEXT REFERENCES [people]([name]),
[people_id_2] TEXT REFERENCES [people]([name]),
PRIMARY KEY ([people_id_1], [people_id_2])
)
```
I've not thought very hard about this, so the design I'm proposing here might not work.
Are there other reasons people might wan the `left_name=` and `right_name=` parameters? If so then I'm much happier with those.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1227571375,
https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155756742,https://api.github.com/repos/simonw/sqlite-utils/issues/432,1155756742,IC_kwDOCGYnMM5E43LG,9599,2022-06-14T22:05:38Z,2022-06-14T22:05:49Z,OWNER,"I don't like the idea of `table_names()` returning names of tables from connected databases as well, because it feels like it could lead to surprising behaviour - especially if those connected databases turn to have table names that are duplicated in the main connected database.
It would be neat if functions like `.rows_where()` worked though.
One thought would be to support something like this:
```python
rows = db[""otherdb.tablename""].rows_where()
```
But... `.` is a valid character in a SQLite table name. So `""otherdb.tablename""` might ambiguously refer to a table called `tablename` in a connected database with the alias `otherdb`, OR a table in the current database with the name `otherdb.tablename`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1236693079,
https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155758664,https://api.github.com/repos/simonw/sqlite-utils/issues/432,1155758664,IC_kwDOCGYnMM5E43pI,9599,2022-06-14T22:07:50Z,2022-06-14T22:07:50Z,OWNER,"Another potential fix: add a `alias=` parameter to `rows_where()` and other similar methods. Then you could do this:
```python
rows = db[""tablename""].rows_where(alias=""otherdb"")
```
This feels wrong to me: `db[""tablename""]` is the bit that is supposed to return a table object. Having part of what that table object is exist as a parameter to other methods is confusing.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1236693079,
https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155759857,https://api.github.com/repos/simonw/sqlite-utils/issues/432,1155759857,IC_kwDOCGYnMM5E437x,9599,2022-06-14T22:09:07Z,2022-06-14T22:09:07Z,OWNER,"Third option, and I think the one I like the best:
```python
rows = db.table(""tablename"", alias=""otherdb"").rows_where(alias=""otherdb"")
```
The `db.table(tablename)` method already exists as an alternative to `db[tablename]`: https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-table-configuration
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1236693079,
https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155764064,https://api.github.com/repos/simonw/sqlite-utils/issues/432,1155764064,IC_kwDOCGYnMM5E449g,9599,2022-06-14T22:15:44Z,2022-06-14T22:15:44Z,OWNER,"Implementing this would be a pretty big change - initial instinct is that I'd need to introduce a `self.alias` property to `Queryable` (the subclass of `Table` and `View`) and a new `self.name_with_alias` getter which returns `alias.tablename` if `alias` is set to a not-None value. Then I'd need to rewrite every piece of code like this:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1161
To look like this instead:
```python
sql = ""select {} from [{}]"".format(select, self.name_with_alias)
```
But some parts would be harder - for example:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1227-L1231
Would have to know to query `alias.sqlite_master` instead.
The cached table counts logic like this would need a bunch of changes too:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L644-L657","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1236693079,
https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155764428,https://api.github.com/repos/simonw/sqlite-utils/issues/432,1155764428,IC_kwDOCGYnMM5E45DM,9599,2022-06-14T22:16:21Z,2022-06-14T22:16:21Z,OWNER,"Initial idea of how the `.table()` method would change:
```diff
diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 7a06304..3ecb40b 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -474,11 +474,12 @@ class Database:
self._tracer(sql, None)
return self.conn.executescript(sql)
- def table(self, table_name: str, **kwargs) -> Union[""Table"", ""View""]:
+ def table(self, table_name: str, alias: Optional[str] = None, **kwargs) -> Union[""Table"", ""View""]:
""""""
Return a table object, optionally configured with default options.
:param table_name: Name of the table
+ :param alias: The database alias to use, if referring to a table in another connected database
""""""
klass = View if table_name in self.view_names() else Table
return klass(self, table_name, **kwargs)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1236693079,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155767202,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155767202,IC_kwDOCGYnMM5E45ui,9599,2022-06-14T22:21:10Z,2022-06-14T22:21:10Z,OWNER,"I can't figure out why that error is being swallowed like that. The most likely culprit was this code:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L1021-L1043
But I tried changing it like this:
```diff
diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index 86eddfb..ed26fdd 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -1023,6 +1023,7 @@ def insert_upsert_implementation(
docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs
)
except Exception as e:
+ raise
if (
isinstance(e, sqlite3.OperationalError)
and e.args
```
And your steps to reproduce still got to 49% and then failed silently.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155767915,https://api.github.com/repos/simonw/sqlite-utils/issues/440,1155767915,IC_kwDOCGYnMM5E455r,9599,2022-06-14T22:22:27Z,2022-06-14T22:22:27Z,OWNER,I forgot to add equivalents of `extras_key=` and `ignore_extras=` to the CLI tool - will do that in a separate issue.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250629388,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155769216,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155769216,IC_kwDOCGYnMM5E46OA,9599,2022-06-14T22:24:49Z,2022-06-14T22:25:06Z,OWNER,"I have a hunch that this crash may be caused by a CSV value which is too long, as addressed at the library level in:
- #440
But not yet addressed in the CLI tool, see:
- #444
Either way though, I really don't like that errors like this are swallowed!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155771462,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155771462,IC_kwDOCGYnMM5E46xG,9599,2022-06-14T22:28:38Z,2022-06-14T22:28:38Z,OWNER,"Maybe this isn't a CSV field value problem - I tried this patch and didn't seem to hit the new breakpoints:
```diff
diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py
index d2ccc5f..f1b823a 100644
--- a/sqlite_utils/utils.py
+++ b/sqlite_utils/utils.py
@@ -204,13 +204,17 @@ def _extra_key_strategy(
# DictReader adds a 'None' key with extra row values
if None not in row:
yield row
- elif ignore_extras:
+ continue
+ else:
+ breakpoint()
+ if ignore_extras:
# ignoring row.pop(none) because of this issue:
# https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637
row.pop(None) # type: ignore
yield row
elif not extras_key:
extras = row.pop(None) # type: ignore
+ breakpoint()
raise RowError(
""Row {} contained these extra values: {}"".format(row, extras)
)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155772244,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155772244,IC_kwDOCGYnMM5E469U,9599,2022-06-14T22:30:03Z,2022-06-14T22:30:03Z,OWNER,"Tried this:
```
% python -i $(which sqlite-utils) insert --csv --delimiter "";"" --encoding ""utf-16-le"" test test.db csv
[------------------------------------] 0%
[#################-------------------] 49% 00:00:01Traceback (most recent call last):
File ""/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py"", line 1072, in main
ctx.exit()
File ""/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py"", line 692, in exit
raise Exit(code)
click.exceptions.Exit: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ""/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils"", line 33, in
sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')())
File ""/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py"", line 1137, in __call__
return self.main(*args, **kwargs)
File ""/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py"", line 1090, in main
sys.exit(e.exit_code)
SystemExit: 0
>>>
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155776023,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155776023,IC_kwDOCGYnMM5E474X,9599,2022-06-14T22:36:07Z,2022-06-14T22:36:07Z,OWNER,"Wait! The arguments in that are the wrong way round. This is correct:
sqlite-utils insert --csv --delimiter "";"" --encoding ""utf-16-le"" test.db test csv
It still outputs the following:
[------------------------------------] 0%
[#################-------------------] 49% 00:00:02%
But it creates a `test.db` file that is 6.2MB.
That database has 3141 rows in it:
```
% sqlite-utils tables test.db --counts -t
table count
------- -------
test 3142
```
I converted that `csv` file to utf-8 like so:
iconv -f UTF-16LE -t UTF-8 csv > utf8.csv
And it contains 3142 lines:
```
% wc -l utf8.csv
3142 utf8.csv
```
So my hunch here is that the problem is actually that the progress bar doesn't know how to correctly measure files in `utf-16-le` encoding!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155781399,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155781399,IC_kwDOCGYnMM5E49MX,9599,2022-06-14T22:45:41Z,2022-06-14T22:45:41Z,OWNER,TIL how to use `iconv`: https://til.simonwillison.net/linux/iconv,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155782835,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155782835,IC_kwDOCGYnMM5E49iz,9599,2022-06-14T22:48:22Z,2022-06-14T22:49:53Z,OWNER,"Here's the code that implements the progress bar in question: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L918-L932
It calls `file_progress()` which looks like this:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L159-L175
Which uses this:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L148-L156","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155784284,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155784284,IC_kwDOCGYnMM5E495c,9599,2022-06-14T22:51:03Z,2022-06-14T22:52:13Z,OWNER,"Yes, this is the problem. The progress bar length is set to the length in bytes of the file - `os.path.getsize(file.name)` - but it's then incremented by the length of each DECODED line in turn.
So if the file is in `utf-16-le` (twice the size of `utf-8`) the progress bar will finish at 50%!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155788944,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155788944,IC_kwDOCGYnMM5E4_CQ,9599,2022-06-14T23:00:24Z,2022-06-14T23:00:24Z,OWNER,"The progress bar only works if the file-like object passed to it has a `fp.fileno()` that isn't 0 (for stdin) - that's how it detects that the file is something which it can measure the size of in order to show progress.
If we know the file size in bytes AND we know the character encoding, can we change `UpdateWrapper` to update the number of bytes-per-character instead?
I don't think so: I can't see a way of definitively saying ""for this encoding the number of bytes per character is X"" - and in fact I'm pretty sure that question doesn't even make sense since variable-length encodings exist.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155789101,https://api.github.com/repos/simonw/sqlite-utils/issues/439,1155789101,IC_kwDOCGYnMM5E4_Et,9599,2022-06-14T23:00:45Z,2022-06-14T23:00:45Z,OWNER,"I'm going to mark this as ""help wanted"" and leave it open. I'm glad that it's not actually a bug where errors get swallowed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1250495688,
https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155791109,https://api.github.com/repos/simonw/sqlite-utils/issues/434,1155791109,IC_kwDOCGYnMM5E4_kF,9599,2022-06-14T23:04:40Z,2022-06-14T23:04:40Z,OWNER,"Definitely a bug - thanks for the detailed write-up!
You're right, the code at fault is here:
https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L2213-L2231","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1243151184,
https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155794149,https://api.github.com/repos/simonw/sqlite-utils/issues/434,1155794149,IC_kwDOCGYnMM5E5ATl,9599,2022-06-14T23:09:54Z,2022-06-14T23:09:54Z,OWNER,"A test that demonstrates the problem:
```python
@pytest.mark.parametrize(""reverse_order"", (True, False))
def test_detect_fts_similar_tables(fresh_db, reverse_order):
# https://github.com/simonw/sqlite-utils/issues/434
table1, table2 = (""demo"", ""demo2"")
if reverse_order:
table1, table2 = table2, table1
fresh_db[table1].insert({""title"": ""Hello""}).enable_fts(
[""title""], fts_version=""FTS4""
)
fresh_db[table2].insert({""title"": ""Hello""}).enable_fts(
[""title""], fts_version=""FTS4""
)
assert fresh_db[table1].detect_fts() == ""{}_fts"".format(table1)
assert fresh_db[table2].detect_fts() == ""{}_fts"".format(table2)
```
The order matters - so this test currently passes in one direction and fails in the other:
```
> assert fresh_db[table2].detect_fts() == ""{}_fts"".format(table2)
E AssertionError: assert 'demo2_fts' == 'demo_fts'
E - demo_fts
E + demo2_fts
E ? +
tests/test_introspect.py:53: AssertionError
========================================================================================= short test summary info =========================================================================================
FAILED tests/test_introspect.py::test_detect_fts_similar_tables[True] - AssertionError: assert 'demo2_fts' == 'demo_fts'
=============================================================================== 1 failed, 1 passed, 855 deselected in 1.00s ===============================================================================
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1243151184,
https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155801812,https://api.github.com/repos/simonw/sqlite-utils/issues/434,1155801812,IC_kwDOCGYnMM5E5CLU,9599,2022-06-14T23:23:32Z,2022-06-14T23:23:32Z,OWNER,"Since table names can be quoted like this:
```sql
CREATE VIRTUAL TABLE ""searchable_fts""
USING FTS4 (text1, text2, [name with . and spaces], content=""searchable"")
```
OR like this:
```sql
CREATE VIRTUAL TABLE ""searchable_fts""
USING FTS4 (text1, text2, [name with . and spaces], content=[searchable])
```
This fix looks to be correct to me (copying from the updated `test_with_trace()` test):
```python
(
""SELECT name FROM sqlite_master\n""
"" WHERE rootpage = 0\n""
"" AND (\n""
"" sql LIKE :like\n""
"" OR sql LIKE :like2\n""
"" OR (\n""
"" tbl_name = :table\n""
"" AND sql LIKE '%VIRTUAL TABLE%USING FTS%'\n""
"" )\n""
"" )"",
{
""like"": ""%VIRTUAL TABLE%USING FTS%content=[dogs]%"",
""like2"": '%VIRTUAL TABLE%USING FTS%content=""dogs""%',
""table"": ""dogs"",
},
)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1243151184,
https://github.com/simonw/sqlite-utils/issues/430#issuecomment-1155803262,https://api.github.com/repos/simonw/sqlite-utils/issues/430,1155803262,IC_kwDOCGYnMM5E5Ch-,9599,2022-06-14T23:26:11Z,2022-06-14T23:26:11Z,OWNER,"It looks like `PRAGMA temp_store` was the right option to use here: https://www.sqlite.org/pragma.html#pragma_temp_store
`temp_store_directory` is listed as deprecated here: https://www.sqlite.org/pragma.html#pragma_temp_store_directory
I'm going to turn this into a help-wanted documentation issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1224112817,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804459,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155804459,IC_kwDOCGYnMM5E5C0r,9599,2022-06-14T23:28:18Z,2022-06-14T23:28:18Z,OWNER,"I think these become part of the `_import_options` list which is used in a few places:
https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L765-L800","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804591,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155804591,IC_kwDOCGYnMM5E5C2v,9599,2022-06-14T23:28:36Z,2022-06-14T23:28:36Z,OWNER,I'm going with `--extras-key` and `--ignore-extras` as the two new options.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815186,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155815186,IC_kwDOCGYnMM5E5FcS,9599,2022-06-14T23:48:16Z,2022-06-14T23:48:16Z,OWNER,"This is tricky to implement because of this code: https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L938-L945
It's reconstructing each document using the known headers here:
`docs = (dict(zip(headers, row)) for row in reader)`
So my first attempt at this - the diff here - did not have the desired result:
```diff
diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index 86eddfb..00b920b 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -6,7 +6,7 @@ import hashlib
import pathlib
import sqlite_utils
from sqlite_utils.db import AlterError, BadMultiValues, DescIndex
-from sqlite_utils.utils import maximize_csv_field_size_limit
+from sqlite_utils.utils import maximize_csv_field_size_limit, _extra_key_strategy
from sqlite_utils import recipes
import textwrap
import inspect
@@ -797,6 +797,15 @@ _import_options = (
""--encoding"",
help=""Character encoding for input, defaults to utf-8"",
),
+ click.option(
+ ""--ignore-extras"",
+ is_flag=True,
+ help=""If a CSV line has more than the expected number of values, ignore the extras"",
+ ),
+ click.option(
+ ""--extras-key"",
+ help=""If a CSV line has more than the expected number of values put them in a list in this column"",
+ ),
)
@@ -885,6 +894,8 @@ def insert_upsert_implementation(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter,
upsert,
@@ -909,6 +920,10 @@ def insert_upsert_implementation(
raise click.ClickException(""--flatten cannot be used with --csv or --tsv"")
if encoding and not (csv or tsv):
raise click.ClickException(""--encoding must be used with --csv or --tsv"")
+ if ignore_extras and extras_key:
+ raise click.ClickException(
+ ""--ignore-extras and --extras-key cannot be used together""
+ )
if pk and len(pk) == 1:
pk = pk[0]
encoding = encoding or ""utf-8-sig""
@@ -935,7 +950,9 @@ def insert_upsert_implementation(
csv_reader_args[""delimiter""] = delimiter
if quotechar:
csv_reader_args[""quotechar""] = quotechar
- reader = csv_std.reader(decoded, **csv_reader_args)
+ reader = _extra_key_strategy(
+ csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key
+ )
first_row = next(reader)
if no_headers:
headers = [""untitled_{}"".format(i + 1) for i in range(len(first_row))]
@@ -1101,6 +1118,8 @@ def insert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter,
detect_types,
@@ -1176,6 +1195,8 @@ def insert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter=alter,
upsert=False,
@@ -1214,6 +1235,8 @@ def upsert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
alter,
not_null,
default,
@@ -1254,6 +1277,8 @@ def upsert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter=alter,
upsert=True,
@@ -1297,6 +1322,8 @@ def bulk(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
load_extension,
):
""""""
@@ -1331,6 +1358,8 @@ def bulk(
sniff=sniff,
no_headers=no_headers,
encoding=encoding,
+ ignore_extras=ignore_extras,
+ extras_key=extras_key,
batch_size=batch_size,
alter=False,
upsert=False,
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815956,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155815956,IC_kwDOCGYnMM5E5FoU,9599,2022-06-14T23:49:56Z,2022-07-07T16:39:18Z,OWNER,"Yeah my initial implementation there makes no sense:
```python
csv_reader_args = {""dialect"": dialect}
if delimiter:
csv_reader_args[""delimiter""] = delimiter
if quotechar:
csv_reader_args[""quotechar""] = quotechar
reader = _extra_key_strategy(
csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key
)
first_row = next(reader)
if no_headers:
headers = [""untitled_{}"".format(i + 1) for i in range(len(first_row))]
reader = itertools.chain([first_row], reader)
else:
headers = first_row
docs = (dict(zip(headers, row)) for row in reader)
```
Because my `_extra_key_strategy()` helper function is designed to work against `csv.DictReader` - not against `csv.reader()` which returns a sequence of lists, not a sequence of dictionaries.
In fact, what's happening here is that `dict(zip(headers, row))` is ignoring anything in the row that doesn't correspond to a header:
```pycon
>>> list(zip([""a"", ""b""], [1, 2, 3]))
[('a', 1), ('b', 2)]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155515426,https://api.github.com/repos/simonw/sqlite-utils/issues/441,1155515426,IC_kwDOCGYnMM5E38Qi,1448859,2022-06-14T17:53:43Z,2022-06-14T17:53:43Z,NONE,"That would be handy (additional where filters) but I think the trick with the `with` statement is already an order of magnitude better than what I had thought of, so my problem is solved by it (plus I got to learn about `with` today!)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1257724585,