id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason 1434911255,I_kwDOCGYnMM5VhwIX,510,Cannot enable FTS5 despite it being available,1176293,closed,0,,,3,2022-11-03T16:03:49Z,2022-11-18T18:37:52Z,2022-11-17T10:36:28Z,NONE,,"When I do `sqlite-utils enable-fts my.db table_name column_name` (with or without `--fts5`), I get an FTS4 virtual table instead of the expected FTS5. FTS5 is however available and Python/SQLite versions do not seem to be the issue. I can manually create the FTS5 virtual table, and then Datasette also works with it from this same Python environment. `>>> sqlite3.version` `2.6.0` `>>> sqlite3.sqlite_version` `3.39.4` `PRAGMA compile_options;` includes `ENABLE_FTS5`. `sqlite-utils, version 3.30`. Any ideas what's happening and how to fix?",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/510/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1392690202,I_kwDOCGYnMM5TAsQa,495,Support JSON values returned from .convert() functions,649467,closed,0,,,3,2022-09-30T16:33:49Z,2022-10-25T21:23:37Z,2022-10-25T21:23:28Z,NONE,,"When using the convert function on a JSON column, the result of the conversion function must be a string. If the return value is either a dict (object) or a list (array), the convert call will error out with an unhelpful user defined function exception. It makes sense that since the original column value was a string and required conversion to data structures, the result should be converted back into a JSON string as well. However, other functions auto-convert to JSON string representation, so the fact that convert doesn't could be surprising. At least the documentation should note this requirement, because the sqlite error messages won't readily reveal the issue. Jf only sqlite's JSON column type meant something :)",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/495/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1353074021,I_kwDOCGYnMM5QpkVl,474,Add an option for specifying column names when inserting CSV data,14294,open,0,,,3,2022-08-27T15:29:59Z,2022-08-31T03:42:36Z,,NONE,,"https://sqlite-utils.datasette.io/en/stable/cli.html#csv-files-without-a-header-row > The first row of any CSV or TSV file is expected to contain the names of the columns in that file. > If your file does not include this row, you can use the `--no-headers` option to specify that the tool should not use that fist row as headers. > If you do this, the table will be created with column names called `untitled_1` and `untitled_2` and so on. You can then rename them using the `sqlite-utils transform ... --rename` command. It would be nice to be able to specify the column names when importing CSV/TSV without a header row, via an extra command line option. (renaming a column of a large table can take a long time, which makes it an inconvenient workaround)",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/474/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1227571375,I_kwDOCGYnMM5JK0Cv,431,Allow making m2m relation of a table to itself,738408,open,0,,,3,2022-05-06T08:30:43Z,2022-06-23T14:12:51Z,,NONE,,"I am building a database, in which one of the tables has a many-to-many relationship to itself. As far as I can see, this is not (yet) possible using `.m2m()` in sqlite-utils. This may be a bit of a niche use case, so feel free to close this issue if you feel it would introduce too much complexity compared to the benefits. Example: suppose I have a table of people, and I want to store the information that John and Mary have two children, Michael and Suzy. It would be neat if I could do something like this: ```python from sqlite_utils import Database db = Database(memory=True) db[""people""].insert({""name"": ""John""}, pk=""name"").m2m( ""people"", [{""name"": ""Michael""}, {""name"": ""Suzy""}], m2m_table=""parent_child"", pk=""name"" ) db[""people""].insert({""name"": ""Mary""}, pk=""name"").m2m( ""people"", [{""name"": ""Michael""}, {""name"": ""Suzy""}], m2m_table=""parent_child"", pk=""name"" ) ``` But if I do that, the many-to-many table `parent_child` has only one column: ``` CREATE TABLE [parent_child] ( [people_id] TEXT REFERENCES [people]([name]), PRIMARY KEY ([people_id], [people_id]) ) ``` This could be solved by adding one or two keyword_arguments to `.m2m()`, e.g. `.m2m(..., left_name=None, right_name=None)` or `.m2m(..., names=(None, None))`.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/431/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1243151184,I_kwDOCGYnMM5KGPtQ,434,`detect_fts()` identifies the wrong table if tables have names that are subsets of each other,559711,closed,0,,,3,2022-05-20T13:28:31Z,2022-06-14T23:24:09Z,2022-06-14T23:24:09Z,NONE,,"Windows 10 Python 3.9.6 When I was running a full text search through the Python library, I noticed that the query was being run on a different full text search table than the one I was trying to search. I took a look at the following function https://github.com/simonw/sqlite-utils/blob/841ad44bacaff05ec79ef78166d12e80c82ba6d7/sqlite_utils/db.py#L2213 and noticed: ```python sql LIKE '%VIRTUAL TABLE%USING FTS%content=%{table}%' ``` My database contains tables with similar names and %{table}% was matching another table that ended differently in its name. I have included a sample test that shows this occurring: I search for Marsupials in db[""books""] and The Clue of the Broken Blade is returned. This occurs since the search for Marsupials was ""successfully"" done against db[""booksb""] and rowid 1 is returned. ""The Clue of the Broken Blade"" has a rowid of 1 in db[""books""] and this is what is returned from the search. ```python def test_fts_search_with_similar_table_names(fresh_db): db = Database(memory=True) db[""books""].insert_all( [ { ""title"": ""The Clue of the Broken Blade"", ""author"": ""Franklin W. Dixon"", }, { ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", }, ] ) db[""booksb""].insert( { ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", } ) db[""booksb""].enable_fts([""title"", ""author""]) db[""books""].enable_fts([""title"", ""author""]) query = ""Marsupials"" assert [ { ""rowid"": 1, ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", }, ] == list(db[""books""].search(query)) ``` ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/434/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1175744654,I_kwDOCGYnMM5GFHCO,417,insert fails on JSONL with whitespace,9954,closed,0,,,3,2022-03-21T17:58:14Z,2022-03-25T21:19:06Z,2022-03-25T21:17:13Z,NONE,,"Any JSON that is newline-delimited and has whitespace (newlines) between the start of a JSON object and an attribute fails due to a parse error. e.g. given the valid JSONL: ```{ ""attribute"": ""value"" } { ""attribute"": ""value2"" } ``` I would expect that `sqlite-utils insert --nl my.db mytable file.jsonl` would properly import the data into `mytable`. However, the following error is thrown instead: `json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)` It makes sense that since the file is intended to be newline separated, the thing being parsed is ""{"" (which obviously fails), however the default newline-separated output of `jq` isn't compact. Using `jq -c` avoids this problem, but the fix is unintuitive and undocumented. Proposed solutions: 1. Default to a ""loose"" newline-separated parse; this could be implemented internally as [the equivalent of] a `jq -c` filter ahead of the insert step. 2. Catch the JSONDecodeError (or pre-empt it in the case of a record === ""{\n"") and give the user a ""it looks like your json isn't _actually_ newline-delimited; try running it through `jq -c` instead"" error message. It might just have been too early in the morning when I was playing with this, but running pipes of data through sqlite-utils without the 'knack' of it led to some false starts.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/417/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1123903919,I_kwDOCGYnMM5C_Wmv,397,Support IF NOT EXISTS for table creation,738408,closed,0,,,3,2022-02-04T07:41:15Z,2022-02-06T01:30:46Z,2022-02-06T01:29:01Z,NONE,,"Currently, I have a bunch of code that looks like this: ```python subjects = db[""subjects""] if db[""subjects""].exists() else db[""subjects""].create({ ... }) ``` It would be neat if sqlite-utils could simplify that by supporting `CREATE TABLE IF NOT EXISTS`, so that I'd be able to write, e.g. ```python subjects = db[""subjects""].create({...}, if_not_exists=True) ```",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/397/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 534507142,MDU6SXNzdWU1MzQ1MDcxNDI=,69,Feature request: enable extensions loading,30607,closed,0,,,3,2019-12-08T08:06:25Z,2022-02-05T00:04:25Z,2020-10-16T18:42:49Z,NONE,,"Hi, it would be great to add a parameter that enables the load of a sqlite extension you need. Something like ""-ext modspatialite"". In this way your great tool would be even more comfortable and powerful. Thank you very much",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/69/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1039037439,PR_kwDOCGYnMM4t0uaI,333,Add functionality to read Parquet files.,2118708,closed,0,,,3,2021-10-28T23:43:19Z,2021-11-25T19:47:35Z,2021-11-25T19:47:35Z,NONE,simonw/sqlite-utils/pulls/333,"I needed this for a project of mine, and I thought it'd be useful to have it in sqlite-utils (It's also mentioned in #248 ). The current implementation works (data is read & data types are inferred correctly. I've added a single straightforward test case, but @simonw please let me know if there are any non-obvious flags/combinations I should test too.",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/333/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0, 1004613267,I_kwDOCGYnMM474S6T,328,Invalid JSON output when no rows,12752,closed,0,,,3,2021-09-22T18:37:26Z,2021-09-22T20:21:34Z,2021-09-22T20:20:18Z,NONE,,"`sqlite-utils query` generates a JSON output with the result from the query: ```json [{...},{...}] ``` If no rows are returned by the query, I'm expecting an empty JSON array: ```json [] ``` But actually I'm getting an empty string. To be consistent, the output should be `[]` when the request succeeds (return code == `0`).",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/328/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 925677191,MDU6SXNzdWU5MjU2NzcxOTE=,289,Mypy fixes for rows_from_file(),857609,closed,0,,,3,2021-06-20T20:34:59Z,2021-06-22T18:44:36Z,2021-06-22T18:13:26Z,NONE,,"Following https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927 You had two mypy errors. The first: > sqlite_utils/utils.py:157: error: Argument 1 to ""BufferedReader"" has incompatible type ""BinaryIO""; expected ""RawIOBase"" Looking at the `BufferedReader` docs, it seems to expect a `RawIOBase`, and this [has been copied into typeshed](https://github.com/python/typeshed/blob/9ec2f8712480c57353cea097a65d75a2c4ec1846/stdlib/io.pyi#L100). There may be scope to change how `BufferedReader` is documented and typed upstream, but for now it wouldn't be too bad to use a `typing.cast()`: ``` # Detect the format, then call this recursively buffered = io.BufferedReader( cast(io.RawIOBase, fp), # Undocumented BufferedReader support for BinaryIO buffer_size=4096, ) ``` The second error seems to be flagging a legitimate bug in your code: > sqlite_utils/utils.py:163: error: Argument 1 to ""decode"" of ""bytes"" has incompatible type ""Optional[str]""; expected ""str"" From your type hints, `encoding` may be `None`. In the CSV format block, you use `encoding or ""utf-8-sig""` to set a default, maybe that's desirable in this case too? ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/289/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 919250621,MDU6SXNzdWU5MTkyNTA2MjE=,269,bool type not supported,4068,closed,0,,,3,2021-06-11T22:00:36Z,2021-06-15T01:34:10Z,2021-06-15T01:34:10Z,NONE,,"Hi! Thank you for sharing this very nice tool :) It would be nice to have support for more types, like `bool`: it is not possible to convert to boolean at the moment. My suggestion would be to handle it as `bool(int(value))`, like csvkit does.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/269/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 815554385,MDU6SXNzdWU4MTU1NTQzODU=,237,"db[""my_table""].drop(ignore=True) parameter, plus sqlite-utils drop-table --ignore and drop-view --ignore",649467,closed,0,,,3,2021-02-24T14:55:06Z,2021-02-25T17:11:41Z,2021-02-25T17:11:41Z,NONE,,"When I'm generating a derived table in python, I often drop the table and create it from scratch. However, the first time I generate the table, it doesn't exist, so the drop raises an exception. That means more boilerplate. I was going to submit a pull request that adds an ""if_exists"" option to the `drop` method of tables and views. However, for a utility like sqlite_utils, perhaps the ""IF EXISTS"" SQL semantics is what you want most of the time, and thus should be the default. What do you think?",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/237/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 783778672,MDU6SXNzdWU3ODM3Nzg2NzI=,220,Better error message for *_fts methods against views,649467,closed,0,,,3,2021-01-11T23:24:00Z,2021-02-22T20:44:51Z,2021-02-14T22:34:26Z,NONE,,"enable_fts and its related methods only work on tables, not views. Could those methods and possibly others move up to the Queryable superclass? ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/220/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 807817197,MDU6SXNzdWU4MDc4MTcxOTc=,229,Hitting `_csv.Error: field larger than field limit (131072)`,631242,closed,0,,,3,2021-02-13T19:52:44Z,2021-02-14T21:33:33Z,2021-02-14T21:33:33Z,NONE,,"I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading: ``` _csv.Error: field larger than field limit (131072) ``` The stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633 There is a way to handle this that helps: https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072 One issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is. There is the progress bar, but that is by percent rather than by line number. It would have been helpful if it could have provided a line number. Also, it would have been useful if it had allowed the loading to continue with later lines. ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/229/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 559197745,MDU6SXNzdWU1NTkxOTc3NDU=,82,Tutorial command no longer works,10350886,closed,0,,,3,2020-02-03T16:36:11Z,2020-02-27T04:16:43Z,2020-02-27T04:16:30Z,NONE,,"Issue with command on [tutorial](https://simonwillison.net/2019/Feb/25/sqlite-utils/) on Simon's site. The following command no longer works, and breaks with the previous too many variables error: #50 ``` cmd > curl ""https://data.nasa.gov/resource/y77d-th95.json"" | \ sqlite-utils insert meteorites.db meteorites - --pk=id ``` Output: ``` cmd Traceback (most recent call last): File ""continuum\miniconda3\envs\main\lib\runpy.py"", line 193, in _run_module_as_main ""__main__"", mod_spec) File ""continuum\miniconda3\envs\main\lib\runpy.py"", line 85, in _run_code exec(code, run_globals) File ""Continuum\miniconda3\envs\main\Scripts\sqlite-utils.exe\__main__.py"", line 9, in File ""continuum\miniconda3\envs\main\lib\site-packages\click\core.py"", line 764, in __call__ return self.main(*args, **kwargs) File ""continuum\miniconda3\envs\main\lib\site-packages\click\core.py"", line 717, in main rv = self.invoke(ctx) File ""continuum\miniconda3\envs\main\lib\site-packages\click\core.py"", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File ""continuum\miniconda3\envs\main\lib\site-packages\click\core.py"", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File ""continuum\miniconda3\envs\main\lib\site-packages\click\core.py"", line 555, in invoke return callback(*args, **kwargs) File ""continuum\miniconda3\envs\main\lib\site-packages\sqlite_utils\cli.py"", line 434, in insert default=default, File ""continuum\miniconda3\envs\main\lib\site-packages\sqlite_utils\cli.py"", line 384, in insert_upsert_implementation docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs File ""continuum\miniconda3\envs\main\lib\site-packages\sqlite_utils\db.py"", line 1081, in insert_all result = self.db.conn.execute(query, params) sqlite3.OperationalError: too many SQL variables ``` My thought is that maybe the dataset grew over the last few years and so didn't run into this issue before. No error when I reduce the count of entries to 83. Once the number of entries hits 84 the command fails. // This passes ``` cmd type meteorite_83.txt | sqlite-utils insert meteorites.db meteorites - --pk=id ``` // But this fails ``` cmd type meteorite_84.txt | sqlite-utils insert meteorites.db meteorites - --pk=id ``` A potential fix might be to chunk the incoming data? I can work on a PR if pointed in right direction. ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/82/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 403922644,MDU6SXNzdWU0MDM5MjI2NDQ=,8,Problems handling column names containing spaces or - ,82988,closed,0,,,3,2019-01-28T17:23:28Z,2019-04-14T15:29:33Z,2019-02-23T21:09:03Z,NONE,,"Irrrespective of whether using column names containing a space or - character is good practice, SQLite does allow it, but `sqlite-utils` throws an error in the following cases: ```python from sqlite_utils import Database dbname = 'test.db' DB = Database(sqlite3.connect(dbname)) import pandas as pd df = pd.DataFrame({'col1':range(3), 'col2':range(3)}) #Convert pandas dataframe to appropriate list/dict format DB['test1'].insert_all( df.to_dict(orient='records') ) #Works fine ``` However: ```python df = pd.DataFrame({'col 1':range(3), 'col2':range(3)}) DB['test1'].insert_all(df.to_dict(orient='records')) ``` throws: ``` --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) in () 1 import pandas as pd 2 df = pd.DataFrame({'col 1':range(3), 'col2':range(3)}) ----> 3 DB['test1'].insert_all(df.to_dict(orient='records')) /usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order) 327 jsonify_if_needed(record.get(key, None)) for key in all_columns 328 ) --> 329 result = self.db.conn.execute(sql, values) 330 self.db.conn.commit() 331 self.last_id = result.lastrowid OperationalError: near ""1"": syntax error ``` and: ```python df = pd.DataFrame({'col-1':range(3), 'col2':range(3)}) DB['test1'].upsert_all(df.to_dict(orient='records')) ``` results in: ``` --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) in () 1 import pandas as pd 2 df = pd.DataFrame({'col-1':range(3), 'col2':range(3)}) ----> 3 DB['test1'].insert_all(df.to_dict(orient='records')) /usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order) 327 jsonify_if_needed(record.get(key, None)) for key in all_columns 328 ) --> 329 result = self.db.conn.execute(sql, values) 330 self.db.conn.commit() 331 self.last_id = result.lastrowid OperationalError: near ""-"": syntax error ```",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/8/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed