id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason 1988525411,I_kwDOCGYnMM52hn1j,603,Pyhton 3.12 Bug report,1324252,open,0,,,1,2023-11-10T22:57:48Z,2023-12-08T05:10:31Z,,NONE,,"I start with new python3 verison 3.12.0 Also have the error where connect DataBase ``` Traceback (most recent call last): File ""/home/t/Development/python/FKPJ/ClinicSYS/run.py"", line 1, in import re, os, io, json, sqlite_utils, requests, pytz, logging File ""/home/t/.local/lib/python3.12/site-packages/sqlite_utils/__init__.py"", line 1, in from .db import Database File ""/home/t/.local/lib/python3.12/site-packages/sqlite_utils/db.py"", line 277, in class Database: File ""/home/t/.local/lib/python3.12/site-packages/sqlite_utils/db.py"", line 306, in Database filename_or_conn: Optional[Union[str, pathlib.Path, sqlite3.Connection]] = None, ^^^^^^^^^^^^^^^^^^ ``` This bug come from `sqlite-utils` since's v3.33. Anyone get the same ? As well now of the resolved plan just keep the sqlite-utils version in python3.12 with v3.32.1 [tested] but where are the sqlite3.Connection problem.... This won't happen on python version down to 3.11[tested] Just the python3.12.0, I have test this error are come from the sqlite3 connection The error say from `sqlite_utils` and with the sqlite3 Connection, what can I do. Let fix together.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/603/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1733198948,I_kwDOCGYnMM5nToRk,555,Filter table by a large bunch of ids,10843208,open,0,,,1,2023-05-31T00:29:51Z,2023-06-14T22:01:57Z,,NONE,,"Hi! this might be a question related to both SQLite & sqlite-utils, and you might be more experienced with them. I have a large bunch of ids, and I'm wondering which is the best way to query them in terms of performance, and simplicity if possible. The naive approach would be something like `select * from table where rowid in (?, ?, ?...)` but that wouldn't scale if ids are >1k. Another approach might be creating a temp table, or in-memory db table, insert all ids in that table and then join with the target one. I failed to attach an in-memory db both using sqlite-utils, and plain sql's execute(), so my closest approach is something like, ```python def filter_existing_video_ids(video_ids): db = get_db() # contains a ""videos"" table db.execute(""CREATE TEMPORARY TABLE IF NOT EXISTS tmp (video_id TEXT NOT NULL PRIMARY KEY)"") db[""tmp""].insert_all([{""video_id"": video_id} for video_id in video_ids]) for row in db[""tmp""].rows_where(""video_id not in (select video_id from videos)""): yield row[""video_id""] db[""tmp""].drop() ``` That kinda worked, I couldn't find an option in sqlite-utils's `create_table()` to tell it's a temporary table. Also, `tmp` table is not dropped finally, neither using `.drop()` despite being created with the keyword `TEMPORARY`. I believe it should be automatically dropped after connection/session ends though I read.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/555/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1720096994,I_kwDOCGYnMM5mhpji,554,"`IndexError` when doing `.insert(..., pk='id')` after `insert_all`",1231935,open,0,,,1,2023-05-22T17:13:02Z,2023-05-22T17:18:33Z,,NONE,,"I believe this is related to https://github.com/simonw/sqlite-utils/issues/98. When `pk` is specified by table A's `insert` call, it throws an index error if a different table has written a row with a higher rowid than exists in the first table. Here's a basic example: ```py from sqlite_utils import Database def test_pk_for_insert(fresh_db): user = {""id"": ""abc"", ""name"": ""david""} fresh_db[""users""].insert(user, pk=""id"") fresh_db[""comments""].insert_all( [ {""id"": ""def"", ""text"": ""ok""}, {""id"": ""ghi"", ""text"": ""great""}, ], ) fresh_db[""users""].insert( user, ignore=True, # BUG: when specifying pk on the second insert call # db.py goes into a block it doesn't expect and we get the error pk=""id"", ) if __name__ == ""__main__"": db = Database(""bug.db"") if db[""users""].exists(): raise ValueError( ""bug only shows on a new database - remove bug.db before running the script"" ) test_pk_for_insert(db) ``` The error is: ```py File ""/Users/david/projects/reddit-to-sqlite/.venv/lib/python3.11/site-packages/sqlite_utils/db.py"", line 2960, in insert_chunk row = list(self.rows_where(""rowid = ?"", [self.last_rowid]))[0] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` The issue is in this block: https://github.com/simonw/sqlite-utils/blob/2747257a3334d55e890b40ec58fada57ae8cfbfd/sqlite_utils/db.py#L2954-L2958 relevant locals are: - `pk`: `'id'` - `result.lastrowid`: `2` What's most interesting is the comment `# self.last_rowid will be 0 if a ""INSERT OR IGNORE"" happened`, which doesn't seem to be the case here. ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/554/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 907795562,MDU6SXNzdWU5MDc3OTU1NjI=,265,Using enable_fts before search term,36287,open,0,,,1,2021-06-01T01:43:34Z,2023-04-01T17:27:18Z,,NONE,,"Many thanks for the sqlite-utils suite of utilities. Has made my life much much easier. I used this to create a table and enable FTS. All works fine. The datasette utility detects FTS and shows a text box. Searching for a term using that interface works well. However, when I start to use features by following https://www.sqlite.org/fts5.html section **""3. Full-text Query Syntax""** I seem to run into issues that I suspect is due to `escape_fts` wrapper function. As an example, if i search for the term `""^குகை"" `on the text box in datasette it produces 140 results. However, when i tweak the query produced by datasette to not use ""escape_fts"" it produces 5 results. Similarly, when I try to restrict the search to a single column in FTS using a spec like `{title : ^குகை}` it returns no rows. The same thing pulls results when used without `escape_fts`. The text in the table is in Tamil language and the search term is a Tamil word. ``` ... where posts_fts match escape_fts(:search) ``` vs ``` ... where posts_fts match (:search) ``` Any ideas why? How can I get the benefits of both escaping as well as utilizing different facets of providing / controlling search terms? Thanks.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/265/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 836829560,MDU6SXNzdWU4MzY4Mjk1NjA=,248,support for Apache Arrow / parquet files I/O,649467,open,0,,,1,2021-03-20T14:59:30Z,2021-10-28T23:46:48Z,,NONE,,"I just started looking at Apache Arrow using pyarrow for import and export of tabular datasets, and it looks quite compelling. It might be worth looking at for sqlite-utils and/or datasette. As a test, I took a random jsonl data dump of a dataset I have with floats, strings, and ints and converted it to arrow's parquet format using the naive `pyarrow.parquet.write_file()` command, which has automatic type inferrence. It compressed down to 7% of the original size. Conversion of a 26MB JSON file and serializing it to parquet was eyeblink instantaneous. Parquet files are portable and can be directly imported into pandas and other analytics software. The only hangup is the automatic type inference of the naive reader. It's great for general laziness and for parsing JSON columns (it correctly interpreted a table of mine with a JSON array). However, I did get an exception for a string column where most entries looked integer-like but had a couple values that weren't -- the reader tried to coerce all of them for some reason, even though the JSON type is string. Since the writer optionally takes a schema, it shouldn't be too hard to grab the sqlite header types. With some additional hinting, you might get datetime columns and JSON, which are native Arrow types. Somewhat tangentially, someone even wrote an sqlite vfs extension for Parquet: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/248/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,