id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason 1224112817,I_kwDOCGYnMM5I9nqx,430,Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases,9308268,open,0,,,2,2022-05-03T13:33:58Z,2022-06-14T23:26:37Z,,NONE,,"I'm trying to figure out a way to get the `table.extract()` method to complete successfully -- I'm not sure if maybe the cause (and a possible solution) of this on Ubuntu Server 22.04 is to adjust some of the PRAGMA values within SQLite itself ... on another Linux system (PopOS), using this method on this same database appears to work just fine. Here's the bit that's causing the error, and the resulting error output: ```python # combine these columns into 1 table ""bib_properties"" : # best_title # bib_level_code # mat_type # material_code # best_author db[""circ_trans""].extract( [""best_title"", ""bib_level_code"", ""mat_type"", ""material_code"", ""best_author""], table=""bib_properties"", fk_column=""bib_properties_id"" ) db[""circ_trans""].extract( [""call_number""], table=""call_number"", fk_column=""call_number_id"", rename={""call_number"": ""value""} ) ``` ```python --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) Input In [17], in () 1 # combine these columns into 1 table ""bib_properties"" : 2 # best_title 3 # bib_level_code 4 # mat_type 5 # material_code 6 # best_author ----> 7 db[""circ_trans""].extract( 8 [""best_title"", ""bib_level_code"", ""mat_type"", ""material_code"", ""best_author""], 9 table=""bib_properties"", 10 fk_column=""bib_properties_id"" 11 ) 13 db[""circ_trans""].extract( 14 [""call_number""], 15 table=""call_number"", 16 fk_column=""call_number_id"", 17 rename={""call_number"": ""value""} 18 ) File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1764, in Table.extract(self, columns, table, fk_column, rename) 1761 column_order.append(c.name) 1763 # Drop the unnecessary columns and rename lookup column -> 1764 self.transform( 1765 drop=set(columns), 1766 rename={magic_lookup_column: fk_column}, 1767 column_order=column_order, 1768 ) 1770 # And add the foreign key constraint 1771 self.add_foreign_key(fk_column, table, ""id"") File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1526, in Table.transform(self, types, rename, drop, pk, not_null, defaults, drop_foreign_keys, column_order) 1524 with self.db.conn: 1525 for sql in sqls: -> 1526 self.db.execute(sql) 1527 # Run the foreign_key_check before we commit 1528 if pragma_foreign_keys_was_on: File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:465, in Database.execute(self, sql, parameters) 463 return self.conn.execute(sql, parameters) 464 else: --> 465 return self.conn.execute(sql) OperationalError: database or disk is full ``` This database is about 17G in total size, so I'm assuming the error is coming from the vacuum ... where i'm assuming it's maybe trying to do the temp storage in a location that doesn't have sufficient room. The disk space is more than ample on the host in question (1.8T is free in the directory where the sqlite db resides) The `/tmp` directory however is limited on a smaller disk associated with the OS I'm trying to think if there's a way to set the `PRAGMA temp_store` or maybe if it's `temp_store_directory` that I'm after ... to use the same local directory of where the file is located (maybe this is a property of the version of sqlite on the system?) ```python # SET the temp file store to be a file ... print(db.execute('PRAGMA temp_store').fetchall()) print(db.execute('PRAGMA temp_store=FILE').fetchall()) print(db.execute('PRAGMA temp_store').fetchall()) # the users home directory ... print(db.execute(""PRAGMA temp_store_directory='/home/plchuser/'"").fetchall()) print(db.execute(""PRAGMA sqlite3_temp_directory='/home/plchuser/'"").fetchall()) print(db.execute(""PRAGMA temp_store_directory"").fetchall()) print(db.execute(""PRAGMA sqlite3_temp_directory"").fetchall()) ``` ```text [(1,)] [] [(1,)] [] [] [('/home/plchuser/',)] [] ``` Here's the docs on the Temporary File Storage Locations https://www.sqlite.org/tempfiles.html",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/430/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1243151184,I_kwDOCGYnMM5KGPtQ,434,`detect_fts()` identifies the wrong table if tables have names that are subsets of each other,559711,closed,0,,,3,2022-05-20T13:28:31Z,2022-06-14T23:24:09Z,2022-06-14T23:24:09Z,NONE,,"Windows 10 Python 3.9.6 When I was running a full text search through the Python library, I noticed that the query was being run on a different full text search table than the one I was trying to search. I took a look at the following function https://github.com/simonw/sqlite-utils/blob/841ad44bacaff05ec79ef78166d12e80c82ba6d7/sqlite_utils/db.py#L2213 and noticed: ```python sql LIKE '%VIRTUAL TABLE%USING FTS%content=%{table}%' ``` My database contains tables with similar names and %{table}% was matching another table that ended differently in its name. I have included a sample test that shows this occurring: I search for Marsupials in db[""books""] and The Clue of the Broken Blade is returned. This occurs since the search for Marsupials was ""successfully"" done against db[""booksb""] and rowid 1 is returned. ""The Clue of the Broken Blade"" has a rowid of 1 in db[""books""] and this is what is returned from the search. ```python def test_fts_search_with_similar_table_names(fresh_db): db = Database(memory=True) db[""books""].insert_all( [ { ""title"": ""The Clue of the Broken Blade"", ""author"": ""Franklin W. Dixon"", }, { ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", }, ] ) db[""booksb""].insert( { ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", } ) db[""booksb""].enable_fts([""title"", ""author""]) db[""books""].enable_fts([""title"", ""author""]) query = ""Marsupials"" assert [ { ""rowid"": 1, ""title"": ""Habits of Australian Marsupials"", ""author"": ""Marlee Hawkins"", }, ] == list(db[""books""].search(query)) ``` ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/434/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1250629388,I_kwDOCGYnMM5KixcM,440,CSV files with too many values in a row cause errors,4068,closed,0,,,20,2022-05-27T10:54:44Z,2022-06-14T22:23:01Z,2022-06-14T20:12:46Z,NONE,,"*Original title: csv.DictReader can have None as key* In some cases, `csv.DictReader` can have `None` as key for unnamed columns, and a list of values as value. `sqlite_utils.utils.rows_from_file` cannot handle that: ```python url=""https://artsdatabanken.no/Fab2018/api/export/csv"" db = sqlite_utils.Database("":memory"") with urlopen(url) as fab: reader, _ = sqlite_utils.utils.rows_from_file(fab, encoding=""utf-16le"") db[""fab2018""].insert_all(reader, pk=""Id"") ``` Result: ``` Traceback (most recent call last): File """", line 3, in File ""/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py"", line 2924, in insert_all chunk = list(chunk) File ""/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py"", line 3454, in fix_square_braces if any(""["" in key or ""]"" in key for key in record.keys()): File ""/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py"", line 3454, in if any(""["" in key or ""]"" in key for key in record.keys()): TypeError: argument of type 'NoneType' is not iterable ``` Code: https://github.com/simonw/sqlite-utils/blob/59be60c471fd7a2c4be7f75e8911163e618ff5ca/sqlite_utils/db.py#L3454 `sqlite-utils insert` from command line is not affected by this issue.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/440/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1236693079,I_kwDOCGYnMM5JtnBX,432,"Support `rows_where()`, `delete_where()` etc for attached alias databases",11597658,open,0,,,5,2022-05-16T06:38:58Z,2022-06-14T22:16:48Z,,NONE,,"Hi, I noticed `rows_where()` doesn't return any rows from tables which are from attached databases. The `exists()` function returns false. As far as I can see this is because the `table_names()` function only looks for table names in the current database and not in attached (or temp) databases. Besides, `rows_where()`, also `insert_all()` and `delete_where()` didn't do what I was expecting because of this. For the moment I've patched `table_names()` for myself, see below but I'm not sure what the total impact is on the other functions like lookup truncate etc which all use `exists()`. Also `view_names()` doesn't look for views in attached or temp databases. ```python def table_names(self, fts4: bool = False, fts5: bool = False) -> List[str]: ""A list of string table names in this database."" where = [""type = 'table'""] if fts4: where.append(""sql like '%USING FTS4%'"") if fts5: where.append(""sql like '%USING FTS5%'"") dbs = [x[1] for x in self.execute('pragma database_list').fetchall()] lst=[] for db in dbs: sql = ""select name from {} where {}"".format(db+"".sqlite_master"","" AND "".join(where)) lst.extend(r[0] for r in self.execute(sql).fetchall()) return lst ```",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/432/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",, 1257724585,I_kwDOCGYnMM5K91qp,441,Combining `rows_where()` and `search()` to limit which rows are searched,1448859,closed,0,,,4,2022-06-02T06:01:55Z,2022-06-14T21:57:57Z,2022-06-14T21:54:38Z,NONE,,"What is the right way to limit a full text search query to some rows of a table? For example, I have a table that contains the following columns: `title`, `content`, `owner` (each row represents a document). The `owner` column is a username. It feels right to store all documents in one table, instead of having one table per owner. In particular because I'd like to full text search all documents, only documents owned by one user and documents owned by a set of users. I tried to combine `.rows_where(""owner = ?"", ""1234"")` and `.search()` from the `Table` class but I don't think that is meant to work. I discovered `.search_sql()` as a way to generate the FTS SQL statement. By hand I can edit it to add a `AND [original].[owner] = :owner` to the `where` clause. This seems to do what I want. My two questions: 1. is adding a `AND ...` to the `where` clause actually the right thing to do or should I be doing something else (my SQL skills are low)? 2. is there a built-in to sqlite-utils way to achieve this? Right now I am thinking I will make my own version of `search_sql()` that generates a query that contains an additional `owner = :owner` for my particular use-case. Bonus question: is this generally useful/something to add to sqlite-utils or too niche?",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/441/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1269886084,I_kwDOCGYnMM5LsOyE,442,`maximize_csv_field_size_limit()` utility function,9599,closed,0,,,2,2022-06-13T19:54:54Z,2022-06-14T21:55:15Z,2022-06-14T21:31:49Z,OWNER,,"This code here runs only if `cli.py` is imported: https://github.com/simonw/sqlite-utils/blob/7ddf5300886a32d6daf60cf1d71efe492b65c87e/sqlite_utils/cli.py#L50-L59 I found myself needing the same fix in another library: - https://github.com/simonw/datasette-socrata/issues/13 It should be a documented utility function.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/442/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1160182768,I_kwDOCGYnMM5FJvvw,412,Optional Pandas integration,9599,open,0,,,13,2022-03-05T01:49:27Z,2022-06-14T15:36:29Z,,OWNER,,"It would be neat if there was a way to use this more seamlessly with Pandas, in particular Pandas dataframes - but without making Pandas a required dependency.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/412/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,