github

This data as json, CSV

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
1160182768	I_kwDOCGYnMM5FJvvw	412	Optional Pandas integration	9599	open	0			13	2022-03-05T01:49:27Z	2022-06-14T15:36:29Z		OWNER		It would be neat if there was a way to use this more seamlessly with Pandas, in particular Pandas dataframes - but without making Pandas a required dependency.	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }
1224112817	I_kwDOCGYnMM5I9nqx	430	Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases	9308268	open	0			2	2022-05-03T13:33:58Z	2022-06-14T23:26:37Z		NONE		I'm trying to figure out a way to get the `table.extract()` method to complete successfully -- I'm not sure if maybe the cause (and a possible solution) of this on Ubuntu Server 22.04 is to adjust some of the PRAGMA values within SQLite itself ... on another Linux system (PopOS), using this method on this same database appears to work just fine. Here's the bit that's causing the error, and the resulting error output: ```python # combine these columns into 1 table "bib_properties" : # best_title # bib_level_code # mat_type # material_code # best_author db["circ_trans"].extract( ["best_title", "bib_level_code", "mat_type", "material_code", "best_author"], table="bib_properties", fk_column="bib_properties_id" ) db["circ_trans"].extract( ["call_number"], table="call_number", fk_column="call_number_id", rename={"call_number": "value"} ) ``` ```python --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) Input In [17], in <cell line: 7>() 1 # combine these columns into 1 table "bib_properties" : 2 # best_title 3 # bib_level_code 4 # mat_type 5 # material_code 6 # best_author ----> 7 db["circ_trans"].extract( 8 ["best_title", "bib_level_code", "mat_type", "material_code", "best_author"], 9 table="bib_properties", 10 fk_column="bib_properties_id" 11 ) 13 db["circ_trans"].extract( 14 ["call_number"], 15 table="call_number", 16 fk_column="call_number_id", 17 rename={"call_number": "value"} 18 ) File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1764, in Table.extract(self, columns, table, fk_column, rename) 1761 column_order.append(c.name) 1763 # Drop the unnecessary columns and rename lookup column -> 1764 self.transform( 1765 drop=set(columns), 1766 rename={magic_lookup_column:…	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/430/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }
1236693079	I_kwDOCGYnMM5JtnBX	432	Support `rows_where()`, `delete_where()` etc for attached alias databases	11597658	open	0			5	2022-05-16T06:38:58Z	2022-06-14T22:16:48Z		NONE		Hi, I noticed `rows_where()` doesn't return any rows from tables which are from attached databases. The `exists()` function returns false. As far as I can see this is because the `table_names()` function only looks for table names in the current database and not in attached (or temp) databases. Besides, `rows_where()`, also `insert_all()` and `delete_where()` didn't do what I was expecting because of this. For the moment I've patched `table_names()` for myself, see below but I'm not sure what the total impact is on the other functions like lookup truncate etc which all use `exists()`. Also `view_names()` doesn't look for views in attached or temp databases. ```python def table_names(self, fts4: bool = False, fts5: bool = False) -> List[str]: "A list of string table names in this database." where = ["type = 'table'"] if fts4: where.append("sql like '%USING FTS4%'") if fts5: where.append("sql like '%USING FTS5%'") dbs = [x[1] for x in self.execute('pragma database_list').fetchall()] lst=[] for db in dbs: sql = "select name from {} where {}".format(db+".sqlite_master"," AND ".join(where)) lst.extend(r[0] for r in self.execute(sql).fetchall()) return lst ```	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/432/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }
1243151184	I_kwDOCGYnMM5KGPtQ	434	`detect_fts()` identifies the wrong table if tables have names that are subsets of each other	559711	closed	0			3	2022-05-20T13:28:31Z	2022-06-14T23:24:09Z	2022-06-14T23:24:09Z	NONE		Windows 10 Python 3.9.6 When I was running a full text search through the Python library, I noticed that the query was being run on a different full text search table than the one I was trying to search. I took a look at the following function https://github.com/simonw/sqlite-utils/blob/841ad44bacaff05ec79ef78166d12e80c82ba6d7/sqlite_utils/db.py#L2213 and noticed: ```python sql LIKE '%VIRTUAL TABLE%USING FTS%content=%{table}%' ``` My database contains tables with similar names and %{table}% was matching another table that ended differently in its name. I have included a sample test that shows this occurring: I search for Marsupials in db["books"] and The Clue of the Broken Blade is returned. This occurs since the search for Marsupials was "successfully" done against db["booksb"] and rowid 1 is returned. "The Clue of the Broken Blade" has a rowid of 1 in db["books"] and this is what is returned from the search. ```python def test_fts_search_with_similar_table_names(fresh_db): db = Database(memory=True) db["books"].insert_all( [ { "title": "The Clue of the Broken Blade", "author": "Franklin W. Dixon", }, { "title": "Habits of Australian Marsupials", "author": "Marlee Hawkins", }, ] ) db["booksb"].insert( { "title": "Habits of Australian Marsupials", "author": "Marlee Hawkins", } ) db["booksb"].enable_fts(["title", "author"]) db["books"].enable_fts(["title", "author"]) query = "Marsupials" assert [ { "rowid": 1, "title": "Habits of Australian Marsupials", "author": "Marlee Hawkins", }, ] == list(db["books"].search(query)) ```	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/434/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
1250629388	I_kwDOCGYnMM5KixcM	440	CSV files with too many values in a row cause errors	4068	closed	0			20	2022-05-27T10:54:44Z	2022-06-14T22:23:01Z	2022-06-14T20:12:46Z	NONE		Original title: csv.DictReader can have None as key In some cases, `csv.DictReader` can have `None` as key for unnamed columns, and a list of values as value. `sqlite_utils.utils.rows_from_file` cannot handle that: ```python url="https://artsdatabanken.no/Fab2018/api/export/csv" db = sqlite_utils.Database(":memory") with urlopen(url) as fab: reader, _ = sqlite_utils.utils.rows_from_file(fab, encoding="utf-16le") db["fab2018"].insert_all(reader, pk="Id") ``` Result: ``` Traceback (most recent call last): File "<stdin>", line 3, in <module> File "/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py", line 2924, in insert_all chunk = list(chunk) File "/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py", line 3454, in fix_square_braces if any("[" in key or "]" in key for key in record.keys()): File "/home/user/.local/pipx/venvs/sqlite-utils/lib/python3.8/site-packages/sqlite_utils/db.py", line 3454, in <genexpr> if any("[" in key or "]" in key for key in record.keys()): TypeError: argument of type 'NoneType' is not iterable ``` Code: https://github.com/simonw/sqlite-utils/blob/59be60c471fd7a2c4be7f75e8911163e618ff5ca/sqlite_utils/db.py#L3454 `sqlite-utils insert` from command line is not affected by this issue.	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
1257724585	I_kwDOCGYnMM5K91qp	441	Combining `rows_where()` and `search()` to limit which rows are searched	1448859	closed	0			4	2022-06-02T06:01:55Z	2022-06-14T21:57:57Z	2022-06-14T21:54:38Z	NONE		What is the right way to limit a full text search query to some rows of a table? For example, I have a table that contains the following columns: `title`, `content`, `owner` (each row represents a document). The `owner` column is a username. It feels right to store all documents in one table, instead of having one table per owner. In particular because I'd like to full text search all documents, only documents owned by one user and documents owned by a set of users. I tried to combine `.rows_where("owner = ?", "1234")` and `.search()` from the `Table` class but I don't think that is meant to work. I discovered `.search_sql()` as a way to generate the FTS SQL statement. By hand I can edit it to add a `AND [original].[owner] = :owner` to the `where` clause. This seems to do what I want. My two questions: 1. is adding a `AND ...` to the `where` clause actually the right thing to do or should I be doing something else (my SQL skills are low)? 2. is there a built-in to sqlite-utils way to achieve this? Right now I am thinking I will make my own version of `search_sql()` that generates a query that contains an additional `owner = :owner` for my particular use-case. Bonus question: is this generally useful/something to add to sqlite-utils or too niche?	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/441/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
1269886084	I_kwDOCGYnMM5LsOyE	442	`maximize_csv_field_size_limit()` utility function	9599	closed	0			2	2022-06-13T19:54:54Z	2022-06-14T21:55:15Z	2022-06-14T21:31:49Z	OWNER		This code here runs only if `cli.py` is imported: https://github.com/simonw/sqlite-utils/blob/7ddf5300886a32d6daf60cf1d71efe492b65c87e/sqlite_utils/cli.py#L50-L59 I found myself needing the same fix in another library: - https://github.com/simonw/datasette-socrata/issues/13 It should be a documented utility function.	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/442/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed