github

This data as json, CSV

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
1382457780	I_kwDOCGYnMM5SZqG0	490	Ability to insert multi-line files	6180701	closed	0			4	2022-09-22T13:29:22Z	2022-09-26T18:24:44Z	2022-09-23T16:37:58Z	NONE		I was looking into how to parse application log files that contain multiline text (e.g. Java stack traces) into sqlite. I can see that at the moment `--lines` helps, but falls short when processing multi-line texts. I wonder if this functionality would be useful for sqlite-utils. A similar approach to Elastic logstash/filebeat can be adopted: https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html Potential changes: - add a `--multiline` option - additional properties for - multiline-pattern (regex expression) - multiline-negate: true/false - multiline-what: previous or next Or if this is achievable in a different way, please share. Thanks!	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/490/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
1257724585	I_kwDOCGYnMM5K91qp	441	Combining `rows_where()` and `search()` to limit which rows are searched	1448859	closed	0			4	2022-06-02T06:01:55Z	2022-06-14T21:57:57Z	2022-06-14T21:54:38Z	NONE		What is the right way to limit a full text search query to some rows of a table? For example, I have a table that contains the following columns: `title`, `content`, `owner` (each row represents a document). The `owner` column is a username. It feels right to store all documents in one table, instead of having one table per owner. In particular because I'd like to full text search all documents, only documents owned by one user and documents owned by a set of users. I tried to combine `.rows_where("owner = ?", "1234")` and `.search()` from the `Table` class but I don't think that is meant to work. I discovered `.search_sql()` as a way to generate the FTS SQL statement. By hand I can edit it to add a `AND [original].[owner] = :owner` to the `where` clause. This seems to do what I want. My two questions: 1. is adding a `AND ...` to the `where` clause actually the right thing to do or should I be doing something else (my SQL skills are low)? 2. is there a built-in to sqlite-utils way to achieve this? Right now I am thinking I will make my own version of `search_sql()` that generates a query that contains an additional `owner = :owner` for my particular use-case. Bonus question: is this generally useful/something to add to sqlite-utils or too niche?	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/441/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
1063388037	I_kwDOCGYnMM4_YgOF	343	Provide function to generate hash_id from specified columns	82988	closed	0			4	2021-11-25T10:12:12Z	2022-03-02T04:25:25Z	2022-03-02T04:25:25Z	NONE		Hi I note that you define `_hash()` to create a `hash_id` from non-id column values in a table [here](https://github.com/simonw/sqlite-utils/blob/8f386a0d300d1b1c76132bb75972b755049fb742/sqlite_utils/db.py#L2996). It would be useful to be able to call a complementary function to generate a corresponding `_id` from a subset of specified columns when adding items to another table, eg to support the creation of foreign keys. Or is there a better pattern for doing that?	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/343/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
990844088	MDU6SXNzdWU5OTA4NDQwODg=	325	sqlite-utils memory can't deal with multiple files with the same name	144773	closed	0			4	2021-09-08T08:14:42Z	2021-09-22T20:52:56Z	2021-09-22T20:45:45Z	NONE		When I use multiple files with the same name, e.g. in `sqlite-utils memory a/bug.csv b/bug.csv`, sqlite-utils creates invalid views. ``` Traceback (most recent call last): File "/home/karl/.local/bin/sqlite-utils", line 8, in <module> sys.exit(cli()) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py", line 1137, in __call__ return self.main(args, kwargs) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py", line 763, in invoke return __callback(args, *kwargs) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/cli.py", line 1299, in memory db[csv_table].transform(types=tracker.types) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/db.py", line 1287, in transform self.db.execute(sql) File "/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/db.py", line 421, in execute return self.conn.execute(sql) sqlite3.OperationalError: error in view t1: no such table: main.bug ``` This can be reproduced with ```sh #!/bin/bash mkdir foo mkdir bar echo -e 'col1,col2\nval1,val2' > foo/bug.csv echo -e 'col3,col4\nval3,val4' > bar/bug.csv sqlite-utils memory /bug.csv 'SELECT 1' ``` Ideally, the tables would get unique names by including the next path segment until the names are unique. But just making the numbered t* aliases work would be good enough. This problem can of course be worked around by…	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/325/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
919314806	MDU6SXNzdWU5MTkzMTQ4MDY=	270	Cannot set type JSON	4068	closed	0			4	2021-06-11T23:53:22Z	2021-06-16T17:34:49Z	2021-06-16T15:47:06Z	NONE		It would be great if the column type could be set to JSON. That would not be different from handling a regular string. It would be something like `repr(value)` and it would work with both JSON and CSV inputs, no matter if `value` is a real list or just a string representing a list.	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/270/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
807174161	MDU6SXNzdWU4MDcxNzQxNjE=	227	Error reading csv files with large column data	295329	closed	0			4	2021-02-12T11:51:47Z	2021-02-16T11:48:03Z	2021-02-14T21:17:19Z	NONE		Feel free to close this issue - I mostly added it for reference for future folks that run into this :) I have a CSV file with one column that has very long strings. When i try to import this file via the `insert` command I get the following error: ``` sqlite-utils insert database.db table_name file_with_large_column.csv Traceback (most recent call last): File "/usr/local/bin/sqlite-utils", line 10, in <module> sys.exit(cli()) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(args, kwargs) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 774, in insert default=default, File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 705, in insert_upsert_implementation docs, pk=pk, batch_size=batch_size, alter=alter, extra_kwargs File "/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py", line 1852, in insert_all first_record = next(records) File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 703, in <genexpr> docs = (decode_base64_values(doc) for doc in docs) File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 681, in <genexpr> docs = (dict(zip(headers, row)) for row in reader) _csv.Error: field larger than field limit (131072) ``` Built with the docker image `datasetteproject/datasette:0.54` with the following versions: ``` # sqlite-utils --version sqlite-utils, version 3.4.1 # datasette --version datas…	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/227/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
549287310	MDU6SXNzdWU1NDkyODczMTA=	76	order_by mechanism	10501166	closed	0			4	2020-01-14T02:06:03Z	2020-04-16T06:23:29Z	2020-04-16T03:13:06Z	NONE		In some cases, I want to iterate rows in a table with `ORDER BY` clause. It would be nice to have a `rows_order_by` function similar to `rows_where`. In a more general case, `rows_filter` function might be added to allow more customized filtering to iterate rows.	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/76/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
593751293	MDU6SXNzdWU1OTM3NTEyOTM=	97	Adding a "recreate" flag to the `Database` constructor	1448859	closed	0			4	2020-04-04T05:41:10Z	2020-04-15T14:29:31Z	2020-04-13T03:52:29Z	NONE		I have a [script](https://github.com/betatim/binder-datasette/blob/master/create-db.ipynb) that imports data into a sqlite DB. When I re-run that script I'd like to remove the existing sqlite DB, instead of adding to it. The pragmatic answer is to add the check and file deletion to my script. However I thought it would be easy and useful for others to add a `recreate=True` flag to `db = sqlite_utils.Database("binder-launches.db")`. After taking a look at the code for it I am not so sure any more. This is because the connection string could be a URL (or "connection string") like `"file:///tmp/foo.db"`. I don't know what the equivalent of `os.path.exists()` is for a connection string or how to detect that something is a connection string and raise an error "can't use recreate=True and conn_string at the same time". Does anyone have an idea/suggestion where to start investigating?	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/97/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed