github
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1217759117 | I_kwDOBm6k_c5IlYeN | 1727 | Research: demonstrate if parallel SQL queries are worthwhile | 9599 | open | 0 | 32 | 2022-04-27T18:54:21Z | 2022-09-26T14:48:31Z | OWNER | I added parallel SQL query execution here: - https://github.com/simonw/datasette/issues/1723 My hunch is that this will take advantage of multiple cores, since Python's `sqlite3` module releases the GIL once a query is passed to SQLite. I'd really like to prove this is the case though. Just not sure how to do it! Larger question: is this performance optimization actually improving performance at all? Under what circumstances is it worthwhile? | 107914493 | issue | { "url": "https://api.github.com/repos/simonw/datasette/issues/1727/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
||||||||
1363765916 | I_kwDOCGYnMM5RSWqc | 483 | `sqlite-utils install` command | 9599 | closed | 0 | 2 | 2022-09-06T20:13:55Z | 2022-09-26T19:04:43Z | 2022-09-26T18:57:15Z | OWNER | With the addition of `--functions` in: - #471 In addition to the existing `convert` command, there are now very good reasons to want to install additional packages into the same virtual environment as `sqlite-utils` itself, to allow them to be used with those features. This isn't easy if you installed the tool with `pipx` or `brew install sqlite-utils`. Datasette solved this problem with the `datasette install` command: - https://github.com/simonw/datasette/issues/925 `sqlite-utils` could benefit from the same idea. | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/483/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | ||||||
1382457780 | I_kwDOCGYnMM5SZqG0 | 490 | Ability to insert multi-line files | 6180701 | closed | 0 | 4 | 2022-09-22T13:29:22Z | 2022-09-26T18:24:44Z | 2022-09-23T16:37:58Z | NONE | I was looking into how to parse application log files that contain multiline text (e.g. Java stack traces) into sqlite. I can see that at the moment `--lines` helps, but falls short when processing multi-line texts. I wonder if this functionality would be useful for sqlite-utils. A similar approach to Elastic logstash/filebeat can be adopted: https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html Potential changes: - add a `--multiline` option - additional properties for - multiline-pattern (regex expression) - multiline-negate: true/false - multiline-what: previous or next Or if this is achievable in a different way, please share. Thanks! | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/490/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | ||||||
1385026210 | I_kwDOBm6k_c5SjdKi | 1819 | Preserve query on timeout | 2182 | closed | 0 | 3 | 2022-09-25T13:32:31Z | 2022-09-26T23:16:15Z | 2022-09-26T23:06:06Z | CONTRIBUTOR | If a query hits the timeout it shows a message like: > SQL query took too long. The time limit is controlled by the [sql_time_limit_ms](https://docs.datasette.io/en/stable/settings.html#sql-time-limit-ms) configuration option. But the query is lost. Hitting the browser back button shows the query _before_ the one that errored. It would be nice if the query that errored was preserved for more tweaking. This would make it similar to how "invalid syntax" works since #1346 / #619. | 107914493 | issue | { "url": "https://api.github.com/repos/simonw/datasette/issues/1819/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | ||||||
1386530156 | I_kwDOCGYnMM5SpMVs | 492 | Idea: ability to pass extra variables to `--convert` scripts | 9599 | open | 0 | 1 | 2022-09-26T18:30:45Z | 2022-09-26T18:33:19Z | OWNER | Got this idea from this example in https://jeqo.github.io/notes/2022-09-24-ingest-logs-sqlite/ ```bash sqlite-utils insert /tmp/kafka-logs.db logs server.log.2022-09-24-21 --text --convert " import re r = re.compile(r'^\[(?P<datetime>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})\] (?P<level>\w+) (?P<log>(.+(\n(?\!\[).+|)+))', re.MULTILINE) def convert(text): rows = [m.groupdict() for m in r.finditer(text)] for row in rows: row.update({'server': 'localhost'}) row.update({'component': 'broker'}) return rows " ``` And the accompanying note: > The `row.update` allows to label rows as I’m planning to ingest logs from different hosts and potentially different components. This made me think: it might be neat if you could inject additional variable values into that script with extra command-line options, to make this kind of reuse easier. Something like this: ```bash sqlite-utils insert /tmp/kafka-logs.db logs server.log.2022-09-24-21 --text --convert " import re r = re.compile(r'^\[(?P<datetime>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})\] (?P<level>\w+) (?P<log>(.+(\n(?\!\[).+|)+))', re.MULTILINE) def convert(text): rows = [m.groupdict() for m in r.finditer(text)] for row in rows: row.update({'server': server}) row.update({'component': component}) return rows " --var server "localhost" --var component "broker" ``` | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/492/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
||||||||
1386593843 | I_kwDOCGYnMM5Spb4z | 494 | Document how to use Just | 9599 | closed | 0 | 2 | 2022-09-26T19:25:12Z | 2022-09-26T19:32:36Z | 2022-09-26T19:26:39Z | OWNER | I'm using `just` a lot know, based on this file - I should add that to https://sqlite-utils.datasette.io/en/latest/contributing.html https://github.com/simonw/sqlite-utils/blob/afbd2b2cba45cccb305c3d4638d18db4dd3d4bbd/Justfile#L1-L24 | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/494/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | ||||||
1386734383 | I_kwDOBm6k_c5Sp-Mv | 1821 | Release Datasette 0.63a0 | 9599 | closed | 0 | 1 | 2022-09-26T21:15:27Z | 2022-09-26T22:06:39Z | 2022-09-26T22:06:39Z | OWNER | > - The [prepare_jinja2_environment(env, datasette)](https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-prepare-jinja2-environment) plugin hook now accepts an optional `datasette` argument. Hook implementations can also now return an `async` function which will be awaited automatically. ([#1809](https://github.com/simonw/datasette/issues/1809)) > - `--load-extension` option now supports entrypoints. Thanks, Alex Garcia. ([#1789](https://github.com/simonw/datasette/pull/1789)) > - New tutorial: [Cleaning data with sqlite-utils and Datasette](https://datasette.io/tutorials/clean-data). > - Facet size can now be set per-table with the new `facet_size` table metadata option. ([#1804](https://github.com/simonw/datasette/issues/1804)) > - `truncate_cells_html` setting now also affects long URLs in columns. ([#1805](https://github.com/simonw/datasette/issues/1805)) > - `Database(is_mutable=)` now defaults to `True`. ([#1808](https://github.com/simonw/datasette/issues/1808)) > - Non-JavaScript textarea now increases height to fit the SQL query. ([#1786](https://github.com/simonw/datasette/issues/1786)) > - More detailed command descriptions on the [CLI reference](https://docs.datasette.io/en/latest/cli-reference.html#cli-reference) page. ([#1787](https://github.com/simonw/datasette/issues/1787)) > - Datasette no longer enforces upper bounds on its depenedencies. ([#1800](https://github.com/simonw/datasette/issues/1800)) > - Facets are now displayed with better line-breaks in long values. Thanks, Daniel Rech. ([#1794](https://github.com/simonw/datasette/pull/1794)) > - The `settings.json` file used in [Configuration directory mode](https://docs.datasette.io/en/latest/settings.html#config-dir) is now validated on startup. ([#1816](https://github.com/simonw/datasette/issues/1816)) | 107914493 | issue | { "url": "https://api.github.com/repos/simonw/datasette/issues/1821/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed |