{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1271003212, "node_id": "IC_kwDOBm6k_c5LwfhM", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:52:04Z", "updated_at": "2022-10-07T01:52:04Z", "author_association": "CONTRIBUTOR", "body": "and if we try immutable mode, which is how things are opened by `datasette inspect` we duplicate the files!!!\r\n\r\n```python\r\n# test_sql_immutable.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270992795, "node_id": "IC_kwDOBm6k_c5Lwc-b", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:29:15Z", "updated_at": "2022-10-07T01:50:14Z", "author_association": "CONTRIBUTOR", "body": "fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0\r\n\r\n```python\r\n# test_sql_ro.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthat's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270988081, "node_id": "IC_kwDOBm6k_c5Lwb0x", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:19:01Z", "updated_at": "2022-10-07T01:27:35Z", "author_association": "CONTRIBUTOR", "body": "okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.\r\n\r\nmake a little test script like this:\r\n\r\n```python\r\n# test_sql.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthen \r\n\r\n```docker\r\nRUN python test_sql.py nlrb.db\r\n```\r\n\r\nproduced a layer that's the same size as `nlrb.db`!!\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270936982, "node_id": "IC_kwDOBm6k_c5LwPWW", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:52:41Z", "updated_at": "2022-10-07T00:52:41Z", "author_association": "CONTRIBUTOR", "body": "it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the \"inspect\" layer still has the same very large size.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270923537, "node_id": "IC_kwDOBm6k_c5LwMER", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:46:08Z", "updated_at": "2022-10-07T00:46:08Z", "author_association": "CONTRIBUTOR", "body": "i thought it was maybe to do with reading through all the files, but that does not seem to be the case\r\n\r\nif i make a little test file like:\r\n\r\n```python\r\n# test_read.py\r\nimport hashlib\r\nimport sys\r\nimport pathlib\r\n\r\nHASH_BLOCK_SIZE = 1024 * 1024\r\n\r\ndef inspect_hash(path):\r\n \"\"\"Calculate the hash of a database, efficiently.\"\"\"\r\n m = hashlib.sha256()\r\n with path.open(\"rb\") as fp:\r\n while True:\r\n data = fp.read(HASH_BLOCK_SIZE)\r\n if not data:\r\n break\r\n m.update(data)\r\n\r\n return m.hexdigest()\r\n\r\ninspect_hash(pathlib.Path(sys.argv[1]))\r\n```\r\n\r\nthen a line in the Dockerfile like\r\n\r\n```docker\r\nRUN python test_read.py nlrb.db && echo \"[]\" > /etc/inspect.json\r\n```\r\n\r\njust produes a layer of `3B`\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}