Lino Website

pysqlite and unicode strings

Sunday, 1. May 2005 18:23.

After upgrading pysqlite from 0.4.3 to 2.0a4 I had some of my test cases fail because of encoding conversion problems.

If you get UnicodeDecodeErrors, here is maybe a hint for you: take care that your strings passed to cursor.execute() are Unicode strings if they contain non-ascii characters. Otherwise pysqlite will decode them using "utf-8", which is not necessarily what you wanted.

The following code snippet demonstrates what can happen:

  #coding: latin1
  import pysqlite2.dbapi2 as sqlite
  conn=sqlite.connect(":memory:")
  csr=conn.cursor()
  csr.execute("create table foo (name char)")
  csr.execute(u"insert into foo (name) values ('Ännchen')")
  csr.execute("select * from foo")
  for row in csr:
      print row

The result is:

  (u'\xc4nnchen',)

The capital «Ä» has been encoded with ASCII 196 (0xc4).

But if you don't put the 'u' before the "insert into" statement, you get a very different result:

  (,)
  Traceback (most recent call last):
    File "tmp.py", line 8, in ?
      for row in csr:
  UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data

It may be surprising that pysqlite cannot see that I was working in latin1 encoding. I don't know enough about unicode to decide whether this is a Python pitfall or a pysqlite bug...

Copyright 2001-2007 Luc Saffre.
http://lino.saffre-rumma.ee
Generated 2007-06-07 16:22:26