Tuesday, January 8, 2013

Re: ORM, Oracle and UTF-8 encoding problem.

Tested against latest master. Same behaviour.

In Oracle backend base.py is following piece of code:

# Check whether cx_Oracle was compiled with the WITH_UNICODE option.
This will
# also be True in Python 3.0.
if int(Database.version.split('.', 1)[0]) >= 5 and not hasattr(Database,
'UNICODE'):
convert_unicode = force_text
else:
convert_unicode = force_bytes

Which was added in <https://github.com/django/django/commit/dcf3be7a62>

Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE
definition.

And Django uses smart_str / force_bytes.

If I remove that and use convert_unicode as force_text / force_unicode
everything works as expected.

9.1.2013 8:56, Jani Tiainen kirjoitti:
> 8.1.2013 21:00, akaariai kirjoitti:
>> I created the following test case into django's test suite modeltests/
>> basic/tests.py:
>> def test_unicode(self):
>> # Note: from __future__ import unicode_literals is in
>> effect...
>> a = Article.objects.create(headline='0
>> \u0442\u0435\u0441\u0442 test', pub_date=datetime.n ow())
>> self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
>> \u0442\u0435\u0441\u0442 test' )
>>
>> This does pass on Oracle when using Django's master branch, both with
>> Python 2.7 and 3.3.
>>
>> Django's backend is doing all sorts of trickery behind the scenes to
>> get correct unicode handling. I am not sure where the problem is. What
>> Django version are you using?
>
> Sorry about forgotting version info. I tested with 1.3.1 and 1.4.1 and
> both gave same behaviour.
>
> And I know that there is quite a lot of trickery going on. I'll try to
> figure out what causes that problem.
>
>> On 8 tammi, 17:34, Jani Tiainen <rede...@gmail.com> wrote:
>>> Hi,
>>>
>>> I've been trying to save UTF-8 characters to oracle database without
>>> success.
>>>
>>> I've verified that database is indeed UTF-8 capable.
>>>
>>> I can insert UTF-8 characters directly using cx_Oracle.
>>>
>>> But when I use ORM it will trash characters.
>>>
>>> Model I use:
>>>
>>> class MyTest(models.Model):
>>> txt = CharField(max_length=128)
>>>
>>> s = u'0 \u0442\u0435\u0441\u0442 test'
>>>
>>> i = MyTest()
>>> i.txt = s
>>> i.save()
>>>
>>> i2 = MyTest.objects.get(id=i.id)
>>> print i2.txt
>>>
>>> u'0 \xbf\xbf\xbf\xbf test'
>>>
>>> So what happens here? It looks like Django trashes my unicode string at
>>> some (unknown point).
>>>
>>> Additional note:
>>>
>>> If I use cursor() from Django connection object strings get broken also.
>>> So it must be django Oracle backend doing something evil for me.
>>>
>>> --
>>> Jani Tiainen
>>>
>>> - Well planned is half done and a half done has been sufficient
>>> before...
>>
>
>


--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Real Estate