twitter: comp.lang.c - 24 new messages in 8 topics

comp.lang.c
http://groups.google.com/group/comp.lang.c?hl=en

comp.lang.c@googlegroups.com

Today's topics:

==============================================================================
TOPIC: usage of size_t
http://groups.google.com/group/comp.lang.c/t/19e0ad96d01b9898?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 6:30 am
From: Kelsey Bjarnason

[snips]

On Mon, 01 Mar 2010 21:15:52 +0000, Richard Bos wrote:

> Keith Thompson <kst-u@mib.org> wrote:
>
>> Kelsey Bjarnason <kbjarnason@gmail.com> writes:
>> > On Wed, 24 Feb 2010 17:16:24 +0000, Richard Bos wrote:
>> >> Yes, but you are on record as never having had to contend with tiny,
>> >> small, medium, compact, large and huge memory models. Things could
>> >> get... interesting.
>> >
>> > Particularly since two pointers which actually point to the same
>> > thing may not compare equal, due to segment:offset varations.
>> >
>> > At least huge pointers normalized. :)
>> >
>> > (It is *so* nice not to have to deal with that crap anymore.)
>>
>> If two pointers that point to the same thing don't compare equal, then
>> the implementation is non-conforming. C99 6.5.9p6:
>
> IIRC the only way to get two such pointers was by pointer manipulation
> that would be considered to have undefined behaviour in ISO C, BICBW.

Not necessarily.

Consider an array, and two pointers. Set one to, say, the 20th element,
set the other to the first element and increment it 20 times. Both refer
to the same position in memory, but there's no guarantee - at least with
"far" pointers - that they will compare equal.

This was a side effect of the fact that in the x86 segmented memory
models (eg 16-bit-land) the same address could have multiple
representations, so without normalization, there was no assurance the
pointers would be equal, despite being equivalent.

huge pointers, by contrast, were normalized, so this problem didn't
occur, but the normalization imposed an overhead, which, if you didn't
need it, was pointless.

"Regular" - unadorned, plain, standard C-style pointers - were neither
far nor huge, were limited (IIRC) to a single segment and as a result
needed no segment portion of their address - and thus didn't suffer the
same issue.

DOS. The short route to insanity. :)

==============================================================================
TOPIC: Is there a better way to achieve this ?
http://groups.google.com/group/comp.lang.c/t/dd4de60290d2834a?hl=en
==============================================================================

== 1 of 8 ==
Date: Tues, Mar 2 2010 6:42 am
From: Francis Moreau

Hello,

I have the following requierement: 3 ints (a, b, limit). The sum of
'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
adjusted so that the sum of 'a' and 'b' is equal to 'limit'.

Basically this can be written in C like this:

int a, b, limit;

/* some code that setup the 3 variables */

if (a < 0 || b < 0)
goto end;
if (a + b > b) /* test for overflow (correct with GCC but not
portable) */
goto end;
if (a + b > limit)
a -= a + b - limit;
if (a < 0)
goto end;

/* ... */

So this should work (ok this uses an undefined behaviour but this code
is only intended to be compiled by GCC) but it looks quite
complicated.

Can anybody think to something more 'elegant' ?

Thanks

== 2 of 8 ==
Date: Tues, Mar 2 2010 6:53 am
From: Malcolm McLean

On Mar 2, 4:42 pm, Francis Moreau <francis.m...@gmail.com> wrote:
> Hello,
>
> I have the following requierement: 3 ints (a, b, limit). The sum of
> 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> Basically this can be written in C like this:
>
> int a, b, limit;
>
> /* some code that setup the 3 variables */
>
> if (a < 0 || b < 0)
> goto end;
> if (a + b > b) /* test for overflow (correct with GCC but not
> portable) */
> goto end;
> if (a + b > limit)
> a -= a + b - limit;
> if (a < 0)
> goto end;
>
> /* ... */
>
> So this should work (ok this uses an undefined behaviour but this code
> is only intended to be compiled by GCC) but it looks quite
> complicated.
>
> Can anybody think to something more 'elegant' ?
>
> Thanks

int geta(a, b, limit)
{
unsigned int t;
int answer;

if(a >= 0 && b >= 0)
{
t = (unsigned) a + (unsigned) b;
if(t > limit)
answer = limit - b;
else
answer = a;
}
/* you haven't specified your other cases. What do we do if one or
both are negative?*/
return answer;
}

== 3 of 8 ==
Date: Tues, Mar 2 2010 6:54 am
From: Andrew Poelstra

On 2010-03-02, Francis Moreau <francis.moro@gmail.com> wrote:
> Hello,
>
> I have the following requierement: 3 ints (a, b, limit). The sum of
> 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> Basically this can be written in C like this:
>
> int a, b, limit;
>
> /* some code that setup the 3 variables */
>
> if (a < 0 || b < 0)
> goto end;
> if (a + b > b) /* test for overflow (correct with GCC but not
> portable) */
> goto end;
> if (a + b > limit)
> a -= a + b - limit;
> if (a < 0)
> goto end;
>
> /* ... */
>
> So this should work (ok this uses an undefined behaviour but this code
> is only intended to be compiled by GCC) but it looks quite
> complicated.
>
> Can anybody think to something more 'elegant' ?
>
> Thanks

I think:

if(a > limit - b)
a = limit - b;

The rest of your code does things you didn't specify, and
it's not clear where 'end' goes to, so I can't tell how to
clean up the rest.

--
Andrew Poelstra
http://www.wpsoftware.net/andrew

== 4 of 8 ==
Date: Tues, Mar 2 2010 9:40 am
From: CarlosB

On Mar 2, 11:42 am, Francis Moreau <francis.m...@gmail.com> wrote:
> Hello,
>
> I have the following requierement: 3 ints (a, b, limit). The sum of
> 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> Basically this can be written in C like this:
>
> int a, b, limit;
>
> /* some code that setup the 3 variables */
>
> if (a < 0 || b < 0)
> goto end;
> if (a + b > b) /* test for overflow (correct with GCC but not
> portable) */
> goto end;
> if (a + b > limit)
> a -= a + b - limit;
> if (a < 0)
> goto end;
>
> /* ... */
>
> So this should work (ok this uses an undefined behaviour but this code
> is only intended to be compiled by GCC) but it looks quite
> complicated.
>
> Can anybody think to something more 'elegant' ?
>
> Thanks

a=(a+b > limit) ? limit-b : a;

== 5 of 8 ==
Date: Tues, Mar 2 2010 9:52 am
From: Francis Moreau

On Mar 2, 3:53 pm, Malcolm McLean <malcolm.mcle...@btinternet.com>
wrote:
> On Mar 2, 4:42 pm, Francis Moreau <francis.m...@gmail.com> wrote:
>
>
>
>
>
> > Hello,
>
> > I have the following requierement: 3 ints (a, b, limit). The sum of
> > 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> > adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> > Basically this can be written in C like this:
>
> > int a, b, limit;
>
> > /* some code that setup the 3 variables */
>
> > if (a < 0 || b < 0)
> > goto end;
> > if (a + b > b) /* test for overflow (correct with GCC but not
> > portable) */
> > goto end;
> > if (a + b > limit)
> > a -= a + b - limit;
> > if (a < 0)
> > goto end;
>
> > /* ... */
>
> > So this should work (ok this uses an undefined behaviour but this code
> > is only intended to be compiled by GCC) but it looks quite
> > complicated.
>
> > Can anybody think to something more 'elegant' ?
>
> > Thanks
>
> int geta(a, b, limit)
> {
> unsigned int t;
> int answer;
>
> if(a >= 0 && b >= 0)
> {
> t = (unsigned) a + (unsigned) b;
> if(t > limit)
> answer = limit - b;
> else
> answer = a;
> }

This doesn't seem right: for example if a=5 b=600 limit=500

In your case answer is equal to -100, which shouldn't happen since
answer must be >= 0.

> /* you haven't specified your other cases. What do we do if one or
> both are negative?*/

It should terminate (goto end; statement) for example by calling
exit().

== 6 of 8 ==
Date: Tues, Mar 2 2010 9:56 am
From: Francis Moreau

On Mar 2, 3:54 pm, Andrew Poelstra <apoels...@localhost.localdomain>
wrote:
> On 2010-03-02, Francis Moreau <francis.m...@gmail.com> wrote:
>
>
>
>
>
> > Hello,
>
> > I have the following requierement: 3 ints (a, b, limit). The sum of
> > 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> > adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> > Basically this can be written in C like this:
>
> > int a, b, limit;
>
> > /* some code that setup the 3 variables */
>
> > if (a < 0 || b < 0)
> > goto end;
> > if (a + b > b) /* test for overflow (correct with GCC but not
> > portable) */
> > goto end;
> > if (a + b > limit)
> > a -= a + b - limit;
> > if (a < 0)
> > goto end;
>
> > /* ... */
>
> > So this should work (ok this uses an undefined behaviour but this code
> > is only intended to be compiled by GCC) but it looks quite
> > complicated.
>
> > Can anybody think to something more 'elegant' ?
>
> > Thanks
>
> I think:
>
> if(a > limit - b)
> a = limit - b;
>

This doesn't trap overflow or negative values...

> The rest of your code does things you didn't specify, and
> it's not clear where 'end' goes to, so I can't tell how to
> clean up the rest.

'end' goes to "exit(1);" for example. Otherwise 'a' is used to compute
new values.

== 7 of 8 ==
Date: Tues, Mar 2 2010 10:13 am
From: Tim Rentsch

Francis Moreau <francis.moro@gmail.com> writes:

> I have the following requierement: 3 ints (a, b, limit). The sum of
> 'a' and 'b' shouldn't be bigger than limit otherwise 'a' should be
> adjusted so that the sum of 'a' and 'b' is equal to 'limit'.
>
> Basically this can be written in C like this:
>
> int a, b, limit;
>
> /* some code that setup the 3 variables */
>
> if (a < 0 || b < 0)
> goto end;
> if (a + b > b) /* test for overflow (correct with GCC but not
> portable) */

Presumably you meant a + b < b in this test (still bad because
it relies on GCC overflow behavior, but this test is the one
I think you meant).

> goto end;
> if (a + b > limit)
> a -= a + b - limit;
> if (a < 0)
> goto end;
>
> /* ... */
>
> So this should work (ok this uses an undefined behaviour but this code
> is only intended to be compiled by GCC) but it looks quite
> complicated.
>
> Can anybody think to something more 'elegant' ?

if( a<0 || b<0 || limit<b ) goto end;
if( a > limit - b ) a = limit - b;
/* Here a >= 0, b >= 0, a+b <= limit */

== 8 of 8 ==
Date: Tues, Mar 2 2010 10:14 am
From: Andrew Poelstra

On 2010-03-02, Francis Moreau <francis.moro@gmail.com> wrote:
> On Mar 2, 3:54�pm, Andrew Poelstra <apoels...@localhost.localdomain>
> wrote:
>>
>> I think:
>>
>> � if(a > limit - b)
>> � � a = limit - b;
>>
>
> This doesn't trap overflow or negative values...
>

Where is there overflow to be trapped?
For negative values, use:

if(a > limit - b)
a = limit - b;

if(a < 0 || b < 0 || limit < 0)
exit(1);

>> The rest of your code does things you didn't specify, and
>> it's not clear where 'end' goes to, so I can't tell how to
>> clean up the rest.
>
> 'end' goes to "exit(1);" for example. Otherwise 'a' is used to compute
> new values.

Given these specifications, the above two conditionals should
be sufficient.

--
Andrew Poelstra
http://www.wpsoftware.net/andrew

==============================================================================
TOPIC: Stylistic questions on UNIX C coding.
http://groups.google.com/group/comp.lang.c/t/51d2b24a60d73f18?hl=en
==============================================================================

== 1 of 7 ==
Date: Tues, Mar 2 2010 6:45 am
From: Kelsey Bjarnason

[snips]

On Mon, 01 Mar 2010 01:06:04 -0800, Nick Keighley wrote:

> one difficulty with embedded <tab-chars> as the only layout character is
> that you lose fine control.
>
> void pippo (int n)
> {
> if ((n == PHOTON) ||
> (n == LEPTON) ||
> (n == HADRON))
> {
> send_msg ("claim nobel!");
> }
> }
>
> I don't see how this layout can survive spaceless layout or variable tab
> stops. Presumably you don't require layout like this.

Frequently. Here's a tip: indent the ( to a tab stop. All lines up -
and allows the individual viewer to work to his own preferences.

>> >> Contrast that to hitting delete on a line which uses spaces instead
>> >> of tabs. All this does is mess up the formatting, as the editor is
>> >> almost certain to treat a space as a _space_, as it should, not as a
>> >> tab, which it _shouldn't_, because the character involved is not a
>> >> tab, but a space.
>>
>> > get a better editor
>>
>> I have a better editor.
>
> :-)

Indeed. The whole spaces-vs-tabs thing is one of those "Obviously, my
way is superior, you heathen (heathfield?) infidel!" issues.

>> One that understands the difference between spaces and tabs. One that
>> does *not* do something as brain-dead as deleting _multiple_ characters
>> when I press delete once. Any editor which deletes multiple items on a
>> single delete simply cannot be trusted, it's liable to destroy
>> something.
>
> no. My editor (this is actually a configurable option) only does this
> for spaces. I assure you it works well in practice.

I wouldn't trust any editor, regardless of configuration, which deleted
multiple characters on a single "delete" keypress. Too freaking
dangerous to be used on anything more important than "to do" lists.

> If I want to insert <tab-chars> (eg. for the dreaded make file) I have
> to change an option on my editor. Normally I do not want to insert
> <tab-chars>

Wait, you have to _reconfigure_ your editor to insert tabs when you hit
tab, because by default, pressing tab doesn't give you a tab, it gives
you spaces? What does spacebar give you? Ampersands? Lemme guess:
"enter" activates "close file without saving". And delete nukes anything
from zero to 8 characters.

>> But then you're on my side of the
>> fence, with spaces being defective by design for indent.
>
> this is phrasing it rather strongly. There are plenty of people who have
> a different opinion from you.

Yes, well, obviously, their opinion is wrong and silly-headed.

(grin, duck, run)

>> If they weren't
>> defective by design, you wouldn't be using tab instead of space.
>
> is that <tab-char> or <tab-key>?

Yes. The tab key puts in tabs. The spacebar puts in spaces. Enter puts
in enter, 'Q' puts in 'Q' and so forth.

> [it's getting repetitive]

Shall we do emacs versus vi next? ;)

>> If your editor - and the notion of spaces for indentation - weren't
>> defective by design, you wouldn't need to use a special key to insert
>> spaces; that's why you have a space bar. The fact you have to resort
>> to something entirely different, the tab key, is prima facie evidence
>> the whole notion of spaces to indent is as defective as it appears.
>
> this is opinion masquerading as fact.

Aren't all such "debates"?

The issue of "stylistic conventions", in the realm of C, is one of those
cases where, beyond a few basic rules of thumb - all-upper-case for
macros, for example - there are no universally agreed upon approaches,
each with its adherents and detractors, each with its benefits and
drawbacks.

>> Now, if your editor worked properly, using tabs instead of spaces, with
>> the tab key inserting tabs as it should, then when viewed on someone
>> else's display, rather than yours, it would show the code as *they*
>> prefer to view it, rather than as *you* have decided is the only true
>> way which everyone should be forced to view it in.
>
> I like finer control over my layout than you do apparently

Quite the contrary. I like delete to delete _one_ character, tab to
insert a tab, and so forth. Not quasi-random side-effects.

>> Really, isn't this just a case of imposing your own layout conventions
>> on others, rather than using a common sense approach which actually
>> lets everyone view the code in their own preferred manner? Without
>> having to defeat the needless additional complication of converting
>> some godawful arrangement you happen to like into something actually
>> manageable?
>
> but you also require others to agree to your conventions.

Not really; at least my approach allows them to layout their code to
their visual preferences, without mucking it up for everyone else.

> I bet you'd hate this approach
>
> Code laid out like this with 4 character indent xxxx
> yyyy
> zzzz
> wwww
>
> and using S and T to represent the "layout charcaters" actually looked
> like this
>
> xxxx
> SSSSyyyy
> Tzzzz
> TSSSSwwww

Blech. Why do that? Use tabs for indent, the way they're intended.

> the worst or all possible worlds!

No, the worst of all possible worlds would be having to be a maintenance
coder for Jeff Relf's magical news client. :)

== 2 of 7 ==
Date: Tues, Mar 2 2010 7:08 am
From: raltbos@xs4all.nl (Richard Bos)

ram@zedat.fu-berlin.de (Stefan Ram) wrote:

> Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> >double rx, ry, rz; /* position */
> >double vx, vy, vz; /* velocity */
> >double ax, ay, az; /* acceleration */
> >... is, to my eye, a lot more readable than
> >/* position */
> >double rx;
> >double ry;
> >double rz;
>
> You can have your cake and eat it, too:
>
> double rx; double ry; double rz; /* position */
> double vx; double vy; double vz; /* velocity */
> double ax; double ay; double az; /* acceleration */

Yes, there's always _someone_ who prefers the worst of both worlds.

Richard

== 3 of 7 ==
Date: Tues, Mar 2 2010 7:13 am
From: ram@zedat.fu-berlin.de (Stefan Ram)

Kelsey Bjarnason <kbjarnason@gmail.com> writes:
>Frequently. Here's a tip: indent the ( to a tab stop. All lines up -
>and allows the individual viewer to work to his own preferences.

The text

Talphabeta(T(
TTT(

, where �T� denotes a tab charater, will be rendered as

alphabeta( (
(

, when there are tab positions at 8, 16, 24, 32, and so on.
So now, the final parentheses of every line are aligned.

But with tab positions at 4, 8, 12, 16, 20, and so on, it
will be rendered as

alphabeta( (
(

So it is not possible to align characters of a line that
are not the first non-tab character of a line using tabs
in such a way that they will be aligned for every tab width.

== 4 of 7 ==
Date: Tues, Mar 2 2010 7:46 am
From: "bartc"

"Kelsey Bjarnason" <kbjarnason@gmail.com> wrote in message
news:m7ydnd62sLPnuRDWnZ2dnUVZ_tAAAAAA@giganews.com...
> On Mon, 01 Mar 2010 01:06:04 -0800, Nick Keighley wrote:

>> If I want to insert <tab-chars> (eg. for the dreaded make file) I have
>> to change an option on my editor. Normally I do not want to insert
>> <tab-chars>
>
> Wait, you have to _reconfigure_ your editor to insert tabs when you hit
> tab, because by default, pressing tab doesn't give you a tab, it gives
> you spaces? What does spacebar give you? Ampersands? Lemme guess:
> "enter" activates "close file without saving". And delete nukes anything
> from zero to 8 characters.

On my typewriter, Tab moves the carriage up to the next tab-stop. Backspace
moves the carriage back, one space at a time.

So Tab is just a kind of macro on that machine.

To emulate the same behaviour in an editor, you don't want or need to store
actual tab characters, you just need the ability to quickly skip forward or
back to the next tab-stop position.

But one problem is ensuring the same set of tab-stops are used across
editors, or even across files in the same editor.

> Yes. The tab key puts in tabs. The spacebar puts in spaces. Enter puts
> in enter, 'Q' puts in 'Q' and so forth.

Printable characters tend to be inserted into the text. Non-printable chars
and special keys could do anything. I suppose Esc puts in escape characters
and F1 puts in 'F1' characters?

--
Bart

== 5 of 7 ==
Date: Tues, Mar 2 2010 10:23 am
From: Branimir Maksimovic

Nick Keighley wrote:
> On 2 Mar, 09:11, Branimir Maksimovic <bm...@hotmail.com> wrote:
>> Nick Keighley wrote:
>>> On 27 Feb, 16:39, Rich Webb <bbew...@mapson.nozirev.ten> wrote:
>>>> On Sat, 27 Feb 2010 08:30:16 -0800 (PST), Nick Keighley
>>>> <nick_keighley_nos...@hotmail.com> wrote:
>>>>> On 27 Feb, 08:39, James Harris <james.harri...@googlemail.com> wrote:
>
>
>>>>>> [...] what do Windows
>>>>>> users use to enter and edit source code?
>>>>> the IDE, ConText, emacs, Word
>>>> vi! Nowadays likely in its gvim incarnation, of course.
>>> VI VI VI!
>>> the editor of the beast
>> Well, when I used VI it was because there wasn;t anything
>> better on machine. Who said: VI editor that beeps and corrupts you
>> files?
>> Once I encrypted my source code by accident with that thing.
>
>
> "The Real Programmer wants a "you asked for it, you got it"
> text editor--complicated, cryptic, powerful, unforgiving,
> dangerous. TECO, to be precise."
Heh, I prefer Joe Allen's editor and using it for development ;)

http://joe-editor.sourceforge.net/index.html

Best editor out there IMO! ;)

Greets

== 6 of 7 ==
Date: Tues, Mar 2 2010 10:45 am
From: Kelsey Bjarnason

[snips]

On Tue, 02 Mar 2010 15:13:46 +0000, Stefan Ram wrote:

> So it is not possible to align characters of a line that are not the
> first non-tab character of a line using tabs in such a way that they
> will be aligned for every tab width.

And thus, we have a conclusive argument for using spaces, which are
guaranteed never to work for any setup but the original author's, *and*
fail in a way virtually impossible to recover from?

Missed a step there, I think. :)

== 7 of 7 ==
Date: Tues, Mar 2 2010 12:48 pm
From: Tim Streater

On 02/03/2010 18:23, Branimir Maksimovic wrote:
> Nick Keighley wrote:
>> On 2 Mar, 09:11, Branimir Maksimovic <bm...@hotmail.com> wrote:
>>> Nick Keighley wrote:
>>>> On 27 Feb, 16:39, Rich Webb <bbew...@mapson.nozirev.ten> wrote:
>>>>> On Sat, 27 Feb 2010 08:30:16 -0800 (PST), Nick Keighley
>>>>> <nick_keighley_nos...@hotmail.com> wrote:
>>>>>> On 27 Feb, 08:39, James Harris <james.harri...@googlemail.com> wrote:
>>
>>
>>>>>>> [...] what do Windows
>>>>>>> users use to enter and edit source code?
>>>>>> the IDE, ConText, emacs, Word
>>>>> vi! Nowadays likely in its gvim incarnation, of course.
>>>> VI VI VI!
>>>> the editor of the beast
>>> Well, when I used VI it was because there wasn;t anything
>>> better on machine. Who said: VI editor that beeps and corrupts you
>>> files?
>>> Once I encrypted my source code by accident with that thing.
>>
>>
>> "The Real Programmer wants a "you asked for it, you got it"
>> text editor--complicated, cryptic, powerful, unforgiving,
>> dangerous. TECO, to be precise."
> Heh, I prefer Joe Allen's editor and using it for development ;)
>
> http://joe-editor.sourceforge.net/index.html
>
> Best editor out there IMO! ;)

Revolting. Looks like a DOS editor. When was this written, 1980?

--
Tim

"That the freedom of speech and debates or proceedings in Parliament
ought not to be impeached or questioned in any court or place out of
Parliament"

Bill of Rights 1689

==============================================================================
TOPIC: UTF-8 and wchar_t
http://groups.google.com/group/comp.lang.c/t/6e69f9f50e29243f?hl=en
==============================================================================

== 1 of 4 ==
Date: Tues, Mar 2 2010 7:03 am
From: Mikko Rauhala

On Tue, 02 Mar 2010 14:28:50 +0100, Michal Nazarewicz <mina86@tlen.pl> wrote:
> On the other hand, is my understanding correct that if
> __STDC_ISO_10646__ macro is defined then wchar_t in fact represent
> unicode code points? If so then I could check for that macro and signal
> #error if it's not defined, right?

Yes, but note also that both 16-bit and 32-bit Unicode wchar_t
implementations are in use (the former on Windows and latter
on some *nix systems at least).

(Don't know about wprintf() and unrepresentable characters offhand.)

--
Mikko Rauhala <mjr@iki.fi> - http://www.iki.fi/mjr/blog/
The Finnish Pirate Party - http://piraattipuolue.fi/
World Transhumanist Association - http://transhumanism.org/
Singularity Institute - http://singinst.org/

== 2 of 4 ==
Date: Tues, Mar 2 2010 10:16 am
From: lacos@ludens.elte.hu (Ersek, Laszlo)

In article <87hboyc0od.fsf@erwin.mina86.com>, Michal Nazarewicz <mina86@tlen.pl> writes:
> Hello everyone,
>
> I am facing a situation where I need to handle UTF-8 input along with
> input from standard input (ie. locale dependent multibyte). In the end,
> after some computations, concatenations, etc I need to output it to
> standard output (again locale dependent multibyte).
>
> What I want to do is convert both the UTF-8 input as well as data from
> standard input to an array of wchar_t and then output it using wprintf()
> (or one of the other "wide" functions).

Handle stdin and stdout like you intend, ie. with setlocale() and the
implicit conversion provided by <stdio.h> functions.

For the UTF-8 input coming from elsewhere: if you can stick with glibc,
just call

#include <iconv.h>

convdesc = iconv_open("WCHAR_T", "UTF-8");

http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html

Otherwise, you'll have to switch at least the LC_CTYPE locale category
manually, and proceed with the separate input like with stdin.

Cheers,
lacos

== 3 of 4 ==
Date: Tues, Mar 2 2010 11:20 am
From: lacos@ludens.elte.hu (Ersek, Laszlo)

In article <87hboyc0od.fsf@erwin.mina86.com>, Michal Nazarewicz <mina86@tlen.pl> writes:

> Also, what happens when I say to wprintf() a string which contains wide
> character which has no representation in current locale (ie. some funky
> unicode character where locale is set to ISO-8859-1 encoding)?

wprintf() will return a negative value [and errno will be set to EILSEQ].

> Can I somehow instruct the standard library function to print, say,
> a question mark in such situations or do I have to handle such cases by
> myself?

On a second thought, you might be better off if you converted the output
with iconv() too, from WCHAR_T to the codeset used by the current
locale.

http://www.opengroup.org/onlinepubs/007908775/xsh/iconv.html
----v----
If iconv() encounters a character in the input buffer that is valid, but
for which an identical character does not exist in the target codeset,
iconv() performs an implementation-dependent conversion on this
character.
----^----

(You would have to test this.)

You should be able to get the codeset used by the current locale by
calling

nl_langinfo(CODESET)

http://www.opengroup.org/onlinepubs/007908775/xsh/nl_langinfo.html

(Sorry for being glibc/SUSv2-specific.)

Cheers,
lacos

== 4 of 4 ==
Date: Tues, Mar 2 2010 12:42 pm
From: Michal Nazarewicz

> In article <87hboyc0od.fsf@erwin.mina86.com>,
> Michal Nazarewicz <mina86@tlen.pl> writes:
>> Also, what happens when I say to wprintf() a string which contains wide
>> character which has no representation in current locale (ie. some funky
>> unicode character where locale is set to ISO-8859-1 encoding)?

lacos@ludens.elte.hu (Ersek, Laszlo) writes:
> wprintf() will return a negative value [and errno will be set to EILSEQ].

>> Can I somehow instruct the standard library function to print, say,
>> a question mark in such situations or do I have to handle such cases by
>> myself?
>
> On a second thought, you might be better off if you converted the output
> with iconv() too, from WCHAR_T to the codeset used by the current
> locale.
[...]

> (Sorry for being glibc/SUSv2-specific.)

Thanks for all the links and information. I have been considering
iconv() but didn't notice that it can do conversion to/from wchar_t as
well and that was my biggest concern. I'll be sure to look more into
it.

Being SUS specific is not a big issue since perfect portability is not
my goal (ie. what Mikko Rauhala confirmed earlier about
__STDC_ISO_10646__ (thanks Mikko!) was enough for me as I imagine
"major" implementations use unicode as internal representation of
wchar_t) however depending on glibc may hurt me a bit as my code won't
quite work on, say, BSD then.

Anyway, thanks for all the comments and links!

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86*tlen.pl>--<jid:mina86*jabber.org>--ooO--(_)--Ooo--

==============================================================================
TOPIC: Edward Nilges' lie
http://groups.google.com/group/comp.lang.c/t/14c6f4a4afe68f60?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 8:42 am
From: Seebs

On 2010-03-02, Nick Keighley <nick_keighley_nospam@hotmail.com> wrote:
> On 1 Mar, 04:18, spinoza1111 <spinoza1...@yahoo.com> wrote:
><stuff>

> I thought you weren't posting anymore?

Please read the subject line.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

==============================================================================
TOPIC: ╰☆╮╰☆╮╰☆ Cheap wholesale BBC Coat, ED Hardy Coat, Evisu Coat ect at www.
rijing-trade.com <Paypal Payment>
http://groups.google.com/group/comp.lang.c/t/99c346142f65343d?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 9:28 am
From: "www.fjrjtrade.com"

╰☆╮╰☆╮╰☆ Cheap wholesale BBC Coat, ED Hardy Coat, Evisu Coat ect at
www.rijing-trade.com <Paypal Payment>

Discount Gucci Coat

Discount KA Coat

Discount Lacoste Coat

Discount POLO Coat

Discount Bape Man Coat

Discount BBC Man Coat

Discount Christan Audigier Coat

Discount ED Hardy Coat

Discount Evisu Man Coat

Discount LRG Man Coat

http://www.rijing-trade.com

==============================================================================
TOPIC: Warning to newbies
http://groups.google.com/group/comp.lang.c/t/9597fd702985dff4?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 1:03 pm
From: Tim Rentsch

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

> "io_x" <a@b.c.invalid> writes:
>
>> "Ben Bacarisse" <ben.usenet@bsb.me.uk> ha scritto nel messaggio
>> news:0.e6be55c75fde5c5b1f8e.20100207132700GMT.87ljf5w50b.fsf@bsb.me.uk...
>>> Willem <willem@stack.nl> writes:
>>>
>>>> santosh wrote:
>>>> ) Anyway, I didn't follow the previous discussions in the thread
>>>> ) closely, but for what it appears to do, your code appears to be an
>>>> ) overkill. What about a solution involving strstr(), malloc()/realloc()
>>>> ) and strcat()?
>>>>
>>>> I've never liked strcat(), and I'm not sure about the performance of
>>>> realloc(), so this is roughly what I would do:
>>> <snip code>
>>>
>>> Snap! Here is what I'd do. I think it is structurally the same:
>>>
>>> char *replace(const char *src, const char *match, const char *replacement)
>>> {
>>> size_t mlen = strlen(match), matches = 0;
>>> for (const char *t, *s = src; t = strstr(s, match); s = t + mlen)
>>> ++matches;
>>> size_t rlen = strlen(replacement);
>>> char *result = malloc(strlen(src) + matches*rlen - matches*mlen + 1);
>>
>> How are you so sure that
>> strlen(src) + matches*rlen - matches*mlen + 1
>> not overflow the unsigned type?
>
> I'm not. In fact I know that there can't be any simple expression
> that does not cause the result to be wrong because it can be wrong "by
> logic" so to speak. If the input is "axxxx....xxx" of length SIZE_MAX
> replacing the 'a' with "bb" will result in a string whose length is
> not representable in a size_t.
>
> You have a good point none the less. I should have written
>
> strlen(src) - matches*mlen + matches*rlen + 1
>
> because it will work for a wider range of inputs.

Eh? If SIZE_MAX > INT_MAX (and it almost certainly is), these
two expressions will have exactly the same value. Are you
worried about the case where SIZE_MAX <= INT_MAX?

==============================================================================
TOPIC: newbie question on understanding the main() function
http://groups.google.com/group/comp.lang.c/t/47facf1454751d70?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Mar 2 2010 1:12 pm
From: Tim Rentsch

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

> Philip Potter <pgp@doc.ic.ac.uk> writes:
>
>> On 08/02/2010 13:27, Ben Bacarisse wrote:
>>> An incorrect definition of
>>> main is not a constraint violation (I am not sure why, but there it
>>> is) so the compiler can be silent about it if it chooses to be.
>>
>> One reason might be that implementations are free to allow alternative
>> definitions of main() other than int main(void) and int main(int,
>> char**). For example, the fairly widely used
>> int main(int argc, char **argv, char **envp)
>> would violate any constraint requiring main to be one of the two
>> standard prototypes.
>
> I'd have thought that the wording could permit implementation-defined
> alternatives (without a diagnostic) but I take your point.
>
> Another way would be to do what is done with constant expressions --
> extensions are allowed but anything that is not a constant expression
> as defined by the standard must be diagnosed. [snip]

As far as I know implementation-specific constant expressions are
not required to be given a diagnostic (assuming 6.6p3 isn't
violated), since 6.6p10 explicitly and specifically allows them
as constant expressions. Do have a supporting citation? No
constraints are violated as far as I know.

==============================================================================

You received this message because you are subscribed to the Google Groups "comp.lang.c"
group.

To post to this group, visit http://groups.google.com/group/comp.lang.c?hl=en

To unsubscribe from this group, send email to comp.lang.c+unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/comp.lang.c/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com/?hl=en

twitter

Tuesday, March 2, 2010

comp.lang.c - 24 new messages in 8 topics - digest

0 Comments:

Post a Comment

About Me

Previous Posts