First Last Prev Next    No search results available
Details
: Bytecode Format Enhancements Needed
Bug#: 402
: libraries
: Bitcode Writer
Status: RESOLVED
Resolution: FIXED
: All
: All
: trunk
: P2
: enhancement
: ---

:
:
: 263 392
:
  Show dependency tree - Show dependency graph
People
Reporter: Reid Spencer <rspencer@reidspencer.com>
Assigned To: Reid Spencer <rspencer@reidspencer.com>

Attachments


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2004-07-07 10:02
This bug is just to capture some enhancements to the bytecode format that 
we're planning for 1.3 so that we cover as many changes as possible in release 
1.3 and not disrupt users again in 1.4.

Encode Types As 24-bit Quantities. 
==================================
We need a new primitive, uint24_vbr, to encode types. Because of the use of 
bit fields for global variables and elsewhere, types are currently not fully 
32-bit quantities (see bug 392). The recommended plan is to always encode 
types into 24-bit fields, but provide for extension by using the value (2^24)-1
as an indicator that what follows is a uint64_vbr that contains the type. It 
is unlikely that many, if any, bytecode files will need more than 16 million 
distinct types.

VBRize the Block Headers
========================
While block headers are only 8 bytes currently, in very small files (say 
containing a few types), their overhead becomes quite large. We can skip the 
aligment of these fields (possibly saving a few bytes) and pack both the block 
type and block length into a single uint32_vbr. This will provide 8 bits for 
the block type (its doubtful if we'll ever need more than 256 block types) and 
24-bits for the block length. Similarly, its doubtful if we'd ever need a 
single block longer than 16MBytes.

CDRize Binary Data Content
==========================
We should use a standard for representing various binary quantities in the 
bytecode file. Integers are pretty much handled by VBR. However, float and 
double types should be regularized to IEEE format and written according to a 
cross-platform standard such as CDR (CORBA), NDR (Sun), or XDR (RPC). CDR is 
the most modern but has its shortcomings. There might be other applicable 
standards too. Strings should be regularized to a a standard format as well.
------- Comment #1 From Chris Lattner 2004-07-07 16:06:54 -------
I'm not sure that I understand the final item.  We already standardize on IEEE 
floating point, in little endian mode.  Things may get sticky when and if we 
ever support a target that uses non-IEEE floating point, but this may never 
happen.  Also, I don't understand your point about strings: LLVM has no support 
for strings (and doesn't want it).

-Chris
------- Comment #2 From Reid Spencer 2004-07-07 18:52:04 -------
My point in suggesting CDR was exactly that: the "what ifs" of the future. If we
were to just say "we encode with CDR" then that settles all questions for now
and the future. I find it much preferable to saying "we encode using CDR rules"
rather than providing a long list and description of the way we encode various
fundamental data types. I think the users would appreciate it too. The only
question is whether CDR is the right choice for a rule set. Its not particularly
compressed.

As for strings, we most certainly do have them and we encode them little endian.
Symbol tables have strings and we handle the global string constants very
specially in the bytecode format. 
------- Comment #3 From Chris Lattner 2004-07-07 18:56:02 -------
Okay, let me restate this.  If we end up supporting other FP formats, a LOT of 
other stuff will have to change as well.  I would much rather make this change 
lazily, rather than build in something up-front that we don't have any 
experience with and we don't know that we need.

w.r.t. strings, we most certainly do not have them.  :)  What we have are 
arrays of bytes, not strings.  They happen to commonly be used as strings by 
certain front-ends (like the C front-end), but they aren't special in any way.  
Likewise with SymbolTable entries, they are just arrays of bytes, not "strings".

-Chris
------- Comment #4 From Reid Spencer 2004-07-07 23:04:07 -------
Some more rebuttal:

Choosing CDR as a format says nothing about how it gets implemented. I agree
that we should do things incrementally. Right now, the only part of CDR that
we'd implement is the way IEEE floats and doubles are encoded. If we end up
supporting a platform that doesn't have IEEE fp then we'd have to deal with that
at the time but the end result would be to encode it as IEEE fp using CDR.
Making the decision to use CDR gives some stability to our specification of
bytecode files without saying anything about how we go about implementing it. 
As for experience with it, I have plenty. I was a CORBA architect for AT&T
Wireless for two years.  Hence, my slant towards CDR. If you put aside the
implementation aspect of this change, what do you have against CDR? What other
alternatives do you suggest? Nothing? One off, piecemeal implementation that
isn't compatible with anything else out there? Why do we need to re-invent this
wheel?

As for strings, I think we're into parsing semantics here. I was suggesting
string in the classical notion as an ordered list of characters, not suggesting
that we store std::string. In that sense the "array of char" is the string I'm
talking about. Would it be okay with you if I stored strings in the bytecode
with all the even indexed characters first and then all the odd indexed ones? I
think not, little-endian "strings" are what we store, what is natural, and what
is common in various specs like CDR.  Let's stop having this silly discussion.
------- Comment #5 From Chris Lattner 2004-07-07 23:35:58 -------
Disclaimer (should have said this before): I know nothing about CDR.

> As for experience with it, I have plenty.

I'm sorry, I didn't mean experience with CDR, I meant experience with the future
problems we will run into with the .bc file format. :)

> What other alternatives do you suggest? Nothing?

I suggest we stay with what we have until there is a reason to change it.

> In that sense the "array of char" is the string I'm talking about.

Okay.  I just want to make sure that it is absolutely clear that what we are
storing in the .bc files is not sufficient for, say, Java or MSIL strings. 
Also, it is not even true that we are storing C strings either.  We are
literally special casing random arrays of bytes, that's all.  They just happen
to be commonly used as strings.

In any case, I don't think there is much point in continuing this discussion in
this bug!

-Chris
------- Comment #6 From Reid Spencer 2004-07-07 23:41:05 -------
Agreed.

I'll drop the CDR discussion for now but should we start extending to esoteric
platforms or getting interest from users about using common data
representations, I will once again become a strong advocate of CDR :)

No further comments on the string or CDR topics needed.

However, there are possibly other small incremental bc enhancements that could
be made before 1.3. We should document those here.
------- Comment #7 From Chris Lattner 2004-07-08 00:42:58 -------
> I'll drop the CDR discussion for now but should we start extending to esoteric
> platforms or getting interest from users about using common data
> representations, I will once again become a strong advocate of CDR :)

Please do!

> However, there are possibly other small incremental bc enhancements that could
> be made before 1.3. We should document those here.

Sounds great.  Bug 263 is one of them :)

-Chris
------- Comment #8 From Reid Spencer 2004-07-25 12:44:23 -------
Mine
------- Comment #9 From Reid Spencer 2004-07-25 16:23:40 -------
Types have been made 24-bit quantities that overflow to 32-bit if necessary.
Block headers could not be made vbr because the size field has to be constant
size in order for the fixup logic to work. However, it has been reduced 50% to a
single 32-bit quantity. CDRizing various types won't be done until necessary so
this bug is done.

First Last Prev Next    No search results available