Our hardware-based skinning doesn't work on certain Intel graphics
devices. Fall back to software skinning on GPUs that return an OpenGL
3.1 context even though we explicitly request a 3.2 one.
Might fix#1459, #1689.
It isn't enough to check the top-level mesh, because attached meshes also
get a transformation from script. At least cotton branches sometimes do
that.
So check the matrix directly before using it and skip rendering.
Not sure why this broke, maybe glGetIntegerv does not return the current
polygon mode anymore. Either way, we don't need to remember the previous
setting but can just always reset it to GL_FILL afterwards.
The buffer offset was encoded in the VAO which would be shared across all
submeshes. Instead, don't encode the buffer offset into the VAO but apply it
with glDrawElementsBaseVertex().
This is required for glDrawElements() with a core profile. Furthermore, for
meshes that have a fixed face ordering (non-transparent meshes at 100%
completion), this puts more static data on the GPU that we don't have to
upload every frame.
For meshes with non-fixed face ordering, we still have to upload faces every
frame, since the ordering might change if the mesh rotates. We could still
improve this by figuring out if the order actually changed after the sort
step, and only update to the GPU if it has. In many cases it hasn't, so there
is some potential for more optimization there. But that's for later.
The DynamiteBox sets the PictureTransformation to a degenerate matrix, to
prevent it from being rendered. Exit early in this case, since we are not
going to render anything anyway, and avoid the assertion when inverting the
matrix.
Replace the hardcoded VAI_ constants with that. The VAI constants are now
moved to C4DrawGL as C4SSA_ constants, similar to the C4SSU ones. This allows
to introduce other attributes to replace vertex positions, normals, colors
and texture coordinates with attributes in later commits. This, in turn, is
needed because the built-in attributes are no longer available in the OpenGL
core profile.
Also, while at it, cleanup C4Shader a bit. Use std::vector<> instead of
maintaining an own array of uniform names. Delete the vertex and fragment
objects after the full shader program has been linked. Make sure that
C4Shader::Init keeps the old program in place if the new one cannot be
compiled or linked.
Code outside of C4FoWRegion should not care about the internals of the
class. Therefore, we remove direct access to the backing surface (and
secondary buffer surface) and replace it instead with accessors that
return those few values that are required by outside code.
In the game initialization code, we're setting the thread locale to the
user's default locale. This is bad when generating shader code because
it may lead to floats being written with commas instead of dots for the
decimal separator, which in turn confuses the GLSL compiler (because it
parses the comma as the comma operator). So now we just temporarily set
the C locale while generating shader code.
The C++ standard library comes with perfectly fine implementations of
these functions, so there's no point in reimplementing them just for the
hell of it.
It looks like sometimes graphics drivers don't report the correct number for
maximum uniforms, since the workaround was enabled even though Newton
confirmed disabling the workaround worked just fine on his GPU. Therefore,
don't listen to what the graphics driver is saying but just try to compile the
shader, and fall back to the workaround if the compilation fails.
In comparison to the old system, this is a downgrade - instead of being
able to set a full color mapping by gamma ramp, we now get just a value
per colour channel.
Upside is that we do not need to play around with the global gamma ramps
any more, which was arguably the wrong way to do it.
This commit will likely break everything that has been using gamma so far.
Instead of doing the transformation when drawing a mesh. This allows making
the OpenGL normal matrix more consistent, since it does not include the
Ogre-To-Clonk transformation, and so that the transformation does not need
to be inverted in the shader.
As a side effect, all Attach transformations were updated, since before
they were specified in the OGRE reference frame, not the Clonk reference
frame.
4x3 matrices use the same number of uniform components as 4x4 ones.
If we're short on uniform components, don't transpose the transformation
matrix before sending it to the shader, and transpose it in the shader
itself instead, saving 4 components per bone.
The last row of the bone transformation matrix always is 0,0,0,1 so
there's no point in uploading it. Also reducing the max bone count to 80
which means the uniform array will fit into the available space on 6000
and 7000 series Geforce GPUs.
As long as we're not actually using a different shader for meshes
without bones, we need to upload an identity matrix so there's defined
data in the bone slot.
Instead of transforming all vertices on the CPU every time an animation
progresses, we now only recalculate the skeleton, leaving the heavy
lifting for the GPU. This also means we no longer have to push all
vertices onto the bus every frame, because the mesh isn't changing and
can therefore be stored in a GL_STATIC_DRAW VBO when it's first loaded.
The downside of this approach is that there's only a limited number of
uniforms and vertex attributes we can pass to the shader. At the moment
these limits are a maximum of 128 bones per skeleton, and no vertex can
be influenced by more than 8 bones at once. So far this is no problem,
as the most complex skeleton in the base game uses less than 64 bones
and no more than 6 bone weights per vertex.
Instead of having the default vertex shader hard-coded into the engine,
allow to load it from Graphics.ocg. There's still a fall-back version
wired into the engine because we can't return an error from
GetVertexShaderCodeForPass.
While we're still not doing skinning on the GPU, copying the vertex data
to a VBO immediately after updating the animation allows us to re-use
that data for unanimated meshes. It also allows us to store unanimated
data on the GPU, instead of transferring it over the bus for each frame.
This should improve cache coherency by having all surface tiles adjacent
instead of strewn across the heap. This will also remove an indirection
in the common case of only using one tile.
With this change, an additional rectangle is stored in C4FoWRegion that
represents the area covered by the viewport in fractional floating point
coordinates. This allows the light texture to be created for an arbitrary
portion of the landscape, and the coordinate transformations for the
shaders will still work.
Also, since the additional rectangle uses floating point precision, the
computed coordinate transformations do now give the exact same result as for
the landscape pixel-by-pixel, and there should not be any offsets left.
I also hope that this change improves or fixes the single-pixel-lines of sky
that are sometimes seen at the edges of the viewport.
This gets rid of GL state changes for questionable gain. It also fixes drawing
of fade sky backgrounds in global viewports (where, for some reason, the shade
model was set to GL_FLAT instead of GL_SMOOTH).
There were two problems with the previous transforms:
1) For inverting the Y axis for the ambient map, the total height of the
output window is needed, not only the viewport region.
2) The Y offset to only use the part of the light texture that is being
rendered to was not applied.
In order to keep the transformations more readable, a new lightweight class
C4FragTransform has been introduced which can only handle translations
and scales in x and y.